Let AâRnĂn be symmetric positive definite with condition number Îș=λmaxâ(A)/λminâ(A). The conjugate gradient method produces iterates x(k) satisfying:
â„xâx(k)â„Aââ€2(Îșâ+1Îșââ1â)kâ„xâx(0)â„Aâ
where â„vâ„Aâ=vTAvâ is the A-norm. Moreover, if A has only m distinct eigenvalues, CG terminates in at most m iterations.
Proof
Proof
CG optimality. By construction, x(k)=x(0)+p where p minimizes Ï(x(0)+p)=21â(x(0)+p)TA(x(0)+p)âbT(x(0)+p) over pâKkâ(A,r(0)). Equivalently, x(k) minimizes â„xâx(k)â„Aâ over x(0)+Kkâ.
Since Kkâ(A,r(0))={p(A)r(0):pâPkâ1â} where Pkâ1â is the set of polynomials of degree â€kâ1:
e(k)=e(0)âp(A)r(0)=e(0)âp(A)Ae(0)=(Iâp(A)A)e(0)=q(A)e(0)
where e(k)=xâx(k) and q(t)=1âtp(t)âPkâ satisfies q(0)=1.
Minimization over polynomials. Therefore:
â„e(k)â„Aâ=minqâPkâ,q(0)=1ââ„q(A)e(0)â„Aâ
Let A=QÎQT with eigenvalues λ1â,âŠ,λnâ. Write e(0)=Qc:
â„q(A)e(0)â„A2â=âi=1nâλiâq(λiâ)2ci2ââ€maxiâq(λiâ)2âiâλiâci2â=maxiâq(λiâ)2â â„e(0)â„A2â
Chebyshev bound. To bound minq(0)=1âmaxtâ[λminâ,λmaxâ]ââŁq(t)âŁ, use the shifted Chebyshev polynomial qkâ(t)=Tkâ(λmaxââλminâλmaxâ+λminââ2tâ)/Tkâ(λmaxââλminâλmaxâ+λminââ) which satisfies qkâ(0)=1 and max[λminâ,λmaxâ]ââŁqkââŁ=1/Tkâ(Îșâ1Îș+1â).
Using Tkâ(x)=21â[(x+x2â1â)k+(xâx2â1â)k]â„21â(Îșââ1Îșâ+1â)k:
Tkâ(Îșâ1Îș+1â)1ââ€2(Îșâ+1Îșââ1â)k
Finite termination. If A has m distinct eigenvalues ÎŒ1â,âŠ,ÎŒmâ, the polynomial q(t)=âj=1mâ(1ât/ÎŒjâ) has degree m, satisfies q(0)=1, and q(ÎŒjâ)=0 for all j. Hence â„e(m)â„Aâ=0. âĄ
â
ExampleEffect of Preconditioning on CG Convergence
For the 2D Laplacian on an NĂN grid: Îș=O(N2), so unpreconditioned CG needs O(N) iterations. With incomplete Cholesky preconditioning: Îșeffâ=O(N), reducing to O(Nâ) iterations. With multigrid preconditioning: Îșeffâ=O(1), giving O(1) CG iterations per solve, i.e., optimal O(N2) total work for N2 unknowns.
RemarkSuperlinear Convergence
In practice, CG often converges faster than the Îș-bound suggests. If eigenvalues cluster into m groups, CG behaves as if there are only m distinct eigenvalues after an initial phase, leading to superlinear convergence. The precise rate depends on the eigenvalue distribution, not just the extreme eigenvalues.