TheoremComplete

Convergence of the Conjugate Gradient Method

Theorem5.2CG Convergence Bound

Let A∈Rn×nA \in \mathbb{R}^{n \times n} be symmetric positive definite with condition number Îș=λmax⁥(A)/λmin⁥(A)\kappa = \lambda_{\max}(A)/\lambda_{\min}(A). The conjugate gradient method produces iterates x(k)x^{(k)} satisfying: ∄x−x(k)∄A≀2(Îș−1Îș+1)k∄x−x(0)∄A\|x - x^{(k)}\|_A \leq 2 \left(\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}\right)^k \|x - x^{(0)}\|_A where ∄v∄A=vTAv\|v\|_A = \sqrt{v^T A v} is the AA-norm. Moreover, if AA has only mm distinct eigenvalues, CG terminates in at most mm iterations.


Proof

Proof

CG optimality. By construction, x(k)=x(0)+px^{(k)} = x^{(0)} + p where pp minimizes ϕ(x(0)+p)=12(x(0)+p)TA(x(0)+p)−bT(x(0)+p)\phi(x^{(0)} + p) = \frac{1}{2}(x^{(0)} + p)^T A (x^{(0)} + p) - b^T(x^{(0)} + p) over p∈Kk(A,r(0))p \in \mathcal{K}_k(A, r^{(0)}). Equivalently, x(k)x^{(k)} minimizes ∄x−x(k)∄A\|x - x^{(k)}\|_A over x(0)+Kkx^{(0)} + \mathcal{K}_k.

Since Kk(A,r(0))={p(A)r(0):p∈Pk−1}\mathcal{K}_k(A, r^{(0)}) = \{p(A) r^{(0)} : p \in \mathcal{P}_{k-1}\} where Pk−1\mathcal{P}_{k-1} is the set of polynomials of degree ≀k−1\leq k - 1: e(k)=e(0)−p(A)r(0)=e(0)−p(A)Ae(0)=(I−p(A)A)e(0)=q(A)e(0)e^{(k)} = e^{(0)} - p(A)r^{(0)} = e^{(0)} - p(A)Ae^{(0)} = (I - p(A)A)e^{(0)} = q(A)e^{(0)} where e(k)=x−x(k)e^{(k)} = x - x^{(k)} and q(t)=1−tp(t)∈Pkq(t) = 1 - tp(t) \in \mathcal{P}_k satisfies q(0)=1q(0) = 1.

Minimization over polynomials. Therefore: ∄e(k)∄A=min⁥q∈Pk, q(0)=1∄q(A)e(0)∄A\|e^{(k)}\|_A = \min_{q \in \mathcal{P}_k,\, q(0)=1} \|q(A)e^{(0)}\|_A

Let A=QΛQTA = Q \Lambda Q^T with eigenvalues λ1,
,λn\lambda_1, \ldots, \lambda_n. Write e(0)=Qce^{(0)} = Q c: ∄q(A)e(0)∄A2=∑i=1nλiq(λi)2ci2≀max⁥iq(λi)2∑iλici2=max⁥iq(λi)2⋅∄e(0)∄A2\|q(A)e^{(0)}\|_A^2 = \sum_{i=1}^n \lambda_i q(\lambda_i)^2 c_i^2 \leq \max_i q(\lambda_i)^2 \sum_i \lambda_i c_i^2 = \max_{i} q(\lambda_i)^2 \cdot \|e^{(0)}\|_A^2

Thus ∄e(k)∄A≀min⁥q(0)=1maxâĄÎ»âˆˆÏƒ(A)∣q(λ)âˆŁâ‹…âˆ„e(0)∄A\|e^{(k)}\|_A \leq \min_{q(0)=1} \max_{\lambda \in \sigma(A)} |q(\lambda)| \cdot \|e^{(0)}\|_A.

Chebyshev bound. To bound min⁥q(0)=1max⁥t∈[λmin⁥,λmax⁥]∣q(t)∣\min_{q(0)=1} \max_{t \in [\lambda_{\min}, \lambda_{\max}]} |q(t)|, use the shifted Chebyshev polynomial qk(t)=Tk ⁣(λmax⁥+λmin⁡−2tλmaxâĄâˆ’Î»min⁥)/Tk ⁣(λmax⁥+λmin⁥λmaxâĄâˆ’Î»min⁥)q_k(t) = T_k\!\left(\frac{\lambda_{\max} + \lambda_{\min} - 2t}{\lambda_{\max} - \lambda_{\min}}\right) / T_k\!\left(\frac{\lambda_{\max} + \lambda_{\min}}{\lambda_{\max} - \lambda_{\min}}\right) which satisfies qk(0)=1q_k(0) = 1 and max⁥[λmin⁥,λmax⁥]∣qk∣=1/Tk ⁣(Îș+1Îș−1)\max_{[\lambda_{\min}, \lambda_{\max}]} |q_k| = 1/T_k\!\left(\frac{\kappa + 1}{\kappa - 1}\right).

Using Tk(x)=12[(x+x2−1)k+(x−x2−1)k]≄12(Îș+1Îș−1)kT_k(x) = \frac{1}{2}\left[(x + \sqrt{x^2-1})^k + (x - \sqrt{x^2-1})^k\right] \geq \frac{1}{2}\left(\frac{\sqrt{\kappa}+1}{\sqrt{\kappa}-1}\right)^k: 1Tk(Îș+1Îș−1)≀2(Îș−1Îș+1)k\frac{1}{T_k\left(\frac{\kappa+1}{\kappa-1}\right)} \leq 2\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^k

Finite termination. If AA has mm distinct eigenvalues ÎŒ1,
,ÎŒm\mu_1, \ldots, \mu_m, the polynomial q(t)=∏j=1m(1−t/ÎŒj)q(t) = \prod_{j=1}^m (1 - t/\mu_j) has degree mm, satisfies q(0)=1q(0) = 1, and q(ÎŒj)=0q(\mu_j) = 0 for all jj. Hence ∄e(m)∄A=0\|e^{(m)}\|_A = 0. □\square

■

ExampleEffect of Preconditioning on CG Convergence

For the 2D Laplacian on an N×NN \times N grid: Îș=O(N2)\kappa = O(N^2), so unpreconditioned CG needs O(N)O(N) iterations. With incomplete Cholesky preconditioning: Îșeff=O(N)\kappa_{\text{eff}} = O(N), reducing to O(N)O(\sqrt{N}) iterations. With multigrid preconditioning: Îșeff=O(1)\kappa_{\text{eff}} = O(1), giving O(1)O(1) CG iterations per solve, i.e., optimal O(N2)O(N^2) total work for N2N^2 unknowns.

RemarkSuperlinear Convergence

In practice, CG often converges faster than the Îș\kappa-bound suggests. If eigenvalues cluster into mm groups, CG behaves as if there are only mm distinct eigenvalues after an initial phase, leading to superlinear convergence. The precise rate depends on the eigenvalue distribution, not just the extreme eigenvalues.