Mathematics Lecture Notes

Theorem5.2CG Convergence Bound

Let $A \in \mathbb{R}^{n \times n}$ be symmetric positive definite with condition number $\kappa = \lambda_{\max}(A)/\lambda_{\min}(A)$ . The conjugate gradient method produces iterates $x^{(k)}$ satisfying: $\|x - x^{(k)}\|_A \leq 2 \left(\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}\right)^k \|x - x^{(0)}\|_A$ where $\|v\|_A = \sqrt{v^T A v}$ is the $A$ -norm. Moreover, if $A$ has only $m$ distinct eigenvalues, CG terminates in at most $m$ iterations.

Proof

CG optimality. By construction, $x^{(k)} = x^{(0)} + p$ where $p$ minimizes $\phi(x^{(0)} + p) = \frac{1}{2}(x^{(0)} + p)^T A (x^{(0)} + p) - b^T(x^{(0)} + p)$ over $p \in \mathcal{K}_k(A, r^{(0)})$ . Equivalently, $x^{(k)}$ minimizes $\|x - x^{(k)}\|_A$ over $x^{(0)} + \mathcal{K}_k$ .

Since $\mathcal{K}_k(A, r^{(0)}) = \{p(A) r^{(0)} : p \in \mathcal{P}_{k-1}\}$ where $\mathcal{P}_{k-1}$ is the set of polynomials of degree $\leq k - 1$ : $e^{(k)} = e^{(0)} - p(A)r^{(0)} = e^{(0)} - p(A)Ae^{(0)} = (I - p(A)A)e^{(0)} = q(A)e^{(0)}$ where $e^{(k)} = x - x^{(k)}$ and $q(t) = 1 - tp(t) \in \mathcal{P}_k$ satisfies $q(0) = 1$ .

Minimization over polynomials. Therefore: $\|e^{(k)}\|_A = \min_{q \in \mathcal{P}_k,\, q(0)=1} \|q(A)e^{(0)}\|_A$

Let $A = Q \Lambda Q^T$ with eigenvalues $\lambda_1, \ldots, \lambda_n$ . Write $e^{(0)} = Q c$ : $\|q(A)e^{(0)}\|_A^2 = \sum_{i=1}^n \lambda_i q(\lambda_i)^2 c_i^2 \leq \max_i q(\lambda_i)^2 \sum_i \lambda_i c_i^2 = \max_{i} q(\lambda_i)^2 \cdot \|e^{(0)}\|_A^2$

Thus $\|e^{(k)}\|_A \leq \min_{q(0)=1} \max_{\lambda \in \sigma(A)} |q(\lambda)| \cdot \|e^{(0)}\|_A$ .

Chebyshev bound. To bound $\min_{q(0)=1} \max_{t \in [\lambda_{\min}, \lambda_{\max}]} |q(t)|$ , use the shifted Chebyshev polynomial $q_k(t) = T_k\!\left(\frac{\lambda_{\max} + \lambda_{\min} - 2t}{\lambda_{\max} - \lambda_{\min}}\right) / T_k\!\left(\frac{\lambda_{\max} + \lambda_{\min}}{\lambda_{\max} - \lambda_{\min}}\right)$ which satisfies $q_k(0) = 1$ and $\max_{[\lambda_{\min}, \lambda_{\max}]} |q_k| = 1/T_k\!\left(\frac{\kappa + 1}{\kappa - 1}\right)$ .

Using $T_k(x) = \frac{1}{2}\left[(x + \sqrt{x^2-1})^k + (x - \sqrt{x^2-1})^k\right] \geq \frac{1}{2}\left(\frac{\sqrt{\kappa}+1}{\sqrt{\kappa}-1}\right)^k$ : $\frac{1}{T_k\left(\frac{\kappa+1}{\kappa-1}\right)} \leq 2\left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^k$

Finite termination. If $A$ has $m$ distinct eigenvalues $\mu_1, \ldots, \mu_m$ , the polynomial $q(t) = \prod_{j=1}^m (1 - t/\mu_j)$ has degree $m$ , satisfies $q(0) = 1$ , and $q(\mu_j) = 0$ for all $j$ . Hence $\|e^{(m)}\|_A = 0$ . $\square$

■

ExampleEffect of Preconditioning on CG Convergence

For the 2D Laplacian on an $N \times N$ grid: $\kappa = O(N^2)$ , so unpreconditioned CG needs $O(N)$ iterations. With incomplete Cholesky preconditioning: $\kappa_{\text{eff}} = O(N)$ , reducing to $O(\sqrt{N})$ iterations. With multigrid preconditioning: $\kappa_{\text{eff}} = O(1)$ , giving $O(1)$ CG iterations per solve, i.e., optimal $O(N^2)$ total work for $N^2$ unknowns.

RemarkSuperlinear Convergence

In practice, CG often converges faster than the $\kappa$ -bound suggests. If eigenvalues cluster into $m$ groups, CG behaves as if there are only $m$ distinct eigenvalues after an initial phase, leading to superlinear convergence. The precise rate depends on the eigenvalue distribution, not just the extreme eigenvalues.

Math Notes

Convergence of the Conjugate Gradient Method

Proof