Singular Value Decomposition

The singular value decomposition (SVD) is arguably the most important matrix factorization in numerical linear algebra. It provides optimal low-rank approximations, reveals the rank and condition number, and is the basis for many algorithms in data science and scientific computing.

Definition and Properties

Definition6.7Singular Value Decomposition

Every matrix $A \in \mathbb{R}^{m \times n}$ (with $m \geq n$ ) has a factorization $A = U \Sigma V^T$ where $U \in \mathbb{R}^{m \times m}$ and $V \in \mathbb{R}^{n \times n}$ are orthogonal, and $\Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_n) \in \mathbb{R}^{m \times n}$ with $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0$ . The $\sigma_i$ are the singular values (square roots of eigenvalues of $A^T A$ ). The columns of $U$ and $V$ are the left and right singular vectors.

ExampleGeometric Interpretation

The SVD says $A$ maps the unit sphere in $\mathbb{R}^n$ to an ellipsoid in $\mathbb{R}^m$ . The semi-axes have lengths $\sigma_1, \ldots, \sigma_n$ , oriented along $u_1, \ldots, u_n$ . The directions $v_1, \ldots, v_n$ are the pre-images of these axes. Thus $\sigma_1 = \|A\|_2$ (operator norm), $\sigma_n = 1/\|A^{-1}\|_2$ (if $A$ is square and invertible), and $\kappa_2(A) = \sigma_1/\sigma_n$ .

Computation

Definition6.8Golub-Kahan Bidiagonalization

The standard SVD algorithm first reduces $A$ to bidiagonal form $B = U_1^T A V_1$ (upper bidiagonal) using Householder reflectors, at cost $O(mn^2)$ . Then the SVD of $B$ is computed iteratively using a variant of the QR algorithm applied to $B^T B$ implicitly (the Golub-Kahan-Reinsch algorithm). Each iteration costs $O(n)$ , and convergence with Wilkinson-type shifts is cubic, giving a total cost of $O(mn^2)$ for the full SVD.

RemarkDivide-and-Conquer SVD

The divide-and-conquer approach splits the bidiagonal matrix into two halves plus a rank-1 update, recursively computes SVDs of the halves, then merges via a secular equation $1 + \sum_i \frac{d_i^2}{\sigma_i^2 - \lambda} = 0$ . This achieves $O(n^2)$ average complexity for computing all singular values and vectors of a bidiagonal matrix, making it the fastest method in practice for dense SVD.

Applications

Definition6.9Low-Rank Approximation and Pseudoinverse

The Eckart-Young theorem states that the best rank- $k$ approximation to $A$ (in both Frobenius and 2-norm) is $A_k = \sum_{i=1}^k \sigma_i u_i v_i^T$ , with errors $\|A - A_k\|_2 = \sigma_{k+1}$ and $\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_n^2}$ . The Moore-Penrose pseudoinverse is $A^+ = V \Sigma^+ U^T$ where $\Sigma^+ = \mathrm{diag}(\sigma_1^{-1}, \ldots, \sigma_r^{-1}, 0, \ldots, 0)$ and $r = \mathrm{rank}(A)$ .

ExamplePrincipal Component Analysis

PCA is SVD applied to the centered data matrix $X \in \mathbb{R}^{n \times p}$ (samples $\times$ features). The right singular vectors $v_j$ are the principal directions, and $\sigma_j^2/(n-1)$ are the corresponding variances. Truncating to $k$ components via $X_k = U_k \Sigma_k V_k^T$ gives the optimal $k$ -dimensional approximation, capturing $\sum_{j=1}^k \sigma_j^2 / \sum_{j=1}^p \sigma_j^2$ of the total variance.