ConceptComplete

Singular Value Decomposition

The singular value decomposition (SVD) is arguably the most important matrix factorization in numerical linear algebra. It provides optimal low-rank approximations, reveals the rank and condition number, and is the basis for many algorithms in data science and scientific computing.


Definition and Properties

Definition6.7Singular Value Decomposition

Every matrix A∈Rm×nA \in \mathbb{R}^{m \times n} (with m≄nm \geq n) has a factorization A=UÎŁVTA = U \Sigma V^T where U∈Rm×mU \in \mathbb{R}^{m \times m} and V∈Rn×nV \in \mathbb{R}^{n \times n} are orthogonal, and ÎŁ=diag(σ1,
,σn)∈Rm×n\Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_n) \in \mathbb{R}^{m \times n} with σ1≄σ2â‰„â‹Żâ‰„Ïƒn≄0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0. The σi\sigma_i are the singular values (square roots of eigenvalues of ATAA^T A). The columns of UU and VV are the left and right singular vectors.

ExampleGeometric Interpretation

The SVD says AA maps the unit sphere in Rn\mathbb{R}^n to an ellipsoid in Rm\mathbb{R}^m. The semi-axes have lengths σ1,
,σn\sigma_1, \ldots, \sigma_n, oriented along u1,
,unu_1, \ldots, u_n. The directions v1,
,vnv_1, \ldots, v_n are the pre-images of these axes. Thus σ1=∄A∄2\sigma_1 = \|A\|_2 (operator norm), σn=1/∄A−1∄2\sigma_n = 1/\|A^{-1}\|_2 (if AA is square and invertible), and Îș2(A)=σ1/σn\kappa_2(A) = \sigma_1/\sigma_n.


Computation

Definition6.8Golub-Kahan Bidiagonalization

The standard SVD algorithm first reduces AA to bidiagonal form B=U1TAV1B = U_1^T A V_1 (upper bidiagonal) using Householder reflectors, at cost O(mn2)O(mn^2). Then the SVD of BB is computed iteratively using a variant of the QR algorithm applied to BTBB^T B implicitly (the Golub-Kahan-Reinsch algorithm). Each iteration costs O(n)O(n), and convergence with Wilkinson-type shifts is cubic, giving a total cost of O(mn2)O(mn^2) for the full SVD.

RemarkDivide-and-Conquer SVD

The divide-and-conquer approach splits the bidiagonal matrix into two halves plus a rank-1 update, recursively computes SVDs of the halves, then merges via a secular equation 1+∑idi2σi2−λ=01 + \sum_i \frac{d_i^2}{\sigma_i^2 - \lambda} = 0. This achieves O(n2)O(n^2) average complexity for computing all singular values and vectors of a bidiagonal matrix, making it the fastest method in practice for dense SVD.


Applications

Definition6.9Low-Rank Approximation and Pseudoinverse

The Eckart-Young theorem states that the best rank-kk approximation to AA (in both Frobenius and 2-norm) is Ak=∑i=1kσiuiviTA_k = \sum_{i=1}^k \sigma_i u_i v_i^T, with errors ∄A−Ak∄2=σk+1\|A - A_k\|_2 = \sigma_{k+1} and ∄A−Ak∄F=σk+12+⋯+σn2\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_n^2}. The Moore-Penrose pseudoinverse is A+=VÎŁ+UTA^+ = V \Sigma^+ U^T where ÎŁ+=diag(σ1−1,
,σr−1,0,
,0)\Sigma^+ = \mathrm{diag}(\sigma_1^{-1}, \ldots, \sigma_r^{-1}, 0, \ldots, 0) and r=rank(A)r = \mathrm{rank}(A).

ExamplePrincipal Component Analysis

PCA is SVD applied to the centered data matrix X∈Rn×pX \in \mathbb{R}^{n \times p} (samples ×\times features). The right singular vectors vjv_j are the principal directions, and σj2/(n−1)\sigma_j^2/(n-1) are the corresponding variances. Truncating to kk components via Xk=UkÎŁkVkTX_k = U_k \Sigma_k V_k^T gives the optimal kk-dimensional approximation, capturing ∑j=1kσj2/∑j=1pσj2\sum_{j=1}^k \sigma_j^2 / \sum_{j=1}^p \sigma_j^2 of the total variance.