TheoremComplete

Singular Value Decomposition

The singular value decomposition (SVD) extends the spectral theorem to non-square and non-symmetric matrices. It is arguably the most important matrix factorization in applied mathematics.

TheoremSingular Value Decomposition

Every m×nm \times n matrix AA (real or complex) can be factored as: A=UΣVTA = U\Sigma V^T

where:

  • UU is m×mm \times m orthogonal (columns are left singular vectors)
  • Σ\Sigma is m×nm \times n diagonal with non-negative entries σ1σ2σr>0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 (the singular values)
  • VV is n×nn \times n orthogonal (columns are right singular vectors)

The number of nonzero singular values equals rank(A)=r\text{rank}(A) = r.

The SVD generalizes eigenvalue decomposition: while A=QΛQTA = Q\Lambda Q^T requires AA symmetric, A=UΣVTA = U\Sigma V^T works for any matrix by using different bases (UU and VV) for domain and codomain.

TheoremConnection to Spectral Theorem

The singular values and vectors of AA are related to eigenvalues of symmetric matrices:

  1. Singular values of AA are σi=λi(ATA)=λi(AAT)\sigma_i = \sqrt{\lambda_i(A^TA)} = \sqrt{\lambda_i(AA^T)}
  2. Right singular vectors (columns of VV) are eigenvectors of ATAA^TA
  3. Left singular vectors (columns of UU) are eigenvectors of AATAA^T
  4. ATA=VΣTΣVTA^TA = V\Sigma^T\Sigma V^T and AAT=UΣΣTUTAA^T = U\Sigma\Sigma^T U^T are spectral decompositions
ExampleComputing SVD

For A=[3045]A = \begin{bmatrix} 3 & 0 \\ 4 & 5 \end{bmatrix}:

Compute ATA=[25202025]A^TA = \begin{bmatrix} 25 & 20 \\ 20 & 25 \end{bmatrix} with eigenvalues 45,545, 5 and eigenvectors forming VV.

Compute AAT=[9121241]AA^T = \begin{bmatrix} 9 & 12 \\ 12 & 41 \end{bmatrix} with eigenvalues 45,545, 5 and eigenvectors forming UU.

Singular values: σ1=45=35\sigma_1 = \sqrt{45} = 3\sqrt{5}, σ2=5\sigma_2 = \sqrt{5}.

Then A=U[35005]VTA = U\begin{bmatrix} 3\sqrt{5} & 0 \\ 0 & \sqrt{5} \end{bmatrix}V^T.

TheoremApplications of SVD

Low-rank approximation: The best rank-kk approximation to AA in Frobenius norm is: Ak=i=1kσiuiviTA_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i\mathbf{v}_i^T

Moore-Penrose pseudoinverse: A+=VΣ+UTA^+ = V\Sigma^+ U^T where Σ+\Sigma^+ has entries 1/σi1/\sigma_i for nonzero σi\sigma_i.

Fundamental subspaces:

  • Im(A)\text{Im}(A) spanned by first rr columns of UU
  • ker(A)\ker(A) spanned by last nrn-r columns of VV
ExampleImage Compression

An m×nm \times n grayscale image can be stored as a matrix. The full SVD requires mnmn values, but the rank-kk approximation needs only k(m+n+1)k(m+n+1) values—huge savings when kmin(m,n)k \ll \min(m,n).

If A=UΣVTA = U\Sigma V^T and we keep only the kk largest singular values: Ak=i=1kσiuiviTA_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i\mathbf{v}_i^T

This is the optimal kk-term approximation and forms the basis for JPEG compression.

Remark

The SVD is ubiquitous: in data science (dimensionality reduction via truncated SVD), signal processing (noise filtering), control theory (system reduction), and numerical analysis (solving ill-conditioned systems). Its optimality for low-rank approximation makes it the gold standard for matrix compression and denoising. The SVD reveals the "essential structure" of any linear transformation.