Singular Value Decomposition

The singular value decomposition (SVD) extends the spectral theorem to non-square and non-symmetric matrices. It is arguably the most important matrix factorization in applied mathematics.

TheoremSingular Value Decomposition

Every $m \times n$ matrix $A$ (real or complex) can be factored as: $A = U\Sigma V^T$

where:

$U$ is $m \times m$ orthogonal (columns are left singular vectors)
$\Sigma$ is $m \times n$ diagonal with non-negative entries $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0$ (the singular values)
$V$ is $n \times n$ orthogonal (columns are right singular vectors)

The number of nonzero singular values equals $\text{rank}(A) = r$ .

The SVD generalizes eigenvalue decomposition: while $A = Q\Lambda Q^T$ requires $A$ symmetric, $A = U\Sigma V^T$ works for any matrix by using different bases ( $U$ and $V$ ) for domain and codomain.

TheoremConnection to Spectral Theorem

The singular values and vectors of $A$ are related to eigenvalues of symmetric matrices:

Singular values of $A$ are $\sigma_i = \sqrt{\lambda_i(A^TA)} = \sqrt{\lambda_i(AA^T)}$
Right singular vectors (columns of $V$ ) are eigenvectors of $A^TA$
Left singular vectors (columns of $U$ ) are eigenvectors of $AA^T$
$A^TA = V\Sigma^T\Sigma V^T$ and $AA^T = U\Sigma\Sigma^T U^T$ are spectral decompositions

ExampleComputing SVD

For $A = \begin{bmatrix} 3 & 0 \\ 4 & 5 \end{bmatrix}$ :

Compute $A^TA = \begin{bmatrix} 25 & 20 \\ 20 & 25 \end{bmatrix}$ with eigenvalues $45, 5$ and eigenvectors forming $V$ .

Compute $AA^T = \begin{bmatrix} 9 & 12 \\ 12 & 41 \end{bmatrix}$ with eigenvalues $45, 5$ and eigenvectors forming $U$ .

Singular values: $\sigma_1 = \sqrt{45} = 3\sqrt{5}$ , $\sigma_2 = \sqrt{5}$ .

Then $A = U\begin{bmatrix} 3\sqrt{5} & 0 \\ 0 & \sqrt{5} \end{bmatrix}V^T$ .

TheoremApplications of SVD

Low-rank approximation: The best rank- $k$ approximation to $A$ in Frobenius norm is: $A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i\mathbf{v}_i^T$

Moore-Penrose pseudoinverse: $A^+ = V\Sigma^+ U^T$ where $\Sigma^+$ has entries $1/\sigma_i$ for nonzero $\sigma_i$ .

Fundamental subspaces:

$\text{Im}(A)$ spanned by first $r$ columns of $U$
$\ker(A)$ spanned by last $n-r$ columns of $V$

ExampleImage Compression

An $m \times n$ grayscale image can be stored as a matrix. The full SVD requires $mn$ values, but the rank- $k$ approximation needs only $k(m+n+1)$ values—huge savings when $k \ll \min(m,n)$ .

If $A = U\Sigma V^T$ and we keep only the $k$ largest singular values: $A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i\mathbf{v}_i^T$

This is the optimal $k$ -term approximation and forms the basis for JPEG compression.

Remark

The SVD is ubiquitous: in data science (dimensionality reduction via truncated SVD), signal processing (noise filtering), control theory (system reduction), and numerical analysis (solving ill-conditioned systems). Its optimality for low-rank approximation makes it the gold standard for matrix compression and denoising. The SVD reveals the "essential structure" of any linear transformation.