Projection Theorem

The Projection Theorem states that in an inner product space, every vector has a unique closest point in any closed subspace. This closest point is the orthogonal projection, and the residual is perpendicular to the subspace. This theorem is the foundation of least-squares approximation, best approximation theory, and the geometry of Hilbert spaces.

Statement

Theorem6.6Orthogonal Projection Theorem

Let $W$ be a finite-dimensional subspace of an inner product space $V$ , and let $v \in V$ . Then there exists a unique vector $\hat{v} \in W$ such that:

$v - \hat{v} \in W^\perp$ (the residual is orthogonal to $W$ ).
$\|v - \hat{v}\| \leq \|v - w\|$ for all $w \in W$ (the projection is the closest point).

Moreover, $\hat{v}$ is uniquely determined by these equivalent conditions.

Theorem6.7Projection formula via ONB

If $\{e_1, \ldots, e_k\}$ is an orthonormal basis for $W$ , then:

$\hat{v} = \operatorname{proj}_W(v) = \sum_{i=1}^k \langle v, e_i \rangle \, e_i.$

The projection matrix (in the standard basis of $\mathbb{R}^n$ ) is $P = QQ^T$ where $Q$ is the matrix with columns $e_1, \ldots, e_k$ .

Examples: projection onto a line

ExampleProjection onto a line in R^2

$W = \operatorname{span}\{(3, 4)\}$ , $v = (7, 1)$ .

$\hat{v} = \frac{\langle v, (3,4) \rangle}{\|(3,4)\|^2}(3, 4) = \frac{21 + 4}{25}(3, 4) = (3, 4)$ .

Residual: $v - \hat{v} = (7, 1) - (3, 4) = (4, -3)$ . Check: $(4, -3) \cdot (3, 4) = 12 - 12 = 0$ ✓.

Distance from $v$ to $W$ : $\|v - \hat{v}\| = \|(4, -3)\| = 5$ .

ExampleProjection onto a line in R^3

$W = \operatorname{span}\{(1, 1, 1)\}$ , $v = (1, 2, 3)$ .

$\hat{v} = \frac{1 + 2 + 3}{3}(1, 1, 1) = 2(1, 1, 1) = (2, 2, 2)$ .

Residual: $(1, 2, 3) - (2, 2, 2) = (-1, 0, 1)$ . Check: $(-1, 0, 1) \cdot (1, 1, 1) = 0$ ✓.

Distance: $\|(-1, 0, 1)\| = \sqrt{2}$ .

Examples: projection onto a plane

ExampleProjection onto a plane in R^3

$W = \operatorname{span}\{(1, 0, 0), (0, 1, 0)\}$ (the $xy$ -plane), $v = (3, 4, 5)$ .

$\hat{v} = (3, 4, 0)$ (just drop the $z$ -component). Residual: $(0, 0, 5) \perp W$ ✓.

ExampleProjection onto a general plane

$W = \operatorname{span}\{w_1, w_2\}$ with $w_1 = (1, 1, 0)$ , $w_2 = (0, 1, 1)$ . First orthonormalize:

$e_1 = \frac{1}{\sqrt{2}}(1, 1, 0)$ . $u_2 = (0, 1, 1) - \frac{1}{2}(1, 1, 0) = (-\frac{1}{2}, \frac{1}{2}, 1)$ , $e_2 = \frac{1}{\sqrt{3/2}}(-\frac{1}{2}, \frac{1}{2}, 1) = \frac{1}{\sqrt{6}}(-1, 1, 2)$ .

For $v = (1, 0, 0)$ : $\hat{v} = \langle v, e_1 \rangle e_1 + \langle v, e_2 \rangle e_2 = \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 1, 0) + \frac{-1}{\sqrt{6}} \cdot \frac{1}{\sqrt{6}}(-1, 1, 2)$ .

$= \frac{1}{2}(1, 1, 0) + \frac{1}{6}(1, -1, -2) = (\frac{1}{2} + \frac{1}{6}, \frac{1}{2} - \frac{1}{6}, -\frac{1}{3}) = (\frac{2}{3}, \frac{1}{3}, -\frac{1}{3})$ .

Check: $v - \hat{v} = (\frac{1}{3}, -\frac{1}{3}, \frac{1}{3})$ . $(1/3, -1/3, 1/3) \cdot (1, 1, 0) = 0$ ✓ and $(1/3, -1/3, 1/3) \cdot (0, 1, 1) = 0$ ✓.

Projection matrices

Definition6.9Orthogonal projection matrix

If $A$ is an $n \times k$ matrix whose columns form a basis for $W$ (not necessarily orthonormal), the projection matrix is:

$P = A(A^TA)^{-1}A^T.$

$P$ satisfies $P^2 = P$ (idempotent) and $P^T = P$ (symmetric). The projection of $v$ is $\hat{v} = Pv$ .

ExampleProjection matrix for a line

$W = \operatorname{span}\{(1, 2)\}$ : $A = \begin{pmatrix} 1 \\ 2 \end{pmatrix}$ , $A^TA = 5$ .

$P = \frac{1}{5}\begin{pmatrix} 1 \\ 2 \end{pmatrix}(1, 2) = \frac{1}{5}\begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix}$ .

Check: $P^2 = P$ ✓, $P^T = P$ ✓, $\operatorname{tr}(P) = 1$ (rank $1$ ), eigenvalues $0, 1$ .

ExampleProjection matrix for a plane

$W = \operatorname{span}\{(1,0,1), (0,1,1)\}$ : $A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix}$ .

$A^TA = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$ , $(A^TA)^{-1} = \frac{1}{3}\begin{pmatrix} 2 & -1 \\ -1 & 2 \end{pmatrix}$ .

$P = A(A^TA)^{-1}A^T = \frac{1}{3}\begin{pmatrix} 2 & -1 & 1 \\ -1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix}$ .

$\operatorname{tr}(P) = 2$ (the rank of $W$ ), eigenvalues $1, 1, 0$ .

Least-squares approximation

TheoremLeast-squares solution

The least-squares solution to the overdetermined system $Ax = b$ (where $A$ is $m \times n$ with $m > n$ and full column rank) minimizes $\|Ax - b\|^2$ . The solution satisfies the normal equations:

$A^T A \hat{x} = A^T b, \quad \hat{x} = (A^TA)^{-1}A^Tb.$

The projection of $b$ onto the column space of $A$ is $\hat{b} = A\hat{x} = Pb$ .

ExampleBest-fit line

Data: $(0, 1), (1, 3), (2, 4)$ . Fit $y = a + bx$ .

$A = \begin{pmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{pmatrix}$ , $b = \begin{pmatrix} 1 \\ 3 \\ 4 \end{pmatrix}$ .

$A^TA = \begin{pmatrix} 3 & 3 \\ 3 & 5 \end{pmatrix}$ , $A^Tb = \begin{pmatrix} 8 \\ 11 \end{pmatrix}$ .

$\hat{x} = \begin{pmatrix} 3 & 3 \\ 3 & 5 \end{pmatrix}^{-1}\begin{pmatrix} 8 \\ 11 \end{pmatrix} = \frac{1}{6}\begin{pmatrix} 5 & -3 \\ -3 & 3 \end{pmatrix}\begin{pmatrix} 8 \\ 11 \end{pmatrix} = \frac{1}{6}\begin{pmatrix} 7 \\ 9 \end{pmatrix}$ .

Best-fit line: $y = \frac{7}{6} + \frac{3}{2}x$ .

ExampleBest-fit polynomial

Fitting $y = a + bx + cx^2$ to data $(0, 1), (1, 2), (2, 1), (3, 0)$ :

$A = \begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \end{pmatrix}$ , $b = \begin{pmatrix} 1 \\ 2 \\ 1 \\ 0 \end{pmatrix}$ .

Solve $A^TA\hat{x} = A^Tb$ for the best parabola fit.

Best approximation in function spaces

ExampleBest polynomial approximation

Find the best constant approximation to $f(x) = x^2$ on $[-1, 1]$ with $\langle f, g \rangle = \int_{-1}^1 fg\,dx$ .

$W = \operatorname{span}\{1\}$ : $\operatorname{proj}_W(x^2) = \frac{\langle x^2, 1 \rangle}{\langle 1, 1 \rangle} = \frac{2/3}{2} = \frac{1}{3}$ .

The best constant approximation to $x^2$ on $[-1, 1]$ in the $L^2$ sense is $c = 1/3$ .

Error: $\|x^2 - 1/3\|^2 = \int_{-1}^1 (x^2 - 1/3)^2\,dx = \frac{8}{45}$ .

ExampleBest trigonometric approximation

Find the best approximation to $f(x) = |x|$ on $[-\pi, \pi]$ using $W = \operatorname{span}\{1, \cos x\}$ .

$a_0 = \frac{1}{2\pi}\int_{-\pi}^{\pi} |x|\,dx = \frac{\pi}{2}$ , $a_1 = \frac{1}{\pi}\int_{-\pi}^{\pi} |x|\cos x\,dx = -\frac{4}{\pi}$ .

Best approximation: $\frac{\pi}{2} - \frac{4}{\pi}\cos x$ .

ExampleBest polynomial approximation to e^x

Approximate $e^x$ on $[0, 1]$ by a linear polynomial $a + bx$ , minimizing $\int_0^1 (e^x - a - bx)^2\,dx$ .

This is the projection of $e^x$ onto $\operatorname{span}\{1, x\}$ with $\langle f, g \rangle = \int_0^1 fg\,dx$ .

The normal equations give $a = 4e - 10 \approx 0.873$ and $b = -6e + 18 \approx 1.690$ .

Properties of orthogonal projections

TheoremProperties of the projection operator

The orthogonal projection $P_W: V \to V$ onto a subspace $W$ satisfies:

$P_W^2 = P_W$ (idempotent).
$P_W^* = P_W$ (self-adjoint), meaning $\langle P_W u, v \rangle = \langle u, P_W v \rangle$ .
$\operatorname{im}(P_W) = W$ and $\ker(P_W) = W^\perp$ .
$I - P_W = P_{W^\perp}$ (the complementary projection).
$\|P_W v\| \leq \|v\|$ for all $v$ (projections are contractions).

ExampleComplementary projections

$W = \operatorname{span}\{(1, 0)\}$ in $\mathbb{R}^2$ : $P_W = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}$ , $P_{W^\perp} = I - P_W = \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}$ .

For $v = (3, 5)$ : $P_W v = (3, 0)$ , $P_{W^\perp} v = (0, 5)$ , $P_W v + P_{W^\perp} v = v$ ✓.

Summary

RemarkThe Projection Theorem as a unifying principle

The Projection Theorem is the geometric heart of applied linear algebra:

Existence and uniqueness of best approximations in inner product spaces.
Least-squares solutions to overdetermined systems.
Fourier analysis as projection onto trigonometric subspaces.
Signal processing: extracting the component of a signal in a subspace.
The projection $\hat{v} = \sum \langle v, e_i \rangle e_i$ reduces all approximation problems to computing inner products.