Orthogonal Projections and Least Squares

Orthogonal projections provide the best approximation of a vector by elements of a subspace. This geometric insight solves the fundamental problem of least squares approximation.

TheoremBest Approximation Theorem

Let $W$ be a finite-dimensional subspace of inner product space $V$ , and let $\mathbf{v} \in V$ . The orthogonal projection $\text{proj}_W(\mathbf{v})$ is the unique closest point in $W$ to $\mathbf{v}$ : $\|\mathbf{v} - \text{proj}_W(\mathbf{v})\| < \|\mathbf{v} - \mathbf{w}\|$

for any $\mathbf{w} \in W$ with $\mathbf{w} \neq \text{proj}_W(\mathbf{v})$ .

The error vector $\mathbf{v} - \text{proj}_W(\mathbf{v})$ is orthogonal to $W$ .

This theorem is the theoretical foundation for least squares: the best fit is achieved when the residual is orthogonal to the approximation space.

TheoremNormal Equations for Least Squares

Consider the overdetermined system $A\mathbf{x} = \mathbf{b}$ where $A$ is $m \times n$ with $m > n$ (more equations than unknowns). The least squares solution $\hat{\mathbf{x}}$ minimizes $\|A\mathbf{x} - \mathbf{b}\|^2$ and satisfies the normal equations: $A^TA\hat{\mathbf{x}} = A^T\mathbf{b}$

If $A$ has full column rank, the solution is unique: $\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b}$ .

The projection matrix $P = A(A^TA)^{-1}A^T$ projects onto the column space of $A$ .

ExampleLinear Regression

Fit a line $y = mx + b$ to data points $(x_1, y_1), \ldots, (x_n, y_n)$ .

Set up system: $\begin{bmatrix} x_1 & 1 \\ x_2 & 1 \\ \vdots & \vdots \\ x_n & 1 \end{bmatrix}\begin{bmatrix} m \\ b \end{bmatrix} \approx \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}$

The least squares solution $(m, b)$ minimizes the sum of squared residuals $\sum_{i=1}^n (y_i - mx_i - b)^2$ .

Solving normal equations: $(A^TA)\begin{bmatrix} m \\ b \end{bmatrix} = A^T\mathbf{y}$ gives the regression coefficients.

TheoremOrthogonal Decomposition

Let $W$ be a subspace of finite-dimensional inner product space $V$ . Every $\mathbf{v} \in V$ can be uniquely written as: $\mathbf{v} = \mathbf{w} + \mathbf{w}^\perp$

where $\mathbf{w} \in W$ and $\mathbf{w}^\perp \in W^\perp$ . This decomposition satisfies:

$\mathbf{w} = \text{proj}_W(\mathbf{v})$
$\mathbf{w}^\perp = \mathbf{v} - \text{proj}_W(\mathbf{v})$
$V = W \oplus W^\perp$

TheoremProjection Matrix Properties

A matrix $P$ is an orthogonal projection matrix if and only if:

$P^2 = P$ (idempotent)
$P^T = P$ (symmetric)

The matrix $I - P$ projects onto the orthogonal complement of the range of $P$ .

Remark

Orthogonal projection is pervasive in applications: data fitting (regression), signal processing (filtering), computer graphics (shadows), and quantum mechanics (measurement). The geometric insight—that the best approximation is achieved when the error is perpendicular—translates into the algebraic condition captured by the normal equations.