TheoremComplete

Orthogonal Projections and Least Squares

Orthogonal projections provide the best approximation of a vector by elements of a subspace. This geometric insight solves the fundamental problem of least squares approximation.

TheoremBest Approximation Theorem

Let WW be a finite-dimensional subspace of inner product space VV, and let vV\mathbf{v} \in V. The orthogonal projection projW(v)\text{proj}_W(\mathbf{v}) is the unique closest point in WW to v\mathbf{v}: vprojW(v)<vw\|\mathbf{v} - \text{proj}_W(\mathbf{v})\| < \|\mathbf{v} - \mathbf{w}\|

for any wW\mathbf{w} \in W with wprojW(v)\mathbf{w} \neq \text{proj}_W(\mathbf{v}).

The error vector vprojW(v)\mathbf{v} - \text{proj}_W(\mathbf{v}) is orthogonal to WW.

This theorem is the theoretical foundation for least squares: the best fit is achieved when the residual is orthogonal to the approximation space.

TheoremNormal Equations for Least Squares

Consider the overdetermined system Ax=bA\mathbf{x} = \mathbf{b} where AA is m×nm \times n with m>nm > n (more equations than unknowns). The least squares solution x^\hat{\mathbf{x}} minimizes Axb2\|A\mathbf{x} - \mathbf{b}\|^2 and satisfies the normal equations: ATAx^=ATbA^TA\hat{\mathbf{x}} = A^T\mathbf{b}

If AA has full column rank, the solution is unique: x^=(ATA)1ATb\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b}.

The projection matrix P=A(ATA)1ATP = A(A^TA)^{-1}A^T projects onto the column space of AA.

ExampleLinear Regression

Fit a line y=mx+by = mx + b to data points (x1,y1),,(xn,yn)(x_1, y_1), \ldots, (x_n, y_n).

Set up system: [x11x21xn1][mb][y1y2yn]\begin{bmatrix} x_1 & 1 \\ x_2 & 1 \\ \vdots & \vdots \\ x_n & 1 \end{bmatrix}\begin{bmatrix} m \\ b \end{bmatrix} \approx \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}

The least squares solution (m,b)(m, b) minimizes the sum of squared residuals i=1n(yimxib)2\sum_{i=1}^n (y_i - mx_i - b)^2.

Solving normal equations: (ATA)[mb]=ATy(A^TA)\begin{bmatrix} m \\ b \end{bmatrix} = A^T\mathbf{y} gives the regression coefficients.

TheoremOrthogonal Decomposition

Let WW be a subspace of finite-dimensional inner product space VV. Every vV\mathbf{v} \in V can be uniquely written as: v=w+w\mathbf{v} = \mathbf{w} + \mathbf{w}^\perp

where wW\mathbf{w} \in W and wW\mathbf{w}^\perp \in W^\perp. This decomposition satisfies:

  • w=projW(v)\mathbf{w} = \text{proj}_W(\mathbf{v})
  • w=vprojW(v)\mathbf{w}^\perp = \mathbf{v} - \text{proj}_W(\mathbf{v})
  • V=WWV = W \oplus W^\perp
TheoremProjection Matrix Properties

A matrix PP is an orthogonal projection matrix if and only if:

  1. P2=PP^2 = P (idempotent)
  2. PT=PP^T = P (symmetric)

The matrix IPI - P projects onto the orthogonal complement of the range of PP.

Remark

Orthogonal projection is pervasive in applications: data fitting (regression), signal processing (filtering), computer graphics (shadows), and quantum mechanics (measurement). The geometric insight—that the best approximation is achieved when the error is perpendicular—translates into the algebraic condition captured by the normal equations.