TheoremComplete

Projection Theorem

The Projection Theorem states that in an inner product space, every vector has a unique closest point in any closed subspace. This closest point is the orthogonal projection, and the residual is perpendicular to the subspace. This theorem is the foundation of least-squares approximation, best approximation theory, and the geometry of Hilbert spaces.


Statement

Theorem6.6Orthogonal Projection Theorem

Let WW be a finite-dimensional subspace of an inner product space VV, and let vVv \in V. Then there exists a unique vector v^W\hat{v} \in W such that:

  1. vv^Wv - \hat{v} \in W^\perp (the residual is orthogonal to WW).
  2. vv^vw\|v - \hat{v}\| \leq \|v - w\| for all wWw \in W (the projection is the closest point).

Moreover, v^\hat{v} is uniquely determined by these equivalent conditions.

Theorem6.7Projection formula via ONB

If {e1,,ek}\{e_1, \ldots, e_k\} is an orthonormal basis for WW, then:

v^=projW(v)=i=1kv,eiei.\hat{v} = \operatorname{proj}_W(v) = \sum_{i=1}^k \langle v, e_i \rangle \, e_i.

The projection matrix (in the standard basis of Rn\mathbb{R}^n) is P=QQTP = QQ^T where QQ is the matrix with columns e1,,eke_1, \ldots, e_k.


Examples: projection onto a line

ExampleProjection onto a line in R^2

W=span{(3,4)}W = \operatorname{span}\{(3, 4)\}, v=(7,1)v = (7, 1).

v^=v,(3,4)(3,4)2(3,4)=21+425(3,4)=(3,4)\hat{v} = \frac{\langle v, (3,4) \rangle}{\|(3,4)\|^2}(3, 4) = \frac{21 + 4}{25}(3, 4) = (3, 4).

Residual: vv^=(7,1)(3,4)=(4,3)v - \hat{v} = (7, 1) - (3, 4) = (4, -3). Check: (4,3)(3,4)=1212=0(4, -3) \cdot (3, 4) = 12 - 12 = 0 ✓.

Distance from vv to WW: vv^=(4,3)=5\|v - \hat{v}\| = \|(4, -3)\| = 5.

ExampleProjection onto a line in R^3

W=span{(1,1,1)}W = \operatorname{span}\{(1, 1, 1)\}, v=(1,2,3)v = (1, 2, 3).

v^=1+2+33(1,1,1)=2(1,1,1)=(2,2,2)\hat{v} = \frac{1 + 2 + 3}{3}(1, 1, 1) = 2(1, 1, 1) = (2, 2, 2).

Residual: (1,2,3)(2,2,2)=(1,0,1)(1, 2, 3) - (2, 2, 2) = (-1, 0, 1). Check: (1,0,1)(1,1,1)=0(-1, 0, 1) \cdot (1, 1, 1) = 0 ✓.

Distance: (1,0,1)=2\|(-1, 0, 1)\| = \sqrt{2}.


Examples: projection onto a plane

ExampleProjection onto a plane in R^3

W=span{(1,0,0),(0,1,0)}W = \operatorname{span}\{(1, 0, 0), (0, 1, 0)\} (the xyxy-plane), v=(3,4,5)v = (3, 4, 5).

v^=(3,4,0)\hat{v} = (3, 4, 0) (just drop the zz-component). Residual: (0,0,5)W(0, 0, 5) \perp W ✓.

ExampleProjection onto a general plane

W=span{w1,w2}W = \operatorname{span}\{w_1, w_2\} with w1=(1,1,0)w_1 = (1, 1, 0), w2=(0,1,1)w_2 = (0, 1, 1). First orthonormalize:

e1=12(1,1,0)e_1 = \frac{1}{\sqrt{2}}(1, 1, 0). u2=(0,1,1)12(1,1,0)=(12,12,1)u_2 = (0, 1, 1) - \frac{1}{2}(1, 1, 0) = (-\frac{1}{2}, \frac{1}{2}, 1), e2=13/2(12,12,1)=16(1,1,2)e_2 = \frac{1}{\sqrt{3/2}}(-\frac{1}{2}, \frac{1}{2}, 1) = \frac{1}{\sqrt{6}}(-1, 1, 2).

For v=(1,0,0)v = (1, 0, 0): v^=v,e1e1+v,e2e2=1212(1,1,0)+1616(1,1,2)\hat{v} = \langle v, e_1 \rangle e_1 + \langle v, e_2 \rangle e_2 = \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(1, 1, 0) + \frac{-1}{\sqrt{6}} \cdot \frac{1}{\sqrt{6}}(-1, 1, 2).

=12(1,1,0)+16(1,1,2)=(12+16,1216,13)=(23,13,13)= \frac{1}{2}(1, 1, 0) + \frac{1}{6}(1, -1, -2) = (\frac{1}{2} + \frac{1}{6}, \frac{1}{2} - \frac{1}{6}, -\frac{1}{3}) = (\frac{2}{3}, \frac{1}{3}, -\frac{1}{3}).

Check: vv^=(13,13,13)v - \hat{v} = (\frac{1}{3}, -\frac{1}{3}, \frac{1}{3}). (1/3,1/3,1/3)(1,1,0)=0(1/3, -1/3, 1/3) \cdot (1, 1, 0) = 0 ✓ and (1/3,1/3,1/3)(0,1,1)=0(1/3, -1/3, 1/3) \cdot (0, 1, 1) = 0 ✓.


Projection matrices

Definition6.9Orthogonal projection matrix

If AA is an n×kn \times k matrix whose columns form a basis for WW (not necessarily orthonormal), the projection matrix is:

P=A(ATA)1AT.P = A(A^TA)^{-1}A^T.

PP satisfies P2=PP^2 = P (idempotent) and PT=PP^T = P (symmetric). The projection of vv is v^=Pv\hat{v} = Pv.

ExampleProjection matrix for a line

W=span{(1,2)}W = \operatorname{span}\{(1, 2)\}: A=(12)A = \begin{pmatrix} 1 \\ 2 \end{pmatrix}, ATA=5A^TA = 5.

P=15(12)(1,2)=15(1224)P = \frac{1}{5}\begin{pmatrix} 1 \\ 2 \end{pmatrix}(1, 2) = \frac{1}{5}\begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix}.

Check: P2=PP^2 = P ✓, PT=PP^T = P ✓, tr(P)=1\operatorname{tr}(P) = 1 (rank 11), eigenvalues 0,10, 1.

ExampleProjection matrix for a plane

W=span{(1,0,1),(0,1,1)}W = \operatorname{span}\{(1,0,1), (0,1,1)\}: A=(100111)A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{pmatrix}.

ATA=(2112)A^TA = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}, (ATA)1=13(2112)(A^TA)^{-1} = \frac{1}{3}\begin{pmatrix} 2 & -1 \\ -1 & 2 \end{pmatrix}.

P=A(ATA)1AT=13(211121112)P = A(A^TA)^{-1}A^T = \frac{1}{3}\begin{pmatrix} 2 & -1 & 1 \\ -1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix}.

tr(P)=2\operatorname{tr}(P) = 2 (the rank of WW), eigenvalues 1,1,01, 1, 0.


Least-squares approximation

TheoremLeast-squares solution

The least-squares solution to the overdetermined system Ax=bAx = b (where AA is m×nm \times n with m>nm > n and full column rank) minimizes Axb2\|Ax - b\|^2. The solution satisfies the normal equations:

ATAx^=ATb,x^=(ATA)1ATb.A^T A \hat{x} = A^T b, \quad \hat{x} = (A^TA)^{-1}A^Tb.

The projection of bb onto the column space of AA is b^=Ax^=Pb\hat{b} = A\hat{x} = Pb.

ExampleBest-fit line

Data: (0,1),(1,3),(2,4)(0, 1), (1, 3), (2, 4). Fit y=a+bxy = a + bx.

A=(101112)A = \begin{pmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{pmatrix}, b=(134)b = \begin{pmatrix} 1 \\ 3 \\ 4 \end{pmatrix}.

ATA=(3335)A^TA = \begin{pmatrix} 3 & 3 \\ 3 & 5 \end{pmatrix}, ATb=(811)A^Tb = \begin{pmatrix} 8 \\ 11 \end{pmatrix}.

x^=(3335)1(811)=16(5333)(811)=16(79)\hat{x} = \begin{pmatrix} 3 & 3 \\ 3 & 5 \end{pmatrix}^{-1}\begin{pmatrix} 8 \\ 11 \end{pmatrix} = \frac{1}{6}\begin{pmatrix} 5 & -3 \\ -3 & 3 \end{pmatrix}\begin{pmatrix} 8 \\ 11 \end{pmatrix} = \frac{1}{6}\begin{pmatrix} 7 \\ 9 \end{pmatrix}.

Best-fit line: y=76+32xy = \frac{7}{6} + \frac{3}{2}x.

ExampleBest-fit polynomial

Fitting y=a+bx+cx2y = a + bx + cx^2 to data (0,1),(1,2),(2,1),(3,0)(0, 1), (1, 2), (2, 1), (3, 0):

A=(100111124139)A = \begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \end{pmatrix}, b=(1210)b = \begin{pmatrix} 1 \\ 2 \\ 1 \\ 0 \end{pmatrix}.

Solve ATAx^=ATbA^TA\hat{x} = A^Tb for the best parabola fit.


Best approximation in function spaces

ExampleBest polynomial approximation

Find the best constant approximation to f(x)=x2f(x) = x^2 on [1,1][-1, 1] with f,g=11fgdx\langle f, g \rangle = \int_{-1}^1 fg\,dx.

W=span{1}W = \operatorname{span}\{1\}: projW(x2)=x2,11,1=2/32=13\operatorname{proj}_W(x^2) = \frac{\langle x^2, 1 \rangle}{\langle 1, 1 \rangle} = \frac{2/3}{2} = \frac{1}{3}.

The best constant approximation to x2x^2 on [1,1][-1, 1] in the L2L^2 sense is c=1/3c = 1/3.

Error: x21/32=11(x21/3)2dx=845\|x^2 - 1/3\|^2 = \int_{-1}^1 (x^2 - 1/3)^2\,dx = \frac{8}{45}.

ExampleBest trigonometric approximation

Find the best approximation to f(x)=xf(x) = |x| on [π,π][-\pi, \pi] using W=span{1,cosx}W = \operatorname{span}\{1, \cos x\}.

a0=12πππxdx=π2a_0 = \frac{1}{2\pi}\int_{-\pi}^{\pi} |x|\,dx = \frac{\pi}{2}, a1=1πππxcosxdx=4πa_1 = \frac{1}{\pi}\int_{-\pi}^{\pi} |x|\cos x\,dx = -\frac{4}{\pi}.

Best approximation: π24πcosx\frac{\pi}{2} - \frac{4}{\pi}\cos x.

ExampleBest polynomial approximation to e^x

Approximate exe^x on [0,1][0, 1] by a linear polynomial a+bxa + bx, minimizing 01(exabx)2dx\int_0^1 (e^x - a - bx)^2\,dx.

This is the projection of exe^x onto span{1,x}\operatorname{span}\{1, x\} with f,g=01fgdx\langle f, g \rangle = \int_0^1 fg\,dx.

The normal equations give a=4e100.873a = 4e - 10 \approx 0.873 and b=6e+181.690b = -6e + 18 \approx 1.690.


Properties of orthogonal projections

TheoremProperties of the projection operator

The orthogonal projection PW:VVP_W: V \to V onto a subspace WW satisfies:

  1. PW2=PWP_W^2 = P_W (idempotent).
  2. PW=PWP_W^* = P_W (self-adjoint), meaning PWu,v=u,PWv\langle P_W u, v \rangle = \langle u, P_W v \rangle.
  3. im(PW)=W\operatorname{im}(P_W) = W and ker(PW)=W\ker(P_W) = W^\perp.
  4. IPW=PWI - P_W = P_{W^\perp} (the complementary projection).
  5. PWvv\|P_W v\| \leq \|v\| for all vv (projections are contractions).
ExampleComplementary projections

W=span{(1,0)}W = \operatorname{span}\{(1, 0)\} in R2\mathbb{R}^2: PW=(1000)P_W = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, PW=IPW=(0001)P_{W^\perp} = I - P_W = \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}.

For v=(3,5)v = (3, 5): PWv=(3,0)P_W v = (3, 0), PWv=(0,5)P_{W^\perp} v = (0, 5), PWv+PWv=vP_W v + P_{W^\perp} v = v ✓.


Summary

RemarkThe Projection Theorem as a unifying principle

The Projection Theorem is the geometric heart of applied linear algebra:

  • Existence and uniqueness of best approximations in inner product spaces.
  • Least-squares solutions to overdetermined systems.
  • Fourier analysis as projection onto trigonometric subspaces.
  • Signal processing: extracting the component of a signal in a subspace.
  • The projection v^=v,eiei\hat{v} = \sum \langle v, e_i \rangle e_i reduces all approximation problems to computing inner products.