ProofComplete

Proof of Cayley-Hamilton Theorem

We present multiple proofs of the Cayley--Hamilton theorem, each illuminating a different aspect of the result. The adjugate (cofactor) proof is the most elementary, while the extension-of-scalars proof is the most conceptual.


Statement

Theorem5.4Cayley-Hamilton Theorem

Let AMn×n(F)A \in M_{n \times n}(F) with characteristic polynomial pA(λ)=det(λIA)p_A(\lambda) = \det(\lambda I - A). Then pA(A)=0p_A(A) = 0.


Proof 1: Via the adjugate matrix

ProofProof using the classical adjoint (adjugate)

Key idea: The adjugate matrix adj(λIA)\operatorname{adj}(\lambda I - A) is a matrix whose entries are polynomials in λ\lambda, and the identity (λIA)adj(λIA)=det(λIA)I(\lambda I - A) \cdot \operatorname{adj}(\lambda I - A) = \det(\lambda I - A) \cdot I will give us the result when we "substitute" λ=A\lambda = A.

Step 1: The adjugate identity. For any matrix MM, we have Madj(M)=det(M)IM \cdot \operatorname{adj}(M) = \det(M) \cdot I. Applying this to M=λIAM = \lambda I - A:

(λIA)adj(λIA)=pA(λ)I.(\lambda I - A) \cdot \operatorname{adj}(\lambda I - A) = p_A(\lambda) \cdot I.

Step 2: Expand the adjugate as a polynomial in λ\lambda. Each entry of adj(λIA)\operatorname{adj}(\lambda I - A) is an (n1)×(n1)(n-1) \times (n-1) minor of λIA\lambda I - A, hence a polynomial in λ\lambda of degree at most n1n - 1. So we can write:

adj(λIA)=Bn1λn1+Bn2λn2++B1λ+B0,\operatorname{adj}(\lambda I - A) = B_{n-1}\lambda^{n-1} + B_{n-2}\lambda^{n-2} + \cdots + B_1 \lambda + B_0,

where B0,B1,,Bn1B_0, B_1, \ldots, B_{n-1} are n×nn \times n matrices with entries in FF (independent of λ\lambda).

Step 3: Expand both sides. Write pA(λ)=λn+cn1λn1++c1λ+c0p_A(\lambda) = \lambda^n + c_{n-1}\lambda^{n-1} + \cdots + c_1\lambda + c_0. The left side is:

(λIA)(Bn1λn1++B0)=Bn1λn+(Bn2ABn1)λn1++(B0AB1)λ+(AB0).(\lambda I - A)(B_{n-1}\lambda^{n-1} + \cdots + B_0) = B_{n-1}\lambda^n + (B_{n-2} - AB_{n-1})\lambda^{n-1} + \cdots + (B_0 - AB_1)\lambda + (-AB_0).

The right side is:

pA(λ)I=Iλn+cn1Iλn1++c1Iλ+c0I.p_A(\lambda) \cdot I = I\lambda^n + c_{n-1}I\lambda^{n-1} + \cdots + c_1 I \lambda + c_0 I.

Step 4: Match coefficients. Comparing coefficients of each power of λ\lambda:

Bn1=I,Bn2ABn1=cn1I,,B0AB1=c1I,AB0=c0I.B_{n-1} = I, \quad B_{n-2} - AB_{n-1} = c_{n-1}I, \quad \ldots, \quad B_0 - AB_1 = c_1 I, \quad -AB_0 = c_0 I.

Step 5: Multiply and sum. Multiply the kk-th equation (coefficient of λk\lambda^k) by AkA^k on the left:

  • λn\lambda^n: AnBn1=AnI=AnA^n \cdot B_{n-1} = A^n \cdot I = A^n.
  • λn1\lambda^{n-1}: An1(Bn2ABn1)=cn1An1A^{n-1}(B_{n-2} - AB_{n-1}) = c_{n-1} A^{n-1}.
  • \vdots
  • λ1\lambda^1: A(B0AB1)=c1AA(B_0 - AB_1) = c_1 A.
  • λ0\lambda^0: AB0=c0I-AB_0 = c_0 I.

Wait, we need to be more careful. Multiply the equation for λk\lambda^k by AkA^k:

  • From λn\lambda^n: Bn1=IB_{n-1} = I     \implies multiply by AnA^n: AnBn1=AnA^n B_{n-1} = A^n.
  • From λn1\lambda^{n-1}: Bn2ABn1=cn1IB_{n-2} - AB_{n-1} = c_{n-1}I     \implies multiply by An1A^{n-1}: An1Bn2AnBn1=cn1An1A^{n-1}B_{n-2} - A^n B_{n-1} = c_{n-1}A^{n-1}.
  • From λn2\lambda^{n-2}: Bn3ABn2=cn2IB_{n-3} - AB_{n-2} = c_{n-2}I     \implies multiply by An2A^{n-2}: An2Bn3An1Bn2=cn2An2A^{n-2}B_{n-3} - A^{n-1}B_{n-2} = c_{n-2}A^{n-2}.
  • \vdots
  • From λ0\lambda^0: AB0=c0I-AB_0 = c_0 I     \implies multiply by II: AB0=c0I-AB_0 = c_0 I.

Step 6: Telescope. Adding all these equations, the left side telescopes:

AnBn1+(An1Bn2AnBn1)++(AB0)=AB0+(everything cancels).A^n B_{n-1} + (A^{n-1}B_{n-2} - A^n B_{n-1}) + \cdots + (-AB_0) = -AB_0 + \text{(everything cancels)}.

More precisely, the sum is AB0-AB_0 from the last equation, but the telescoping gives:

LHS=AB0+AB0A2B1+A2B1=0.\text{LHS} = -AB_0 + AB_0 - A^2 B_1 + A^2 B_1 - \cdots = 0.

Wait, let me re-sum. The total of the left sides is:

AnAn+An1Bn2An1Bn2+=0.A^n - A^n + A^{n-1}B_{n-2} - A^{n-1}B_{n-2} + \cdots = 0.

Actually, the positive term AkBk1A^k B_{k-1} from the λk\lambda^k equation cancels with the negative term AkBk1-A^k B_{k-1} from the λk1\lambda^{k-1} equation. So the total is 00.

The total of the right sides is An+cn1An1++c1A+c0I=pA(A)A^n + c_{n-1}A^{n-1} + \cdots + c_1 A + c_0 I = p_A(A).

Therefore pA(A)=0p_A(A) = 0. \blacksquare

ExampleAdjugate proof illustrated for 2x2

A=(abcd)A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}, p(λ)=λ2(a+d)λ+(adbc)p(\lambda) = \lambda^2 - (a+d)\lambda + (ad - bc).

λIA=(λabcλd)\lambda I - A = \begin{pmatrix} \lambda - a & -b \\ -c & \lambda - d \end{pmatrix}, adj(λIA)=(λdbcλa)\operatorname{adj}(\lambda I - A) = \begin{pmatrix} \lambda - d & b \\ c & \lambda - a \end{pmatrix}.

So B1=IB_1 = I and B0=(dbca)B_0 = \begin{pmatrix} -d & b \\ c & -a \end{pmatrix}.

The equations are:

  • λ2\lambda^2: B1=IB_1 = I.
  • λ1\lambda^1: B0AB1=(a+d)IB_0 - AB_1 = -(a+d)I, i.e., B0=A(a+d)I=(dbca)B_0 = A - (a+d)I = \begin{pmatrix} -d & b \\ c & -a \end{pmatrix} ✓.
  • λ0\lambda^0: AB0=(adbc)I-AB_0 = (ad-bc)I, i.e., A(dbca)=(adbc)I-A\begin{pmatrix} -d & b \\ c & -a \end{pmatrix} = (ad-bc)I ✓.

Multiplying: A2IA((a+d)I)+I(adbc)I=A2(a+d)A+(adbc)IA^2 \cdot I - A \cdot (-(a+d)I) + I \cdot (ad-bc)I = A^2 - (a+d)A + (ad-bc)I, and the telescoping gives 00.


Proof 2: Via extension of scalars (for diagonalizable case)

ProofProof for diagonalizable matrices

Special case: Assume AA is diagonalizable, i.e., A=PDP1A = PDP^{-1} where D=diag(λ1,,λn)D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n).

Then pA(λ)=i=1n(λλi)p_A(\lambda) = \prod_{i=1}^n (\lambda - \lambda_i), and:

pA(A)=i=1n(AλiI)=Pi=1n(DλiI)P1.p_A(A) = \prod_{i=1}^n (A - \lambda_i I) = P \prod_{i=1}^n (D - \lambda_i I) P^{-1}.

Now DλiI=diag(λ1λi,,λnλi)D - \lambda_i I = \operatorname{diag}(\lambda_1 - \lambda_i, \ldots, \lambda_n - \lambda_i), which has a zero in the ii-th diagonal entry. The product i(DλiI)\prod_i (D - \lambda_i I) has every diagonal entry equal to zero (the jj-th entry is i(λjλi)\prod_i (\lambda_j - \lambda_i), which contains the factor λjλj=0\lambda_j - \lambda_j = 0). So i(DλiI)=0\prod_i (D - \lambda_i I) = 0, hence pA(A)=0p_A(A) = 0.

RemarkExtending to all matrices

The diagonalizable case is not sufficient since not all matrices are diagonalizable over FF. However, over F\overline{F} (the algebraic closure), the diagonalizable matrices are dense (in the Zariski topology). Since pA(A)=0p_A(A) = 0 is a polynomial identity in the entries of AA, and it holds on a dense set, it holds everywhere. This is the "extension of scalars" or "density" argument.

More concretely: the entries of pA(A)p_A(A) are polynomial functions of the entries of AA. These polynomials vanish on all diagonalizable matrices (a Zariski-dense set), hence vanish identically.


Proof 3: Via eigenvectors

ProofProof by checking on each eigenvector (over algebraically closed fields)

Over an algebraically closed field FF, let λ\lambda be an eigenvalue of AA with eigenvector vv. Then:

pA(A)v=pA(λ)v=0v=0,p_A(A)v = p_A(\lambda) v = 0 \cdot v = 0,

since Akv=λkvA^k v = \lambda^k v for all kk, so p(A)v=p(λ)vp(A)v = p(\lambda)v for any polynomial pp.

If AA has nn linearly independent eigenvectors v1,,vnv_1, \ldots, v_n (i.e., AA is diagonalizable), then pA(A)vi=0p_A(A)v_i = 0 for all ii implies pA(A)=0p_A(A) = 0.

For non-diagonalizable AA: use generalized eigenvectors. If vv is a generalized eigenvector with (AλI)mv=0(A - \lambda I)^m v = 0, and pA(λ)=(λλ0)m0p_A(\lambda) = (\lambda - \lambda_0)^{m_0} \cdots, then a similar but more involved argument shows pA(A)v=0p_A(A)v = 0. Since the generalized eigenvectors span FnF^n (over an algebraically closed field), pA(A)=0p_A(A) = 0.

ExampleChecking on generalized eigenvectors

A=(2102)A = \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}, p(λ)=(λ2)2p(\lambda) = (\lambda - 2)^2.

Eigenvector: v1=(1,0)v_1 = (1, 0), Av1=2v1Av_1 = 2v_1. Check: p(A)v1=(A2I)2v1=0p(A)v_1 = (A-2I)^2 v_1 = 0 since (A2I)v1=0(A-2I)v_1 = 0.

Generalized eigenvector: v2=(0,1)v_2 = (0, 1), (A2I)v2=(1,0)=v10(A - 2I)v_2 = (1, 0) = v_1 \neq 0, but (A2I)2v2=(A2I)v1=0(A-2I)^2 v_2 = (A-2I)v_1 = 0.

So p(A)v2=0p(A)v_2 = 0. Since v1,v2v_1, v_2 span R2\mathbb{R}^2, p(A)=0p(A) = 0.


Verification examples

ExampleVerification for rotation matrix

R=(cosθsinθsinθcosθ)R = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}.

p(λ)=λ22cosθλ+1p(\lambda) = \lambda^2 - 2\cos\theta \cdot \lambda + 1.

R22cosθR+IR^2 - 2\cos\theta \cdot R + I:

R2=(cos2θsin2θsin2θcos2θ)R^2 = \begin{pmatrix} \cos 2\theta & -\sin 2\theta \\ \sin 2\theta & \cos 2\theta \end{pmatrix}, so the (1,1)(1,1) entry of R22cosθR+IR^2 - 2\cos\theta \cdot R + I is:

cos2θ2cos2θ+1=(2cos2θ1)2cos2θ+1=0\cos 2\theta - 2\cos^2\theta + 1 = (2\cos^2\theta - 1) - 2\cos^2\theta + 1 = 0 ✓.

Similarly, all entries vanish.

ExampleVerification for a general 3x3

A=(110011001)A = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{pmatrix}, p(λ)=(1λ)3=λ3+3λ23λ+1p(\lambda) = (1 - \lambda)^3 = -\lambda^3 + 3\lambda^2 - 3\lambda + 1.

(AI)3=(010001000)3=0(A - I)^3 = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}^3 = 0 (a 3×33 \times 3 nilpotent matrix cubed is zero).

ExampleVerification for a symmetric matrix

A=(2112)A = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}, p(λ)=λ24λ+3=(λ3)(λ1)p(\lambda) = \lambda^2 - 4\lambda + 3 = (\lambda - 3)(\lambda - 1).

A24A+3I=(5445)(8448)+(3003)=(0000)A^2 - 4A + 3I = \begin{pmatrix} 5 & 4 \\ 4 & 5 \end{pmatrix} - \begin{pmatrix} 8 & 4 \\ 4 & 8 \end{pmatrix} + \begin{pmatrix} 3 & 0 \\ 0 & 3 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix} ✓.

ExampleVerification over C

A=(i10i)A = \begin{pmatrix} i & 1 \\ 0 & -i \end{pmatrix} over C\mathbb{C}. p(λ)=λ2+1p(\lambda) = \lambda^2 + 1.

A2+I=(1001)+(1001)=0A^2 + I = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} + \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = 0. Wait: A2=(i10i)2=(1i1+1(i)01)=(1001)=IA^2 = \begin{pmatrix} i & 1 \\ 0 & -i \end{pmatrix}^2 = \begin{pmatrix} -1 & i \cdot 1 + 1 \cdot (-i) \\ 0 & -1 \end{pmatrix} = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} = -I.

So A2+I=I+I=0A^2 + I = -I + I = 0 ✓.

ExampleVerification for a 4x4 permutation matrix

A=(0100001000011000)A = \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \end{pmatrix} (cyclic permutation).

A4=IA^4 = I, so p(λ)=λ41p(\lambda) = \lambda^4 - 1 and p(A)=A4I=0p(A) = A^4 - I = 0 ✓.


Consequences revisited

ExampleEvery matrix is annihilated by a degree-n polynomial

By Cayley--Hamilton, the vector space {p(A):pF[x],degpn1}\{p(A) : p \in F[x], \deg p \leq n - 1\} contains all powers AkA^k (by repeatedly using the relation An=(cn1An1++c0I)A^n = -(c_{n-1}A^{n-1} + \cdots + c_0 I)). This means:

F[A]=span{I,A,A2,,An1},F[A] = \operatorname{span}\{I, A, A^2, \ldots, A^{n-1}\},

a vector space of dimension at most nn. The exact dimension is degmA\deg m_A, where mAm_A is the minimal polynomial.

ExampleGeneral inverse formula

For n=3n = 3 with p(λ)=λ3+c2λ2+c1λ+c0p(\lambda) = \lambda^3 + c_2\lambda^2 + c_1\lambda + c_0 and c0=(1)3det(A)0c_0 = (-1)^3 \det(A) \neq 0:

A3+c2A2+c1A+c0I=0    A(A2+c2A+c1I)=c0I,A^3 + c_2 A^2 + c_1 A + c_0 I = 0 \implies A(A^2 + c_2 A + c_1 I) = -c_0 I,

A1=1c0(A2+c2A+c1I).A^{-1} = -\frac{1}{c_0}(A^2 + c_2 A + c_1 I).

For A=(101020003)A = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{pmatrix}: p(λ)=(λ1)(λ2)(λ3)=λ36λ2+11λ6p(\lambda) = (\lambda-1)(\lambda-2)(\lambda-3) = \lambda^3 - 6\lambda^2 + 11\lambda - 6.

A1=16(A26A+11I)=16((104040009)(60601200018)+(110001100011))=16(602030002)A^{-1} = \frac{1}{6}(A^2 - 6A + 11I) = \frac{1}{6}\left(\begin{pmatrix} 1 & 0 & 4 \\ 0 & 4 & 0 \\ 0 & 0 & 9 \end{pmatrix} - \begin{pmatrix} 6 & 0 & 6 \\ 0 & 12 & 0 \\ 0 & 0 & 18 \end{pmatrix} + \begin{pmatrix} 11 & 0 & 0 \\ 0 & 11 & 0 \\ 0 & 0 & 11 \end{pmatrix}\right) = \frac{1}{6}\begin{pmatrix} 6 & 0 & -2 \\ 0 & 3 & 0 \\ 0 & 0 & 2 \end{pmatrix}.


Summary

RemarkThree proof strategies

The Cayley--Hamilton theorem admits several proofs, each with distinct flavor:

  • Adjugate proof: Elementary, works over any commutative ring. Uses the identity (λIA)adj(λIA)=pA(λ)I(\lambda I - A) \cdot \operatorname{adj}(\lambda I - A) = p_A(\lambda) I and a telescoping argument.
  • Density argument: Proves the result for diagonalizable matrices (where it is obvious) and extends by continuity/Zariski density to all matrices.
  • Generalized eigenvector proof: Checks pA(A)v=0p_A(A)v = 0 on each (generalized) eigenvector over the algebraic closure.

All three illuminate the same truth: the characteristic polynomial is the "identity card" of the matrix, and the matrix itself is constrained to satisfy this identity.