TheoremComplete

Cayley-Hamilton Theorem

The Cayley--Hamilton theorem states that every square matrix satisfies its own characteristic polynomial. This deep result connects the polynomial algebra of eigenvalues to the matrix algebra itself, and has far-reaching consequences for computing matrix functions, finding inverse matrices, and understanding the structure of linear operators.


Statement

Theorem5.4Cayley-Hamilton Theorem

Let AMn×n(F)A \in M_{n \times n}(F) be a square matrix over a field FF, and let pA(λ)=det(AλI)p_A(\lambda) = \det(A - \lambda I) be its characteristic polynomial. Then:

pA(A)=0.p_A(A) = 0.

That is, if pA(λ)=(1)nλn+cn1λn1++c1λ+c0p_A(\lambda) = (-1)^n \lambda^n + c_{n-1}\lambda^{n-1} + \cdots + c_1 \lambda + c_0, then:

(1)nAn+cn1An1++c1A+c0I=0.(-1)^n A^n + c_{n-1}A^{n-1} + \cdots + c_1 A + c_0 I = 0.

RemarkWhat the theorem does NOT say

The theorem does not follow from simply "substituting AA for λ\lambda" in det(AλI)\det(A - \lambda I). The expression det(AAI)=det(0)=0\det(A - AI) = \det(0) = 0 is trivially true but irrelevant. The content is that when you expand pA(λ)p_A(\lambda) as a polynomial in λ\lambda, collect the scalar coefficients, and then form the matrix polynomial pA(A)=cnAn++c0Ip_A(A) = c_n A^n + \cdots + c_0 I, the result is the zero matrix.


Verification in small cases

ExampleCayley-Hamilton for a 2x2 matrix

A=(1234)A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}.

Characteristic polynomial (monic): p(λ)=λ25λ2p(\lambda) = \lambda^2 - 5\lambda - 2.

Verify A25A2I=0A^2 - 5A - 2I = 0:

A2=(7101522)A^2 = \begin{pmatrix} 7 & 10 \\ 15 & 22 \end{pmatrix}, 5A=(5101520)5A = \begin{pmatrix} 5 & 10 \\ 15 & 20 \end{pmatrix}, 2I=(2002)2I = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}.

A25A2I=(752101001515022202)=(0000)A^2 - 5A - 2I = \begin{pmatrix} 7-5-2 & 10-10-0 \\ 15-15-0 & 22-20-2 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}.

ExampleCayley-Hamilton for a 3x3 matrix

A=(210021002)A = \begin{pmatrix} 2 & 1 & 0 \\ 0 & 2 & 1 \\ 0 & 0 & 2 \end{pmatrix}.

Characteristic polynomial: p(λ)=(2λ)3=λ3+6λ212λ+8p(\lambda) = (2 - \lambda)^3 = -\lambda^3 + 6\lambda^2 - 12\lambda + 8, or in monic form: p(λ)=λ36λ2+12λ8p(\lambda) = \lambda^3 - 6\lambda^2 + 12\lambda - 8.

Verify: A36A2+12A8I=0A^3 - 6A^2 + 12A - 8I = 0.

A2I=(010001000)A - 2I = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}, (A2I)2=(001000000)(A-2I)^2 = \begin{pmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}, (A2I)3=0(A-2I)^3 = 0.

So p(A)=(A2I)3=0p(A) = (A - 2I)^3 = 0.

ExampleCayley-Hamilton for diagonal matrices

If A=diag(λ1,,λn)A = \operatorname{diag}(\lambda_1, \ldots, \lambda_n), then p(λ)=i(λλi)p(\lambda) = \prod_i (\lambda - \lambda_i), and p(A)=i(AλiI)=diag(i(λ1λi),)=0p(A) = \prod_i (A - \lambda_i I) = \operatorname{diag}(\prod_{i}(\lambda_1 - \lambda_i), \ldots) = 0 since each diagonal entry contains a factor (λjλj)=0(\lambda_j - \lambda_j) = 0.


Applications

TheoremMatrix inverse via Cayley-Hamilton

If AA is invertible (detA0\det A \neq 0), the Cayley--Hamilton theorem provides an explicit formula for A1A^{-1} as a polynomial in AA.

For a 2×22 \times 2 matrix with p(λ)=λ2tr(A)λ+det(A)p(\lambda) = \lambda^2 - \operatorname{tr}(A)\lambda + \det(A): A2tr(A)A+det(A)I=0A^2 - \operatorname{tr}(A) \cdot A + \det(A) \cdot I = 0, so:

A1=1det(A)(tr(A)IA).A^{-1} = \frac{1}{\det(A)}(\operatorname{tr}(A) \cdot I - A).

ExampleComputing inverse via Cayley-Hamilton

A=(4321)A = \begin{pmatrix} 4 & 3 \\ 2 & 1 \end{pmatrix}, tr(A)=5\operatorname{tr}(A) = 5, det(A)=2\det(A) = -2.

A1=12(5IA)=12(1324)=(1/23/212)A^{-1} = \frac{1}{-2}(5I - A) = \frac{1}{-2}\begin{pmatrix} 1 & -3 \\ -2 & 4 \end{pmatrix} = \begin{pmatrix} -1/2 & 3/2 \\ 1 & -2 \end{pmatrix}.

Verify: AA1=(4321)(1/23/212)=(1001)AA^{-1} = \begin{pmatrix} 4 & 3 \\ 2 & 1 \end{pmatrix}\begin{pmatrix} -1/2 & 3/2 \\ 1 & -2 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}.

ExampleReducing higher powers of A

For A=(1234)A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} with p(λ)=λ25λ2p(\lambda) = \lambda^2 - 5\lambda - 2, the relation A2=5A+2IA^2 = 5A + 2I lets us express any power of AA as a linear combination of AA and II:

  • A3=5A2+2A=5(5A+2I)+2A=27A+10IA^3 = 5A^2 + 2A = 5(5A + 2I) + 2A = 27A + 10I.
  • A4=27A2+10A=27(5A+2I)+10A=145A+54IA^4 = 27A^2 + 10A = 27(5A + 2I) + 10A = 145A + 54I.
  • In general, An=anA+bnIA^n = a_n A + b_n I where an+1=5an+bna_{n+1} = 5a_n + b_n and bn+1=2anb_{n+1} = 2a_n.
ExampleCayley-Hamilton for nilpotent matrices

A nilpotent matrix NN with Nk=0N^k = 0 has characteristic polynomial p(λ)=(1)nλnp(\lambda) = (-1)^n \lambda^n, so Cayley--Hamilton gives (1)nNn=0(-1)^n N^n = 0, i.e., Nn=0N^n = 0. This means:

The index of nilpotency of an n×nn \times n nilpotent matrix is at most nn.

For N=(010001000)N = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}: N20N^2 \neq 0 but N3=0N^3 = 0. Cayley--Hamilton guarantees N3=0N^3 = 0 without computing.


Cayley--Hamilton and the minimal polynomial

Definition5.9Minimal polynomial

The minimal polynomial mA(λ)m_A(\lambda) of AA is the monic polynomial of smallest degree such that mA(A)=0m_A(A) = 0.

By Cayley--Hamilton, degmAn\deg m_A \leq n. The minimal polynomial divides the characteristic polynomial: mApAm_A \mid p_A.

ExampleMinimal polynomial vs characteristic polynomial
  • A=(2003)A = \begin{pmatrix} 2 & 0 \\ 0 & 3 \end{pmatrix}: pA(λ)=(λ2)(λ3)p_A(\lambda) = (\lambda - 2)(\lambda - 3). Since AA is diagonalizable with distinct eigenvalues, mA(λ)=(λ2)(λ3)=pA(λ)m_A(\lambda) = (\lambda - 2)(\lambda - 3) = p_A(\lambda).

  • A=(2002)=2IA = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} = 2I: pA(λ)=(λ2)2p_A(\lambda) = (\lambda - 2)^2. But (A2I)=0(A - 2I) = 0, so mA(λ)=λ2m_A(\lambda) = \lambda - 2. Here mApAm_A \neq p_A.

  • A=(2102)A = \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}: pA(λ)=(λ2)2p_A(\lambda) = (\lambda - 2)^2. Check: (A2I)=(0100)0(A - 2I) = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} \neq 0. So mA(λ)=(λ2)2=pA(λ)m_A(\lambda) = (\lambda - 2)^2 = p_A(\lambda).

ExampleMinimal polynomial of block diagonal

A=diag((2102),(3003))A = \operatorname{diag}\left(\begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}, \begin{pmatrix} 3 & 0 \\ 0 & 3 \end{pmatrix}\right).

pA(λ)=(λ2)2(λ3)2p_A(\lambda) = (\lambda - 2)^2 (\lambda - 3)^2, but mA(λ)=(λ2)2(λ3)m_A(\lambda) = (\lambda - 2)^2(\lambda - 3) since the 33-block is 3I3I (killed by λ3\lambda - 3) while the 22-block requires (λ2)2(\lambda - 2)^2.


Consequences

ExampleEvery matrix satisfies a polynomial of degree n

Cayley--Hamilton guarantees that AnA^n can be written as a linear combination of I,A,A2,,An1I, A, A^2, \ldots, A^{n-1}. This means:

The set {I,A,A2,,An1}\{I, A, A^2, \ldots, A^{n-1}\} spans all powers of AA. In particular, A1A^{-1} (when it exists) is a polynomial in AA of degree at most n1n - 1.

For an n×nn \times n matrix, the algebra F[A]={p(A):pF[x]}F[A] = \{p(A) : p \in F[x]\} has dimension at most nn over FF (in fact, dimension equals degmA\deg m_A).

ExampleNewton's identities from Cayley-Hamilton

For AM2×2(F)A \in M_{2 \times 2}(F): A2tr(A)A+det(A)I=0A^2 - \operatorname{tr}(A) \cdot A + \det(A) \cdot I = 0.

Taking traces: tr(A2)tr(A)2+2det(A)=0\operatorname{tr}(A^2) - \operatorname{tr}(A)^2 + 2\det(A) = 0, so:

det(A)=tr(A)2tr(A2)2.\det(A) = \frac{\operatorname{tr}(A)^2 - \operatorname{tr}(A^2)}{2}.

This is a special case of Newton's identities relating power sums to elementary symmetric functions.

For A=(3124)A = \begin{pmatrix} 3 & 1 \\ 2 & 4 \end{pmatrix}: tr(A)=7\operatorname{tr}(A) = 7, tr(A2)=tr(1171418)=29\operatorname{tr}(A^2) = \operatorname{tr}\begin{pmatrix} 11 & 7 \\ 14 & 18 \end{pmatrix} = 29. So det(A)=49292=10\det(A) = \frac{49 - 29}{2} = 10.


Summary

RemarkSignificance of Cayley-Hamilton

The Cayley--Hamilton theorem is a cornerstone result:

  • Every n×nn \times n matrix satisfies a polynomial relation of degree nn, constraining the algebra generated by AA.
  • It provides formulas for A1A^{-1} and higher powers of AA in terms of lower powers.
  • It establishes that the minimal polynomial divides the characteristic polynomial.
  • It is the starting point for the theory of canonical forms (Jordan form, rational canonical form).
  • It generalizes to linear operators on infinite-dimensional spaces (with appropriate definitions) and to ring-theoretic settings (the structure theorem for modules over a PID).