Diagonalization

A matrix is diagonalizable if it is similar to a diagonal matrix -- equivalently, if there exists a basis of eigenvectors. Diagonalization is the most desirable form a matrix can take: it reduces matrix computations to scalar operations and reveals the geometric action of the transformation as scaling along independent directions.

Definition

Definition5.8Diagonalizable matrix

A matrix $A \in M_{n \times n}(F)$ is diagonalizable (over $F$ ) if there exists an invertible matrix $P$ such that:

$P^{-1}AP = D = \operatorname{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n),$

where $\lambda_1, \ldots, \lambda_n$ are the eigenvalues of $A$ . Equivalently, $A = PDP^{-1}$ .

The columns of $P$ are eigenvectors of $A$ : $Ap_i = \lambda_i p_i$ where $p_i$ is the $i$ -th column of $P$ .

Theorem5.3Diagonalizability criterion

$A \in M_{n \times n}(F)$ is diagonalizable over $F$ if and only if:

The characteristic polynomial $p_A(\lambda)$ splits completely over $F$ (all roots lie in $F$ ), and
For each eigenvalue $\lambda$ , the geometric multiplicity equals the algebraic multiplicity: $m_g(\lambda) = m_a(\lambda)$ .

Equivalently, $A$ is diagonalizable iff $V$ has a basis consisting of eigenvectors of $A$ .

Diagonalization procedure

ExampleDiagonalizing a 2x2 matrix

$A = \begin{pmatrix} 4 & 1 \\ 2 & 3 \end{pmatrix}$ .

Step 1: Characteristic polynomial: $\lambda^2 - 7\lambda + 10 = (\lambda - 5)(\lambda - 2)$ .

Step 2: Eigenvalues: $\lambda_1 = 5$ , $\lambda_2 = 2$ .

Step 3: Eigenvectors:

$\lambda = 5$ : $(A - 5I)v = 0$ gives $\begin{pmatrix} -1 & 1 \\ 2 & -2 \end{pmatrix}v = 0$ , so $v_1 = (1, 1)$ .
$\lambda = 2$ : $(A - 2I)v = 0$ gives $\begin{pmatrix} 2 & 1 \\ 2 & 1 \end{pmatrix}v = 0$ , so $v_2 = (1, -2)$ .

Step 4: $P = \begin{pmatrix} 1 & 1 \\ 1 & -2 \end{pmatrix}$ , $D = \begin{pmatrix} 5 & 0 \\ 0 & 2 \end{pmatrix}$ , and $A = PDP^{-1}$ .

ExampleDiagonalizing a 3x3 matrix

$A = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & -2 \\ 0 & 1 & 4 \end{pmatrix}$ .

Characteristic polynomial: $(1 - \lambda)[(1-\lambda)(4-\lambda) + 2] = (1-\lambda)(\lambda^2 - 5\lambda + 6) = (1-\lambda)({\lambda - 2})(\lambda - 3)$ .

Eigenvalues: $1, 2, 3$ (all distinct, so automatically diagonalizable).

Eigenvectors: $v_1 = (1, 0, 0)$ for $\lambda = 1$ ; $v_2 = (0, -2, 1)$ for $\lambda = 2$ ; $v_3 = (0, -1, 1)$ for $\lambda = 3$ .

$P = \begin{pmatrix} 1 & 0 & 0 \\ 0 & -2 & -1 \\ 0 & 1 & 1 \end{pmatrix}$ , $D = \operatorname{diag}(1, 2, 3)$ .

When diagonalization fails

ExampleA non-diagonalizable 2x2 matrix

$A = \begin{pmatrix} 3 & 1 \\ 0 & 3 \end{pmatrix}$ .

Characteristic polynomial: $(\lambda - 3)^2$ . Eigenvalue $\lambda = 3$ with $m_a = 2$ .

$A - 3I = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}$ , so $\ker(A - 3I) = \operatorname{span}\{(1, 0)\}$ , $m_g = 1$ .

Since $m_g = 1 < 2 = m_a$ , $A$ is not diagonalizable. There is no basis of $\mathbb{R}^2$ consisting of eigenvectors.

ExampleNon-diagonalizable over R but diagonalizable over C

$A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}$ .

Over $\mathbb{R}$ : $p_A(\lambda) = \lambda^2 + 1$ has no real roots, so $A$ is not diagonalizable over $\mathbb{R}$ .

Over $\mathbb{C}$ : eigenvalues $i, -i$ with eigenvectors $(1, -i)$ and $(1, i)$ . Then $P^{-1}AP = \begin{pmatrix} i & 0 \\ 0 & -i \end{pmatrix}$ .

ExampleNilpotent matrices (except zero) are never diagonalizable

If $N \neq 0$ is nilpotent with $N^k = 0$ , then the only eigenvalue is $0$ , so if $N$ were diagonalizable, $D = 0$ , meaning $N = P \cdot 0 \cdot P^{-1} = 0$ , contradicting $N \neq 0$ .

Example: $N = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}$ has $N^3 = 0$ , eigenvalue $0$ with $m_a = 3$ but $m_g = 1$ .

Sufficient conditions for diagonalizability

TheoremDistinct eigenvalues imply diagonalizability

If $A \in M_{n \times n}(F)$ has $n$ distinct eigenvalues in $F$ , then $A$ is diagonalizable.

ExampleDistinct eigenvalues guarantee diagonalizability

$A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$ . Eigenvalues: $\frac{5 \pm \sqrt{25 - (-8)}}{2}$ ... let us compute: $p(\lambda) = \lambda^2 - 5\lambda - 2$ , discriminant $25 + 8 = 33 > 0$ .

Two distinct real eigenvalues $\frac{5 \pm \sqrt{33}}{2} \approx 5.37, -0.37$ . Since they are distinct, $A$ is diagonalizable over $\mathbb{R}$ .

TheoremSymmetric matrices are diagonalizable

Every real symmetric matrix ( $A^T = A$ ) is diagonalizable over $\mathbb{R}$ . Moreover, it is orthogonally diagonalizable: there exists an orthogonal matrix $Q$ ( $Q^T Q = I$ ) such that $Q^T A Q = D$ .

ExampleOrthogonal diagonalization of a symmetric matrix

$A = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$ .

Eigenvalues: $3, 1$ . Eigenvectors: $\frac{1}{\sqrt{2}}(1, 1)$ and $\frac{1}{\sqrt{2}}(1, -1)$ .

$Q = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}$ , $Q^TAQ = \begin{pmatrix} 3 & 0 \\ 0 & 1 \end{pmatrix}$ .

Powers and functions of diagonalizable matrices

TheoremPowers via diagonalization

If $A = PDP^{-1}$ where $D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)$ , then:

$A^k = PD^kP^{-1} = P \operatorname{diag}(\lambda_1^k, \ldots, \lambda_n^k) P^{-1}.$

ExampleComputing A^{10}

$A = \begin{pmatrix} 1 & 1 \\ 0 & 2 \end{pmatrix}$ , eigenvalues $1, 2$ , $P = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}$ , $P^{-1} = \begin{pmatrix} 1 & -1 \\ 0 & 1 \end{pmatrix}$ .

$A^{10} = P \begin{pmatrix} 1 & 0 \\ 0 & 1024 \end{pmatrix} P^{-1} = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}\begin{pmatrix} 1 & 0 \\ 0 & 1024 \end{pmatrix}\begin{pmatrix} 1 & -1 \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 1023 \\ 0 & 1024 \end{pmatrix}$ .

ExampleFibonacci numbers via diagonalization

The Fibonacci recurrence $F_{n+2} = F_{n+1} + F_n$ is encoded by $\begin{pmatrix} F_{n+1} \\ F_n \end{pmatrix} = A^n \begin{pmatrix} 1 \\ 0 \end{pmatrix}$ where $A = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}$ .

Eigenvalues: $\phi = \frac{1+\sqrt{5}}{2}$ and $\hat{\phi} = \frac{1-\sqrt{5}}{2}$ .

By diagonalization: $A^n = PD^nP^{-1}$ , leading to Binet's formula:

$F_n = \frac{\phi^n - \hat{\phi}^n}{\sqrt{5}}.$

ExampleMatrix exponential via diagonalization

If $A = PDP^{-1}$ , then $e^A = Pe^DP^{-1} = P\operatorname{diag}(e^{\lambda_1}, \ldots, e^{\lambda_n})P^{-1}$ .

For $A = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}$ : eigenvalues $i, -i$ , so $e^A = P\begin{pmatrix} e^i & 0 \\ 0 & e^{-i} \end{pmatrix}P^{-1}$ . Computing back in real coordinates gives $e^A = \begin{pmatrix} \cos 1 & \sin 1 \\ -\sin 1 & \cos 1 \end{pmatrix}$ , a rotation by $1$ radian.

Simultaneous diagonalization

TheoremSimultaneous diagonalization

Two diagonalizable matrices $A, B \in M_{n \times n}(F)$ are simultaneously diagonalizable (i.e., there exists $P$ such that both $P^{-1}AP$ and $P^{-1}BP$ are diagonal) if and only if $AB = BA$ .

ExampleCommuting matrices are simultaneously diagonalizable

$A = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix}$ , $B = \begin{pmatrix} 3 & 0 \\ 0 & 5 \end{pmatrix}$ . Since both are diagonal, $AB = BA$ , and they are already simultaneously diagonalized (with $P = I$ ).

A nontrivial example: $A = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$ , $B = \begin{pmatrix} 4 & -1 \\ -1 & 4 \end{pmatrix}$ . Check: $AB = \begin{pmatrix} 7 & 2 \\ 2 & 7 \end{pmatrix} = BA$ . Common eigenvectors: $(1,1)$ and $(1,-1)$ . In this basis, $A = \operatorname{diag}(3, 1)$ and $B = \operatorname{diag}(3, 5)$ .

ExampleNon-commuting diagonalizable matrices

$A = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix}$ , $B = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$ . Then $AB = \begin{pmatrix} 0 & 1 \\ 2 & 0 \end{pmatrix} \neq \begin{pmatrix} 0 & 2 \\ 1 & 0 \end{pmatrix} = BA$ .

Both are diagonalizable (distinct eigenvalues), but they cannot be simultaneously diagonalized. Their eigenvector bases are different: $A$ uses $e_1, e_2$ while $B$ uses $(1,1), (1,-1)$ .

Applications of diagonalization

ExampleSystems of differential equations

The system $\mathbf{x}'(t) = A\mathbf{x}(t)$ with $A = \begin{pmatrix} -3 & 1 \\ 1 & -3 \end{pmatrix}$ .

Eigenvalues: $-2, -4$ . Eigenvectors: $(1,1), (1,-1)$ .

General solution: $\mathbf{x}(t) = c_1 e^{-2t}\begin{pmatrix} 1 \\ 1 \end{pmatrix} + c_2 e^{-4t}\begin{pmatrix} 1 \\ -1 \end{pmatrix}$ .

Both eigenvalues are negative, so all solutions decay to $0$ as $t \to \infty$ (the origin is a stable node).

ExampleSteady states of Markov chains

$A = \begin{pmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{pmatrix}$ (transition matrix). Eigenvalues: $\lambda_1 = 1$ , $\lambda_2 = 0.5$ .

Eigenvectors: $v_1 = (3, 2)$ for $\lambda_1 = 1$ , $v_2 = (1, -1)$ for $\lambda_2 = 0.5$ .

As $n \to \infty$ : $A^n \to$ projection onto $E_1$ . The steady-state distribution is $\pi = \frac{1}{5}(3, 2)$ : $60\%$ in state $1$ , $40\%$ in state $2$ .

Summary

RemarkDiagonalization as the ideal decomposition

Diagonalization decomposes $V$ into a direct sum of eigenspaces: $V = E_{\lambda_1} \oplus E_{\lambda_2} \oplus \cdots \oplus E_{\lambda_k}$ . In this decomposition:

$T$ acts on each $E_{\lambda_i}$ by scalar multiplication $\lambda_i$ .
Powers $T^n$ act by $\lambda_i^n$ on each eigenspace.
Functions $f(T)$ act by $f(\lambda_i)$ on each eigenspace.
Matrix equations reduce to scalar equations.

When diagonalization fails ( $m_g < m_a$ for some eigenvalue), the next best option is the Jordan normal form, which introduces Jordan blocks to handle the deficiency.