TheoremComplete

The Ergodic Theorem for Markov Chains

The ergodic theorem is the fundamental result connecting time averages and stationary distributions for Markov chains. It generalizes the strong law of large numbers to dependent processes, stating that the long-run proportion of time spent in each state converges to the stationary probability of that state.


Statement of the theorem

Theorem1.1Ergodic theorem (strong law for Markov chains)

Let (Xn)n0(X_n)_{n \geq 0} be an irreducible, positive recurrent Markov chain on a countable state space SS with stationary distribution π\pi. Then for any initial distribution and any state jSj \in S:

limn1nk=0n11Xk=j=πjalmost surely.\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}_{X_k = j} = \pi_j \quad \text{almost surely}.

Moreover, for any bounded function f:SRf: S \to \mathbb{R}:

limn1nk=0n1f(Xk)=jSπjf(j)=Eπ[f(X0)]almost surely.\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} f(X_k) = \sum_{j \in S} \pi_j f(j) = \mathbb{E}_\pi[f(X_0)] \quad \text{almost surely}.

This theorem says that time averages equal ensemble averages: the proportion of time the chain spends in state jj equals the stationary probability πj\pi_j, and more generally, time averages of any observable ff converge to the expectation under the stationary distribution.

RemarkComparison with the SLLN

The classical strong law of large numbers (SLLN) states that for i.i.d. random variables Y1,Y2,Y_1, Y_2, \ldots with mean μ\mu:

limn1nk=1nYk=μa.s.\lim_{n \to \infty} \frac{1}{n} \sum_{k=1}^n Y_k = \mu \quad \text{a.s.}

The ergodic theorem extends this to Markov chains, where the XkX_k are not independent. The role of μ\mu is played by Eπ[f(X0)]\mathbb{E}_\pi[f(X_0)].


Interpretation and consequences

ExampleTwo-state chain

For the two-state chain with stationary distribution π=(β/(α+β),α/(α+β))\pi = (\beta/(\alpha+\beta), \alpha/(\alpha+\beta)), the ergodic theorem says:

limn1nk=0n11Xk=0=βα+β.\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}_{X_k = 0} = \frac{\beta}{\alpha + \beta}.

Starting from any initial state, the long-run proportion of time in state 0 is β/(α+β)\beta/(\alpha+\beta).

ExampleConnection to renewal theory

Fix a state ii and let T0=0T_0 = 0, T1,T2,T_1, T_2, \ldots be the successive return times to ii. The increments τk=TkTk1\tau_k = T_k - T_{k-1} are i.i.d. with mean μi=E[τiX0=i]\mu_i = \mathbb{E}[\tau_i \mid X_0 = i]. By the renewal theorem, the number of visits to ii up to time nn is asymptotically n/μin/\mu_i, so

limn1nk=0n11Xk=i=1μi=πi.\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}_{X_k = i} = \frac{1}{\mu_i} = \pi_i.

This is a special case of the ergodic theorem.


Proof sketch

The proof relies on the strong Markov property and renewal theory. Here is an outline:

Step 1: Renewal structure. Fix a state ii and decompose the trajectory into i.i.d. excursions between visits to ii. Let T1,T2,T_1, T_2, \ldots be the return times to ii, with τk=TkTk1\tau_k = T_k - T_{k-1} i.i.d. with mean μi<\mu_i < \infty.

Step 2: Apply the renewal theorem. By the strong law of large numbers for the i.i.d. sequence {τk}\{\tau_k\}:

limmTmm=μia.s.\lim_{m \to \infty} \frac{T_m}{m} = \mu_i \quad \text{a.s.}

Let N(n)=max{m:Tmn}N(n) = \max\{m : T_m \leq n\} be the number of returns to ii by time nn. Then TN(n)n<TN(n)+1T_{N(n)} \leq n < T_{N(n)+1}, so

TN(n)N(n)nN(n)<TN(n)+1N(n).\frac{T_{N(n)}}{N(n)} \leq \frac{n}{N(n)} < \frac{T_{N(n)+1}}{N(n)}.

As nn \to \infty, N(n)N(n) \to \infty a.s., so n/N(n)μin/N(n) \to \mu_i.

Step 3: Counting visits. The number of visits to ii up to time nn is N(n)+O(1)N(n) + O(1), so

1nk=0n11Xk=i=N(n)n1μi=πia.s.\frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}_{X_k = i} = \frac{N(n)}{n} \to \frac{1}{\mu_i} = \pi_i \quad \text{a.s.}

Step 4: Extend to all states. For any other state jj, count the visits to jj during each excursion from ii to ii. The expected number of visits to jj per excursion is finite (by positive recurrence), and the law of large numbers for renewals yields the result for all jj.

RemarkAperiodicity not required

Crucially, the ergodic theorem holds for all irreducible, positive recurrent chains, even periodic ones. Convergence of time averages does not require convergence of the distribution P(Xn=j)\mathbb{P}(X_n = j) — the latter requires aperiodicity.


Convergence of distributions

Theorem1.2Convergence to stationarity (aperiodic case)

If the chain is also aperiodic, then for all states i,jSi, j \in S:

limnpij(n)=πj.\lim_{n \to \infty} p_{ij}^{(n)} = \pi_j.

In other words, P(Xn=jX0=i)πj\mathbb{P}(X_n = j \mid X_0 = i) \to \pi_j as nn \to \infty, regardless of the starting state ii.

For periodic chains, pij(n)p_{ij}^{(n)} does not converge, but the Cesàro average does:

limn1nk=0n1pij(k)=πj.\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} p_{ij}^{(k)} = \pi_j.

This is the Cesàro ergodic theorem, a weaker form of convergence.

ExamplePeriodic chain without pointwise convergence

Consider the deterministic cyclic chain on {0,1}\{0, 1\} with p01=p10=1p_{01} = p_{10} = 1. The stationary distribution is π=(1/2,1/2)\pi = (1/2, 1/2). However,

p00(n)={1if n is even,0if n is odd.p_{00}^{(n)} = \begin{cases} 1 & \text{if } n \text{ is even}, \\ 0 & \text{if } n \text{ is odd}. \end{cases}

The sequence {p00(n)}\{p_{00}^{(n)}\} oscillates and does not converge. But the Cesàro average

1nk=0n1p00(k)12=π0.\frac{1}{n} \sum_{k=0}^{n-1} p_{00}^{(k)} \to \frac{1}{2} = \pi_0.


Applications

ExampleMarkov chain Monte Carlo

In MCMC algorithms (e.g., Metropolis-Hastings, Gibbs sampling), we construct a Markov chain (Xn)(X_n) whose stationary distribution is a target distribution π\pi (often a posterior distribution). The ergodic theorem guarantees that

1nk=0n1f(Xk)Eπ[f(X0)]\frac{1}{n} \sum_{k=0}^{n-1} f(X_k) \to \mathbb{E}_\pi[f(X_0)]

for any integrable function ff. This allows us to approximate expectations under π\pi by simulating the chain.

ExampleCard shuffling

Consider shuffling a deck of cards by repeatedly performing a random transposition. The state space is the symmetric group SnS_n (permutations), and the chain converges to the uniform distribution on SnS_n. The ergodic theorem implies that the proportion of time the deck spends in any configuration converges to 1/n!1/n!.

RemarkCentral limit theorem for Markov chains

Beyond the law of large numbers, there is also a central limit theorem for Markov chains: under appropriate conditions,

1nk=0n1(f(Xk)Eπ[f])dN(0,σ2),\frac{1}{\sqrt{n}} \sum_{k=0}^{n-1} \left(f(X_k) - \mathbb{E}_\pi[f]\right) \xrightarrow{d} \mathcal{N}(0, \sigma^2),

where σ2\sigma^2 depends on the variance of ff and the mixing properties of the chain. This is essential for quantifying the error in MCMC estimates.


Summary

The ergodic theorem is the cornerstone of Markov chain theory:

  • Time averages = ensemble averages: Long-run frequencies converge to stationary probabilities.
  • Applies to all positive recurrent chains: Aperiodicity is not required.
  • Foundation for MCMC: Justifies using Markov chains to approximate expectations.
  • Extends the SLLN: From i.i.d. to dependent sequences with a renewal structure.

The theorem shows that stationary distributions are not just algebraic objects (πP=π\pi P = \pi) but have a profound probabilistic interpretation as long-run frequencies.