The Ergodic Theorem for Markov Chains
The ergodic theorem is the fundamental result connecting time averages and stationary distributions for Markov chains. It generalizes the strong law of large numbers to dependent processes, stating that the long-run proportion of time spent in each state converges to the stationary probability of that state.
Statement of the theorem
Let be an irreducible, positive recurrent Markov chain on a countable state space with stationary distribution . Then for any initial distribution and any state :
Moreover, for any bounded function :
This theorem says that time averages equal ensemble averages: the proportion of time the chain spends in state equals the stationary probability , and more generally, time averages of any observable converge to the expectation under the stationary distribution.
The classical strong law of large numbers (SLLN) states that for i.i.d. random variables with mean :
The ergodic theorem extends this to Markov chains, where the are not independent. The role of is played by .
Interpretation and consequences
For the two-state chain with stationary distribution , the ergodic theorem says:
Starting from any initial state, the long-run proportion of time in state 0 is .
Fix a state and let , be the successive return times to . The increments are i.i.d. with mean . By the renewal theorem, the number of visits to up to time is asymptotically , so
This is a special case of the ergodic theorem.
Proof sketch
The proof relies on the strong Markov property and renewal theory. Here is an outline:
Step 1: Renewal structure. Fix a state and decompose the trajectory into i.i.d. excursions between visits to . Let be the return times to , with i.i.d. with mean .
Step 2: Apply the renewal theorem. By the strong law of large numbers for the i.i.d. sequence :
Let be the number of returns to by time . Then , so
As , a.s., so .
Step 3: Counting visits. The number of visits to up to time is , so
Step 4: Extend to all states. For any other state , count the visits to during each excursion from to . The expected number of visits to per excursion is finite (by positive recurrence), and the law of large numbers for renewals yields the result for all .
Crucially, the ergodic theorem holds for all irreducible, positive recurrent chains, even periodic ones. Convergence of time averages does not require convergence of the distribution — the latter requires aperiodicity.
Convergence of distributions
If the chain is also aperiodic, then for all states :
In other words, as , regardless of the starting state .
For periodic chains, does not converge, but the Cesàro average does:
This is the Cesàro ergodic theorem, a weaker form of convergence.
Consider the deterministic cyclic chain on with . The stationary distribution is . However,
The sequence oscillates and does not converge. But the Cesàro average
Applications
In MCMC algorithms (e.g., Metropolis-Hastings, Gibbs sampling), we construct a Markov chain whose stationary distribution is a target distribution (often a posterior distribution). The ergodic theorem guarantees that
for any integrable function . This allows us to approximate expectations under by simulating the chain.
Consider shuffling a deck of cards by repeatedly performing a random transposition. The state space is the symmetric group (permutations), and the chain converges to the uniform distribution on . The ergodic theorem implies that the proportion of time the deck spends in any configuration converges to .
Beyond the law of large numbers, there is also a central limit theorem for Markov chains: under appropriate conditions,
where depends on the variance of and the mixing properties of the chain. This is essential for quantifying the error in MCMC estimates.
Summary
The ergodic theorem is the cornerstone of Markov chain theory:
- Time averages = ensemble averages: Long-run frequencies converge to stationary probabilities.
- Applies to all positive recurrent chains: Aperiodicity is not required.
- Foundation for MCMC: Justifies using Markov chains to approximate expectations.
- Extends the SLLN: From i.i.d. to dependent sequences with a renewal structure.
The theorem shows that stationary distributions are not just algebraic objects () but have a profound probabilistic interpretation as long-run frequencies.