The Ergodic Theorem for Markov Chains

The ergodic theorem is the fundamental result connecting time averages and stationary distributions for Markov chains. It generalizes the strong law of large numbers to dependent processes, stating that the long-run proportion of time spent in each state converges to the stationary probability of that state.

Statement of the theorem

Theorem1.1Ergodic theorem (strong law for Markov chains)

Let $(X_n)_{n \geq 0}$ be an irreducible, positive recurrent Markov chain on a countable state space $S$ with stationary distribution $\pi$ . Then for any initial distribution and any state $j \in S$ :

$\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}_{X_k = j} = \pi_j \quad \text{almost surely}.$

Moreover, for any bounded function $f: S \to \mathbb{R}$ :

$\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} f(X_k) = \sum_{j \in S} \pi_j f(j) = \mathbb{E}_\pi[f(X_0)] \quad \text{almost surely}.$

This theorem says that time averages equal ensemble averages: the proportion of time the chain spends in state $j$ equals the stationary probability $\pi_j$ , and more generally, time averages of any observable $f$ converge to the expectation under the stationary distribution.

RemarkComparison with the SLLN

The classical strong law of large numbers (SLLN) states that for i.i.d. random variables $Y_1, Y_2, \ldots$ with mean $\mu$ :

$\lim_{n \to \infty} \frac{1}{n} \sum_{k=1}^n Y_k = \mu \quad \text{a.s.}$

The ergodic theorem extends this to Markov chains, where the $X_k$ are not independent. The role of $\mu$ is played by $\mathbb{E}_\pi[f(X_0)]$ .

Interpretation and consequences

ExampleTwo-state chain

For the two-state chain with stationary distribution $\pi = (\beta/(\alpha+\beta), \alpha/(\alpha+\beta))$ , the ergodic theorem says:

$\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}_{X_k = 0} = \frac{\beta}{\alpha + \beta}.$

Starting from any initial state, the long-run proportion of time in state 0 is $\beta/(\alpha+\beta)$ .

ExampleConnection to renewal theory

Fix a state $i$ and let $T_0 = 0$ , $T_1, T_2, \ldots$ be the successive return times to $i$ . The increments $\tau_k = T_k - T_{k-1}$ are i.i.d. with mean $\mu_i = \mathbb{E}[\tau_i \mid X_0 = i]$ . By the renewal theorem, the number of visits to $i$ up to time $n$ is asymptotically $n/\mu_i$ , so

$\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}_{X_k = i} = \frac{1}{\mu_i} = \pi_i.$

This is a special case of the ergodic theorem.

Proof sketch

The proof relies on the strong Markov property and renewal theory. Here is an outline:

Step 1: Renewal structure. Fix a state $i$ and decompose the trajectory into i.i.d. excursions between visits to $i$ . Let $T_1, T_2, \ldots$ be the return times to $i$ , with $\tau_k = T_k - T_{k-1}$ i.i.d. with mean $\mu_i < \infty$ .

Step 2: Apply the renewal theorem. By the strong law of large numbers for the i.i.d. sequence $\{\tau_k\}$ :

$\lim_{m \to \infty} \frac{T_m}{m} = \mu_i \quad \text{a.s.}$

Let $N(n) = \max\{m : T_m \leq n\}$ be the number of returns to $i$ by time $n$ . Then $T_{N(n)} \leq n < T_{N(n)+1}$ , so

$\frac{T_{N(n)}}{N(n)} \leq \frac{n}{N(n)} < \frac{T_{N(n)+1}}{N(n)}.$

As $n \to \infty$ , $N(n) \to \infty$ a.s., so $n/N(n) \to \mu_i$ .

Step 3: Counting visits. The number of visits to $i$ up to time $n$ is $N(n) + O(1)$ , so

$\frac{1}{n} \sum_{k=0}^{n-1} \mathbf{1}_{X_k = i} = \frac{N(n)}{n} \to \frac{1}{\mu_i} = \pi_i \quad \text{a.s.}$

Step 4: Extend to all states. For any other state $j$ , count the visits to $j$ during each excursion from $i$ to $i$ . The expected number of visits to $j$ per excursion is finite (by positive recurrence), and the law of large numbers for renewals yields the result for all $j$ .

RemarkAperiodicity not required

Crucially, the ergodic theorem holds for all irreducible, positive recurrent chains, even periodic ones. Convergence of time averages does not require convergence of the distribution $\mathbb{P}(X_n = j)$ — the latter requires aperiodicity.

Convergence of distributions

Theorem1.2Convergence to stationarity (aperiodic case)

If the chain is also aperiodic, then for all states $i, j \in S$ :

$\lim_{n \to \infty} p_{ij}^{(n)} = \pi_j.$

In other words, $\mathbb{P}(X_n = j \mid X_0 = i) \to \pi_j$ as $n \to \infty$ , regardless of the starting state $i$ .

For periodic chains, $p_{ij}^{(n)}$ does not converge, but the Cesàro average does:

$\lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} p_{ij}^{(k)} = \pi_j.$

This is the Cesàro ergodic theorem, a weaker form of convergence.

ExamplePeriodic chain without pointwise convergence

Consider the deterministic cyclic chain on $\{0, 1\}$ with $p_{01} = p_{10} = 1$ . The stationary distribution is $\pi = (1/2, 1/2)$ . However,

$p_{00}^{(n)} = \begin{cases} 1 & \text{if } n \text{ is even}, \\ 0 & \text{if } n \text{ is odd}. \end{cases}$

The sequence $\{p_{00}^{(n)}\}$ oscillates and does not converge. But the Cesàro average

$\frac{1}{n} \sum_{k=0}^{n-1} p_{00}^{(k)} \to \frac{1}{2} = \pi_0.$

Applications

ExampleMarkov chain Monte Carlo

In MCMC algorithms (e.g., Metropolis-Hastings, Gibbs sampling), we construct a Markov chain $(X_n)$ whose stationary distribution is a target distribution $\pi$ (often a posterior distribution). The ergodic theorem guarantees that

$\frac{1}{n} \sum_{k=0}^{n-1} f(X_k) \to \mathbb{E}_\pi[f(X_0)]$

for any integrable function $f$ . This allows us to approximate expectations under $\pi$ by simulating the chain.

ExampleCard shuffling

Consider shuffling a deck of cards by repeatedly performing a random transposition. The state space is the symmetric group $S_n$ (permutations), and the chain converges to the uniform distribution on $S_n$ . The ergodic theorem implies that the proportion of time the deck spends in any configuration converges to $1/n!$ .

RemarkCentral limit theorem for Markov chains

Beyond the law of large numbers, there is also a central limit theorem for Markov chains: under appropriate conditions,

$\frac{1}{\sqrt{n}} \sum_{k=0}^{n-1} \left(f(X_k) - \mathbb{E}_\pi[f]\right) \xrightarrow{d} \mathcal{N}(0, \sigma^2),$

where $\sigma^2$ depends on the variance of $f$ and the mixing properties of the chain. This is essential for quantifying the error in MCMC estimates.

Summary

The ergodic theorem is the cornerstone of Markov chain theory:

Time averages = ensemble averages: Long-run frequencies converge to stationary probabilities.
Applies to all positive recurrent chains: Aperiodicity is not required.
Foundation for MCMC: Justifies using Markov chains to approximate expectations.
Extends the SLLN: From i.i.d. to dependent sequences with a renewal structure.

The theorem shows that stationary distributions are not just algebraic objects ( $\pi P = \pi$ ) but have a profound probabilistic interpretation as long-run frequencies.