Convergence Theorem for Markov Chains
The convergence theorem describes the long-term behavior of the transition probabilities for irreducible, aperiodic, positive recurrent Markov chains. It states that the distribution of the chain converges to the unique stationary distribution, regardless of the initial state.
The main result
Let be an irreducible, aperiodic, positive recurrent Markov chain on a countable state space with unique stationary distribution . Then for all states :
Moreover, the convergence is uniform in : for any , there exists such that for all and all :
This theorem says that no matter where the chain starts, the probability of being in state after steps converges to as . The initial distribution is "forgotten" asymptotically.
Aperiodicity ensures that does not oscillate as . For periodic chains, the sequence may fail to converge (e.g., the cyclic chain on ), but the CesΓ ro averages still converge to .
Proof outline
The proof combines several key ingredients:
Step 1: Coupling argument. Construct two independent copies of the chain, and , starting from states and respectively. By irreducibility and aperiodicity, there is a positive probability that for some finite . Once they meet, they remain together forever (by the Markov property).
Step 2: Total variation distance. Define the total variation distance between distributions and as
Then,
where is an independent copy starting from the stationary distribution.
Step 3: Exponential decay. By aperiodicity, there exists such that
for some constant . This is the geometric ergodicity property. The rate depends on the spectral gap of the transition matrix.
Step 4: Pointwise convergence. Since the total variation distance controls pointwise differences,
For irreducible, aperiodic chains on a finite state space, the convergence is always geometrically fast: there exist constants such that
The rate is the second-largest eigenvalue of (in absolute value).
Examples
For the transition matrix
the eigenvalues are and . The stationary distribution is . We have
Thus,
The convergence rate is .
A lazy random walk on a graph stays at the current vertex with probability and moves to a uniformly random neighbor with probability . This modification ensures aperiodicity: for all , so every state is aperiodic.
For the lazy walk on the cycle , the stationary distribution is uniform: . The convergence rate is governed by the second eigenvalue of the transition matrix, which is approximately for large .
Mixing time
The mixing time of a Markov chain is the smallest such that
Equivalently, after time , the distribution is within total variation distance of the stationary distribution, regardless of the starting state.
The mixing time quantifies how long it takes for the chain to "forget" its initial condition.
The hypercube has vertices. A random walk on the hypercube flips a uniformly random coordinate at each step. The stationary distribution is uniform: .
The mixing time is : after steps (for a suitable constant ), the distribution is close to uniform. This is a foundational result in the theory of random walks on graphs.
Rate of convergence and the spectral gap
For a finite-state reversible Markov chain with eigenvalues , the spectral gap is
A larger spectral gap implies faster convergence to stationarity.
For a reversible Markov chain on a finite state space, the mixing time satisfies
where .
This theorem connects the algebraic property (spectral gap) to the probabilistic property (mixing time). Techniques from spectral graph theory and linear algebra provide powerful tools for bounding mixing times.
On the complete graph (all pairs of vertices connected), a random walk moves to a uniformly random neighbor at each step. The stationary distribution is uniform: . The spectral gap is , so the mixing time is .
Periodic chains and CesΓ ro convergence
For any irreducible, positive recurrent Markov chain (possibly periodic), the CesΓ ro averages converge:
This weaker form of convergence holds even for periodic chains. However, for practical applications (e.g., MCMC), we usually modify the chain to be aperiodic (e.g., by adding a self-loop with positive probability).
On the cycle with deterministic transitions , the stationary distribution is uniform: . However,
The sequence oscillates, but
Summary
The convergence theorem describes the asymptotic behavior of Markov chains:
- Irreducible + aperiodic + positive recurrent: as .
- Geometric convergence: For finite state spaces, the rate is exponential, governed by the spectral gap.
- Mixing time: Quantifies how long it takes to reach near-stationarity.
- CesΓ ro convergence: Weaker result that holds even for periodic chains.
These results are central to applications in MCMC, statistical physics (Gibbs measures), and randomized algorithms (e.g., approximate counting).