Introduction to Ergodic Theory - Key Proof

ProofProof Sketch of Birkhoff's Ergodic Theorem

We outline the proof of Birkhoff's ergodic theorem, emphasizing key ideas over technical details.

Theorem Statement: For a measure-preserving $(X, \mu, T)$ and $f \in L^1(X, \mu)$ , the limit $\lim_{n \to \infty} \frac{1}{n}\sum_{k=0}^{n-1} f(T^k(x)) = f^*(x)$ exists almost everywhere, with $f^*$ invariant and $\int f^* d\mu = \int f d\mu$ .

Step 1: Maximal ergodic lemma

Define the maximal function:

$f^+(x) = \sup_{n \geq 1} \frac{1}{n} \sum_{k=0}^{n-1} f(T^k(x))$

The maximal ergodic lemma states: for the set $E = \{x : f^+(x) > 0\}$ :

$\int_E f \, d\mu \geq 0$

Proof of lemma: Define $S_n = \sum_{k=0}^{n-1} f \circ T^k$ and note that:

$\max_{1 \leq k \leq n} S_k(x) = \max\{f(x), f(x) + \max_{1 \leq k \leq n-1} S_k(T(x))\}$

Using this recursion and measure-preservation, one shows $\int_E f \, d\mu \geq 0$ through careful estimation. This lemma is the technical heart of the proof.

Step 2: Limsup and liminf

Define:

$\overline{f}(x) = \limsup_{n \to \infty} \frac{1}{n}\sum_{k=0}^{n-1} f(T^k(x))$ $\underline{f}(x) = \liminf_{n \to \infty} \frac{1}{n}\sum_{k=0}^{n-1} f(T^k(x))$

Both $\overline{f}$ and $\underline{f}$ are $T$ -invariant (if $T$ moves $x$ to $T(x)$ , the time average is the same, just shifted).

Step 3: Showing $\overline{f} = \underline{f}$ almost everywhere

For any $c \in \mathbb{R}$ , apply the maximal lemma to $f - c$ . The set where $\overline{f} > c$ has:

$\int_{\{\overline{f} > c\}} (f - c) \, d\mu \geq 0$

Similarly, for $\underline{f} < c$ :

$\int_{\{\underline{f} < c\}} (f - c) \, d\mu \leq 0$

If $\mu(\{\overline{f} > \underline{f}\}) > 0$ , choose $c$ between $\overline{f}$ and $\underline{f}$ on this set (by density of rationals). This yields:

$\int_{\{\overline{f} > c > \underline{f}\}} (f - c) \, d\mu \geq 0 \text{ and } \leq 0$

implying $\mu(\{\overline{f} > c > \underline{f}\}) = 0$ . Taking countable union over rationals: $\mu(\{\overline{f} > \underline{f}\}) = 0$ .

Thus $\overline{f} = \underline{f} =: f^*$ almost everywhere, so the limit exists.

Step 4: Invariance and integral preservation

$f^*$ is invariant: $f^* \circ T = f^*$ by construction (time averages are shift-invariant).

For integral preservation, use the dominated convergence theorem (or monotone convergence with truncations):

$\int f^* \, d\mu = \int \lim_{n \to \infty} \frac{1}{n}\sum_{k=0}^{n-1} f \circ T^k \, d\mu = \lim_{n \to \infty} \frac{1}{n}\sum_{k=0}^{n-1} \int f \circ T^k \, d\mu$

Since $T$ preserves measure, $\int f \circ T^k d\mu = \int f d\mu$ , yielding:

$\int f^* \, d\mu = \lim_{n \to \infty} \frac{1}{n} \cdot n \int f \, d\mu = \int f \, d\mu$

Conclusion: The time average $f^*$ exists almost everywhere, is invariant, and has the same integral as $f$ .

■

This proof combines measure theory, functional analysis, and clever inequalities. The maximal ergodic lemma is the key technical tool, controlling fluctuations in partial sums. Once this is established, the rest follows from standard arguments.

Remark

For ergodic $T$ , any invariant function is constant almost everywhere (by ergodicity definition). Thus $f^* = c$ a.e., and integrating: $c = \int f d\mu$ . This completes the classical statement: for ergodic systems, time averages equal the space average.

The proof extends to $L^p$ spaces and more general settings (amenable groups, noncommutative spaces), demonstrating the robustness of the ergodic theorem beyond its original formulation.

ExampleApplication to Monte Carlo Methods

Birkhoff's theorem justifies Monte Carlo integration. To compute $\int_X f \, d\mu$ for an ergodic system:

Choose any typical initial $x_0$
Compute time average: $\frac{1}{N}\sum_{k=0}^{N-1} f(T^k(x_0))$
As $N \to \infty$ , this converges to $\int f \, d\mu$ almost surely

This provides theoretical foundation for Markov Chain Monte Carlo methods widely used in statistics, physics, and machine learning.

Birkhoff's ergodic theorem stands among the great results of 20th-century mathematics, connecting dynamics, probability, and analysis. It provides rigorous foundations for statistical mechanics, justifies computational methods, and reveals deep connections between individual trajectories and ensemble statistics.