Berry-Esseen Theorem and CLT Refinements

While the CLT states that normalized sums converge to the normal distribution, the Berry-Esseen theorem quantifies the rate of convergence, providing non-asymptotic error bounds.

Berry-Esseen Bound

Theorem7.9Berry-Esseen Theorem

Let $X_1, X_2, \ldots$ be i.i.d. random variables with $E[X_i] = 0$ , $E[X_i^2] = \sigma^2 > 0$ , and $E[|X_i|^3] = \rho < \infty$ . Let $S_n = \frac{X_1 + \cdots + X_n}{\sigma\sqrt{n}}$ . Then there exists a universal constant $C$ such that $\sup_{x \in \mathbb{R}} |P(S_n \leq x) - \Phi(x)| \leq \frac{C\rho}{\sigma^3 \sqrt{n}}$ The best known value is $C \leq 0.4748$ (Shevtsova, 2011).

The Berry-Esseen theorem shows the CLT approximation error is $O(1/\sqrt{n})$ , which is sharp in general. The bound depends on the ratio $\rho/\sigma^3$ , which measures the "non-normality" of the distribution.

ExampleBernoulli case

For $X_i \sim \text{Bernoulli}(1/2) - 1/2$ : $\sigma^2 = 1/4$ , $\rho = 1/8$ . The Berry-Esseen bound gives: $\sup_x |P(S_n \leq x) - \Phi(x)| \leq \frac{0.4748 \cdot 1/8}{(1/2)^3 \sqrt{n}} = \frac{0.4748}{\sqrt{n}}$ For $n = 100$ : error $\leq 0.047$ , meaning the normal approximation is accurate to about $5\%$ .

Lindeberg-Feller CLT

Theorem7.10Lindeberg-Feller CLT

Let $X_1, X_2, \ldots$ be independent (not necessarily identically distributed) with $E[X_i] = 0$ , $\sigma_i^2 = \operatorname{Var}(X_i)$ , and $s_n^2 = \sum_{i=1}^n \sigma_i^2$ . If the Lindeberg condition holds: for every $\epsilon > 0$ , $\frac{1}{s_n^2} \sum_{i=1}^n E[X_i^2 \cdot \mathbf{1}_{|X_i| > \epsilon s_n}] \to 0 \quad \text{as } n \to \infty$ then $\frac{X_1 + \cdots + X_n}{s_n} \xrightarrow{d} N(0, 1)$ .

The Lindeberg condition ensures that no single summand dominates the sum. It is sufficient and, under a mild uniformity assumption ( $\max_i \sigma_i^2 / s_n^2 \to 0$ ), also necessary.

RemarkMultivariate CLT

The CLT extends to random vectors: if $\mathbf{X}_1, \mathbf{X}_2, \ldots$ are i.i.d. random vectors in $\mathbb{R}^d$ with mean $\boldsymbol{\mu}$ and covariance matrix $\Sigma$ , then $\sqrt{n}(\bar{\mathbf{X}}_n - \boldsymbol{\mu}) \xrightarrow{d} N(\mathbf{0}, \Sigma)$ This multivariate version is the basis for multivariate statistical methods including principal component analysis and multivariate hypothesis testing.