ConceptComplete

The Central Limit Theorem

The Central Limit Theorem (CLT) is arguably the most important result in probability theory, explaining why the normal distribution appears so frequently in nature and providing the theoretical foundation for statistical inference.


Statement

Definition

A sequence of random variables X1,X2,X_1, X_2, \ldots converges in distribution to a random variable XX, written XndXX_n \xrightarrow{d} X, if limnFXn(x)=FX(x)\lim_{n \to \infty} F_{X_n}(x) = F_X(x) for every xx at which FXF_X is continuous, where FXnF_{X_n} and FXF_X are the cumulative distribution functions.

Theorem7.3Central Limit Theorem (CLT)

Let X1,X2,X_1, X_2, \ldots be independent and identically distributed (i.i.d.) random variables with mean μ=E[Xi]\mu = E[X_i] and finite variance σ2=Var(Xi)>0\sigma^2 = \operatorname{Var}(X_i) > 0. Let Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i. Then the standardized sample mean converges in distribution to the standard normal: Xˉnμσ/n=i=1nXinμσndN(0,1)as n\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sum_{i=1}^n X_i - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} N(0, 1) \quad \text{as } n \to \infty Equivalently, P(Xˉnμσ/nz)Φ(z)P\left(\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \leq z\right) \to \Phi(z) for all zz, where Φ\Phi is the standard normal CDF.


Examples

ExampleCLT for dice rolls

Let XiX_i be the outcome of rolling a fair die: μ=3.5\mu = 3.5, σ2=35/12\sigma^2 = 35/12. For n=100n = 100 rolls, Xˉ100\bar{X}_{100} is approximately normal with mean 3.53.5 and standard deviation σ/1000.1708\sigma/\sqrt{100} \approx 0.1708. The probability of the average exceeding 3.73.7 is approximately: P(Xˉ100>3.7)1Φ(3.73.50.1708)=1Φ(1.17)0.121P(\bar{X}_{100} > 3.7) \approx 1 - \Phi\left(\frac{3.7 - 3.5}{0.1708}\right) = 1 - \Phi(1.17) \approx 0.121

ExampleCLT for Bernoulli trials

For XiBernoulli(p)X_i \sim \text{Bernoulli}(p) with nn trials: p^=Xˉn\hat{p} = \bar{X}_n is the sample proportion, and p^pp(1p)/ndN(0,1)\frac{\hat{p} - p}{\sqrt{p(1-p)/n}} \xrightarrow{d} N(0, 1) This gives the approximate 95%95\% confidence interval p^±1.96p^(1p^)/n\hat{p} \pm 1.96\sqrt{\hat{p}(1-\hat{p})/n}.


RemarkNo assumption on the underlying distribution

The CLT is remarkable because it requires no assumption about the shape of the XiX_i distribution — only that the mean and variance exist. Whether the XiX_i are discrete, continuous, skewed, or multimodal, the average Xˉn\bar{X}_n approaches normality. This universality explains the prevalence of the bell curve in real-world data: any quantity that is the sum of many small independent effects will be approximately normal.