The Central Limit Theorem

The Central Limit Theorem (CLT) is arguably the most important result in probability theory, explaining why the normal distribution appears so frequently in nature and providing the theoretical foundation for statistical inference.

Statement

Definition

A sequence of random variables $X_1, X_2, \ldots$ converges in distribution to a random variable $X$ , written $X_n \xrightarrow{d} X$ , if $\lim_{n \to \infty} F_{X_n}(x) = F_X(x)$ for every $x$ at which $F_X$ is continuous, where $F_{X_n}$ and $F_X$ are the cumulative distribution functions.

Theorem7.3Central Limit Theorem (CLT)

Let $X_1, X_2, \ldots$ be independent and identically distributed (i.i.d.) random variables with mean $\mu = E[X_i]$ and finite variance $\sigma^2 = \operatorname{Var}(X_i) > 0$ . Let $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ . Then the standardized sample mean converges in distribution to the standard normal: $\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sum_{i=1}^n X_i - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} N(0, 1) \quad \text{as } n \to \infty$ Equivalently, $P\left(\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \leq z\right) \to \Phi(z)$ for all $z$ , where $\Phi$ is the standard normal CDF.

Examples

ExampleCLT for dice rolls

Let $X_i$ be the outcome of rolling a fair die: $\mu = 3.5$ , $\sigma^2 = 35/12$ . For $n = 100$ rolls, $\bar{X}_{100}$ is approximately normal with mean $3.5$ and standard deviation $\sigma/\sqrt{100} \approx 0.1708$ . The probability of the average exceeding $3.7$ is approximately: $P(\bar{X}_{100} > 3.7) \approx 1 - \Phi\left(\frac{3.7 - 3.5}{0.1708}\right) = 1 - \Phi(1.17) \approx 0.121$

ExampleCLT for Bernoulli trials

For $X_i \sim \text{Bernoulli}(p)$ with $n$ trials: $\hat{p} = \bar{X}_n$ is the sample proportion, and $\frac{\hat{p} - p}{\sqrt{p(1-p)/n}} \xrightarrow{d} N(0, 1)$ This gives the approximate $95\%$ confidence interval $\hat{p} \pm 1.96\sqrt{\hat{p}(1-\hat{p})/n}$ .

RemarkNo assumption on the underlying distribution

The CLT is remarkable because it requires no assumption about the shape of the $X_i$ distribution — only that the mean and variance exist. Whether the $X_i$ are discrete, continuous, skewed, or multimodal, the average $\bar{X}_n$ approaches normality. This universality explains the prevalence of the bell curve in real-world data: any quantity that is the sum of many small independent effects will be approximately normal.