ProofComplete

Proof of the Cramer-Rao Lower Bound

We prove the Cramer-Rao inequality, establishing the fundamental limit on the precision of unbiased estimators.


Proof

Theorem (Cramer-Rao): If θ^\hat{\theta} is an unbiased estimator of θ\theta and the regularity conditions hold, then Var(θ^)1/(nI(θ))\operatorname{Var}(\hat{\theta}) \geq 1/(nI(\theta)).

Step 1: The score function.

Define the score function Sn(θ)=θlogL(θ)=i=1nθlogf(Xi;θ)S_n(\theta) = \frac{\partial}{\partial \theta} \log L(\theta) = \sum_{i=1}^n \frac{\partial}{\partial \theta} \log f(X_i; \theta).

The score has mean zero: E[Sn(θ)]=E[θlogL(θ)]=θf(x;θ)1f(x;θ)f(x;θ)dx=θf(x;θ)dx=θ1=0E[S_n(\theta)] = E\left[\frac{\partial}{\partial \theta} \log L(\theta)\right] = \int \frac{\partial}{\partial \theta} f(\mathbf{x}; \theta)\, \frac{1}{f(\mathbf{x};\theta)} \cdot f(\mathbf{x};\theta)\,d\mathbf{x} = \frac{\partial}{\partial \theta} \int f(\mathbf{x};\theta)\,d\mathbf{x} = \frac{\partial}{\partial \theta} 1 = 0

where we used the regularity condition to interchange differentiation and integration.

The variance of the score is the Fisher information: Var(Sn(θ))=E[Sn(θ)2]=nI(θ)\operatorname{Var}(S_n(\theta)) = E[S_n(\theta)^2] = nI(\theta)

Step 2: Covariance calculation.

Since θ^\hat{\theta} is unbiased, E[θ^]=θE[\hat{\theta}] = \theta for all θ\theta. Differentiating both sides with respect to θ\theta: θE[θ^]=1\frac{\partial}{\partial \theta} E[\hat{\theta}] = 1

θθ^(x)f(x;θ)dx=θ^(x)fθ(x;θ)dx\frac{\partial}{\partial \theta} \int \hat{\theta}(\mathbf{x}) f(\mathbf{x}; \theta)\,d\mathbf{x} = \int \hat{\theta}(\mathbf{x}) \frac{\partial f}{\partial \theta}(\mathbf{x}; \theta)\,d\mathbf{x}

=θ^(x)logfθ(x;θ)f(x;θ)dx=E[θ^Sn(θ)]= \int \hat{\theta}(\mathbf{x}) \frac{\partial \log f}{\partial \theta}(\mathbf{x}; \theta) f(\mathbf{x}; \theta)\,d\mathbf{x} = E[\hat{\theta} \cdot S_n(\theta)]

Since E[Sn]=0E[S_n] = 0: Cov(θ^,Sn)=E[θ^Sn]E[θ^]E[Sn]=1θ0=1\operatorname{Cov}(\hat{\theta}, S_n) = E[\hat{\theta} \cdot S_n] - E[\hat{\theta}] \cdot E[S_n] = 1 - \theta \cdot 0 = 1

Step 3: Apply the Cauchy-Schwarz inequality.

By the Cauchy-Schwarz inequality: [Cov(θ^,Sn)]2Var(θ^)Var(Sn)[\operatorname{Cov}(\hat{\theta}, S_n)]^2 \leq \operatorname{Var}(\hat{\theta}) \cdot \operatorname{Var}(S_n)

1=12Var(θ^)nI(θ)1 = 1^2 \leq \operatorname{Var}(\hat{\theta}) \cdot nI(\theta)

Therefore: Var(θ^)1nI(θ)\operatorname{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)}

Step 4: Equality condition.

Equality in Cauchy-Schwarz holds if and only if θ^θ=cSn(θ)\hat{\theta} - \theta = c \cdot S_n(\theta) for some constant cc (not depending on x\mathbf{x}). This occurs precisely when the score is a linear function of θ^\hat{\theta}, which is characteristic of exponential family distributions. \square


RemarkExtension to biased estimators

For a biased estimator θ^\hat{\theta} with E[θ^]=g(θ)E[\hat{\theta}] = g(\theta), the Cramer-Rao bound becomes Var(θ^)[g(θ)]2/(nI(θ))\operatorname{Var}(\hat{\theta}) \geq [g'(\theta)]^2 / (nI(\theta)). The proof is identical except Cov(θ^,Sn)=g(θ)\operatorname{Cov}(\hat{\theta}, S_n) = g'(\theta) instead of 11.