TheoremComplete

The Gauss-Markov Theorem

The Gauss-Markov theorem establishes that ordinary least squares produces the best estimators among all linear unbiased estimators, justifying the central role of OLS in statistical practice.


The Theorem

Theorem10.4Gauss-Markov Theorem

In the linear model Y=Xβ+ϵ\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} with E[ϵ]=0E[\boldsymbol{\epsilon}] = \mathbf{0} and Cov(ϵ)=σ2I\operatorname{Cov}(\boldsymbol{\epsilon}) = \sigma^2 \mathbf{I} (no normality assumed), the OLS estimator β^=(XTX)1XTY\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y} is the Best Linear Unbiased Estimator (BLUE) of β\boldsymbol{\beta}. Specifically, for any linear combination cTβ\mathbf{c}^T\boldsymbol{\beta} and any other linear unbiased estimator θ~=aTY\tilde{\theta} = \mathbf{a}^T\mathbf{Y} with E[θ~]=cTβE[\tilde{\theta}] = \mathbf{c}^T\boldsymbol{\beta} for all β\boldsymbol{\beta}: Var(cTβ^)Var(θ~)\operatorname{Var}(\mathbf{c}^T\hat{\boldsymbol{\beta}}) \leq \operatorname{Var}(\tilde{\theta})

Best means minimum variance, Linear means the estimator is a linear function of Y\mathbf{Y}, and Unbiased means E[β^]=βE[\hat{\boldsymbol{\beta}}] = \boldsymbol{\beta}.


Implications

ExampleOptimality of the sample mean

For the model Yi=μ+ϵiY_i = \mu + \epsilon_i (intercept only), X=1n\mathbf{X} = \mathbf{1}_n and μ^OLS=Yˉ\hat{\mu}_{OLS} = \bar{Y}. The Gauss-Markov theorem says Yˉ\bar{Y} has the smallest variance among all linear unbiased estimators of μ\mu. No other linear combination aiYi\sum a_i Y_i with ai=1\sum a_i = 1 can have smaller variance.


Extensions

Theorem10.5Generalized Gauss-Markov

If Cov(ϵ)=σ2V\operatorname{Cov}(\boldsymbol{\epsilon}) = \sigma^2 \mathbf{V} for a known positive definite matrix VI\mathbf{V} \neq \mathbf{I} (heteroscedasticity or correlation), the BLUE is the generalized least squares (GLS) estimator: β^GLS=(XTV1X)1XTV1Y\hat{\boldsymbol{\beta}}_{GLS} = (\mathbf{X}^T\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\mathbf{V}^{-1}\mathbf{Y} This reduces to OLS when V=I\mathbf{V} = \mathbf{I}.

RemarkBeyond BLUE

The Gauss-Markov theorem restricts attention to linear unbiased estimators. Biased estimators (like ridge regression β^λ=(XTX+λI)1XTY\hat{\boldsymbol{\beta}}_\lambda = (\mathbf{X}^T\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{Y}) can achieve lower MSE by trading a small bias for a large variance reduction, especially when predictors are highly correlated (multicollinearity). This observation motivates regularization methods that dominate modern statistics and machine learning.