ConceptComplete

Multiple Linear Regression

Multiple linear regression extends simple regression to model the response as a linear function of several predictors, using matrix algebra for a compact and general formulation.


The Matrix Formulation

Definition

The multiple linear regression model is Y=Xβ+ϵ\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} where Y=(Y1,,Yn)T\mathbf{Y} = (Y_1, \ldots, Y_n)^T is the n×1n \times 1 response vector, X\mathbf{X} is the n×pn \times p design matrix (with rows (1,xi1,,xi,p1)(1, x_{i1}, \ldots, x_{i,p-1})), β=(β0,β1,,βp1)T\boldsymbol{\beta} = (\beta_0, \beta_1, \ldots, \beta_{p-1})^T is the p×1p \times 1 parameter vector, and ϵN(0,σ2In)\boldsymbol{\epsilon} \sim N(\mathbf{0}, \sigma^2 \mathbf{I}_n).

Definition

The OLS estimator minimizes YXβ2\|\mathbf{Y} - \mathbf{X}\boldsymbol{\beta}\|^2 and is given by β^=(XTX)1XTY\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y} provided XTX\mathbf{X}^T\mathbf{X} is invertible (i.e., X\mathbf{X} has full column rank pp). The fitted values are Y^=Xβ^=HY\hat{\mathbf{Y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{H}\mathbf{Y}, where H=X(XTX)1XT\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T is the hat matrix.


Properties

ExampleDistribution of the OLS estimator

Under the normal model ϵN(0,σ2I)\boldsymbol{\epsilon} \sim N(\mathbf{0}, \sigma^2\mathbf{I}): β^N(β,σ2(XTX)1)\hat{\boldsymbol{\beta}} \sim N(\boldsymbol{\beta}, \sigma^2(\mathbf{X}^T\mathbf{X})^{-1}) (np)σ^2σ2=eTeσ2χnp2\frac{(n-p)\hat{\sigma}^2}{\sigma^2} = \frac{\mathbf{e}^T\mathbf{e}}{\sigma^2} \sim \chi^2_{n-p} where e=YY^\mathbf{e} = \mathbf{Y} - \hat{\mathbf{Y}} is the residual vector and σ^2=eTe/(np)\hat{\sigma}^2 = \mathbf{e}^T\mathbf{e}/(n-p).

Moreover, β^\hat{\boldsymbol{\beta}} and σ^2\hat{\sigma}^2 are independent.


Inference

RemarkTesting individual coefficients

To test H0:βj=0H_0: \beta_j = 0 (whether predictor jj contributes after accounting for all other predictors), use tj=β^jSE(β^j)=β^jσ^[(XTX)1]jjtnpt_j = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)} = \frac{\hat{\beta}_j}{\hat{\sigma}\sqrt{[(\mathbf{X}^T\mathbf{X})^{-1}]_{jj}}} \sim t_{n-p} under H0H_0. The overall F-test H0:β1==βp1=0H_0: \beta_1 = \cdots = \beta_{p-1} = 0 uses F=SSR/(p1)SSE/(np)Fp1,npF = \frac{SSR/(p-1)}{SSE/(n-p)} \sim F_{p-1, n-p}.