Analysis of Variance (ANOVA) and the F-Test

ANOVA decomposes the total variability in the data into components attributable to different sources, enabling simultaneous comparison of multiple group means through a single F-test.

The ANOVA Decomposition

Theorem10.6ANOVA Decomposition

In the linear model, the total sum of squares decomposes as: $\underbrace{\sum_{i=1}^n (y_i - \bar{y})^2}_{SST} = \underbrace{\sum_{i=1}^n (\hat{y}_i - \bar{y})^2}_{SSR} + \underbrace{\sum_{i=1}^n (y_i - \hat{y}_i)^2}_{SSE}$ $\text{Total variation} = \text{Explained variation} + \text{Unexplained variation}$ with degrees of freedom $n - 1 = (p - 1) + (n - p)$ .

The F-Test

Theorem10.7F-Test for Overall Significance

In the normal linear model, to test $H_0: \beta_1 = \beta_2 = \cdots = \beta_{p-1} = 0$ (no predictors are significant): $F = \frac{SSR/(p-1)}{SSE/(n-p)} = \frac{MSR}{MSE} \sim F_{p-1, n-p} \quad \text{under } H_0$ Reject $H_0$ at level $\alpha$ if $F > F_{p-1, n-p, \alpha}$ .

ExampleOne-way ANOVA

For $k$ groups with $n_j$ observations each, the model is $Y_{ij} = \mu_j + \epsilon_{ij}$ . Testing $H_0: \mu_1 = \cdots = \mu_k$ : $F = \frac{\sum_j n_j(\bar{Y}_j - \bar{Y})^2 / (k-1)}{\sum_{i,j}(Y_{ij} - \bar{Y}_j)^2 / (N-k)} = \frac{MS_{\text{between}}}{MS_{\text{within}}} \sim F_{k-1, N-k}$ For $k = 3$ groups with $n_1 = n_2 = n_3 = 10$ : $F \sim F_{2, 27}$ . At $\alpha = 0.05$ , $F_{2,27,0.05} = 3.35$ .

Partial F-Tests

Theorem10.8Partial F-Test for Nested Models

To test whether a subset of predictors is significant, compare the full model (with $p$ parameters) and the reduced model (with $q < p$ parameters): $F = \frac{(SSE_{\text{reduced}} - SSE_{\text{full}}) / (p - q)}{SSE_{\text{full}} / (n - p)} \sim F_{p-q, n-p}$ under $H_0$ (the additional $p - q$ predictors have zero coefficients).

RemarkConnection to individual $t$-tests

When testing a single coefficient $\beta_j = 0$ , the partial $F$ -test with $p - q = 1$ gives $F = t_j^2$ where $t_j$ is the $t$ -statistic for $\beta_j$ . Thus the $t$ -test and $F$ -test are equivalent for testing individual coefficients. However, the $F$ -test is more general: it can simultaneously test multiple coefficients, which the $t$ -test cannot.