Proof: Multivariable Chain Rule

The proof of the chain rule uses the definition of differentiability: approximating functions by linear maps. The key is showing that the composition of "good approximations" is itself a "good approximation."

Statement

Theorem9.1Chain Rule

If $g : \mathbb{R}^n \to \mathbb{R}^m$ is differentiable at $\mathbf{a}$ and $f : \mathbb{R}^m \to \mathbb{R}^p$ is differentiable at $g(\mathbf{a})$ , then $f \circ g$ is differentiable at $\mathbf{a}$ with

$D(f \circ g)(\mathbf{a}) = Df(g(\mathbf{a})) \circ Dg(\mathbf{a}).$

Proof

Let $\mathbf{b} = g(\mathbf{a})$ . By differentiability,

$g(\mathbf{a} + \mathbf{h}) = g(\mathbf{a}) + Dg(\mathbf{a})(\mathbf{h}) + \mathbf{r}(\mathbf{h}), \quad \text{where } \lim_{\mathbf{h} \to \mathbf{0}} \frac{\|\mathbf{r}(\mathbf{h})\|}{\|\mathbf{h}\|} = 0.$

$f(\mathbf{b} + \mathbf{k}) = f(\mathbf{b}) + Df(\mathbf{b})(\mathbf{k}) + \mathbf{s}(\mathbf{k}), \quad \text{where } \lim_{\mathbf{k} \to \mathbf{0}} \frac{\|\mathbf{s}(\mathbf{k})\|}{\|\mathbf{k}\|} = 0.$

Setting $\mathbf{k} = g(\mathbf{a} + \mathbf{h}) - g(\mathbf{a}) = Dg(\mathbf{a})(\mathbf{h}) + \mathbf{r}(\mathbf{h})$ ,

$(f \circ g)(\mathbf{a} + \mathbf{h}) = f(\mathbf{b} + \mathbf{k}) = f(\mathbf{b}) + Df(\mathbf{b})(Dg(\mathbf{a})(\mathbf{h}) + \mathbf{r}(\mathbf{h})) + \mathbf{s}(\mathbf{k}).$

Thus

$(f \circ g)(\mathbf{a} + \mathbf{h}) - (f \circ g)(\mathbf{a}) = [Df(\mathbf{b}) \circ Dg(\mathbf{a})](\mathbf{h}) + Df(\mathbf{b})(\mathbf{r}(\mathbf{h})) + \mathbf{s}(\mathbf{k}).$

The error term satisfies

$\frac{\|Df(\mathbf{b})(\mathbf{r}(\mathbf{h})) + \mathbf{s}(\mathbf{k})\|}{\|\mathbf{h}\|} \to 0 \quad \text{as } \mathbf{h} \to \mathbf{0},$

which can be verified using continuity of $Dg$ and the bounds on $\mathbf{r}$ and $\mathbf{s}$ . Thus $f \circ g$ is differentiable with derivative $Df(\mathbf{b}) \circ Dg(\mathbf{a})$ .

■

Summary

The chain rule proof:

Uses definition of differentiability (linear approximation plus error).
Composition of linear approximations gives the derivative.
Error terms vanish at correct rate.

See Chain Rule for applications.