ProofComplete

Proof: Multivariable Chain Rule

The proof of the chain rule uses the definition of differentiability: approximating functions by linear maps. The key is showing that the composition of "good approximations" is itself a "good approximation."


Statement

Theorem9.1Chain Rule

If g:Rnβ†’Rmg : \mathbb{R}^n \to \mathbb{R}^m is differentiable at a\mathbf{a} and f:Rmβ†’Rpf : \mathbb{R}^m \to \mathbb{R}^p is differentiable at g(a)g(\mathbf{a}), then f∘gf \circ g is differentiable at a\mathbf{a} with

D(f∘g)(a)=Df(g(a))∘Dg(a).D(f \circ g)(\mathbf{a}) = Df(g(\mathbf{a})) \circ Dg(\mathbf{a}).


Proof

Proof

Let b=g(a)\mathbf{b} = g(\mathbf{a}). By differentiability,

g(a+h)=g(a)+Dg(a)(h)+r(h),whereΒ lim⁑hβ†’0βˆ₯r(h)βˆ₯βˆ₯hβˆ₯=0.g(\mathbf{a} + \mathbf{h}) = g(\mathbf{a}) + Dg(\mathbf{a})(\mathbf{h}) + \mathbf{r}(\mathbf{h}), \quad \text{where } \lim_{\mathbf{h} \to \mathbf{0}} \frac{\|\mathbf{r}(\mathbf{h})\|}{\|\mathbf{h}\|} = 0.

f(b+k)=f(b)+Df(b)(k)+s(k),whereΒ lim⁑kβ†’0βˆ₯s(k)βˆ₯βˆ₯kβˆ₯=0.f(\mathbf{b} + \mathbf{k}) = f(\mathbf{b}) + Df(\mathbf{b})(\mathbf{k}) + \mathbf{s}(\mathbf{k}), \quad \text{where } \lim_{\mathbf{k} \to \mathbf{0}} \frac{\|\mathbf{s}(\mathbf{k})\|}{\|\mathbf{k}\|} = 0.

Setting k=g(a+h)βˆ’g(a)=Dg(a)(h)+r(h)\mathbf{k} = g(\mathbf{a} + \mathbf{h}) - g(\mathbf{a}) = Dg(\mathbf{a})(\mathbf{h}) + \mathbf{r}(\mathbf{h}),

(f∘g)(a+h)=f(b+k)=f(b)+Df(b)(Dg(a)(h)+r(h))+s(k).(f \circ g)(\mathbf{a} + \mathbf{h}) = f(\mathbf{b} + \mathbf{k}) = f(\mathbf{b}) + Df(\mathbf{b})(Dg(\mathbf{a})(\mathbf{h}) + \mathbf{r}(\mathbf{h})) + \mathbf{s}(\mathbf{k}).

Thus

(f∘g)(a+h)βˆ’(f∘g)(a)=[Df(b)∘Dg(a)](h)+Df(b)(r(h))+s(k).(f \circ g)(\mathbf{a} + \mathbf{h}) - (f \circ g)(\mathbf{a}) = [Df(\mathbf{b}) \circ Dg(\mathbf{a})](\mathbf{h}) + Df(\mathbf{b})(\mathbf{r}(\mathbf{h})) + \mathbf{s}(\mathbf{k}).

The error term satisfies

βˆ₯Df(b)(r(h))+s(k)βˆ₯βˆ₯hβˆ₯β†’0asΒ hβ†’0,\frac{\|Df(\mathbf{b})(\mathbf{r}(\mathbf{h})) + \mathbf{s}(\mathbf{k})\|}{\|\mathbf{h}\|} \to 0 \quad \text{as } \mathbf{h} \to \mathbf{0},

which can be verified using continuity of DgDg and the bounds on r\mathbf{r} and s\mathbf{s}. Thus f∘gf \circ g is differentiable with derivative Df(b)∘Dg(a)Df(\mathbf{b}) \circ Dg(\mathbf{a}).

β– 

Summary

The chain rule proof:

  • Uses definition of differentiability (linear approximation plus error).
  • Composition of linear approximations gives the derivative.
  • Error terms vanish at correct rate.

See Chain Rule for applications.