We prove the chain rule for compositions of differentiable multivariable functions, which is the foundation for all computations involving multivariable derivatives.
Proof
Theorem: Let g:RmβRn be differentiable at a and f:RnβRp be differentiable at b=g(a). Then fβg is differentiable at a and
D(fβg)(a)=Df(b)βDg(a)
Step 1: Setup.
Since f is differentiable at b, we can write
f(b+k)=f(b)+Df(b)k+β₯kβ₯Ο΅1β(k)
where Ο΅1β(k)β0 as kβ0.
Since g is differentiable at a:
g(a+h)=g(a)+Dg(a)h+β₯hβ₯Ο΅2β(h)
where Ο΅2β(h)β0 as hβ0.
Step 2: Compose.
Set k=g(a+h)βg(a)=Dg(a)h+β₯hβ₯Ο΅2β(h). Then:
For the first error term: β₯hβ₯β₯hβ₯β β₯Df(b)Ο΅2β(h)β₯ββ€β₯Df(b)β₯β β₯Ο΅2β(h)β₯β0.
For the second error term: we need β₯kβ₯/β₯hβ₯ to remain bounded. Indeed:
β₯hβ₯β₯kβ₯β=β₯hβ₯β₯Dg(a)h+β₯hβ₯Ο΅2β(h)β₯ββ€β₯Dg(a)β₯+β₯Ο΅2β(h)β₯
which is bounded as hβ0 (say by C=β₯Dg(a)β₯+1 for small h).
Also, kβ0 as hβ0 (since g is continuous), so Ο΅1β(k)β0.
The entire error term divided by β₯hβ₯ tends to 0, establishing:
(fβg)(a+h)=(fβg)(a)+[Df(b)βDg(a)]h+o(β₯hβ₯)
By definition of differentiability, fβg is differentiable at a with derivative D(fβg)(a)=Df(b)βDg(a). β‘
β
RemarkMatrix multiplication and the chain rule
In matrix form, the chain rule says Jfβgβ=Jfββ Jgβ, so the chain rule for derivatives translates to matrix multiplication for Jacobians. This is why the derivative of a composition is the product (not sum) of derivatives.