Multivariable Chain Rule

The Chain Rule for multivariable functions states that the derivative of a composition is the composition of derivatives (matrix multiplication of Jacobians). This fundamental result enables computing derivatives of complex functions and is essential for differential geometry, machine learning (backpropagation), and optimization.

Statement

Theorem9.1Chain Rule

Let $g : \mathbb{R}^n \to \mathbb{R}^m$ be differentiable at $\mathbf{a}$ and $f : \mathbb{R}^m \to \mathbb{R}^p$ be differentiable at $g(\mathbf{a})$ . Then $f \circ g : \mathbb{R}^n \to \mathbb{R}^p$ is differentiable at $\mathbf{a}$ , and

$D(f \circ g)(\mathbf{a}) = Df(g(\mathbf{a})) \circ Dg(\mathbf{a}).$

In terms of Jacobian matrices,

$J(f \circ g)(\mathbf{a}) = Jf(g(\mathbf{a})) \cdot Jg(\mathbf{a}).$

Applications

ExampleChain rule in polar coordinates

Let $f(x, y) = x^2 + y^2$ and $(x, y) = g(r, \theta) = (r\cos\theta, r\sin\theta)$ . Then

$\frac{\partial(f \circ g)}{\partial r} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial r} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial r} = 2x \cos\theta + 2y \sin\theta = 2r.$

Similarly, $\frac{\partial(f \circ g)}{\partial \theta} = 0$ .

ExampleBackpropagation in neural networks

Neural networks compute $f = f_L \circ f_{L-1} \circ \cdots \circ f_1$ . The chain rule gives

$\frac{\partial f}{\partial \mathbf{x}} = \frac{\partial f_L}{\partial f_{L-1}} \cdot \frac{\partial f_{L-1}}{\partial f_{L-2}} \cdots \frac{\partial f_1}{\partial \mathbf{x}},$

the basis for backpropagation (gradient descent training).

Summary

The multivariable chain rule:

Derivative of composition = composition of derivatives (matrix multiplication).
Essential for computing derivatives in complex systems.
Applications: coordinate transformations, optimization, machine learning.

See Total Derivative and Inverse Function Theorem.