Total Derivative

The total derivative (or Fréchet derivative) is the correct generalization of the derivative to functions between normed spaces. A function $f : \mathbb{R}^n \to \mathbb{R}^m$ is differentiable if it is well-approximated by a linear map near each point. This is stronger than having partial derivatives and is the right notion for the chain rule, implicit function theorem, and optimization.

Definition

Definition9.1Total derivative

Let $f : U \subseteq \mathbb{R}^n \to \mathbb{R}^m$ and $\mathbf{a} \in U$ . Then $f$ is differentiable at $\mathbf{a}$ if there exists a linear map $Df(\mathbf{a}) : \mathbb{R}^n \to \mathbb{R}^m$ such that

$\lim_{\mathbf{h} \to \mathbf{0}} \frac{\|f(\mathbf{a} + \mathbf{h}) - f(\mathbf{a}) - Df(\mathbf{a})(\mathbf{h})\|}{\|\mathbf{h}\|} = 0.$

The linear map $Df(\mathbf{a})$ is the total derivative (or derivative) of $f$ at $\mathbf{a}$ .

RemarkBest linear approximation

The total derivative is the best linear approximation to $f$ near $\mathbf{a}$ :

$f(\mathbf{a} + \mathbf{h}) \approx f(\mathbf{a}) + Df(\mathbf{a})(\mathbf{h}) + o(\|\mathbf{h}\|).$

The error is $o(\|\mathbf{h}\|)$ , meaning it vanishes faster than $\|\mathbf{h}\|$ .

Jacobian matrix

Definition9.2Jacobian matrix

For $f : \mathbb{R}^n \to \mathbb{R}^m$ , if $f$ is differentiable at $\mathbf{a}$ , the matrix representation of $Df(\mathbf{a})$ is the Jacobian matrix:

$Jf(\mathbf{a}) = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{pmatrix}_{\mathbf{a}}.$

ExampleJacobian of f(x, y) = (x² + y², xy)

For $f(x, y) = (x^2 + y^2, xy)$ , the Jacobian is

$Jf(x, y) = \begin{pmatrix} 2x & 2y \\ y & x \end{pmatrix}.$

Differentiability implies continuity

Theorem9.1Differentiable implies continuous

If $f : \mathbb{R}^n \to \mathbb{R}^m$ is differentiable at $\mathbf{a}$ , then $f$ is continuous at $\mathbf{a}$ .

Theorem9.2Sufficient condition for differentiability

If all partial derivatives $\frac{\partial f_i}{\partial x_j}$ exist and are continuous in a neighborhood of $\mathbf{a}$ , then $f$ is differentiable at $\mathbf{a}$ .

RemarkContinuous partials sufficient but not necessary

Continuous partial derivatives guarantee differentiability, but differentiability can hold with discontinuous partials (though this is rare).

Chain rule

Theorem9.3Chain rule

If $g : \mathbb{R}^n \to \mathbb{R}^m$ is differentiable at $\mathbf{a}$ and $f : \mathbb{R}^m \to \mathbb{R}^p$ is differentiable at $g(\mathbf{a})$ , then $f \circ g$ is differentiable at $\mathbf{a}$ , and

$D(f \circ g)(\mathbf{a}) = Df(g(\mathbf{a})) \circ Dg(\mathbf{a}).$

In terms of Jacobians,

$J(f \circ g)(\mathbf{a}) = Jf(g(\mathbf{a})) \cdot Jg(\mathbf{a}).$

ExampleChain rule application

Let $g(t) = (t^2, t^3)$ and $f(x, y) = x + y^2$ . Then $f \circ g(t) = t^2 + t^6$ . By the chain rule,

$(f \circ g)'(t) = \nabla f(g(t)) \cdot g'(t) = (1, 2y)|_{(t^2, t^3)} \cdot (2t, 3t^2) = (1, 2t^3) \cdot (2t, 3t^2) = 2t + 6t^5.$

Verifying: $(f \circ g)'(t) = 2t + 6t^5$ . ✓

Summary

The total derivative is the correct notion of differentiability in several variables:

Best linear approximation to $f$ near $\mathbf{a}$ .
Represented by the Jacobian matrix.
Differentiability $\Rightarrow$ continuity.
Chain rule: $D(f \circ g) = Df \circ Dg$ (matrix multiplication).

See Inverse Function Theorem for major applications.