ConceptComplete

Total Derivative

The total derivative (or Frรฉchet derivative) is the correct generalization of the derivative to functions between normed spaces. A function f:Rnโ†’Rmf : \mathbb{R}^n \to \mathbb{R}^m is differentiable if it is well-approximated by a linear map near each point. This is stronger than having partial derivatives and is the right notion for the chain rule, implicit function theorem, and optimization.


Definition

Definition9.1Total derivative

Let f:UโІRnโ†’Rmf : U \subseteq \mathbb{R}^n \to \mathbb{R}^m and aโˆˆU\mathbf{a} \in U. Then ff is differentiable at a\mathbf{a} if there exists a linear map Df(a):Rnโ†’RmDf(\mathbf{a}) : \mathbb{R}^n \to \mathbb{R}^m such that

limโกhโ†’0โˆฅf(a+h)โˆ’f(a)โˆ’Df(a)(h)โˆฅโˆฅhโˆฅ=0.\lim_{\mathbf{h} \to \mathbf{0}} \frac{\|f(\mathbf{a} + \mathbf{h}) - f(\mathbf{a}) - Df(\mathbf{a})(\mathbf{h})\|}{\|\mathbf{h}\|} = 0.

The linear map Df(a)Df(\mathbf{a}) is the total derivative (or derivative) of ff at a\mathbf{a}.

RemarkBest linear approximation

The total derivative is the best linear approximation to ff near a\mathbf{a}:

f(a+h)โ‰ˆf(a)+Df(a)(h)+o(โˆฅhโˆฅ).f(\mathbf{a} + \mathbf{h}) \approx f(\mathbf{a}) + Df(\mathbf{a})(\mathbf{h}) + o(\|\mathbf{h}\|).

The error is o(โˆฅhโˆฅ)o(\|\mathbf{h}\|), meaning it vanishes faster than โˆฅhโˆฅ\|\mathbf{h}\|.


Jacobian matrix

Definition9.2Jacobian matrix

For f:Rnโ†’Rmf : \mathbb{R}^n \to \mathbb{R}^m, if ff is differentiable at a\mathbf{a}, the matrix representation of Df(a)Df(\mathbf{a}) is the Jacobian matrix:

Jf(a)=(โˆ‚f1โˆ‚x1โ‹ฏโˆ‚f1โˆ‚xnโ‹ฎโ‹ฑโ‹ฎโˆ‚fmโˆ‚x1โ‹ฏโˆ‚fmโˆ‚xn)a.Jf(\mathbf{a}) = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{pmatrix}_{\mathbf{a}}.

ExampleJacobian of f(x, y) = (xยฒ + yยฒ, xy)

For f(x,y)=(x2+y2,xy)f(x, y) = (x^2 + y^2, xy), the Jacobian is

Jf(x,y)=(2x2yyx).Jf(x, y) = \begin{pmatrix} 2x & 2y \\ y & x \end{pmatrix}.


Differentiability implies continuity

Theorem9.1Differentiable implies continuous

If f:Rnโ†’Rmf : \mathbb{R}^n \to \mathbb{R}^m is differentiable at a\mathbf{a}, then ff is continuous at a\mathbf{a}.

Theorem9.2Sufficient condition for differentiability

If all partial derivatives โˆ‚fiโˆ‚xj\frac{\partial f_i}{\partial x_j} exist and are continuous in a neighborhood of a\mathbf{a}, then ff is differentiable at a\mathbf{a}.

RemarkContinuous partials sufficient but not necessary

Continuous partial derivatives guarantee differentiability, but differentiability can hold with discontinuous partials (though this is rare).


Chain rule

Theorem9.3Chain rule

If g:Rnโ†’Rmg : \mathbb{R}^n \to \mathbb{R}^m is differentiable at a\mathbf{a} and f:Rmโ†’Rpf : \mathbb{R}^m \to \mathbb{R}^p is differentiable at g(a)g(\mathbf{a}), then fโˆ˜gf \circ g is differentiable at a\mathbf{a}, and

D(fโˆ˜g)(a)=Df(g(a))โˆ˜Dg(a).D(f \circ g)(\mathbf{a}) = Df(g(\mathbf{a})) \circ Dg(\mathbf{a}).

In terms of Jacobians,

J(fโˆ˜g)(a)=Jf(g(a))โ‹…Jg(a).J(f \circ g)(\mathbf{a}) = Jf(g(\mathbf{a})) \cdot Jg(\mathbf{a}).

ExampleChain rule application

Let g(t)=(t2,t3)g(t) = (t^2, t^3) and f(x,y)=x+y2f(x, y) = x + y^2. Then fโˆ˜g(t)=t2+t6f \circ g(t) = t^2 + t^6. By the chain rule,

(fโˆ˜g)โ€ฒ(t)=โˆ‡f(g(t))โ‹…gโ€ฒ(t)=(1,2y)โˆฃ(t2,t3)โ‹…(2t,3t2)=(1,2t3)โ‹…(2t,3t2)=2t+6t5.(f \circ g)'(t) = \nabla f(g(t)) \cdot g'(t) = (1, 2y)|_{(t^2, t^3)} \cdot (2t, 3t^2) = (1, 2t^3) \cdot (2t, 3t^2) = 2t + 6t^5.

Verifying: (fโˆ˜g)โ€ฒ(t)=2t+6t5(f \circ g)'(t) = 2t + 6t^5. โœ“


Summary

The total derivative is the correct notion of differentiability in several variables:

  • Best linear approximation to ff near a\mathbf{a}.
  • Represented by the Jacobian matrix.
  • Differentiability โ‡’\Rightarrow continuity.
  • Chain rule: D(fโˆ˜g)=Dfโˆ˜DgD(f \circ g) = Df \circ Dg (matrix multiplication).

See Inverse Function Theorem for major applications.