ACE 328/Chapter 9

Multivariable Differentiation

Partial derivatives, the gradient, and the total derivative for functions of several variables. Clairaut's theorem on equality of mixed partials. Classes of differentiable functions (C^k).

Differential calculus in one variable is the study of the local linear approximation of a function. The derivative f(a)f'(a) is a number only because linear maps RR\mathbb{R} \to \mathbb{R} are determined by a single scalar; the geometrically correct object is the linear map hf(a)hh \mapsto f'(a) h. In several variables this viewpoint becomes mandatory. The derivative of a function F:RnRm\vec{F}: \mathbb{R}^n \to \mathbb{R}^m at a point is a linear map RnRm\mathbb{R}^n \to \mathbb{R}^m, represented once bases are chosen by an m×nm \times n matrix of partial derivatives. This chapter develops the theory: directional and partial derivatives, the total (Frechet) derivative, the Jacobian matrix, the delicate interplay between partial and total differentiability, the chain rule, higher-order partials, Clairaut's theorem on the equality of mixed partials, the classes CkC^k, and a mean-value inequality for vector-valued maps.

Remark.
A note on philosophy (Dieudonne). In his classic Foundations of Analysis, Jean Dieudonne of the French Bourbaki school writes that presenting differential calculus "aims at keeping as close as possible to the fundamental idea of Calculus, namely the local approximation of functions by linear functions. In the classical teaching of Calculus, this idea is immediately obscured by the accidental fact that, on a one-dimensional vector space, there is a one-to-one correspondence between linear forms and numbers, and therefore the derivative at a point is defined as a number instead of a linear form. This slavish subservience to the shibboleth of numerical interpretation at any cost becomes much worse when dealing with functions of several variables." The viewpoint of this chapter — derivative as a linear map — is the one Dieudonne recommends.

The One-Variable Derivative, Reconsidered

Before generalizing, it is worth reformulating the familiar definition of the one-variable derivative in a way that transports verbatim to several variables. Let DRD \subseteq \mathbb{R} be an interval and aint(D)a \in \operatorname{int}(D). The derivative f(a)f'(a) exists (as a real number) iff limxaf(x)f(a)xa\lim_{x \to a} \frac{f(x) - f(a)}{x - a} exists, equivalently iff limxaf(x)f(a)f(a)(xa)xa=0.\lim_{x \to a} \frac{f(x) - f(a) - f'(a)(x - a)}{x - a} = 0. The numerator f(x)(f(a)+f(a)(xa))f(x) - (f(a) + f'(a)(x - a)) is the error we make when approximating f(x)f(x) by its tangent line at aa; the equation says this error tends to zero faster than xax - a.

DefinitionLittle-oh Notation

Let E,FE, F be functions defined near aa with F(x)0F(x) \neq 0 near aa (except possibly at aa). We say E(x)=o(F(x))E(x) = o(F(x)) as xax \to a if limxaE(x)F(x)=0.\lim_{x \to a} \frac{|E(x)|}{|F(x)|} = 0.

Remark.
Intuition: E=o(F)E = o(F) means EE becomes negligible compared to FF in the limit. "Little-oh of hh" is the precise way to say "vanishes faster than linearly."

With this language, ff is differentiable at aa iff there exists a real number α\alpha (necessarily α=f(a)\alpha = f'(a)) such that f(a+h)f(a)αh=o(h)as h0.f(a + h) - f(a) - \alpha h = o(h) \quad \text{as } h \to 0. Equivalently, ff is differentiable at aa iff there exists a linear map L:RRL : \mathbb{R} \to \mathbb{R} such that f(a+h)f(a)L(h)=o(h)f(a + h) - f(a) - L(h) = o(h) as h0h \to 0; this LL is what we call the derivative of ff at aa. The one-variable case hides the linear-map nature only because linear maps RR\mathbb{R} \to \mathbb{R} are identified with single numbers via L(y)=αyL(y) = \alpha y. In several variables, no such identification is possible, and the linear-map formulation is unavoidable.

ExampleA tangent line that touches the graph infinitely often

Consider f(x)={x2sin(1/x)x0,0x=0.f(x) = \begin{cases} x^2 \sin(1/x) & x \neq 0, \\ 0 & x = 0. \end{cases} A direct computation shows f(0)=0f'(0) = 0 — so the tangent line to the graph at the origin is y=0y = 0 (the xx-axis). Yet the graph of ff crosses the xx-axis infinitely often in every neighbourhood of 00 (wherever sin(1/x)=0\sin(1/x) = 0). The tangent line is not a line that "touches the graph once" — it is the best linear approximation to the increment, and nothing more.

One-sided derivatives are sometimes useful: if D=[α,β]D = [\alpha, \beta], fR(α)=limh0+f(α+h)f(α)h,fL(β)=limh0f(β+h)f(β)h,f'_R(\alpha) = \lim_{h \to 0^+} \frac{f(\alpha + h) - f(\alpha)}{h}, \qquad f'_L(\beta) = \lim_{h \to 0^-} \frac{f(\beta + h) - f(\beta)}{h}, are the right derivative at the left endpoint and the left derivative at the right endpoint, respectively.


Directional and Partial Derivatives

Let URnU \subseteq \mathbb{R}^n be open and f:URf : U \to \mathbb{R}. The most naive way to extract one-variable information from ff is to restrict it to a line through a point aU\vec{a} \in U and differentiate.

DefinitionDirectional Derivative

Let URnU \subseteq \mathbb{R}^n be open, f:URf : U \to \mathbb{R}, aU\vec{a} \in U, and vRn\vec{v} \in \mathbb{R}^n a direction vector. The directional derivative of ff at a\vec{a} in the direction v\vec{v} is Dvf(a)=limh0f(a+hv)f(a)h,D_{\vec{v}} f(\vec{a}) = \lim_{h \to 0} \frac{f(\vec{a} + h\vec{v}) - f(\vec{a})}{h}, provided this limit exists. In particular, when v=ej=(0,,0,1,0,,0)\vec{v} = \vec{e}_j = (0, \ldots, 0, 1, 0, \ldots, 0) is the jj-th canonical basis vector, we call this the partial derivative of ff with respect to xjx_j at a\vec{a} and denote it fxj(a):=limh0f(a+hej)f(a)h.\frac{\partial f}{\partial x_j}(\vec{a}) := \lim_{h \to 0} \frac{f(\vec{a} + h \vec{e}_j) - f(\vec{a})}{h}. Other notations: Djf(a)D_j f(\vec{a}) and fxj(a)f_{x_j}(\vec{a}).

Remark.
Intuition: The partial derivative f/xj\partial f / \partial x_j freezes all other variables at the value aka_k and differentiates ff as a one-variable function of xjx_j. It is the rate of change of ff when you move through a\vec{a} with unit velocity along the jj-th coordinate axis. The directional derivative is the rate along any line.

Partial Derivatives Do Not Imply Continuity

A crucial warning: in several variables, the existence of every partial derivative at a point is not nearly enough to guarantee that the function is continuous there. Partial derivatives probe the function only along coordinate axes; the function can misbehave wildly on lines that are not axis-aligned.

ExamplePartials exist at the origin but f is not continuous there

Define f:R2Rf : \mathbb{R}^2 \to \mathbb{R} by f(x,y)={1if xy=0,0if xy0.f(x, y) = \begin{cases} 1 & \text{if } xy = 0, \\ 0 & \text{if } xy \neq 0. \end{cases} Along either axis, ff is identically 11, so fx(0,0)=limh0f(h,0)f(0,0)h=limh011h=0,fy(0,0)=0\frac{\partial f}{\partial x}(0, 0) = \lim_{h \to 0} \frac{f(h, 0) - f(0,0)}{h} = \lim_{h \to 0} \frac{1 - 1}{h} = 0, \qquad \frac{\partial f}{\partial y}(0, 0) = 0 exist. Yet ff is not continuous at (0,0)(0,0): the sequence (1/n,1/n)(0,0)(1/n, 1/n) \to (0,0), but f(1/n,1/n)=0↛1=f(0,0)f(1/n, 1/n) = 0 \not\to 1 = f(0,0).

ExampleA nastier counterexample with partials everywhere

Define f:R2Rf : \mathbb{R}^2 \to \mathbb{R} by f(x,y)={xyx2+y2(x,y)(0,0),0(x,y)=(0,0).f(x, y) = \begin{cases} \dfrac{xy}{x^2 + y^2} & (x, y) \neq (0,0), \\ 0 & (x, y) = (0,0). \end{cases} Away from the origin, the quotient rule gives fx(x,y)=y3yx2(x2+y2)2,fy(x,y)=x3xy2(x2+y2)2.\frac{\partial f}{\partial x}(x, y) = \frac{y^3 - y x^2}{(x^2 + y^2)^2}, \qquad \frac{\partial f}{\partial y}(x, y) = \frac{x^3 - x y^2}{(x^2 + y^2)^2}. At the origin, f(h,0)=0f(h, 0) = 0 for all hh, so f/x(0,0)=0\partial f / \partial x(0,0) = 0, and similarly for yy. Thus the partial derivatives exist everywhere. Yet f(t,t)=t2/(2t2)=1/2f(t, t) = t^2 / (2 t^2) = 1/2 for every t0t \neq 0, so limt0f(t,t)=1/20=f(0,0)\lim_{t \to 0} f(t, t) = 1/2 \neq 0 = f(0,0), and ff is not continuous at the origin.

The lesson is that pointwise existence of partials is too weak a notion of differentiability. We need something that forces continuity — something that says the function is locally well-approximated by a linear map in every direction simultaneously, not just along axes.


The Total Derivative

The correct generalization of the one-variable derivative is the linear approximation viewpoint. In one variable, ff is differentiable at aa iff there is a scalar α\alpha (necessarily f(a)f'(a)) such that f(a+h)f(a)αh=o(h)as h0,f(a + h) - f(a) - \alpha h = o(h) \quad \text{as } h \to 0, equivalently, there is a linear map T:RRT : \mathbb{R} \to \mathbb{R} such that f(a+h)f(a)T(h)=o(h)f(a + h) - f(a) - T(h) = o(h). In higher dimensions we keep the same statement, replacing α\alpha by a matrix and absolute values by norms.

DefinitionTotal Derivative (Frechet Derivative)

Let F:DRm\vec{F} : D \to \mathbb{R}^m, where DRnD \subseteq \mathbb{R}^n, and let aint(D)\vec{a} \in \operatorname{int}(D). We say that F\vec{F} is differentiable at a\vec{a} if there exists a linear map T:RnRmT : \mathbb{R}^n \to \mathbb{R}^m such that limh0F(a+h)F(a)Thh=0.\lim_{\vec{h} \to \vec{0}} \frac{\|\vec{F}(\vec{a} + \vec{h}) - \vec{F}(\vec{a}) - T\vec{h}\|}{\|\vec{h}\|} = 0. When such a TT exists it is unique; we denote it DF(a)D\vec{F}(\vec{a}) and call it the total derivative (or Frechet derivative, or differential) of F\vec{F} at a\vec{a}.

Remark.
Intuition: DF(a)D\vec{F}(\vec{a}) is the unique linear map that agrees with the increment F(a+h)F(a)\vec{F}(\vec{a} + \vec{h}) - \vec{F}(\vec{a}) to higher than linear order. Informally, F(a+h)F(a)+DF(a)h\vec{F}(\vec{a} + \vec{h}) \approx \vec{F}(\vec{a}) + D\vec{F}(\vec{a})\vec{h}, with an error that vanishes faster than h\|\vec{h}\|. It is the best linear approximation to the increment. The definition is coordinate-free — it refers only to norms and linearity, not to specific axes.

The definition has the right theoretical consequences, starting with the implication that partials existing alone failed to deliver.

TheoremDifferentiability implies continuity

Let F:DRm\vec{F} : D \to \mathbb{R}^m, DRnD \subseteq \mathbb{R}^n, aint(D)\vec{a} \in \operatorname{int}(D). If F\vec{F} is differentiable at a\vec{a}, then F\vec{F} is continuous at a\vec{a}.

The Jacobian Matrix

Every linear map T:RnRmT : \mathbb{R}^n \to \mathbb{R}^m is represented, in the canonical bases, by a unique m×nm \times n matrix MM such that Th=MhT\vec{h} = M\vec{h}. The next theorem identifies this matrix for T=DF(a)T = D\vec{F}(\vec{a}): its entries are precisely the partial derivatives of the components of F\vec{F}.

TheoremTotal Derivative is the Jacobian

Let F:URm\vec{F} : U \to \mathbb{R}^m, URnU \subseteq \mathbb{R}^n, aint(U)\vec{a} \in \operatorname{int}(U), and write F=(F1,,Fm)\vec{F} = (F_1, \ldots, F_m). If F\vec{F} is differentiable at a\vec{a}, then all partial derivatives Fi/xj(a)\partial F_i / \partial x_j (\vec{a}) exist, and DF(a)D\vec{F}(\vec{a}) is represented in the canonical bases by the m×nm \times n Jacobian matrix JF(a)=(F1x1(a)F1xn(a)Fmx1(a)Fmxn(a)).J_{\vec{F}}(\vec{a}) = \begin{pmatrix} \dfrac{\partial F_1}{\partial x_1}(\vec{a}) & \cdots & \dfrac{\partial F_1}{\partial x_n}(\vec{a}) \\ \vdots & & \vdots \\ \dfrac{\partial F_m}{\partial x_1}(\vec{a}) & \cdots & \dfrac{\partial F_m}{\partial x_n}(\vec{a}) \end{pmatrix}. In particular, differentiability implies the existence of every partial derivative of every component. The converse is not true in general.

Remark.
Intuition: The Jacobian is the matrix whose jj-th column is the partial derivative F/xj(a)\partial \vec{F} / \partial x_j(\vec{a}), viewed as a vector in Rm\mathbb{R}^m. This column records the change of F\vec{F} produced by a unit push along the jj-th axis. Multiplying by a general direction vector h\vec{h} builds up the total linear response by linearly combining the axis-responses — which is exactly what a linear map does.

The example f(x,y)=xy/(x2+y2)f(x, y) = xy/(x^2 + y^2) (extended to 00 at the origin) from the previous section shows the converse is genuinely false: its partials exist at (0,0)(0,0), but ff is not even continuous there, so by the continuity theorem it cannot be differentiable.

CorollaryComponent-wise Differentiability

Let F:URm\vec{F} : U \to \mathbb{R}^m, F=(F1,,Fm)\vec{F} = (F_1, \ldots, F_m), aint(U)\vec{a} \in \operatorname{int}(U). Then DF(a)D\vec{F}(\vec{a}) exists if and only if DFj(a)DF_j(\vec{a}) exists for every j{1,,m}j \in \{1, \ldots, m\}.

Remark.
Intuition: Differentiability of a vector-valued map is nothing more than simultaneous differentiability of its components. This is the vector-valued analogue of the fact that a sequence in Rm\mathbb{R}^m converges iff each of its component sequences does.

A Sufficient Condition: Continuous Partials

The example f(x,y)=xy/(x2+y2)f(x,y) = xy/(x^2 + y^2) shows partials existing at a point is too weak for differentiability. The following theorem gives a practical sufficient condition: if the partials exist in a neighbourhood and are continuous at the point, then the function is differentiable there.

TheoremContinuous Partials Imply Differentiability

Let F:URm\vec{F} : U \to \mathbb{R}^m, URnU \subseteq \mathbb{R}^n open, aU\vec{a} \in U, and write F=(F1,,Fm)\vec{F} = (F_1, \ldots, F_m). Suppose there is r>0r > 0 such that for every j{1,,m}j \in \{1, \ldots, m\} and every i{1,,n}i \in \{1, \ldots, n\}, the partial derivative Fj/xi(x)\partial F_j / \partial x_i(\vec{x}) exists for every xBr(a)\vec{x} \in B_r(\vec{a}), and the function xFj/xi(x)\vec{x} \mapsto \partial F_j / \partial x_i(\vec{x}) is continuous at a\vec{a}. Then F\vec{F} is differentiable at a\vec{a}.

Remark.
Intuition: Partial derivatives alone do not know about what happens off the coordinate axes. But if they vary continuously, they control the function in a full neighbourhood, and a mean-value telescoping argument along the edges of a small box pieces together an honest total derivative. "Continuous partials" is by far the most common way to check differentiability in practice.

The Gradient

DefinitionGradient

Let URnU \subseteq \mathbb{R}^n be open and f:URf : U \to \mathbb{R} have all partial derivatives on UU. The gradient of ff is the function f:URn\nabla f : U \to \mathbb{R}^n defined by (f)(x)=(fx1(x),,fxn(x)).(\nabla f)(\vec{x}) = \left( \frac{\partial f}{\partial x_1}(\vec{x}), \ldots, \frac{\partial f}{\partial x_n}(\vec{x}) \right). Equivalently, when ff is differentiable at x\vec{x}, Df(x)h=f(x)hDf(\vec{x})\vec{h} = \nabla f(\vec{x}) \cdot \vec{h}.

Remark.
Intuition: The gradient is the vector field that records, at each point, the direction of steepest ascent of ff, with magnitude equal to the rate of ascent in that direction. A function URnU \to \mathbb{R}^n with URnU \subseteq \mathbb{R}^n is called a vector field, so the gradient of a scalar-valued function is always a vector field.

The Chain Rule

The chain rule — the most important computational tool of differential calculus — takes an especially clean form when the derivative is viewed as a linear map: the derivative of a composition is the composition of the derivatives.

TheoremChain Rule

Let G:URp\vec{G} : U \to \mathbb{R}^p with URnU \subseteq \mathbb{R}^n open, and F:VRm\vec{F} : V \to \mathbb{R}^m with VRpV \subseteq \mathbb{R}^p open. Suppose G(U)V\vec{G}(U) \subseteq V, let aU\vec{a} \in U, and set b=G(a)\vec{b} = \vec{G}(\vec{a}). If G\vec{G} is differentiable at a\vec{a} and F\vec{F} is differentiable at b\vec{b}, then FG:URm\vec{F} \circ \vec{G} : U \to \mathbb{R}^m is differentiable at a\vec{a}, and D(FG)(a)=DF(b)DG(a).D(\vec{F} \circ \vec{G})(\vec{a}) = D\vec{F}(\vec{b}) \circ D\vec{G}(\vec{a}). In matrix form, the Jacobian of the composition is the product of the Jacobians: JFG(a)=JF(b)JG(a).J_{\vec{F} \circ \vec{G}}(\vec{a}) = J_{\vec{F}}(\vec{b}) \, J_{\vec{G}}(\vec{a}).

Remark.
Intuition: To first order, G\vec{G} near a\vec{a} acts like its linear approximation DG(a)D\vec{G}(\vec{a}), sending a small displacement h\vec{h} to DG(a)hD\vec{G}(\vec{a}) \vec{h}. Then F\vec{F} near b\vec{b} acts like DF(b)D\vec{F}(\vec{b}). Composing linear approximations gives the linear approximation of the composition. The error terms are negligible because the operator norm tames them.
CorollaryScalar Chain Rule

If f:VRf : V \to \mathbb{R} is differentiable at G(a)\vec{G}(\vec{a}) and G:URp\vec{G} : U \to \mathbb{R}^p is differentiable at a\vec{a}, then (fG)xj(a)=k=1pfyk(G(a))Gkxj(a).\frac{\partial (f \circ \vec{G})}{\partial x_j}(\vec{a}) = \sum_{k=1}^p \frac{\partial f}{\partial y_k}(\vec{G}(\vec{a})) \cdot \frac{\partial G_k}{\partial x_j}(\vec{a}).


Higher-Order Partial Derivatives

Once we have partial derivatives, we can differentiate again. Let URnU \subseteq \mathbb{R}^n be open and f:URf : U \to \mathbb{R}. If f/xj\partial f / \partial x_j exists on UU and is itself a function from UU to R\mathbb{R}, we can ask about its partial derivatives.

DefinitionSecond-Order Partial Derivative

Let URnU \subseteq \mathbb{R}^n be open and f:URf : U \to \mathbb{R}. Assume f/xj\partial f / \partial x_j exists on a neighbourhood of aU\vec{a} \in U. The second-order partial derivative of ff with respect to xix_i and xjx_j at a\vec{a} is 2fxixj(a)=xi(fxj)(a)=limh0fxj(a+hei)fxj(a)h,\frac{\partial^2 f}{\partial x_i \partial x_j}(\vec{a}) = \frac{\partial}{\partial x_i}\left( \frac{\partial f}{\partial x_j} \right)(\vec{a}) = \lim_{h \to 0} \frac{\dfrac{\partial f}{\partial x_j}(\vec{a} + h\vec{e}_i) - \dfrac{\partial f}{\partial x_j}(\vec{a})}{h}, when the limit exists. Alternative notations include DiDjf(a)D_i D_j f(\vec{a}) and fxjxi(a)f_{x_j x_i}(\vec{a}) (note the reversed order in subscript notation: fxjxif_{x_j x_i} means "differentiate first with respect to xjx_j, then with respect to xix_i").

Remark.
Intuition: A second-order partial measures the rate at which one partial changes as you move in a (possibly different) coordinate direction. The notation is a minor booby trap: 2f/xixj\partial^2 f / \partial x_i \partial x_j means "first /xj\partial/\partial x_j, then /xi\partial/\partial x_i" — the operator nearest ff is applied first.

Equality of Mixed Partials: Clairaut's Theorem

A natural question: does the order of differentiation matter? Is 2f/xy\partial^2 f / \partial x \partial y always equal to 2f/yx\partial^2 f / \partial y \partial x? In general, no — there is a classical counterexample. But under mild continuity hypotheses, yes: this is Clairaut's theorem (also called Schwarz's theorem).

ExampleA function with unequal mixed partials

Define f:R2Rf : \mathbb{R}^2 \to \mathbb{R} by f(x,y)={xy(x2y2)x2+y2(x,y)(0,0),0(x,y)=(0,0).f(x, y) = \begin{cases} \dfrac{xy(x^2 - y^2)}{x^2 + y^2} & (x, y) \neq (0, 0), \\ 0 & (x, y) = (0, 0). \end{cases} For (x,y)(0,0)(x, y) \neq (0,0) a direct computation gives fx(x,y)=y(x4+4x2y2y4)(x2+y2)2,fy(x,y)=x(x44x2y2y4)(x2+y2)2.\frac{\partial f}{\partial x}(x, y) = \frac{y(x^4 + 4x^2 y^2 - y^4)}{(x^2 + y^2)^2}, \qquad \frac{\partial f}{\partial y}(x, y) = \frac{x(x^4 - 4 x^2 y^2 - y^4)}{(x^2 + y^2)^2}. At the origin, f(x,0)=0f(x, 0) = 0 for all xx, so f/x(0,0)=0\partial f / \partial x(0, 0) = 0, and likewise f/y(0,0)=0\partial f / \partial y(0, 0) = 0. In particular, f/x\partial f / \partial x and f/y\partial f / \partial y are continuous on R2\mathbb{R}^2. Now compute along the axes: fx(0,y)=y(y4)y4=y,fy(x,0)=xx4x4=x.\frac{\partial f}{\partial x}(0, y) = \frac{y \cdot (-y^4)}{y^4} = -y, \qquad \frac{\partial f}{\partial y}(x, 0) = \frac{x \cdot x^4}{x^4} = x. Therefore 2fyx(0,0)=ddyy=0(y)=1,2fxy(0,0)=ddxx=0x=+1.\frac{\partial^2 f}{\partial y \partial x}(0, 0) = \frac{d}{dy}\Big|_{y=0} (-y) = -1, \qquad \frac{\partial^2 f}{\partial x \partial y}(0, 0) = \frac{d}{dx}\Big|_{x=0} x = +1. So the mixed partials are both defined at the origin, but 1+1-1 \neq +1: they are unequal. The obstruction is that 2f/xy\partial^2 f / \partial x \partial y fails to be continuous at (0,0)(0,0).

The counterexample shows continuity of the mixed partial is essential. The theorem proves that this is in fact enough.

LemmaTwo-Variable Equality of Mixed Partials

Let f:DRf : D \to \mathbb{R}, DR2D \subseteq \mathbb{R}^2, (a,b)int(D)(a, b) \in \operatorname{int}(D). Suppose there is δ>0\delta > 0 such that f/x\partial f / \partial x, f/y\partial f / \partial y, and 2f/xy\partial^2 f / \partial x \partial y all exist on Bδ(a,b)B_\delta(a, b), and that the map (x,y)2f/xy(x,y)(x, y) \mapsto \partial^2 f / \partial x \partial y (x, y) is continuous at (a,b)(a, b). Then 2f/yx(a,b)\partial^2 f / \partial y \partial x(a, b) exists and 2fyx(a,b)=2fxy(a,b).\frac{\partial^2 f}{\partial y \partial x}(a, b) = \frac{\partial^2 f}{\partial x \partial y}(a, b).

Remark.
Intuition: Consider the "double increment" Δ(h,k):=f(a+h,b+k)f(a+h,b)f(a,b+k)+f(a,b).\Delta(h, k) := f(a + h, b + k) - f(a + h, b) - f(a, b + k) + f(a, b). This symmetric quantity can be rewritten two ways: first differencing in xx and then in yy, or vice versa. Dividing by hkhk and letting h,k0h, k \to 0 recovers either mixed partial. The Mean Value Theorem converts both differences into evaluations of fxyf_{xy} at nearby points, and continuity of fxyf_{xy} at (a,b)(a, b) forces both limits to the same value.
TheoremClairaut's Theorem (Equality of Mixed Partials)

Let DRnD \subseteq \mathbb{R}^n, f:DRf : D \to \mathbb{R}, and aint(D)\vec{a} \in \operatorname{int}(D). Suppose there is δ>0\delta > 0 such that f/xi(x)\partial f / \partial x_i(\vec{x}), f/xj(x)\partial f / \partial x_j(\vec{x}), and 2f/xixj(x)\partial^2 f / \partial x_i \partial x_j(\vec{x}) exist for every xBδ(a)\vec{x} \in B_\delta(\vec{a}), and the map x2f/xixj(x)\vec{x} \mapsto \partial^2 f / \partial x_i \partial x_j(\vec{x}) is continuous at a\vec{a}. Then 2f/xjxi(a)\partial^2 f / \partial x_j \partial x_i(\vec{a}) exists and 2fxixj(a)=2fxjxi(a).\frac{\partial^2 f}{\partial x_i \partial x_j}(\vec{a}) = \frac{\partial^2 f}{\partial x_j \partial x_i}(\vec{a}).

Remark.
Intuition: The nn-variable theorem reduces to the 22-variable lemma by freezing all coordinates except xix_i and xjx_j. Once only two variables wiggle, the two-variable proof applies verbatim.

The Classes CkC^k

DefinitionC^k Functions

Let URnU \subseteq \mathbb{R}^n be open. A function F:URm\vec{F} : U \to \mathbb{R}^m is said to be of class C1C^1 on UU if all first-order partial derivatives of all component functions of F\vec{F} exist and are continuous on UU. More generally, for k1k \geq 1, F\vec{F} is of class CkC^k on UU if all partial derivatives of order k\leq k of all components of F\vec{F} exist and are continuous on UU. The function is of class CC^\infty (or smooth) if it is of class CkC^k for every k1k \geq 1.

Locally: F\vec{F} is of class C1C^1 at a point aint(U)\vec{a} \in \operatorname{int}(U) if there is an open ball around a\vec{a} on which every first-order partial of every component exists, and each such partial is continuous at a\vec{a}.

By convention, C0(U)C^0(U) denotes the continuous functions URmU \to \mathbb{R}^m.

Remark.
Intuition: CkC^k is a hierarchy of smoothness classes: C0C1C2CC^0 \supset C^1 \supset C^2 \supset \cdots \supset C^\infty. The C1C^1 class is the natural setting for differential calculus, since continuous partials imply differentiability. The C2C^2 class is where Clairaut applies to every mixed pair simultaneously, so the Hessian matrix is symmetric.
DefinitionReal-Analytic Functions

Let URnU \subseteq \mathbb{R}^n be open. A function f:URf : U \to \mathbb{R} is real-analytic on UU, written fCω(U)f \in C^\omega(U), if for every aU\vec{a} \in U there exists r>0r > 0 such that for every x\vec{x} with xa<r\|\vec{x} - \vec{a}\| < r, the Taylor series of ff at a\vec{a} converges and equals f(x)f(\vec{x}): f(x)=k=01k!i1,,ik=1nkfxi1xik(a)(xi1ai1)(xikaik).f(\vec{x}) = \sum_{k = 0}^\infty \frac{1}{k!} \sum_{i_1, \ldots, i_k = 1}^n \frac{\partial^k f}{\partial x_{i_1} \cdots \partial x_{i_k}}(\vec{a}) (x_{i_1} - a_{i_1}) \cdots (x_{i_k} - a_{i_k}).

Remark.
Intuition: Analytic functions are "locally polynomials of infinite degree." They form a strictly smaller class than CC^\infty: the function f(x)=e1/x2f(x) = e^{-1/x^2} for x0x \neq 0, f(0)=0f(0) = 0, is CC^\infty on R\mathbb{R} with all derivatives vanishing at the origin, so its Taylor series at 00 is identically zero — yet f≢0f \not\equiv 0. Hence fCCωf \in C^\infty \setminus C^\omega. The hierarchy is C0C1C2CCω.C^0 \supsetneq C^1 \supsetneq C^2 \supsetneq \cdots \supsetneq C^\infty \supsetneq C^\omega. Polynomials, exponentials, sines, and cosines are all analytic; so are compositions, sums, products, and (where nonzero in the denominator) quotients of analytic functions.
CorollaryC^2 Implies Symmetric Mixed Partials

If f:URf : U \to \mathbb{R} is of class C2C^2 on an open set URnU \subseteq \mathbb{R}^n, then for all i,j{1,,n}i, j \in \{1, \ldots, n\} and all xU\vec{x} \in U, 2fxixj(x)=2fxjxi(x).\frac{\partial^2 f}{\partial x_i \partial x_j}(\vec{x}) = \frac{\partial^2 f}{\partial x_j \partial x_i}(\vec{x}). In particular, the C2C^2 mixed partials do not depend on the order of differentiation.

CorollaryC^1 Implies Differentiable

If F:URm\vec{F} : U \to \mathbb{R}^m is of class C1C^1 on an open set UU, then F\vec{F} is differentiable at every xU\vec{x} \in U.


The Mean Value Inequality for Vector-Valued Maps

In one variable, the Mean Value Theorem is an equality: f(b)f(a)=f(c)(ba)f(b) - f(a) = f'(c)(b - a) for some c(a,b)c \in (a, b). For vector-valued maps this equality fails in general — there is no single point c\vec{c} where the derivative reproduces the increment. (Consider F(t)=(cost,sint)\vec{F}(t) = (\cos t, \sin t) on [0,2π][0, 2\pi]: F(2π)F(0)=0\vec{F}(2\pi) - \vec{F}(0) = \vec{0}, but DF(c)=(sinc,cosc)0D\vec{F}(c) = (-\sin c, \cos c) \neq \vec{0} for any cc.) What survives is an inequality bounding the increment by the supremum of the operator norm of the derivative along the segment.

TheoremMean Value Inequality (Higher-Dimensional MVT)

Let URnU \subseteq \mathbb{R}^n be open and F:URm\vec{F} : U \to \mathbb{R}^m be differentiable on UU. Suppose the line segment [p,q]:={p+t(qp):0t1}[\vec{p}, \vec{q}] := \{\vec{p} + t(\vec{q} - \vec{p}) : 0 \leq t \leq 1\} is contained in UU. Then F(q)F(p)Mqp,where M=supx[p,q]DF(x)op.\|\vec{F}(\vec{q}) - \vec{F}(\vec{p})\| \leq M \|\vec{q} - \vec{p}\|, \qquad \text{where } M = \sup_{\vec{x} \in [\vec{p}, \vec{q}]} \|D\vec{F}(\vec{x})\|_{\mathrm{op}}.

Remark.
Intuition: You cannot expect to write the increment as an exact derivative-times-displacement because different components of F\vec{F} might be "going back and forth" at cross-purposes. But the total speed along the segment is at most MM, and the distance covered is at most speed ×\times time. The operator norm DF(x)op\|D\vec{F}(\vec{x})\|_{\mathrm{op}} is the largest stretch factor of the linear map DF(x)D\vec{F}(\vec{x}).
CorollaryVanishing Derivative on a Convex Set

Let URnU \subseteq \mathbb{R}^n be open and convex, and F:URm\vec{F} : U \to \mathbb{R}^m differentiable on UU with DF(x)=0D\vec{F}(\vec{x}) = 0 for every xU\vec{x} \in U. Then F\vec{F} is constant on UU.


With partial derivatives, total derivatives, the chain rule, and a working notion of smoothness classes in hand, we are ready to tackle one of the deepest local-to-global results of differential calculus: the implicit function theorem, which tells us when an equation F(x,y)=0\vec{F}(\vec{x}, \vec{y}) = \vec{0} can be solved locally for y\vec{y} in terms of x\vec{x}.