Partial derivatives, the gradient, and the total derivative for functions of several variables. Clairaut's theorem on equality of mixed partials. Classes of differentiable functions (C^k).
Differential calculus in one variable is the study of the local linear approximation of a function. The derivative f′(a) is a number only because linear maps R→R are determined by a single scalar; the geometrically correct object is the linear map h↦f′(a)h. In several variables this viewpoint becomes mandatory. The derivative of a function F:Rn→Rm at a point is a linear map Rn→Rm, represented once bases are chosen by an m×n matrix of partial derivatives. This chapter develops the theory: directional and partial derivatives, the total (Frechet) derivative, the Jacobian matrix, the delicate interplay between partial and total differentiability, the chain rule, higher-order partials, Clairaut's theorem on the equality of mixed partials, the classes Ck, and a mean-value inequality for vector-valued maps.
Remark.
A note on philosophy (Dieudonne). In his classic Foundations of Analysis, Jean Dieudonne of the French Bourbaki school writes that presenting differential calculus "aims at keeping as close as possible to the fundamental idea of Calculus, namely the local approximation of functions by linear functions. In the classical teaching of Calculus, this idea is immediately obscured by the accidental fact that, on a one-dimensional vector space, there is a one-to-one correspondence between linear forms and numbers, and therefore the derivative at a point is defined as a number instead of a linear form. This slavish subservience to the shibboleth of numerical interpretation at any cost becomes much worse when dealing with functions of several variables." The viewpoint of this chapter — derivative as a linear map — is the one Dieudonne recommends.
The One-Variable Derivative, Reconsidered
Before generalizing, it is worth reformulating the familiar definition of the one-variable derivative in a way that transports verbatim to several variables. Let D⊆R be an interval and a∈int(D). The derivative f′(a) exists (as a real number) iff
limx→ax−af(x)−f(a)
exists, equivalently iff
limx→ax−af(x)−f(a)−f′(a)(x−a)=0.
The numerator f(x)−(f(a)+f′(a)(x−a)) is the error we make when approximating f(x) by its tangent line at a; the equation says this error tends to zero faster thanx−a.
DefinitionLittle-oh Notation
Let E,F be functions defined near a with F(x)=0 near a (except possibly at a). We say E(x)=o(F(x)) as x→a if
limx→a∣F(x)∣∣E(x)∣=0.
Remark.
Intuition:E=o(F) means E becomes negligible compared to F in the limit. "Little-oh of h" is the precise way to say "vanishes faster than linearly."
With this language, f is differentiable at a iff there exists a real number α (necessarily α=f′(a)) such that
f(a+h)−f(a)−αh=o(h)as h→0.
Equivalently, f is differentiable at a iff there exists a linear map L:R→R such that f(a+h)−f(a)−L(h)=o(h) as h→0; this L is what we call the derivative of f at a. The one-variable case hides the linear-map nature only because linear maps R→R are identified with single numbers via L(y)=αy. In several variables, no such identification is possible, and the linear-map formulation is unavoidable.
ExampleA tangent line that touches the graph infinitely often
Consider
f(x)={x2sin(1/x)0x=0,x=0.
A direct computation shows f′(0)=0 — so the tangent line to the graph at the origin is y=0 (the x-axis). Yet the graph of f crosses the x-axis infinitely often in every neighbourhood of 0 (wherever sin(1/x)=0). The tangent line is not a line that "touches the graph once" — it is the best linear approximation to the increment, and nothing more.
One-sided derivatives are sometimes useful: if D=[α,β],
fR′(α)=limh→0+hf(α+h)−f(α),fL′(β)=limh→0−hf(β+h)−f(β),
are the right derivative at the left endpoint and the left derivative at the right endpoint, respectively.
Directional and Partial Derivatives
Let U⊆Rn be open and f:U→R. The most naive way to extract one-variable information from f is to restrict it to a line through a point a∈U and differentiate.
DefinitionDirectional Derivative
Let U⊆Rn be open, f:U→R, a∈U, and v∈Rn a direction vector. The directional derivative of f at a in the direction v is
Dvf(a)=limh→0hf(a+hv)−f(a),
provided this limit exists. In particular, when v=ej=(0,…,0,1,0,…,0) is the j-th canonical basis vector, we call this the partial derivative of f with respect to xj at a and denote it
∂xj∂f(a):=limh→0hf(a+hej)−f(a).
Other notations: Djf(a) and fxj(a).
Remark.
Intuition: The partial derivative ∂f/∂xj freezes all other variables at the value ak and differentiates f as a one-variable function of xj. It is the rate of change of f when you move through a with unit velocity along the j-th coordinate axis. The directional derivative is the rate along any line.
Partial Derivatives Do Not Imply Continuity
A crucial warning: in several variables, the existence of every partial derivative at a point is not nearly enough to guarantee that the function is continuous there. Partial derivatives probe the function only along coordinate axes; the function can misbehave wildly on lines that are not axis-aligned.
ExamplePartials exist at the origin but f is not continuous there
Define f:R2→R by
f(x,y)={10if xy=0,if xy=0.
Along either axis, f is identically 1, so
∂x∂f(0,0)=limh→0hf(h,0)−f(0,0)=limh→0h1−1=0,∂y∂f(0,0)=0
exist. Yet f is not continuous at (0,0): the sequence (1/n,1/n)→(0,0), but f(1/n,1/n)=0→1=f(0,0).
ExampleA nastier counterexample with partials everywhere
Define f:R2→R by
f(x,y)=⎩⎨⎧x2+y2xy0(x,y)=(0,0),(x,y)=(0,0).
Away from the origin, the quotient rule gives
∂x∂f(x,y)=(x2+y2)2y3−yx2,∂y∂f(x,y)=(x2+y2)2x3−xy2.
At the origin, f(h,0)=0 for all h, so ∂f/∂x(0,0)=0, and similarly for y. Thus the partial derivatives exist everywhere. Yet f(t,t)=t2/(2t2)=1/2 for every t=0, so limt→0f(t,t)=1/2=0=f(0,0), and f is not continuous at the origin.
The lesson is that pointwise existence of partials is too weak a notion of differentiability. We need something that forces continuity — something that says the function is locally well-approximated by a linear map in every direction simultaneously, not just along axes.
The Total Derivative
The correct generalization of the one-variable derivative is the linear approximation viewpoint. In one variable, f is differentiable at a iff there is a scalar α (necessarily f′(a)) such that
f(a+h)−f(a)−αh=o(h)as h→0,
equivalently, there is a linear map T:R→R such that f(a+h)−f(a)−T(h)=o(h). In higher dimensions we keep the same statement, replacing α by a matrix and absolute values by norms.
DefinitionTotal Derivative (Frechet Derivative)
Let F:D→Rm, where D⊆Rn, and let a∈int(D). We say that F is differentiable at a if there exists a linear map T:Rn→Rm such that
limh→0∥h∥∥F(a+h)−F(a)−Th∥=0.
When such a T exists it is unique; we denote it DF(a) and call it the total derivative (or Frechet derivative, or differential) of F at a.
Remark.
Intuition:DF(a) is the unique linear map that agrees with the increment F(a+h)−F(a) to higher than linear order. Informally, F(a+h)≈F(a)+DF(a)h, with an error that vanishes faster than ∥h∥. It is the best linear approximation to the increment. The definition is coordinate-free — it refers only to norms and linearity, not to specific axes.
The definition has the right theoretical consequences, starting with the implication that partials existing alone failed to deliver.
TheoremDifferentiability implies continuity
Let F:D→Rm, D⊆Rn, a∈int(D). If F is differentiable at a, then F is continuous at a.
The Jacobian Matrix
Every linear map T:Rn→Rm is represented, in the canonical bases, by a unique m×n matrix M such that Th=Mh. The next theorem identifies this matrix for T=DF(a): its entries are precisely the partial derivatives of the components of F.
TheoremTotal Derivative is the Jacobian
Let F:U→Rm, U⊆Rn, a∈int(U), and write F=(F1,…,Fm). If F is differentiable at a, then all partial derivatives ∂Fi/∂xj(a) exist, and DF(a) is represented in the canonical bases by the m×nJacobian matrixJF(a)=∂x1∂F1(a)⋮∂x1∂Fm(a)⋯⋯∂xn∂F1(a)⋮∂xn∂Fm(a).
In particular, differentiability implies the existence of every partial derivative of every component. The converse is not true in general.
Remark.
Intuition: The Jacobian is the matrix whose j-th column is the partial derivative ∂F/∂xj(a), viewed as a vector in Rm. This column records the change of F produced by a unit push along the j-th axis. Multiplying by a general direction vector h builds up the total linear response by linearly combining the axis-responses — which is exactly what a linear map does.
The example f(x,y)=xy/(x2+y2) (extended to 0 at the origin) from the previous section shows the converse is genuinely false: its partials exist at (0,0), but f is not even continuous there, so by the continuity theorem it cannot be differentiable.
CorollaryComponent-wise Differentiability
Let F:U→Rm, F=(F1,…,Fm), a∈int(U). Then DF(a) exists if and only if DFj(a) exists for every j∈{1,…,m}.
Remark.
Intuition: Differentiability of a vector-valued map is nothing more than simultaneous differentiability of its components. This is the vector-valued analogue of the fact that a sequence in Rm converges iff each of its component sequences does.
A Sufficient Condition: Continuous Partials
The example f(x,y)=xy/(x2+y2) shows partials existing at a point is too weak for differentiability. The following theorem gives a practical sufficient condition: if the partials exist in a neighbourhood and are continuous at the point, then the function is differentiable there.
Let F:U→Rm, U⊆Rn open, a∈U, and write F=(F1,…,Fm). Suppose there is r>0 such that for every j∈{1,…,m} and every i∈{1,…,n}, the partial derivative ∂Fj/∂xi(x) exists for every x∈Br(a), and the function x↦∂Fj/∂xi(x) is continuous at a. Then F is differentiable at a.
Remark.
Intuition: Partial derivatives alone do not know about what happens off the coordinate axes. But if they vary continuously, they control the function in a full neighbourhood, and a mean-value telescoping argument along the edges of a small box pieces together an honest total derivative. "Continuous partials" is by far the most common way to check differentiability in practice.
The Gradient
DefinitionGradient
Let U⊆Rn be open and f:U→R have all partial derivatives on U. The gradient of f is the function ∇f:U→Rn defined by
(∇f)(x)=(∂x1∂f(x),…,∂xn∂f(x)).
Equivalently, when f is differentiable at x, Df(x)h=∇f(x)⋅h.
Remark.
Intuition: The gradient is the vector field that records, at each point, the direction of steepest ascent of f, with magnitude equal to the rate of ascent in that direction. A function U→Rn with U⊆Rn is called a vector field, so the gradient of a scalar-valued function is always a vector field.
The Chain Rule
The chain rule — the most important computational tool of differential calculus — takes an especially clean form when the derivative is viewed as a linear map: the derivative of a composition is the composition of the derivatives.
TheoremChain Rule
Let G:U→Rp with U⊆Rn open, and F:V→Rm with V⊆Rp open. Suppose G(U)⊆V, let a∈U, and set b=G(a). If G is differentiable at a and F is differentiable at b, then F∘G:U→Rm is differentiable at a, and
D(F∘G)(a)=DF(b)∘DG(a).
In matrix form, the Jacobian of the composition is the product of the Jacobians:
JF∘G(a)=JF(b)JG(a).
Remark.
Intuition: To first order, G near a acts like its linear approximation DG(a), sending a small displacement h to DG(a)h. Then F near b acts like DF(b). Composing linear approximations gives the linear approximation of the composition. The error terms are negligible because the operator norm tames them.
CorollaryScalar Chain Rule
If f:V→R is differentiable at G(a) and G:U→Rp is differentiable at a, then
∂xj∂(f∘G)(a)=∑k=1p∂yk∂f(G(a))⋅∂xj∂Gk(a).
Higher-Order Partial Derivatives
Once we have partial derivatives, we can differentiate again. Let U⊆Rn be open and f:U→R. If ∂f/∂xj exists on U and is itself a function from U to R, we can ask about its partial derivatives.
DefinitionSecond-Order Partial Derivative
Let U⊆Rn be open and f:U→R. Assume ∂f/∂xj exists on a neighbourhood of a∈U. The second-order partial derivative of f with respect to xi and xj at a is
∂xi∂xj∂2f(a)=∂xi∂(∂xj∂f)(a)=limh→0h∂xj∂f(a+hei)−∂xj∂f(a),
when the limit exists. Alternative notations include DiDjf(a) and fxjxi(a) (note the reversed order in subscript notation: fxjxi means "differentiate first with respect to xj, then with respect to xi").
Remark.
Intuition: A second-order partial measures the rate at which one partial changes as you move in a (possibly different) coordinate direction. The notation is a minor booby trap: ∂2f/∂xi∂xj means "first ∂/∂xj, then ∂/∂xi" — the operator nearest f is applied first.
Equality of Mixed Partials: Clairaut's Theorem
A natural question: does the order of differentiation matter? Is ∂2f/∂x∂y always equal to ∂2f/∂y∂x? In general, no — there is a classical counterexample. But under mild continuity hypotheses, yes: this is Clairaut's theorem (also called Schwarz's theorem).
ExampleA function with unequal mixed partials
Define f:R2→R by
f(x,y)=⎩⎨⎧x2+y2xy(x2−y2)0(x,y)=(0,0),(x,y)=(0,0).
For (x,y)=(0,0) a direct computation gives
∂x∂f(x,y)=(x2+y2)2y(x4+4x2y2−y4),∂y∂f(x,y)=(x2+y2)2x(x4−4x2y2−y4).
At the origin, f(x,0)=0 for all x, so ∂f/∂x(0,0)=0, and likewise ∂f/∂y(0,0)=0. In particular, ∂f/∂x and ∂f/∂y are continuous on R2. Now compute along the axes:
∂x∂f(0,y)=y4y⋅(−y4)=−y,∂y∂f(x,0)=x4x⋅x4=x.
Therefore
∂y∂x∂2f(0,0)=dydy=0(−y)=−1,∂x∂y∂2f(0,0)=dxdx=0x=+1.
So the mixed partials are both defined at the origin, but −1=+1: they are unequal. The obstruction is that ∂2f/∂x∂y fails to be continuous at (0,0).
The counterexample shows continuity of the mixed partial is essential. The theorem proves that this is in fact enough.
LemmaTwo-Variable Equality of Mixed Partials
Let f:D→R, D⊆R2, (a,b)∈int(D). Suppose there is δ>0 such that ∂f/∂x, ∂f/∂y, and ∂2f/∂x∂y all exist on Bδ(a,b), and that the map (x,y)↦∂2f/∂x∂y(x,y) is continuous at (a,b). Then ∂2f/∂y∂x(a,b) exists and
∂y∂x∂2f(a,b)=∂x∂y∂2f(a,b).
Remark.
Intuition: Consider the "double increment"
Δ(h,k):=f(a+h,b+k)−f(a+h,b)−f(a,b+k)+f(a,b).
This symmetric quantity can be rewritten two ways: first differencing in x and then in y, or vice versa. Dividing by hk and letting h,k→0 recovers either mixed partial. The Mean Value Theorem converts both differences into evaluations of fxy at nearby points, and continuity of fxy at (a,b) forces both limits to the same value.
TheoremClairaut's Theorem (Equality of Mixed Partials)
Let D⊆Rn, f:D→R, and a∈int(D). Suppose there is δ>0 such that ∂f/∂xi(x), ∂f/∂xj(x), and ∂2f/∂xi∂xj(x) exist for every x∈Bδ(a), and the map x↦∂2f/∂xi∂xj(x) is continuous at a. Then ∂2f/∂xj∂xi(a) exists and
∂xi∂xj∂2f(a)=∂xj∂xi∂2f(a).
Remark.
Intuition: The n-variable theorem reduces to the 2-variable lemma by freezing all coordinates except xi and xj. Once only two variables wiggle, the two-variable proof applies verbatim.
The Classes Ck
DefinitionC^k Functions
Let U⊆Rn be open. A function F:U→Rm is said to be of class C1 on U if all first-order partial derivatives of all component functions of F exist and are continuous on U. More generally, for k≥1, F is of class Ck on U if all partial derivatives of order ≤k of all components of F exist and are continuous on U. The function is of class C∞ (or smooth) if it is of class Ck for every k≥1.
Locally: F is of class C1 at a point a∈int(U) if there is an open ball around a on which every first-order partial of every component exists, and each such partial is continuous at a.
By convention, C0(U) denotes the continuous functions U→Rm.
Remark.
Intuition:Ck is a hierarchy of smoothness classes: C0⊃C1⊃C2⊃⋯⊃C∞. The C1 class is the natural setting for differential calculus, since continuous partials imply differentiability. The C2 class is where Clairaut applies to every mixed pair simultaneously, so the Hessian matrix is symmetric.
DefinitionReal-Analytic Functions
Let U⊆Rn be open. A function f:U→R is real-analytic on U, written f∈Cω(U), if for every a∈U there exists r>0 such that for every x with ∥x−a∥<r, the Taylor series of f at a converges and equals f(x):
f(x)=∑k=0∞k!1∑i1,…,ik=1n∂xi1⋯∂xik∂kf(a)(xi1−ai1)⋯(xik−aik).
Remark.
Intuition: Analytic functions are "locally polynomials of infinite degree." They form a strictly smaller class than C∞: the function f(x)=e−1/x2 for x=0, f(0)=0, is C∞ on R with all derivatives vanishing at the origin, so its Taylor series at 0 is identically zero — yet f≡0. Hence f∈C∞∖Cω. The hierarchy is
C0⊋C1⊋C2⊋⋯⊋C∞⊋Cω.
Polynomials, exponentials, sines, and cosines are all analytic; so are compositions, sums, products, and (where nonzero in the denominator) quotients of analytic functions.
CorollaryC^2 Implies Symmetric Mixed Partials
If f:U→R is of class C2 on an open set U⊆Rn, then for all i,j∈{1,…,n} and all x∈U,
∂xi∂xj∂2f(x)=∂xj∂xi∂2f(x).
In particular, the C2 mixed partials do not depend on the order of differentiation.
CorollaryC^1 Implies Differentiable
If F:U→Rm is of class C1 on an open set U, then F is differentiable at every x∈U.
The Mean Value Inequality for Vector-Valued Maps
In one variable, the Mean Value Theorem is an equality: f(b)−f(a)=f′(c)(b−a) for some c∈(a,b). For vector-valued maps this equality fails in general — there is no single point c where the derivative reproduces the increment. (Consider F(t)=(cost,sint) on [0,2π]: F(2π)−F(0)=0, but DF(c)=(−sinc,cosc)=0 for any c.) What survives is an inequality bounding the increment by the supremum of the operator norm of the derivative along the segment.
TheoremMean Value Inequality (Higher-Dimensional MVT)
Let U⊆Rn be open and F:U→Rm be differentiable on U. Suppose the line segment
[p,q]:={p+t(q−p):0≤t≤1}
is contained in U. Then
∥F(q)−F(p)∥≤M∥q−p∥,where M=supx∈[p,q]∥DF(x)∥op.
Remark.
Intuition: You cannot expect to write the increment as an exact derivative-times-displacement because different components of F might be "going back and forth" at cross-purposes. But the total speed along the segment is at most M, and the distance covered is at most speed × time. The operator norm ∥DF(x)∥op is the largest stretch factor of the linear map DF(x).
CorollaryVanishing Derivative on a Convex Set
Let U⊆Rn be open and convex, and F:U→Rm differentiable on U with DF(x)=0 for every x∈U. Then F is constant on U.
With partial derivatives, total derivatives, the chain rule, and a working notion of smoothness classes in hand, we are ready to tackle one of the deepest local-to-global results of differential calculus: the implicit function theorem, which tells us when an equation F(x,y)=0 can be solved locally for y in terms of x.