ACE 328/Chapter 9

Multivariable Differentiation

Partial derivatives, the gradient, and the total derivative for functions of several variables. Clairaut's theorem on equality of mixed partials. Classes of differentiable functions (C^k).

Differential calculus in one variable is the study of the local linear approximation of a function. The derivative $f'(a)$ is a number only because linear maps $\mathbb{R} \to \mathbb{R}$ are determined by a single scalar; the geometrically correct object is the linear map $h \mapsto f'(a) h$ . In several variables this viewpoint becomes mandatory. The derivative of a function $\vec{F}: \mathbb{R}^n \to \mathbb{R}^m$ at a point is a linear map $\mathbb{R}^n \to \mathbb{R}^m$ , represented once bases are chosen by an $m \times n$ matrix of partial derivatives. This chapter develops the theory: directional and partial derivatives, the total (Frechet) derivative, the Jacobian matrix, the delicate interplay between partial and total differentiability, the chain rule, higher-order partials, Clairaut's theorem on the equality of mixed partials, the classes $C^k$ , and a mean-value inequality for vector-valued maps.

Remark.

A note on philosophy (Dieudonne). In his classic Foundations of Analysis, Jean Dieudonne of the French Bourbaki school writes that presenting differential calculus "aims at keeping as close as possible to the fundamental idea of Calculus, namely the local approximation of functions by linear functions. In the classical teaching of Calculus, this idea is immediately obscured by the accidental fact that, on a one-dimensional vector space, there is a one-to-one correspondence between linear forms and numbers, and therefore the derivative at a point is defined as a number instead of a linear form. This slavish subservience to the shibboleth of numerical interpretation at any cost becomes much worse when dealing with functions of several variables." The viewpoint of this chapter — derivative as a linear map — is the one Dieudonne recommends.

The One-Variable Derivative, Reconsidered

Before generalizing, it is worth reformulating the familiar definition of the one-variable derivative in a way that transports verbatim to several variables. Let $D \subseteq \mathbb{R}$ be an interval and $a \in \operatorname{int}(D)$ . The derivative $f'(a)$ exists (as a real number) iff $\lim_{x \to a} \frac{f(x) - f(a)}{x - a}$ exists, equivalently iff $\lim_{x \to a} \frac{f(x) - f(a) - f'(a)(x - a)}{x - a} = 0.$ The numerator $f(x) - (f(a) + f'(a)(x - a))$ is the error we make when approximating $f(x)$ by its tangent line at $a$ ; the equation says this error tends to zero faster than $x - a$ .

DefinitionLittle-oh Notation

Let $E, F$ be functions defined near $a$ with $F(x) \neq 0$ near $a$ (except possibly at $a$ ). We say $E(x) = o(F(x))$ as $x \to a$ if $\lim_{x \to a} \frac{|E(x)|}{|F(x)|} = 0.$

Remark.

Intuition:

E = o(F)

means

E

becomes negligible compared to

F

in the limit. "Little-oh of

h

" is the precise way to say "vanishes faster than linearly."

With this language, $f$ is differentiable at $a$ iff there exists a real number $\alpha$ (necessarily $\alpha = f'(a)$ ) such that $f(a + h) - f(a) - \alpha h = o(h) \quad \text{as } h \to 0.$ Equivalently, $f$ is differentiable at $a$ iff there exists a linear map $L : \mathbb{R} \to \mathbb{R}$ such that $f(a + h) - f(a) - L(h) = o(h)$ as $h \to 0$ ; this $L$ is what we call the derivative of $f$ at $a$ . The one-variable case hides the linear-map nature only because linear maps $\mathbb{R} \to \mathbb{R}$ are identified with single numbers via $L(y) = \alpha y$ . In several variables, no such identification is possible, and the linear-map formulation is unavoidable.

ExampleA tangent line that touches the graph infinitely often

Consider $f(x) = \begin{cases} x^2 \sin(1/x) & x \neq 0, \\ 0 & x = 0. \end{cases}$ A direct computation shows $f'(0) = 0$ — so the tangent line to the graph at the origin is $y = 0$ (the $x$ -axis). Yet the graph of $f$ crosses the $x$ -axis infinitely often in every neighbourhood of $0$ (wherever $\sin(1/x) = 0$ ). The tangent line is not a line that "touches the graph once" — it is the best linear approximation to the increment, and nothing more.

One-sided derivatives are sometimes useful: if $D = [\alpha, \beta]$ , $f'_R(\alpha) = \lim_{h \to 0^+} \frac{f(\alpha + h) - f(\alpha)}{h}, \qquad f'_L(\beta) = \lim_{h \to 0^-} \frac{f(\beta + h) - f(\beta)}{h},$ are the right derivative at the left endpoint and the left derivative at the right endpoint, respectively.

Directional and Partial Derivatives

Let $U \subseteq \mathbb{R}^n$ be open and $f : U \to \mathbb{R}$ . The most naive way to extract one-variable information from $f$ is to restrict it to a line through a point $\vec{a} \in U$ and differentiate.

DefinitionDirectional Derivative

Let $U \subseteq \mathbb{R}^n$ be open, $f : U \to \mathbb{R}$ , $\vec{a} \in U$ , and $\vec{v} \in \mathbb{R}^n$ a direction vector. The directional derivative of $f$ at $\vec{a}$ in the direction $\vec{v}$ is $D_{\vec{v}} f(\vec{a}) = \lim_{h \to 0} \frac{f(\vec{a} + h\vec{v}) - f(\vec{a})}{h},$ provided this limit exists. In particular, when $\vec{v} = \vec{e}_j = (0, \ldots, 0, 1, 0, \ldots, 0)$ is the $j$ -th canonical basis vector, we call this the partial derivative of $f$ with respect to $x_j$ at $\vec{a}$ and denote it $\frac{\partial f}{\partial x_j}(\vec{a}) := \lim_{h \to 0} \frac{f(\vec{a} + h \vec{e}_j) - f(\vec{a})}{h}.$ Other notations: $D_j f(\vec{a})$ and $f_{x_j}(\vec{a})$ .

Remark.

Intuition: The partial derivative

\partial f / \partial x_j

freezes all other variables at the value

a_k

and differentiates

f

as a one-variable function of

x_j

. It is the rate of change of

f

when you move through

\vec{a}

with unit velocity along the

j

-th coordinate axis. The directional derivative is the rate along any line.

Partial Derivatives Do Not Imply Continuity

A crucial warning: in several variables, the existence of every partial derivative at a point is not nearly enough to guarantee that the function is continuous there. Partial derivatives probe the function only along coordinate axes; the function can misbehave wildly on lines that are not axis-aligned.

ExamplePartials exist at the origin but f is not continuous there

Define $f : \mathbb{R}^2 \to \mathbb{R}$ by $f(x, y) = \begin{cases} 1 & \text{if } xy = 0, \\ 0 & \text{if } xy \neq 0. \end{cases}$ Along either axis, $f$ is identically $1$ , so $\frac{\partial f}{\partial x}(0, 0) = \lim_{h \to 0} \frac{f(h, 0) - f(0,0)}{h} = \lim_{h \to 0} \frac{1 - 1}{h} = 0, \qquad \frac{\partial f}{\partial y}(0, 0) = 0$ exist. Yet $f$ is not continuous at $(0,0)$ : the sequence $(1/n, 1/n) \to (0,0)$ , but $f(1/n, 1/n) = 0 \not\to 1 = f(0,0)$ .

ExampleA nastier counterexample with partials everywhere

Define $f : \mathbb{R}^2 \to \mathbb{R}$ by $f(x, y) = \begin{cases} \dfrac{xy}{x^2 + y^2} & (x, y) \neq (0,0), \\ 0 & (x, y) = (0,0). \end{cases}$ Away from the origin, the quotient rule gives $\frac{\partial f}{\partial x}(x, y) = \frac{y^3 - y x^2}{(x^2 + y^2)^2}, \qquad \frac{\partial f}{\partial y}(x, y) = \frac{x^3 - x y^2}{(x^2 + y^2)^2}.$ At the origin, $f(h, 0) = 0$ for all $h$ , so $\partial f / \partial x(0,0) = 0$ , and similarly for $y$ . Thus the partial derivatives exist everywhere. Yet $f(t, t) = t^2 / (2 t^2) = 1/2$ for every $t \neq 0$ , so $\lim_{t \to 0} f(t, t) = 1/2 \neq 0 = f(0,0)$ , and $f$ is not continuous at the origin.

The lesson is that pointwise existence of partials is too weak a notion of differentiability. We need something that forces continuity — something that says the function is locally well-approximated by a linear map in every direction simultaneously, not just along axes.

The Total Derivative

The correct generalization of the one-variable derivative is the linear approximation viewpoint. In one variable, $f$ is differentiable at $a$ iff there is a scalar $\alpha$ (necessarily $f'(a)$ ) such that $f(a + h) - f(a) - \alpha h = o(h) \quad \text{as } h \to 0,$ equivalently, there is a linear map $T : \mathbb{R} \to \mathbb{R}$ such that $f(a + h) - f(a) - T(h) = o(h)$ . In higher dimensions we keep the same statement, replacing $\alpha$ by a matrix and absolute values by norms.

DefinitionTotal Derivative (Frechet Derivative)

Let $\vec{F} : D \to \mathbb{R}^m$ , where $D \subseteq \mathbb{R}^n$ , and let $\vec{a} \in \operatorname{int}(D)$ . We say that $\vec{F}$ is differentiable at $\vec{a}$ if there exists a linear map $T : \mathbb{R}^n \to \mathbb{R}^m$ such that $\lim_{\vec{h} \to \vec{0}} \frac{\|\vec{F}(\vec{a} + \vec{h}) - \vec{F}(\vec{a}) - T\vec{h}\|}{\|\vec{h}\|} = 0.$ When such a $T$ exists it is unique; we denote it $D\vec{F}(\vec{a})$ and call it the total derivative (or Frechet derivative, or differential) of $\vec{F}$ at $\vec{a}$ .

Remark.

Intuition:

D\vec{F}(\vec{a})

is the unique linear map that agrees with the increment

\vec{F}(\vec{a} + \vec{h}) - \vec{F}(\vec{a})

to higher than linear order. Informally,

\vec{F}(\vec{a} + \vec{h}) \approx \vec{F}(\vec{a}) + D\vec{F}(\vec{a})\vec{h}

, with an error that vanishes faster than

\|\vec{h}\|

. It is the best linear approximation to the increment. The definition is coordinate-free — it refers only to norms and linearity, not to specific axes.

The definition has the right theoretical consequences, starting with the implication that partials existing alone failed to deliver.

TheoremDifferentiability implies continuity

Let $\vec{F} : D \to \mathbb{R}^m$ , $D \subseteq \mathbb{R}^n$ , $\vec{a} \in \operatorname{int}(D)$ . If $\vec{F}$ is differentiable at $\vec{a}$ , then $\vec{F}$ is continuous at $\vec{a}$ .

The Jacobian Matrix

Every linear map $T : \mathbb{R}^n \to \mathbb{R}^m$ is represented, in the canonical bases, by a unique $m \times n$ matrix $M$ such that $T\vec{h} = M\vec{h}$ . The next theorem identifies this matrix for $T = D\vec{F}(\vec{a})$ : its entries are precisely the partial derivatives of the components of $\vec{F}$ .

TheoremTotal Derivative is the Jacobian

Let $\vec{F} : U \to \mathbb{R}^m$ , $U \subseteq \mathbb{R}^n$ , $\vec{a} \in \operatorname{int}(U)$ , and write $\vec{F} = (F_1, \ldots, F_m)$ . If $\vec{F}$ is differentiable at $\vec{a}$ , then all partial derivatives $\partial F_i / \partial x_j (\vec{a})$ exist, and $D\vec{F}(\vec{a})$ is represented in the canonical bases by the $m \times n$ Jacobian matrix $J_{\vec{F}}(\vec{a}) = \begin{pmatrix} \dfrac{\partial F_1}{\partial x_1}(\vec{a}) & \cdots & \dfrac{\partial F_1}{\partial x_n}(\vec{a}) \\ \vdots & & \vdots \\ \dfrac{\partial F_m}{\partial x_1}(\vec{a}) & \cdots & \dfrac{\partial F_m}{\partial x_n}(\vec{a}) \end{pmatrix}.$ In particular, differentiability implies the existence of every partial derivative of every component. The converse is not true in general.

Remark.

Intuition: The Jacobian is the matrix whose

j

-th column is the partial derivative

\partial \vec{F} / \partial x_j(\vec{a})

, viewed as a vector in

\mathbb{R}^m

. This column records the change of

\vec{F}

produced by a unit push along the

j

-th axis. Multiplying by a general direction vector

\vec{h}

builds up the total linear response by linearly combining the axis-responses — which is exactly what a linear map does.

The example $f(x, y) = xy/(x^2 + y^2)$ (extended to $0$ at the origin) from the previous section shows the converse is genuinely false: its partials exist at $(0,0)$ , but $f$ is not even continuous there, so by the continuity theorem it cannot be differentiable.

CorollaryComponent-wise Differentiability

Let $\vec{F} : U \to \mathbb{R}^m$ , $\vec{F} = (F_1, \ldots, F_m)$ , $\vec{a} \in \operatorname{int}(U)$ . Then $D\vec{F}(\vec{a})$ exists if and only if $DF_j(\vec{a})$ exists for every $j \in \{1, \ldots, m\}$ .

Remark.

Intuition: Differentiability of a vector-valued map is nothing more than simultaneous differentiability of its components. This is the vector-valued analogue of the fact that a sequence in

\mathbb{R}^m

converges iff each of its component sequences does.

A Sufficient Condition: Continuous Partials

The example $f(x,y) = xy/(x^2 + y^2)$ shows partials existing at a point is too weak for differentiability. The following theorem gives a practical sufficient condition: if the partials exist in a neighbourhood and are continuous at the point, then the function is differentiable there.

TheoremContinuous Partials Imply Differentiability

Let $\vec{F} : U \to \mathbb{R}^m$ , $U \subseteq \mathbb{R}^n$ open, $\vec{a} \in U$ , and write $\vec{F} = (F_1, \ldots, F_m)$ . Suppose there is $r > 0$ such that for every $j \in \{1, \ldots, m\}$ and every $i \in \{1, \ldots, n\}$ , the partial derivative $\partial F_j / \partial x_i(\vec{x})$ exists for every $\vec{x} \in B_r(\vec{a})$ , and the function $\vec{x} \mapsto \partial F_j / \partial x_i(\vec{x})$ is continuous at $\vec{a}$ . Then $\vec{F}$ is differentiable at $\vec{a}$ .

Remark.

Intuition: Partial derivatives alone do not know about what happens off the coordinate axes. But if they vary continuously, they control the function in a full neighbourhood, and a mean-value telescoping argument along the edges of a small box pieces together an honest total derivative. "Continuous partials" is by far the most common way to check differentiability in practice.

The Gradient

DefinitionGradient

Let $U \subseteq \mathbb{R}^n$ be open and $f : U \to \mathbb{R}$ have all partial derivatives on $U$ . The gradient of $f$ is the function $\nabla f : U \to \mathbb{R}^n$ defined by $(\nabla f)(\vec{x}) = \left( \frac{\partial f}{\partial x_1}(\vec{x}), \ldots, \frac{\partial f}{\partial x_n}(\vec{x}) \right).$ Equivalently, when $f$ is differentiable at $\vec{x}$ , $Df(\vec{x})\vec{h} = \nabla f(\vec{x}) \cdot \vec{h}$ .

Remark.

Intuition: The gradient is the vector field that records, at each point, the direction of steepest ascent of

f

, with magnitude equal to the rate of ascent in that direction. A function

U \to \mathbb{R}^n

with

U \subseteq \mathbb{R}^n

is called a vector field, so the gradient of a scalar-valued function is always a vector field.

The Chain Rule

The chain rule — the most important computational tool of differential calculus — takes an especially clean form when the derivative is viewed as a linear map: the derivative of a composition is the composition of the derivatives.

TheoremChain Rule

Let $\vec{G} : U \to \mathbb{R}^p$ with $U \subseteq \mathbb{R}^n$ open, and $\vec{F} : V \to \mathbb{R}^m$ with $V \subseteq \mathbb{R}^p$ open. Suppose $\vec{G}(U) \subseteq V$ , let $\vec{a} \in U$ , and set $\vec{b} = \vec{G}(\vec{a})$ . If $\vec{G}$ is differentiable at $\vec{a}$ and $\vec{F}$ is differentiable at $\vec{b}$ , then $\vec{F} \circ \vec{G} : U \to \mathbb{R}^m$ is differentiable at $\vec{a}$ , and $D(\vec{F} \circ \vec{G})(\vec{a}) = D\vec{F}(\vec{b}) \circ D\vec{G}(\vec{a}).$ In matrix form, the Jacobian of the composition is the product of the Jacobians: $J_{\vec{F} \circ \vec{G}}(\vec{a}) = J_{\vec{F}}(\vec{b}) \, J_{\vec{G}}(\vec{a}).$

Remark.

Intuition: To first order,

\vec{G}

near

\vec{a}

acts like its linear approximation

D\vec{G}(\vec{a})

, sending a small displacement

\vec{h}

D\vec{G}(\vec{a}) \vec{h}

. Then

\vec{F}

near

\vec{b}

acts like

D\vec{F}(\vec{b})

. Composing linear approximations gives the linear approximation of the composition. The error terms are negligible because the operator norm tames them.

CorollaryScalar Chain Rule

If $f : V \to \mathbb{R}$ is differentiable at $\vec{G}(\vec{a})$ and $\vec{G} : U \to \mathbb{R}^p$ is differentiable at $\vec{a}$ , then $\frac{\partial (f \circ \vec{G})}{\partial x_j}(\vec{a}) = \sum_{k=1}^p \frac{\partial f}{\partial y_k}(\vec{G}(\vec{a})) \cdot \frac{\partial G_k}{\partial x_j}(\vec{a}).$

Higher-Order Partial Derivatives

Once we have partial derivatives, we can differentiate again. Let $U \subseteq \mathbb{R}^n$ be open and $f : U \to \mathbb{R}$ . If $\partial f / \partial x_j$ exists on $U$ and is itself a function from $U$ to $\mathbb{R}$ , we can ask about its partial derivatives.

DefinitionSecond-Order Partial Derivative

Let $U \subseteq \mathbb{R}^n$ be open and $f : U \to \mathbb{R}$ . Assume $\partial f / \partial x_j$ exists on a neighbourhood of $\vec{a} \in U$ . The second-order partial derivative of $f$ with respect to $x_i$ and $x_j$ at $\vec{a}$ is $\frac{\partial^2 f}{\partial x_i \partial x_j}(\vec{a}) = \frac{\partial}{\partial x_i}\left( \frac{\partial f}{\partial x_j} \right)(\vec{a}) = \lim_{h \to 0} \frac{\dfrac{\partial f}{\partial x_j}(\vec{a} + h\vec{e}_i) - \dfrac{\partial f}{\partial x_j}(\vec{a})}{h},$ when the limit exists. Alternative notations include $D_i D_j f(\vec{a})$ and $f_{x_j x_i}(\vec{a})$ (note the reversed order in subscript notation: $f_{x_j x_i}$ means "differentiate first with respect to $x_j$ , then with respect to $x_i$ ").

Remark.

Intuition: A second-order partial measures the rate at which one partial changes as you move in a (possibly different) coordinate direction. The notation is a minor booby trap:

\partial^2 f / \partial x_i \partial x_j

means "first

\partial/\partial x_j

, then

\partial/\partial x_i

" — the operator nearest

f

is applied first.

Equality of Mixed Partials: Clairaut's Theorem

A natural question: does the order of differentiation matter? Is $\partial^2 f / \partial x \partial y$ always equal to $\partial^2 f / \partial y \partial x$ ? In general, no — there is a classical counterexample. But under mild continuity hypotheses, yes: this is Clairaut's theorem (also called Schwarz's theorem).

ExampleA function with unequal mixed partials

Define $f : \mathbb{R}^2 \to \mathbb{R}$ by $f(x, y) = \begin{cases} \dfrac{xy(x^2 - y^2)}{x^2 + y^2} & (x, y) \neq (0, 0), \\ 0 & (x, y) = (0, 0). \end{cases}$ For $(x, y) \neq (0,0)$ a direct computation gives $\frac{\partial f}{\partial x}(x, y) = \frac{y(x^4 + 4x^2 y^2 - y^4)}{(x^2 + y^2)^2}, \qquad \frac{\partial f}{\partial y}(x, y) = \frac{x(x^4 - 4 x^2 y^2 - y^4)}{(x^2 + y^2)^2}.$ At the origin, $f(x, 0) = 0$ for all $x$ , so $\partial f / \partial x(0, 0) = 0$ , and likewise $\partial f / \partial y(0, 0) = 0$ . In particular, $\partial f / \partial x$ and $\partial f / \partial y$ are continuous on $\mathbb{R}^2$ . Now compute along the axes: $\frac{\partial f}{\partial x}(0, y) = \frac{y \cdot (-y^4)}{y^4} = -y, \qquad \frac{\partial f}{\partial y}(x, 0) = \frac{x \cdot x^4}{x^4} = x.$ Therefore $\frac{\partial^2 f}{\partial y \partial x}(0, 0) = \frac{d}{dy}\Big|_{y=0} (-y) = -1, \qquad \frac{\partial^2 f}{\partial x \partial y}(0, 0) = \frac{d}{dx}\Big|_{x=0} x = +1.$ So the mixed partials are both defined at the origin, but $-1 \neq +1$ : they are unequal. The obstruction is that $\partial^2 f / \partial x \partial y$ fails to be continuous at $(0,0)$ .

The counterexample shows continuity of the mixed partial is essential. The theorem proves that this is in fact enough.

LemmaTwo-Variable Equality of Mixed Partials

Let $f : D \to \mathbb{R}$ , $D \subseteq \mathbb{R}^2$ , $(a, b) \in \operatorname{int}(D)$ . Suppose there is $\delta > 0$ such that $\partial f / \partial x$ , $\partial f / \partial y$ , and $\partial^2 f / \partial x \partial y$ all exist on $B_\delta(a, b)$ , and that the map $(x, y) \mapsto \partial^2 f / \partial x \partial y (x, y)$ is continuous at $(a, b)$ . Then $\partial^2 f / \partial y \partial x(a, b)$ exists and $\frac{\partial^2 f}{\partial y \partial x}(a, b) = \frac{\partial^2 f}{\partial x \partial y}(a, b).$

Remark.

Intuition: Consider the "double increment"

\Delta(h, k) := f(a + h, b + k) - f(a + h, b) - f(a, b + k) + f(a, b).

This symmetric quantity can be rewritten two ways: first differencing in

x

and then in

y

, or vice versa. Dividing by

hk

and letting

h, k \to 0

recovers either mixed partial. The Mean Value Theorem converts both differences into evaluations of

f_{xy}

at nearby points, and continuity of

f_{xy}

(a, b)

forces both limits to the same value.

TheoremClairaut's Theorem (Equality of Mixed Partials)

Let $D \subseteq \mathbb{R}^n$ , $f : D \to \mathbb{R}$ , and $\vec{a} \in \operatorname{int}(D)$ . Suppose there is $\delta > 0$ such that $\partial f / \partial x_i(\vec{x})$ , $\partial f / \partial x_j(\vec{x})$ , and $\partial^2 f / \partial x_i \partial x_j(\vec{x})$ exist for every $\vec{x} \in B_\delta(\vec{a})$ , and the map $\vec{x} \mapsto \partial^2 f / \partial x_i \partial x_j(\vec{x})$ is continuous at $\vec{a}$ . Then $\partial^2 f / \partial x_j \partial x_i(\vec{a})$ exists and $\frac{\partial^2 f}{\partial x_i \partial x_j}(\vec{a}) = \frac{\partial^2 f}{\partial x_j \partial x_i}(\vec{a}).$

Remark.

Intuition: The

n

-variable theorem reduces to the

2

-variable lemma by freezing all coordinates except

x_i

and

x_j

. Once only two variables wiggle, the two-variable proof applies verbatim.

The Classes $C^k$

DefinitionC^k Functions

Let $U \subseteq \mathbb{R}^n$ be open. A function $\vec{F} : U \to \mathbb{R}^m$ is said to be of class $C^1$ on $U$ if all first-order partial derivatives of all component functions of $\vec{F}$ exist and are continuous on $U$ . More generally, for $k \geq 1$ , $\vec{F}$ is of class $C^k$ on $U$ if all partial derivatives of order $\leq k$ of all components of $\vec{F}$ exist and are continuous on $U$ . The function is of class $C^\infty$ (or smooth) if it is of class $C^k$ for every $k \geq 1$ .

Locally: $\vec{F}$ is of class $C^1$ at a point $\vec{a} \in \operatorname{int}(U)$ if there is an open ball around $\vec{a}$ on which every first-order partial of every component exists, and each such partial is continuous at $\vec{a}$ .

By convention, $C^0(U)$ denotes the continuous functions $U \to \mathbb{R}^m$ .

Remark.

Intuition:

C^k

is a hierarchy of smoothness classes:

C^0 \supset C^1 \supset C^2 \supset \cdots \supset C^\infty

. The

C^1

class is the natural setting for differential calculus, since continuous partials imply differentiability. The

C^2

class is where Clairaut applies to every mixed pair simultaneously, so the Hessian matrix is symmetric.

DefinitionReal-Analytic Functions

Let $U \subseteq \mathbb{R}^n$ be open. A function $f : U \to \mathbb{R}$ is real-analytic on $U$ , written $f \in C^\omega(U)$ , if for every $\vec{a} \in U$ there exists $r > 0$ such that for every $\vec{x}$ with $\|\vec{x} - \vec{a}\| < r$ , the Taylor series of $f$ at $\vec{a}$ converges and equals $f(\vec{x})$ : $f(\vec{x}) = \sum_{k = 0}^\infty \frac{1}{k!} \sum_{i_1, \ldots, i_k = 1}^n \frac{\partial^k f}{\partial x_{i_1} \cdots \partial x_{i_k}}(\vec{a}) (x_{i_1} - a_{i_1}) \cdots (x_{i_k} - a_{i_k}).$

Remark.

Intuition: Analytic functions are "locally polynomials of infinite degree." They form a strictly smaller class than

C^\infty

: the function

f(x) = e^{-1/x^2}

for

x \neq 0

f(0) = 0

, is

C^\infty

\mathbb{R}

with all derivatives vanishing at the origin, so its Taylor series at

0

is identically zero — yet

f \not\equiv 0

. Hence

f \in C^\infty \setminus C^\omega

. The hierarchy is

C^0 \supsetneq C^1 \supsetneq C^2 \supsetneq \cdots \supsetneq C^\infty \supsetneq C^\omega.

Polynomials, exponentials, sines, and cosines are all analytic; so are compositions, sums, products, and (where nonzero in the denominator) quotients of analytic functions.

CorollaryC^2 Implies Symmetric Mixed Partials

If $f : U \to \mathbb{R}$ is of class $C^2$ on an open set $U \subseteq \mathbb{R}^n$ , then for all $i, j \in \{1, \ldots, n\}$ and all $\vec{x} \in U$ , $\frac{\partial^2 f}{\partial x_i \partial x_j}(\vec{x}) = \frac{\partial^2 f}{\partial x_j \partial x_i}(\vec{x}).$ In particular, the $C^2$ mixed partials do not depend on the order of differentiation.

CorollaryC^1 Implies Differentiable

If $\vec{F} : U \to \mathbb{R}^m$ is of class $C^1$ on an open set $U$ , then $\vec{F}$ is differentiable at every $\vec{x} \in U$ .

The Mean Value Inequality for Vector-Valued Maps

In one variable, the Mean Value Theorem is an equality: $f(b) - f(a) = f'(c)(b - a)$ for some $c \in (a, b)$ . For vector-valued maps this equality fails in general — there is no single point $\vec{c}$ where the derivative reproduces the increment. (Consider $\vec{F}(t) = (\cos t, \sin t)$ on $[0, 2\pi]$ : $\vec{F}(2\pi) - \vec{F}(0) = \vec{0}$ , but $D\vec{F}(c) = (-\sin c, \cos c) \neq \vec{0}$ for any $c$ .) What survives is an inequality bounding the increment by the supremum of the operator norm of the derivative along the segment.

TheoremMean Value Inequality (Higher-Dimensional MVT)

Let $U \subseteq \mathbb{R}^n$ be open and $\vec{F} : U \to \mathbb{R}^m$ be differentiable on $U$ . Suppose the line segment $[\vec{p}, \vec{q}] := \{\vec{p} + t(\vec{q} - \vec{p}) : 0 \leq t \leq 1\}$ is contained in $U$ . Then $\|\vec{F}(\vec{q}) - \vec{F}(\vec{p})\| \leq M \|\vec{q} - \vec{p}\|, \qquad \text{where } M = \sup_{\vec{x} \in [\vec{p}, \vec{q}]} \|D\vec{F}(\vec{x})\|_{\mathrm{op}}.$

Remark.

Intuition: You cannot expect to write the increment as an exact derivative-times-displacement because different components of

\vec{F}

might be "going back and forth" at cross-purposes. But the total speed along the segment is at most

M

, and the distance covered is at most speed

\times

time. The operator norm

\|D\vec{F}(\vec{x})\|_{\mathrm{op}}

is the largest stretch factor of the linear map

D\vec{F}(\vec{x})

CorollaryVanishing Derivative on a Convex Set

Let $U \subseteq \mathbb{R}^n$ be open and convex, and $\vec{F} : U \to \mathbb{R}^m$ differentiable on $U$ with $D\vec{F}(\vec{x}) = 0$ for every $\vec{x} \in U$ . Then $\vec{F}$ is constant on $U$ .

With partial derivatives, total derivatives, the chain rule, and a working notion of smoothness classes in hand, we are ready to tackle one of the deepest local-to-global results of differential calculus: the implicit function theorem, which tells us when an equation $\vec{F}(\vec{x}, \vec{y}) = \vec{0}$ can be solved locally for $\vec{y}$ in terms of $\vec{x}$ .

The One-Variable Derivative, Reconsidered

Directional and Partial Derivatives

Partial Derivatives Do Not Imply Continuity

The Total Derivative

The Jacobian Matrix

A Sufficient Condition: Continuous Partials

The Gradient

The Chain Rule

Higher-Order Partial Derivatives

Equality of Mixed Partials: Clairaut's Theorem

The Classes CkC^kCk

The Mean Value Inequality for Vector-Valued Maps

The Classes $C^k$