ACE 328/Chapter 11

Taylor Expansion and Extrema

Taylor expansions for functions of several variables, with integral and Lagrange remainders. Critical points, the Hessian matrix, and the second derivative test for classifying extrema.

In this chapter we develop the higher-order differential calculus of scalar-valued functions of several real variables. After recalling the partial and total derivatives, we prove Clairaut's theorem on the equality of mixed partials, derive the multivariable Taylor expansion with both integral and Lagrange remainders, and use this expansion to analyze critical points via the Hessian matrix. The culmination is the second derivative test, which classifies nondegenerate critical points as local minima, local maxima, or saddle points via the spectral theorem for symmetric matrices.

Partial Derivatives and the Total Derivative

We briefly recall the two notions of derivative for a function of several variables.

DefinitionPartial Derivative

Let $U \subseteq \mathbb{R}^n$ be open, $f : U \to \mathbb{R}$ , and $\vec{a} \in U$ . For $j \in \{1, \dots, n\}$ , the $j$ -th partial derivative of $f$ at $\vec{a}$ is $\frac{\partial f}{\partial x_j}(\vec{a}) := \lim_{t \to 0} \frac{f(\vec{a} + t \vec{e}_j) - f(\vec{a})}{t},$ whenever this limit exists, where $\vec{e}_j$ is the $j$ -th standard basis vector.

Remark.

Intuition: The partial derivative $\partial f / \partial x_j$ measures the rate of change of $f$ in the direction of the $j$ -th coordinate axis. All other variables are held fixed, so this reduces to a single-variable derivative.

DefinitionDifferentiability and the Total Derivative

Let $U \subseteq \mathbb{R}^n$ open and $f : U \to \mathbb{R}$ . We say $f$ is differentiable at $\vec{a} \in U$ if there exists a linear map $Df(\vec{a}) : \mathbb{R}^n \to \mathbb{R}$ such that $\lim_{\vec{h} \to \vec{0}} \frac{|f(\vec{a} + \vec{h}) - f(\vec{a}) - Df(\vec{a})(\vec{h})|}{\|\vec{h}\|} = 0.$ The linear map $Df(\vec{a})$ is called the total derivative (or differential) of $f$ at $\vec{a}$ .

Remark.

Intuition: Differentiability is a stronger notion than the existence of partial derivatives. It asks for a single linear map which approximates $f$ well in every direction simultaneously. When $f$ is differentiable, $Df(\vec{a})$ is represented by the gradient: $Df(\vec{a})(\vec{h}) = \nabla f(\vec{a}) \cdot \vec{h} = \sum_{j=1}^{n} \frac{\partial f}{\partial x_j}(\vec{a}) h_j.$

TheoremContinuously Differentiable Implies Differentiable

If all partial derivatives of $f : U \to \mathbb{R}$ exist and are continuous on $U$ (that is, $f \in C^1(U)$ ), then $f$ is differentiable at every point of $U$ .

Higher Order Partial Derivatives

We now iterate the process of taking partial derivatives.

DefinitionHigher Order Partial Derivatives

Let $f : U \to \mathbb{R}$ with $U \subseteq \mathbb{R}^n$ open. If $\partial f / \partial x_j$ exists on a neighbourhood of $\vec{a}$ and is itself differentiable in direction $x_i$ at $\vec{a}$ , we define the second order partial derivative $\frac{\partial^2 f}{\partial x_i \, \partial x_j}(\vec{a}) := \frac{\partial}{\partial x_i}\left( \frac{\partial f}{\partial x_j}\right)(\vec{a}).$ By iteration, for a multi-index $(k_1, \dots, k_n)$ with $k_1 + \cdots + k_n = k$ we define $\frac{\partial^k f}{\partial x_1^{k_1} \cdots \partial x_n^{k_n}}(\vec{a}).$ We say $f$ is of class $C^r$ on $U$ (written $f \in C^r(U)$ ) if all partial derivatives of $f$ up to order $r$ exist and are continuous on $U$ .

Remark.

Intuition: Class $C^r$ functions have $r$ continuous derivatives in every mix of directions. The notation $\partial^2 f / \partial x_i \, \partial x_j$ means "first differentiate with respect to $x_j$ , then with respect to $x_i$ ." A priori, this is different from reversing the order. Clairaut's theorem tells us that sufficient smoothness makes the order irrelevant.

TheoremClairaut — Equality of Mixed Partials

Let $U \subseteq \mathbb{R}^n$ open and $f : U \to \mathbb{R}$ . Suppose $\partial f / \partial x_i$ , $\partial f / \partial x_j$ , $\partial^2 f / \partial x_i \, \partial x_j$ , and $\partial^2 f / \partial x_j \, \partial x_i$ all exist and are continuous on $U$ . Then $\frac{\partial^2 f}{\partial x_i \, \partial x_j}(\vec{a}) = \frac{\partial^2 f}{\partial x_j \, \partial x_i}(\vec{a}) \qquad \text{for every } \vec{a} \in U.$ In particular, if $f \in C^2(U)$ , mixed partials commute.

Remark.

Intuition: Differentiation with respect to different variables can be performed in any order, provided the second derivatives are continuous. This turns the Hessian matrix into a symmetric matrix, which will be crucial for the spectral argument in the second derivative test.

Multivariable Taylor Expansion

We first recall the single-variable Taylor theorem, then lift it to several variables by restricting $f$ to a line.

TheoremSingle-Variable Taylor Theorem with Integral Remainder

Let $I \subseteq \mathbb{R}$ open, $f : I \to \mathbb{R}$ of class $C^{r+1}$ for some $r \geq 0$ . Let $a \in I$ and $h \in \mathbb{R}$ with $a + h \in I$ . Then $f(a+h) = \sum_{j=0}^{r} \frac{f^{(j)}(a)}{j!} h^{j} + \widetilde R_{a, r}(h),$ where the integral remainder is $\widetilde R_{a, r}(h) = \frac{h^{r+1}}{r!} \int_{0}^{1} f^{(r+1)}(a + th)(1 - t)^{r} \, dt.$

Remark.

Intuition: The Taylor polynomial approximates $f$ at $a$ to order $r$ , and the remainder records the error. The integral form of the remainder has a clean inductive proof by repeated integration by parts and is the cleanest remainder to lift to several variables.

We now use this to expand a multivariable function by restricting to a line.

TheoremMultivariable Taylor Theorem

Let $U \subseteq \mathbb{R}^n$ open and $f : U \to \mathbb{R}$ of class $C^{r+1}$ . Let $\vec{a} \in U$ and $\vec{h} \in \mathbb{R}^n$ small enough that $\vec{a} + t\vec{h} \in U$ for all $t \in [-1, 1]$ . Then $f(\vec{a} + \vec{h}) = \sum_{\ell = 0}^{r} \frac{1}{\ell!} \left[\mathcal{L}_{\vec{h}}^{\ell} f\right](\vec{a}) + R_{\vec{a}, r}(\vec{h}),$ where $\mathcal{L}_{\vec{h}} := \sum_{i=1}^n h_i \, \partial/\partial x_i$ is the directional differential operator, and $R_{\vec{a}, r}(\vec{h}) = \frac{1}{r!} \int_0^1 \left[\mathcal{L}_{\vec{h}}^{r+1} f\right](\vec{a} + t\vec{h})(1 - t)^{r}\, dt.$

Remark.

Intuition: The multivariable Taylor polynomial is obtained by applying the "total directional derivative" operator $\mathcal{L}_{\vec{h}}$ repeatedly. Each application of $\mathcal{L}_{\vec{h}}$ brings down a factor of $\vec{h}$ by the chain rule, producing all mixed partials weighted by appropriate products of the components of $\vec{h}$ .

Second-Order Expansion and the Hessian

The case $r = 2$ (so $f \in C^3$ ) is the one we need for the second derivative test. Expanding $\mathcal{L}_{\vec{h}} f = \sum_i h_i \partial f/\partial x_i$ and $\mathcal{L}_{\vec{h}}^{2} f = \sum_{i, j} h_i h_j \, \partial^2 f/\partial x_i \partial x_j$ , we obtain:

CorollaryTaylor Expansion to Second Order

Let $f : U \to \mathbb{R}$ of class $C^3$ and $\vec{a} \in U$ . Then for small $\vec{h}$ , $f(\vec{a} + \vec{h}) = f(\vec{a}) + \nabla f(\vec{a}) \cdot \vec{h} + \tfrac{1}{2}\, \vec{h}^{\,T} H_f(\vec{a}) \vec{h} + R_{\vec{a}, 2}(\vec{h}),$ where $H_f(\vec{a}) := \left(\frac{\partial^2 f}{\partial x_i \partial x_j}(\vec{a})\right)_{i, j = 1, \dots, n}$ is the Hessian matrix of $f$ at $\vec{a}$ .

Remark.

Intuition: The Hessian plays the role of the second derivative for a scalar-valued function of several variables. Because $f \in C^3 \subseteq C^2$ , Clairaut's theorem makes $H_f(\vec{a})$ symmetric. The quadratic form $\vec{h} \mapsto \tfrac{1}{2}\vec{h}^{\,T} H_f(\vec{a}) \vec{h}$ captures the leading-order curvature of $f$ at $\vec{a}$ .

Bound on the Remainder

TheoremLagrange-Type Bound on the Second-Order Remainder

Suppose $f \in C^3(U)$ and all third order partial derivatives of $f$ are bounded in absolute value by $M$ on a neighbourhood of $\vec{a}$ . Then for $\vec{h}$ sufficiently small, $|R_{\vec{a}, 2}(\vec{h})| \leq \frac{n^{3} M}{6} \|\vec{h}\|_{\infty}^{3},$ where $\|\vec{h}\|_{\infty} = \max\{|h_1|, \dots, |h_n|\}$ .

ExampleTaylor Expansion of a Polynomial at a Point

Let $f : \mathbb{R}^2 \to \mathbb{R}$ , $f(x, y) = xy^2 + 2xy$ . We compute the Taylor expansion of $f$ to second order about $(1, -1)$ .

The partial derivatives at $(1, -1)$ are: $\frac{\partial f}{\partial x}(x, y) = y^2 + 2y, \quad \frac{\partial f}{\partial y}(x, y) = 2xy + 2x, \quad \frac{\partial^2 f}{\partial x^2} = 0, \quad \frac{\partial^2 f}{\partial x \partial y} = 2y + 2, \quad \frac{\partial^2 f}{\partial y^2} = 2x.$ Evaluating at $(1, -1)$ : $\nabla f(1, -1) = \begin{pmatrix} -1 \\ 0 \end{pmatrix}, \qquad H_f(1, -1) = \begin{pmatrix} 0 & 0 \\ 0 & 2 \end{pmatrix}.$ Noting $f(1, -1) = 1 \cdot 1 + 2 \cdot 1 \cdot (-1) = -1$ , the second-order Taylor expansion reads $f(1 + h, -1 + k) = -1 - h + k^{2} + R_{(1,-1), 2}(h, k).$ Since $f$ is a polynomial, we may compute the remainder exactly: $f(1 + h, -1 + k) = (1 + h)(-1 + k)^2 + 2(1 + h)(-1 + k) = -1 - h + k^2 + h k^2,$ so $R_{(1,-1), 2}(h, k) = h k^{2}$ , which is a cubic monomial and hence not part of the second-order Taylor polynomial.

For an upper bound: all third-order partials of $f$ equal $0$ except $\partial^3 f/\partial x \partial y^2 = 2$ . So $M = 2$ , and with $n = 2$ , $|R_{(1, -1), 2}(h, k)| \leq \frac{2^3 \cdot 2}{6} (\max\{|h|, |k|\})^3 = \frac{8}{3}(\max\{|h|, |k|\})^3.$

Numerical consequence. If $|h|, |k| \leq \tfrac{1}{10}$ , then without computing the remainder explicitly we know $|R_{(1, -1), 2}(h, k)| \leq \frac{8}{3} \cdot \frac{1}{10^3} = 0.002\overline{6},$ so the true value of $f(1 + h, -1 + k)$ differs from the Taylor approximation $-1 - h + k^{2}$ by at most $0.003$ . This is the practical payoff of the Lagrange bound: we can guarantee a quantitative error bound for the approximation without ever computing the error explicitly.

Relative Extrema and Critical Points

DefinitionRelative Extrema and Saddle Points

Let $U \subseteq \mathbb{R}^n$ open, $f : U \to \mathbb{R}$ , $\vec{a} \in U$ .

$f$ has a relative (local) minimum at $\vec{a}$ iff $f(\vec{x}) \geq f(\vec{a})$ for all $\vec{x}$ in some neighbourhood of $\vec{a}$ .
$f$ has a relative (local) maximum at $\vec{a}$ iff $f(\vec{x}) \leq f(\vec{a})$ for all $\vec{x}$ in some neighbourhood of $\vec{a}$ .
$f$ has a saddle point at $\vec{a}$ iff in every neighbourhood of $\vec{a}$ there exist $\vec{x}$ and $\vec{z}$ with $f(\vec{z}) < f(\vec{a}) < f(\vec{x})$ .
$f$ has a relative extremum at $\vec{a}$ iff it has a relative min or max at $\vec{a}$ .

Remark.

Intuition: The first two cases extend the familiar local min/max from calculus. A saddle point sits in between: along some directions $f$ increases away from $\vec{a}$ , along others it decreases. Saddles are the genuinely higher-dimensional phenomenon.

We first recall the single-variable version, since the multivariable result reduces to it along each coordinate direction.

TheoremSingle-Variable Fermat Theorem

Let $f : (a, b) \to \mathbb{R}$ and suppose $f$ has either a relative maximum or a relative minimum at $c \in (a, b)$ . If $f'(c)$ exists, then $f'(c) = 0$ .

TheoremVanishing Gradient at Interior Extrema

Let $U \subseteq \mathbb{R}^n$ open, $f : U \to \mathbb{R}$ . If $f$ has a relative extremum at $\vec{a} \in U$ and $f$ is differentiable at $\vec{a}$ , then $\nabla f(\vec{a}) = \vec{0}$ .

Remark.

Intuition: This is the natural multivariable Fermat theorem. If we can approach $\vec{a}$ from every direction and $f$ has an extremum there, then the directional derivative must vanish in every direction — equivalently, the gradient vanishes.

DefinitionCritical Point

A point $\vec{a} \in U$ is a critical point of $f : U \to \mathbb{R}$ iff $\nabla f(\vec{a}) = \vec{0}$ .

Remark.

Intuition: Critical points are the candidates for local extrema and saddle points. The vanishing gradient condition is necessary but not sufficient: not every critical point is an extremum.

The Hessian and Definiteness

DefinitionPositive/Negative (Semi-)Definite, Indefinite

Let $A$ be a symmetric $n \times n$ real matrix. We say $A$ is

positive definite iff $\vec{h}^{\,T} A \vec{h} > 0$ for all $\vec{h} \neq \vec{0}$ ;
negative definite iff $\vec{h}^{\,T} A \vec{h} < 0$ for all $\vec{h} \neq \vec{0}$ ;
positive semi-definite iff $\vec{h}^{\,T} A \vec{h} \geq 0$ for all $\vec{h}$ ;
negative semi-definite iff $\vec{h}^{\,T} A \vec{h} \leq 0$ for all $\vec{h}$ ;
indefinite iff there exist $\vec{h}_+, \vec{h}_- \in \mathbb{R}^n$ with $\vec{h}_+^{\,T} A \vec{h}_+ > 0$ and $\vec{h}_-^{\,T} A \vec{h}_- < 0$ .

Remark.

Intuition: The sign of the quadratic form $\vec{h} \mapsto \vec{h}^{\,T} A \vec{h}$ encodes how curvature bends. Positive definite matrices curve upward in every direction, negative definite curve downward. Indefinite matrices curve up in some directions and down in others — the hallmark of a saddle.

TheoremDefiniteness via Eigenvalues

Let $A$ be a symmetric $n \times n$ real matrix with eigenvalues $\lambda_1, \dots, \lambda_n$ . Then

$A$ is positive definite iff $\lambda_j > 0$ for all $j$ ;
$A$ is negative definite iff $\lambda_j < 0$ for all $j$ ;
$A$ is indefinite iff $A$ has both a positive and a negative eigenvalue;
$A$ is nonsingular iff no eigenvalue equals $0$ , i.e. iff $\det A \neq 0$ .

The Second Derivative Test

We can now state and prove the multivariable second derivative test, which is the main application of the Taylor expansion.

TheoremSecond Derivative Test

Let $U \subseteq \mathbb{R}^n$ open and $f \in C^3(U)$ . Let $\vec{a} \in U$ be a critical point of $f$ (so $\nabla f(\vec{a}) = \vec{0}$ ) and assume $H_f(\vec{a})$ is nonsingular. Then:

(i) If $H_f(\vec{a})$ is positive definite, $f$ has a relative minimum at $\vec{a}$ .

(ii) If $H_f(\vec{a})$ is negative definite, $f$ has a relative maximum at $\vec{a}$ .

(iii) If $H_f(\vec{a})$ is indefinite, $f$ has a saddle point at $\vec{a}$ .

Remark.

Intuition: Because $\nabla f(\vec{a}) = \vec{0}$ , the leading order behaviour of $f$ near $\vec{a}$ is controlled by the quadratic form $\tfrac{1}{2} \vec{h}^{\,T} H_f(\vec{a}) \vec{h}$ . The remainder is of order $\|\vec{h}\|^3$ and becomes negligible for small $\vec{h}$ . The definiteness of the Hessian therefore determines whether $f$ curves up, down, or in different directions, locally.

Remark.

Intuition: When $\det H_f(\vec{a}) = 0$ , the test is inconclusive: higher-order terms in the Taylor expansion are needed to decide the nature of the critical point. Examples such as $f(x, y) = x^4 + y^4$ , $f(x, y) = -x^4 - y^4$ , and $f(x, y) = x^4 - y^4$ at the origin show that a critical point with zero Hessian determinant may be a min, max, or saddle.

Remark.

Computational note. Checking whether $\det H_f(\vec{a}) \neq 0$ and computing the eigenvalues (or equivalently the signs of the leading principal minors, by Sylvester's criterion) can be done quite easily numerically provided $n$ is not too large. In two variables, with $H_f = \begin{pmatrix} A & B \\ B & C \end{pmatrix}$ , the test simplifies to:

$\det H_f = AC - B^2 > 0$ and $A > 0$ : local minimum.
$\det H_f = AC - B^2 > 0$ and $A < 0$ : local maximum.
$\det H_f < 0$ : saddle.
$\det H_f = 0$ : inconclusive.

A Worked Example

ExampleClassifying Critical Points of a Polynomial

Classify all critical points of the function $f : \mathbb{R}^2 \to \mathbb{R}$ given by $f(x, y) = x^4 + y^3 - 2 x^2 + y^2.$

Step 1: Find the critical points. We have $\frac{\partial f}{\partial x} = 4 x^3 - 4 x = 4 x (x^2 - 1), \qquad \frac{\partial f}{\partial y} = 3 y^2 + 2 y = y(3 y + 2).$ Setting both to zero gives $x \in \{-1, 0, 1\}$ and $y \in \{-\tfrac{2}{3}, 0\}$ , so the six critical points are $(0, 0), \quad (0, -\tfrac{2}{3}), \quad (1, 0), \quad (1, -\tfrac{2}{3}), \quad (-1, 0), \quad (-1, -\tfrac{2}{3}).$

Step 2: Compute the Hessian. We have $\frac{\partial^2 f}{\partial x^2} = 12 x^2 - 4, \quad \frac{\partial^2 f}{\partial x \partial y} = 0, \quad \frac{\partial^2 f}{\partial y^2} = 6 y + 2.$ Thus $H_f(x, y) = \begin{pmatrix} 12 x^2 - 4 & 0 \\ 0 & 6 y + 2 \end{pmatrix}$ , which is diagonal, so its eigenvalues are the diagonal entries.

Step 3: Classify each point.

At $(0, 0)$ : $H_f = \operatorname{diag}(-4, 2)$ , indefinite, so $(0, 0)$ is a saddle point.
At $(0, -\tfrac{2}{3})$ : $H_f = \operatorname{diag}(-4, -2)$ , negative definite, so it is a relative maximum.
At $(\pm 1, 0)$ : $H_f = \operatorname{diag}(8, 2)$ , positive definite, so these are relative minima.
At $(\pm 1, -\tfrac{2}{3})$ : $H_f = \operatorname{diag}(8, -2)$ , indefinite, so these are saddle points.

Remark.

Intuition: In low dimensions the Hessian is diagonal or $2 \times 2$ , and one can often see the classification at a glance. In higher dimensions one computes eigenvalues (or equivalently the principal minors) numerically. The key fact is that the second derivative test reduces the multivariable problem to an eigenvalue problem for a symmetric matrix.