ACE 328/Chapter 11

Taylor Expansion and Extrema

Taylor expansions for functions of several variables, with integral and Lagrange remainders. Critical points, the Hessian matrix, and the second derivative test for classifying extrema.

In this chapter we develop the higher-order differential calculus of scalar-valued functions of several real variables. After recalling the partial and total derivatives, we prove Clairaut's theorem on the equality of mixed partials, derive the multivariable Taylor expansion with both integral and Lagrange remainders, and use this expansion to analyze critical points via the Hessian matrix. The culmination is the second derivative test, which classifies nondegenerate critical points as local minima, local maxima, or saddle points via the spectral theorem for symmetric matrices.


Partial Derivatives and the Total Derivative

We briefly recall the two notions of derivative for a function of several variables.

DefinitionPartial Derivative

Let URnU \subseteq \mathbb{R}^n be open, f:URf : U \to \mathbb{R}, and aU\vec{a} \in U. For j{1,,n}j \in \{1, \dots, n\}, the jj-th partial derivative of ff at a\vec{a} is fxj(a):=limt0f(a+tej)f(a)t,\frac{\partial f}{\partial x_j}(\vec{a}) := \lim_{t \to 0} \frac{f(\vec{a} + t \vec{e}_j) - f(\vec{a})}{t}, whenever this limit exists, where ej\vec{e}_j is the jj-th standard basis vector.

Remark.

Intuition: The partial derivative f/xj\partial f / \partial x_j measures the rate of change of ff in the direction of the jj-th coordinate axis. All other variables are held fixed, so this reduces to a single-variable derivative.

DefinitionDifferentiability and the Total Derivative

Let URnU \subseteq \mathbb{R}^n open and f:URf : U \to \mathbb{R}. We say ff is differentiable at aU\vec{a} \in U if there exists a linear map Df(a):RnRDf(\vec{a}) : \mathbb{R}^n \to \mathbb{R} such that limh0f(a+h)f(a)Df(a)(h)h=0.\lim_{\vec{h} \to \vec{0}} \frac{|f(\vec{a} + \vec{h}) - f(\vec{a}) - Df(\vec{a})(\vec{h})|}{\|\vec{h}\|} = 0. The linear map Df(a)Df(\vec{a}) is called the total derivative (or differential) of ff at a\vec{a}.

Remark.

Intuition: Differentiability is a stronger notion than the existence of partial derivatives. It asks for a single linear map which approximates ff well in every direction simultaneously. When ff is differentiable, Df(a)Df(\vec{a}) is represented by the gradient: Df(a)(h)=f(a)h=j=1nfxj(a)hj.Df(\vec{a})(\vec{h}) = \nabla f(\vec{a}) \cdot \vec{h} = \sum_{j=1}^{n} \frac{\partial f}{\partial x_j}(\vec{a}) h_j.

TheoremContinuously Differentiable Implies Differentiable

If all partial derivatives of f:URf : U \to \mathbb{R} exist and are continuous on UU (that is, fC1(U)f \in C^1(U)), then ff is differentiable at every point of UU.


Higher Order Partial Derivatives

We now iterate the process of taking partial derivatives.

DefinitionHigher Order Partial Derivatives

Let f:URf : U \to \mathbb{R} with URnU \subseteq \mathbb{R}^n open. If f/xj\partial f / \partial x_j exists on a neighbourhood of a\vec{a} and is itself differentiable in direction xix_i at a\vec{a}, we define the second order partial derivative 2fxixj(a):=xi(fxj)(a).\frac{\partial^2 f}{\partial x_i \, \partial x_j}(\vec{a}) := \frac{\partial}{\partial x_i}\left( \frac{\partial f}{\partial x_j}\right)(\vec{a}). By iteration, for a multi-index (k1,,kn)(k_1, \dots, k_n) with k1++kn=kk_1 + \cdots + k_n = k we define kfx1k1xnkn(a).\frac{\partial^k f}{\partial x_1^{k_1} \cdots \partial x_n^{k_n}}(\vec{a}). We say ff is of class CrC^r on UU (written fCr(U)f \in C^r(U)) if all partial derivatives of ff up to order rr exist and are continuous on UU.

Remark.

Intuition: Class CrC^r functions have rr continuous derivatives in every mix of directions. The notation 2f/xixj\partial^2 f / \partial x_i \, \partial x_j means "first differentiate with respect to xjx_j, then with respect to xix_i." A priori, this is different from reversing the order. Clairaut's theorem tells us that sufficient smoothness makes the order irrelevant.

TheoremClairaut — Equality of Mixed Partials

Let URnU \subseteq \mathbb{R}^n open and f:URf : U \to \mathbb{R}. Suppose f/xi\partial f / \partial x_i, f/xj\partial f / \partial x_j, 2f/xixj\partial^2 f / \partial x_i \, \partial x_j, and 2f/xjxi\partial^2 f / \partial x_j \, \partial x_i all exist and are continuous on UU. Then 2fxixj(a)=2fxjxi(a)for every aU.\frac{\partial^2 f}{\partial x_i \, \partial x_j}(\vec{a}) = \frac{\partial^2 f}{\partial x_j \, \partial x_i}(\vec{a}) \qquad \text{for every } \vec{a} \in U. In particular, if fC2(U)f \in C^2(U), mixed partials commute.

Remark.

Intuition: Differentiation with respect to different variables can be performed in any order, provided the second derivatives are continuous. This turns the Hessian matrix into a symmetric matrix, which will be crucial for the spectral argument in the second derivative test.


Multivariable Taylor Expansion

We first recall the single-variable Taylor theorem, then lift it to several variables by restricting ff to a line.

TheoremSingle-Variable Taylor Theorem with Integral Remainder

Let IRI \subseteq \mathbb{R} open, f:IRf : I \to \mathbb{R} of class Cr+1C^{r+1} for some r0r \geq 0. Let aIa \in I and hRh \in \mathbb{R} with a+hIa + h \in I. Then f(a+h)=j=0rf(j)(a)j!hj+R~a,r(h),f(a+h) = \sum_{j=0}^{r} \frac{f^{(j)}(a)}{j!} h^{j} + \widetilde R_{a, r}(h), where the integral remainder is R~a,r(h)=hr+1r!01f(r+1)(a+th)(1t)rdt.\widetilde R_{a, r}(h) = \frac{h^{r+1}}{r!} \int_{0}^{1} f^{(r+1)}(a + th)(1 - t)^{r} \, dt.

Remark.

Intuition: The Taylor polynomial approximates ff at aa to order rr, and the remainder records the error. The integral form of the remainder has a clean inductive proof by repeated integration by parts and is the cleanest remainder to lift to several variables.

We now use this to expand a multivariable function by restricting to a line.

TheoremMultivariable Taylor Theorem

Let URnU \subseteq \mathbb{R}^n open and f:URf : U \to \mathbb{R} of class Cr+1C^{r+1}. Let aU\vec{a} \in U and hRn\vec{h} \in \mathbb{R}^n small enough that a+thU\vec{a} + t\vec{h} \in U for all t[1,1]t \in [-1, 1]. Then f(a+h)==0r1![Lhf](a)+Ra,r(h),f(\vec{a} + \vec{h}) = \sum_{\ell = 0}^{r} \frac{1}{\ell!} \left[\mathcal{L}_{\vec{h}}^{\ell} f\right](\vec{a}) + R_{\vec{a}, r}(\vec{h}), where Lh:=i=1nhi/xi\mathcal{L}_{\vec{h}} := \sum_{i=1}^n h_i \, \partial/\partial x_i is the directional differential operator, and Ra,r(h)=1r!01[Lhr+1f](a+th)(1t)rdt.R_{\vec{a}, r}(\vec{h}) = \frac{1}{r!} \int_0^1 \left[\mathcal{L}_{\vec{h}}^{r+1} f\right](\vec{a} + t\vec{h})(1 - t)^{r}\, dt.

Remark.

Intuition: The multivariable Taylor polynomial is obtained by applying the "total directional derivative" operator Lh\mathcal{L}_{\vec{h}} repeatedly. Each application of Lh\mathcal{L}_{\vec{h}} brings down a factor of h\vec{h} by the chain rule, producing all mixed partials weighted by appropriate products of the components of h\vec{h}.

Second-Order Expansion and the Hessian

The case r=2r = 2 (so fC3f \in C^3) is the one we need for the second derivative test. Expanding Lhf=ihif/xi\mathcal{L}_{\vec{h}} f = \sum_i h_i \partial f/\partial x_i and Lh2f=i,jhihj2f/xixj\mathcal{L}_{\vec{h}}^{2} f = \sum_{i, j} h_i h_j \, \partial^2 f/\partial x_i \partial x_j, we obtain:

CorollaryTaylor Expansion to Second Order

Let f:URf : U \to \mathbb{R} of class C3C^3 and aU\vec{a} \in U. Then for small h\vec{h}, f(a+h)=f(a)+f(a)h+12hTHf(a)h+Ra,2(h),f(\vec{a} + \vec{h}) = f(\vec{a}) + \nabla f(\vec{a}) \cdot \vec{h} + \tfrac{1}{2}\, \vec{h}^{\,T} H_f(\vec{a}) \vec{h} + R_{\vec{a}, 2}(\vec{h}), where Hf(a):=(2fxixj(a))i,j=1,,nH_f(\vec{a}) := \left(\frac{\partial^2 f}{\partial x_i \partial x_j}(\vec{a})\right)_{i, j = 1, \dots, n} is the Hessian matrix of ff at a\vec{a}.

Remark.

Intuition: The Hessian plays the role of the second derivative for a scalar-valued function of several variables. Because fC3C2f \in C^3 \subseteq C^2, Clairaut's theorem makes Hf(a)H_f(\vec{a}) symmetric. The quadratic form h12hTHf(a)h\vec{h} \mapsto \tfrac{1}{2}\vec{h}^{\,T} H_f(\vec{a}) \vec{h} captures the leading-order curvature of ff at a\vec{a}.

Bound on the Remainder

TheoremLagrange-Type Bound on the Second-Order Remainder

Suppose fC3(U)f \in C^3(U) and all third order partial derivatives of ff are bounded in absolute value by MM on a neighbourhood of a\vec{a}. Then for h\vec{h} sufficiently small, Ra,2(h)n3M6h3,|R_{\vec{a}, 2}(\vec{h})| \leq \frac{n^{3} M}{6} \|\vec{h}\|_{\infty}^{3}, where h=max{h1,,hn}\|\vec{h}\|_{\infty} = \max\{|h_1|, \dots, |h_n|\}.

ExampleTaylor Expansion of a Polynomial at a Point

Let f:R2Rf : \mathbb{R}^2 \to \mathbb{R}, f(x,y)=xy2+2xyf(x, y) = xy^2 + 2xy. We compute the Taylor expansion of ff to second order about (1,1)(1, -1).

The partial derivatives at (1,1)(1, -1) are: fx(x,y)=y2+2y,fy(x,y)=2xy+2x,2fx2=0,2fxy=2y+2,2fy2=2x.\frac{\partial f}{\partial x}(x, y) = y^2 + 2y, \quad \frac{\partial f}{\partial y}(x, y) = 2xy + 2x, \quad \frac{\partial^2 f}{\partial x^2} = 0, \quad \frac{\partial^2 f}{\partial x \partial y} = 2y + 2, \quad \frac{\partial^2 f}{\partial y^2} = 2x. Evaluating at (1,1)(1, -1): f(1,1)=(10),Hf(1,1)=(0002).\nabla f(1, -1) = \begin{pmatrix} -1 \\ 0 \end{pmatrix}, \qquad H_f(1, -1) = \begin{pmatrix} 0 & 0 \\ 0 & 2 \end{pmatrix}. Noting f(1,1)=11+21(1)=1f(1, -1) = 1 \cdot 1 + 2 \cdot 1 \cdot (-1) = -1, the second-order Taylor expansion reads f(1+h,1+k)=1h+k2+R(1,1),2(h,k).f(1 + h, -1 + k) = -1 - h + k^{2} + R_{(1,-1), 2}(h, k). Since ff is a polynomial, we may compute the remainder exactly: f(1+h,1+k)=(1+h)(1+k)2+2(1+h)(1+k)=1h+k2+hk2,f(1 + h, -1 + k) = (1 + h)(-1 + k)^2 + 2(1 + h)(-1 + k) = -1 - h + k^2 + h k^2, so R(1,1),2(h,k)=hk2R_{(1,-1), 2}(h, k) = h k^{2}, which is a cubic monomial and hence not part of the second-order Taylor polynomial.

For an upper bound: all third-order partials of ff equal 00 except 3f/xy2=2\partial^3 f/\partial x \partial y^2 = 2. So M=2M = 2, and with n=2n = 2, R(1,1),2(h,k)2326(max{h,k})3=83(max{h,k})3.|R_{(1, -1), 2}(h, k)| \leq \frac{2^3 \cdot 2}{6} (\max\{|h|, |k|\})^3 = \frac{8}{3}(\max\{|h|, |k|\})^3.

Numerical consequence. If h,k110|h|, |k| \leq \tfrac{1}{10}, then without computing the remainder explicitly we know R(1,1),2(h,k)831103=0.0026,|R_{(1, -1), 2}(h, k)| \leq \frac{8}{3} \cdot \frac{1}{10^3} = 0.002\overline{6}, so the true value of f(1+h,1+k)f(1 + h, -1 + k) differs from the Taylor approximation 1h+k2-1 - h + k^{2} by at most 0.0030.003. This is the practical payoff of the Lagrange bound: we can guarantee a quantitative error bound for the approximation without ever computing the error explicitly.


Relative Extrema and Critical Points

DefinitionRelative Extrema and Saddle Points

Let URnU \subseteq \mathbb{R}^n open, f:URf : U \to \mathbb{R}, aU\vec{a} \in U.

  • ff has a relative (local) minimum at a\vec{a} iff f(x)f(a)f(\vec{x}) \geq f(\vec{a}) for all x\vec{x} in some neighbourhood of a\vec{a}.
  • ff has a relative (local) maximum at a\vec{a} iff f(x)f(a)f(\vec{x}) \leq f(\vec{a}) for all x\vec{x} in some neighbourhood of a\vec{a}.
  • ff has a saddle point at a\vec{a} iff in every neighbourhood of a\vec{a} there exist x\vec{x} and z\vec{z} with f(z)<f(a)<f(x)f(\vec{z}) < f(\vec{a}) < f(\vec{x}).
  • ff has a relative extremum at a\vec{a} iff it has a relative min or max at a\vec{a}.
Remark.

Intuition: The first two cases extend the familiar local min/max from calculus. A saddle point sits in between: along some directions ff increases away from a\vec{a}, along others it decreases. Saddles are the genuinely higher-dimensional phenomenon.

We first recall the single-variable version, since the multivariable result reduces to it along each coordinate direction.

TheoremSingle-Variable Fermat Theorem

Let f:(a,b)Rf : (a, b) \to \mathbb{R} and suppose ff has either a relative maximum or a relative minimum at c(a,b)c \in (a, b). If f(c)f'(c) exists, then f(c)=0f'(c) = 0.

TheoremVanishing Gradient at Interior Extrema

Let URnU \subseteq \mathbb{R}^n open, f:URf : U \to \mathbb{R}. If ff has a relative extremum at aU\vec{a} \in U and ff is differentiable at a\vec{a}, then f(a)=0\nabla f(\vec{a}) = \vec{0}.

Remark.

Intuition: This is the natural multivariable Fermat theorem. If we can approach a\vec{a} from every direction and ff has an extremum there, then the directional derivative must vanish in every direction — equivalently, the gradient vanishes.

DefinitionCritical Point

A point aU\vec{a} \in U is a critical point of f:URf : U \to \mathbb{R} iff f(a)=0\nabla f(\vec{a}) = \vec{0}.

Remark.

Intuition: Critical points are the candidates for local extrema and saddle points. The vanishing gradient condition is necessary but not sufficient: not every critical point is an extremum.


The Hessian and Definiteness

DefinitionPositive/Negative (Semi-)Definite, Indefinite

Let AA be a symmetric n×nn \times n real matrix. We say AA is

  • positive definite iff hTAh>0\vec{h}^{\,T} A \vec{h} > 0 for all h0\vec{h} \neq \vec{0};
  • negative definite iff hTAh<0\vec{h}^{\,T} A \vec{h} < 0 for all h0\vec{h} \neq \vec{0};
  • positive semi-definite iff hTAh0\vec{h}^{\,T} A \vec{h} \geq 0 for all h\vec{h};
  • negative semi-definite iff hTAh0\vec{h}^{\,T} A \vec{h} \leq 0 for all h\vec{h};
  • indefinite iff there exist h+,hRn\vec{h}_+, \vec{h}_- \in \mathbb{R}^n with h+TAh+>0\vec{h}_+^{\,T} A \vec{h}_+ > 0 and hTAh<0\vec{h}_-^{\,T} A \vec{h}_- < 0.
Remark.

Intuition: The sign of the quadratic form hhTAh\vec{h} \mapsto \vec{h}^{\,T} A \vec{h} encodes how curvature bends. Positive definite matrices curve upward in every direction, negative definite curve downward. Indefinite matrices curve up in some directions and down in others — the hallmark of a saddle.

TheoremDefiniteness via Eigenvalues

Let AA be a symmetric n×nn \times n real matrix with eigenvalues λ1,,λn\lambda_1, \dots, \lambda_n. Then

  • AA is positive definite iff λj>0\lambda_j > 0 for all jj;
  • AA is negative definite iff λj<0\lambda_j < 0 for all jj;
  • AA is indefinite iff AA has both a positive and a negative eigenvalue;
  • AA is nonsingular iff no eigenvalue equals 00, i.e. iff detA0\det A \neq 0.

The Second Derivative Test

We can now state and prove the multivariable second derivative test, which is the main application of the Taylor expansion.

TheoremSecond Derivative Test

Let URnU \subseteq \mathbb{R}^n open and fC3(U)f \in C^3(U). Let aU\vec{a} \in U be a critical point of ff (so f(a)=0\nabla f(\vec{a}) = \vec{0}) and assume Hf(a)H_f(\vec{a}) is nonsingular. Then:

(i) If Hf(a)H_f(\vec{a}) is positive definite, ff has a relative minimum at a\vec{a}.

(ii) If Hf(a)H_f(\vec{a}) is negative definite, ff has a relative maximum at a\vec{a}.

(iii) If Hf(a)H_f(\vec{a}) is indefinite, ff has a saddle point at a\vec{a}.

Remark.

Intuition: Because f(a)=0\nabla f(\vec{a}) = \vec{0}, the leading order behaviour of ff near a\vec{a} is controlled by the quadratic form 12hTHf(a)h\tfrac{1}{2} \vec{h}^{\,T} H_f(\vec{a}) \vec{h}. The remainder is of order h3\|\vec{h}\|^3 and becomes negligible for small h\vec{h}. The definiteness of the Hessian therefore determines whether ff curves up, down, or in different directions, locally.

Remark.

Intuition: When detHf(a)=0\det H_f(\vec{a}) = 0, the test is inconclusive: higher-order terms in the Taylor expansion are needed to decide the nature of the critical point. Examples such as f(x,y)=x4+y4f(x, y) = x^4 + y^4, f(x,y)=x4y4f(x, y) = -x^4 - y^4, and f(x,y)=x4y4f(x, y) = x^4 - y^4 at the origin show that a critical point with zero Hessian determinant may be a min, max, or saddle.

Remark.

Computational note. Checking whether detHf(a)0\det H_f(\vec{a}) \neq 0 and computing the eigenvalues (or equivalently the signs of the leading principal minors, by Sylvester's criterion) can be done quite easily numerically provided nn is not too large. In two variables, with Hf=(ABBC)H_f = \begin{pmatrix} A & B \\ B & C \end{pmatrix}, the test simplifies to:

  • detHf=ACB2>0\det H_f = AC - B^2 > 0 and A>0A > 0: local minimum.
  • detHf=ACB2>0\det H_f = AC - B^2 > 0 and A<0A < 0: local maximum.
  • detHf<0\det H_f < 0: saddle.
  • detHf=0\det H_f = 0: inconclusive.

A Worked Example

ExampleClassifying Critical Points of a Polynomial

Classify all critical points of the function f:R2Rf : \mathbb{R}^2 \to \mathbb{R} given by f(x,y)=x4+y32x2+y2.f(x, y) = x^4 + y^3 - 2 x^2 + y^2.

Step 1: Find the critical points. We have fx=4x34x=4x(x21),fy=3y2+2y=y(3y+2).\frac{\partial f}{\partial x} = 4 x^3 - 4 x = 4 x (x^2 - 1), \qquad \frac{\partial f}{\partial y} = 3 y^2 + 2 y = y(3 y + 2). Setting both to zero gives x{1,0,1}x \in \{-1, 0, 1\} and y{23,0}y \in \{-\tfrac{2}{3}, 0\}, so the six critical points are (0,0),(0,23),(1,0),(1,23),(1,0),(1,23).(0, 0), \quad (0, -\tfrac{2}{3}), \quad (1, 0), \quad (1, -\tfrac{2}{3}), \quad (-1, 0), \quad (-1, -\tfrac{2}{3}).

Step 2: Compute the Hessian. We have 2fx2=12x24,2fxy=0,2fy2=6y+2.\frac{\partial^2 f}{\partial x^2} = 12 x^2 - 4, \quad \frac{\partial^2 f}{\partial x \partial y} = 0, \quad \frac{\partial^2 f}{\partial y^2} = 6 y + 2. Thus Hf(x,y)=(12x24006y+2)H_f(x, y) = \begin{pmatrix} 12 x^2 - 4 & 0 \\ 0 & 6 y + 2 \end{pmatrix}, which is diagonal, so its eigenvalues are the diagonal entries.

Step 3: Classify each point.

  • At (0,0)(0, 0): Hf=diag(4,2)H_f = \operatorname{diag}(-4, 2), indefinite, so (0,0)(0, 0) is a saddle point.
  • At (0,23)(0, -\tfrac{2}{3}): Hf=diag(4,2)H_f = \operatorname{diag}(-4, -2), negative definite, so it is a relative maximum.
  • At (±1,0)(\pm 1, 0): Hf=diag(8,2)H_f = \operatorname{diag}(8, 2), positive definite, so these are relative minima.
  • At (±1,23)(\pm 1, -\tfrac{2}{3}): Hf=diag(8,2)H_f = \operatorname{diag}(8, -2), indefinite, so these are saddle points.
Remark.

Intuition: In low dimensions the Hessian is diagonal or 2×22 \times 2, and one can often see the classification at a glance. In higher dimensions one computes eigenvalues (or equivalently the principal minors) numerically. The key fact is that the second derivative test reduces the multivariable problem to an eigenvalue problem for a symmetric matrix.