ACE 328/Chapter 12

Lagrange Multipliers

Optimizing a function subject to equality constraints. The method of Lagrange multipliers, derived via the implicit function theorem. Extends to multiple constraints with a full-rank Jacobian condition.

We now turn from unconstrained optimization (where the critical point condition is f=0\nabla f = \vec{0}) to constrained optimization, where we optimize ff subject to one or more constraints of the form g(x)=0g(\vec{x}) = 0. The method of Lagrange multipliers expresses the necessary condition for an interior constrained extremum as a geometric statement: at such a point, the gradient of ff must lie in the span of the gradients of the constraints. We derive the single-constraint theorem from the implicit function theorem, generalize to several constraints with a full-rank Jacobian hypothesis, and illustrate with worked examples.


Constrained Extrema

DefinitionConstrained Extremum

Let URnU \subseteq \mathbb{R}^n open, f:URf : U \to \mathbb{R}, and SUS \subseteq U. We say v0S\vec{v}_0 \in S is a constrained extremum for ff (subject to the constraint SS) iff the restriction fS:SRf|_S : S \to \mathbb{R} has a relative extremum at v0\vec{v}_0. That is, there is a neighbourhood VV of v0\vec{v}_0 in Rn\mathbb{R}^n such that f(x)f(v0)f(\vec{x}) \geq f(\vec{v}_0) (or \leq) for all xSV\vec{x} \in S \cap V.

Remark.

Intuition: A constrained extremum need not be an unconstrained extremum. The constraint set SS is typically a level set of one or more functions gig_i; the question is how ff behaves when we are forced to stay on SS. For example, the function f(x,y)=x+yf(x, y) = x + y has no unconstrained maximum on R2\mathbb{R}^2, but it has a constrained maximum on the unit circle S={x2+y2=1}S = \{x^2 + y^2 = 1\}.


Geometric Motivation

Suppose S={(x,y)R2:g(x,y)=0}S = \{(x, y) \in \mathbb{R}^2 : g(x, y) = 0\} is a smooth curve in R2\mathbb{R}^2 and ff has a constrained extremum at aS\vec{a} \in S. Let γ:(δ,δ)S\gamma : (-\delta, \delta) \to S be a smooth path with γ(0)=a\gamma(0) = \vec{a}. Then tf(γ(t))t \mapsto f(\gamma(t)) has a relative extremum at t=0t = 0, so by the chain rule and the single-variable Fermat theorem, 0=(fγ)(0)=f(a)γ(0).0 = (f \circ \gamma)'(0) = \nabla f(\vec{a}) \cdot \gamma'(0). Thus f(a)\nabla f(\vec{a}) is orthogonal to the tangent direction γ(0)\gamma'(0) to the curve SS at a\vec{a}. On the other hand, g(a)\nabla g(\vec{a}) is always orthogonal to the level curve {g=0}\{g = 0\} at a\vec{a} (assuming g(a)0\nabla g(\vec{a}) \neq \vec{0}). Since both f(a)\nabla f(\vec{a}) and g(a)\nabla g(\vec{a}) are orthogonal to the tangent line to SS at a\vec{a}, and the tangent line is one-dimensional, the two gradients must be parallel: f(a)=λg(a)for some λR.\nabla f(\vec{a}) = \lambda \, \nabla g(\vec{a}) \qquad \text{for some } \lambda \in \mathbb{R}. The scalar λ\lambda is called the Lagrange multiplier.

ExampleA Line Constrained to a Circle

Maximize f(x,y)=x+yf(x, y) = x + y subject to the constraint g(x,y)=x2+y21=0g(x, y) = x^2 + y^2 - 1 = 0.

Compute f(x,y)=(1,1)\nabla f(x, y) = (1, 1) and g(x,y)=(2x,2y)\nabla g(x, y) = (2x, 2y). The Lagrange condition f=λg\nabla f = \lambda \nabla g at an extremum on the circle reads (1,1)=λ(2x,2y)x=y  (with λ0).(1, 1) = \lambda (2x, 2y) \qquad \Longleftrightarrow \qquad x = y \;(\text{with } \lambda \neq 0). Intersecting x=yx = y with the constraint x2+y2=1x^2 + y^2 = 1 gives the two candidate points (1/2,1/2)(1/\sqrt{2}, 1/\sqrt{2}) and (1/2,1/2)(-1/\sqrt{2}, -1/\sqrt{2}), where ff takes the values 2\sqrt{2} and 2-\sqrt{2} respectively.

Global extrema. The constraint set S={x2+y2=1}S = \{x^2 + y^2 = 1\} is compact (closed and bounded in R2\mathbb{R}^2) and ff is continuous, so by the extreme value theorem fSf|_S attains both a global max and a global min on SS. Any such global extremum must satisfy the Lagrange condition (since g0\nabla g \neq \vec{0} everywhere on SS), so the candidates above include them. Therefore the maximum is 2\sqrt{2} at (1/2,1/2)(1/\sqrt{2}, 1/\sqrt{2}) and the minimum is 2-\sqrt{2} at (1/2,1/2)(-1/\sqrt{2}, -1/\sqrt{2}).

Remark.

Global versus local extrema. The Lagrange condition is necessary for a constrained extremum (given the regularity g0\nabla g \neq \vec{0}), not sufficient. To confirm a candidate is a global max or min, we typically argue via compactness: if SS is compact and ff continuous, fSf|_S attains its extrema, and these extrema must appear among the Lagrange candidates. For non-compact constraint sets, additional reasoning (coercivity, asymptotic behaviour) is needed to decide global extrema.


Single-Constraint Lagrange Multiplier Theorem

We now prove this rigorously using the implicit function theorem.

TheoremLagrange Multiplier Theorem — Single Constraint

Let URnU \subseteq \mathbb{R}^n open and f,g:URf, g : U \to \mathbb{R} be C1C^1 functions. Let v0U\vec{v}_0 \in U with g(v0)=0g(\vec{v}_0) = 0 and g(v0)0\nabla g(\vec{v}_0) \neq \vec{0}. Set S:={xU:g(x)=0}S := \{\vec{x} \in U : g(\vec{x}) = 0\}. If ff, subject to the constraint SS, has a relative extremum at v0\vec{v}_0, then there exists λR\lambda \in \mathbb{R} such that f(v0)=λg(v0).\nabla f(\vec{v}_0) = \lambda \, \nabla g(\vec{v}_0).

Remark.

Intuition: The hypothesis g(v0)0\nabla g(\vec{v}_0) \neq \vec{0} guarantees that the constraint set SS is a smooth (n1)(n-1)-dimensional manifold near v0\vec{v}_0 (by the implicit function theorem). When this fails, the geometric argument breaks down and the conclusion need not hold. The scalar λ\lambda measures the proportionality between f\nabla f and g\nabla g at v0\vec{v}_0.

The Lagrangian

It is often convenient to combine the necessary condition into a single system of equations via an auxiliary function.

DefinitionLagrangian for One Constraint

Given f,g:URf, g : U \to \mathbb{R}, the Lagrangian is the function L:U×RR\mathcal{L} : U \times \mathbb{R} \to \mathbb{R} defined by L(x,λ):=f(x)λg(x).\mathcal{L}(\vec{x}, \lambda) := f(\vec{x}) - \lambda\, g(\vec{x}).

Remark.

Intuition: Setting xL=0\nabla_{\vec{x}} \mathcal{L} = \vec{0} recovers f=λg\nabla f = \lambda \nabla g. Setting L/λ=0\partial \mathcal{L}/\partial \lambda = 0 recovers the constraint g(x)=0g(\vec{x}) = 0. So finding the constrained critical points reduces to finding the unconstrained critical points of L\mathcal{L}.

ExampleConstrained Extrema on a Sphere

Find the maximum and minimum of f(x,y,z)=x2y+2zf(x, y, z) = x - 2y + 2z subject to x2+y2+z2=9x^2 + y^2 + z^2 = 9.

Set g(x,y,z)=x2+y2+z29g(x, y, z) = x^2 + y^2 + z^2 - 9. Form the Lagrangian L(x,y,z,λ)=x2y+2zλ(x2+y2+z29),\mathcal{L}(x, y, z, \lambda) = x - 2y + 2z - \lambda(x^2 + y^2 + z^2 - 9), and solve L=0\nabla \mathcal{L} = \vec{0}: The first-order conditions are:

  • 12λx=01 - 2\lambda x = 0
  • 22λy=0-2 - 2\lambda y = 0
  • 22λz=02 - 2\lambda z = 0
  • x2+y2+z2=9x^2 + y^2 + z^2 = 9

Solving the first three for xx, yy, zz in terms of λ\lambda:

x=12λ,y=1λ,z=1λ.x = \tfrac{1}{2\lambda}, \qquad y = -\tfrac{1}{\lambda}, \qquad z = \tfrac{1}{\lambda}.

Substituting into the constraint:

14λ2+1λ2+1λ2=9.\tfrac{1}{4\lambda^2} + \tfrac{1}{\lambda^2} + \tfrac{1}{\lambda^2} = 9. The last equation gives 94λ2=9\tfrac{9}{4 \lambda^2} = 9, so λ=±12\lambda = \pm \tfrac{1}{2}. The corresponding points are (1,2,2)(1, -2, 2) (when λ=12\lambda = \tfrac{1}{2}) and (1,2,2)(-1, 2, -2) (when λ=12\lambda = -\tfrac{1}{2}). Evaluating, f(1,2,2)=1+4+4=9,f(1,2,2)=144=9.f(1, -2, 2) = 1 + 4 + 4 = 9, \qquad f(-1, 2, -2) = -1 - 4 - 4 = -9. Since the sphere is compact and ff is continuous, the maximum and minimum are attained: the maximum is 99 at (1,2,2)(1, -2, 2) and the minimum is 9-9 at (1,2,2)(-1, 2, -2).


Multiple Constraints: The Rank Condition

We now generalize to kk constraints g1(x)==gk(x)=0g_1(\vec{x}) = \cdots = g_k(\vec{x}) = 0 with 0<k<n0 < k < n. The key hypothesis is that the Jacobian of g=(g1,,gk)\vec{g} = (g_1, \dots, g_k) has full rank kk at the critical point.

TheoremLagrange Multiplier Theorem — Several Constraints

Let URnU \subseteq \mathbb{R}^n open and f:URf : U \to \mathbb{R} of class C1C^1. Let 0<k<n0 < k < n and let g=(g1,,gk):URk\vec{g} = (g_1, \dots, g_k) : U \to \mathbb{R}^k be C1C^1. Set S:={xU:g(x)=0}=i=1k{xU:gi(x)=0}.S := \{\vec{x} \in U : \vec{g}(\vec{x}) = \vec{0}\} = \bigcap_{i=1}^{k} \{\vec{x} \in U : g_i(\vec{x}) = 0\}. Suppose ff, subject to the constraints g=0\vec{g} = \vec{0}, has a constrained relative extremum at v0S\vec{v}_0 \in S, and the k×nk \times n Jacobian matrix Jg(v0)=(gixj(v0))i=1,,kj=1,,nJ_{\vec{g}}(\vec{v}_0) = \left(\frac{\partial g_i}{\partial x_j}(\vec{v}_0)\right)_{\substack{i = 1, \dots, k \\ j = 1, \dots, n}} has rank kk. Then there exist λ1,,λkR\lambda_1, \dots, \lambda_k \in \mathbb{R} such that f(v0)=λ1g1(v0)++λkgk(v0).\nabla f(\vec{v}_0) = \lambda_1 \nabla g_1(\vec{v}_0) + \cdots + \lambda_k \nabla g_k(\vec{v}_0).

Remark.

Intuition: The full-rank hypothesis says the kk constraints are independent at v0\vec{v}_0: the gradient vectors g1(v0),,gk(v0)\nabla g_1(\vec{v}_0), \dots, \nabla g_k(\vec{v}_0) are linearly independent, so the constraint set SS is locally a smooth (nk)(n - k)-dimensional manifold. The conclusion asserts that f(v0)\nabla f(\vec{v}_0) lies in the span of these gradient vectors — equivalently, in the normal space to SS at v0\vec{v}_0.

Remark.

Why the rank condition matters: If the Jacobian fails to have rank kk at v0\vec{v}_0, then the conclusion may fail even for smooth data. A classical example is f(x,y)=xf(x, y) = x with constraints g1(x,y)=yg_1(x, y) = y and g2(x,y)=yx2g_2(x, y) = y - x^2. The constraint set is {(0,0)}\{(0, 0)\}, at which the Jacobian has rank 11 (not 22). The gradient of ff at the origin is (1,0)(1, 0), but the gradients of g1,g2g_1, g_2 at the origin are (0,1)(0, 1) and (0,1)(0, 1) — neither direction contains (1,0)(1, 0) in its span.

The Multi-Constraint Lagrangian

DefinitionLagrangian for Several Constraints

The Lagrangian for the problem of optimizing ff subject to g1==gk=0g_1 = \cdots = g_k = 0 is the function of n+kn + k variables L(x,λ1,,λk):=f(x)λ1g1(x)λkgk(x).\mathcal{L}(\vec{x}, \lambda_1, \dots, \lambda_k) := f(\vec{x}) - \lambda_1 g_1(\vec{x}) - \cdots - \lambda_k g_k(\vec{x}).

Remark.

Intuition: Solving L=0\nabla \mathcal{L} = \vec{0} (in all n+kn + k variables) is equivalent to the system: f(x)=j=1kλjgj(x)\nabla f(\vec{x}) = \sum_{j=1}^{k} \lambda_j \nabla g_j(\vec{x}) together with the kk constraint equations g1(x)=0,,gk(x)=0g_1(\vec{x}) = 0, \ldots, g_k(\vec{x}) = 0. At solutions where Jg(x)J_{\vec{g}}(\vec{x}) has rank kk, these are the candidate constrained extrema.


Worked Example with Two Constraints

ExampleDistance from the Origin to a Space Curve

Consider the curve CC in R3\mathbb{R}^3 defined as the intersection of the surfaces z2=x2+y2z^2 = x^2 + y^2 and x2z=3x - 2z = 3. Find the distance from the origin to the closest point on CC.

Setup. Minimizing the distance is equivalent to minimizing its square. Let f(x,y,z)=x2+y2+z2f(x, y, z) = x^2 + y^2 + z^2 and set g1(x,y,z)=x2+y2z2,g2(x,y,z)=x2z3.g_1(x, y, z) = x^2 + y^2 - z^2, \qquad g_2(x, y, z) = x - 2z - 3. The Lagrangian is L(x,y,z,λ,μ)=fλg1μg2.\mathcal{L}(x, y, z, \lambda, \mu) = f - \lambda g_1 - \mu g_2.

The Lagrange system. Setting L=0\nabla \mathcal{L} = \vec{0} gives:

  • 2x=2λx+μ2x = 2\lambda x + \mu
  • 2y=2λy2y = 2\lambda y
  • 2z=2λz2μ2z = -2\lambda z - 2\mu
  • z2=x2+y2z^2 = x^2 + y^2 (constraint 1)
  • x2z=3x - 2z = 3 (constraint 2) The second equation gives (1λ)y=0(1 - \lambda) y = 0, so either λ=1\lambda = 1 or y=0y = 0.

Case λ=1\lambda = 1. The first equation becomes 2x=2x+μ2x = 2x + \mu, so μ=0\mu = 0. The third becomes 2z=2z2z = -2z, so z=0z = 0. The constraint x2z=3x - 2z = 3 gives x=3x = 3, and the constraint z2=x2+y2z^2 = x^2 + y^2 gives 0=9+y20 = 9 + y^2, which has no real solution. No solutions.

Case y=0y = 0, λ1\lambda \neq 1. Rewrite the first and third equations as 2(1λ)x=μ,(1+λ)2z=2μ(1+λ)z=μ.2(1 - \lambda) x = \mu, \qquad (1 + \lambda)\cdot 2z = -2\mu \quad \Longleftrightarrow \quad (1 + \lambda) z = -\mu. The fourth becomes z2=x2z^2 = x^2, i.e. z=±xz = \pm x.

Subcase z=xz = x. Then x2z=x=3x - 2z = -x = 3, so x=3x = -3 and z=3z = -3. The point is (3,0,3)(-3, 0, -3).

Subcase z=xz = -x. Then x2z=3x=3x - 2z = 3x = 3, so x=1x = 1 and z=1z = -1. The point is (1,0,1)(1, 0, -1).

Rank check. At both points, Jg=(2x2y2z102).J_{\vec{g}} = \begin{pmatrix} 2x & 2y & -2z \\ 1 & 0 & -2 \end{pmatrix}. At (3,0,3)(-3, 0, -3), this is (606102)\begin{pmatrix} -6 & 0 & 6 \\ 1 & 0 & -2 \end{pmatrix}, and the first and third columns give det(6612)=60\det\begin{pmatrix} -6 & 6 \\ 1 & -2 \end{pmatrix} = 6 \neq 0, so rank 22. At (1,0,1)(1, 0, -1), the matrix is (202102)\begin{pmatrix} 2 & 0 & 2 \\ 1 & 0 & -2 \end{pmatrix} and the first/third columns have det=60\det = -6 \neq 0, so rank 22 as well.

Values. f(3,0,3)=9+0+9=18,f(1,0,1)=1+0+1=2.f(-3, 0, -3) = 9 + 0 + 9 = 18, \qquad f(1, 0, -1) = 1 + 0 + 1 = 2. The constrained maximum of ff is 1818 at (3,0,3)(-3, 0, -3) (farthest point on CC) and the constrained minimum is 22 at (1,0,1)(1, 0, -1). The distance from the origin to the closest point on CC is 2\sqrt{2}.

Remark.

Intuition: The Lagrange system is typically nonlinear, and real-world problems require careful case analysis. The rank check is essential: without it, the Lagrange conditions are not guaranteed to hold at the extremum. Once the candidates are in hand, plug them into ff to identify which is the max and which is the min; use compactness of the constraint set (when available) or direct estimation to confirm that the extrema are attained.