ACE 328/Chapter 12

Lagrange Multipliers

Optimizing a function subject to equality constraints. The method of Lagrange multipliers, derived via the implicit function theorem. Extends to multiple constraints with a full-rank Jacobian condition.

We now turn from unconstrained optimization (where the critical point condition is $\nabla f = \vec{0}$ ) to constrained optimization, where we optimize $f$ subject to one or more constraints of the form $g(\vec{x}) = 0$ . The method of Lagrange multipliers expresses the necessary condition for an interior constrained extremum as a geometric statement: at such a point, the gradient of $f$ must lie in the span of the gradients of the constraints. We derive the single-constraint theorem from the implicit function theorem, generalize to several constraints with a full-rank Jacobian hypothesis, and illustrate with worked examples.

Constrained Extrema

DefinitionConstrained Extremum

Let $U \subseteq \mathbb{R}^n$ open, $f : U \to \mathbb{R}$ , and $S \subseteq U$ . We say $\vec{v}_0 \in S$ is a constrained extremum for $f$ (subject to the constraint $S$ ) iff the restriction $f|_S : S \to \mathbb{R}$ has a relative extremum at $\vec{v}_0$ . That is, there is a neighbourhood $V$ of $\vec{v}_0$ in $\mathbb{R}^n$ such that $f(\vec{x}) \geq f(\vec{v}_0)$ (or $\leq$ ) for all $\vec{x} \in S \cap V$ .

Remark.

Intuition: A constrained extremum need not be an unconstrained extremum. The constraint set $S$ is typically a level set of one or more functions $g_i$ ; the question is how $f$ behaves when we are forced to stay on $S$ . For example, the function $f(x, y) = x + y$ has no unconstrained maximum on $\mathbb{R}^2$ , but it has a constrained maximum on the unit circle $S = \{x^2 + y^2 = 1\}$ .

Geometric Motivation

Suppose $S = \{(x, y) \in \mathbb{R}^2 : g(x, y) = 0\}$ is a smooth curve in $\mathbb{R}^2$ and $f$ has a constrained extremum at $\vec{a} \in S$ . Let $\gamma : (-\delta, \delta) \to S$ be a smooth path with $\gamma(0) = \vec{a}$ . Then $t \mapsto f(\gamma(t))$ has a relative extremum at $t = 0$ , so by the chain rule and the single-variable Fermat theorem, $0 = (f \circ \gamma)'(0) = \nabla f(\vec{a}) \cdot \gamma'(0).$ Thus $\nabla f(\vec{a})$ is orthogonal to the tangent direction $\gamma'(0)$ to the curve $S$ at $\vec{a}$ . On the other hand, $\nabla g(\vec{a})$ is always orthogonal to the level curve $\{g = 0\}$ at $\vec{a}$ (assuming $\nabla g(\vec{a}) \neq \vec{0}$ ). Since both $\nabla f(\vec{a})$ and $\nabla g(\vec{a})$ are orthogonal to the tangent line to $S$ at $\vec{a}$ , and the tangent line is one-dimensional, the two gradients must be parallel: $\nabla f(\vec{a}) = \lambda \, \nabla g(\vec{a}) \qquad \text{for some } \lambda \in \mathbb{R}.$ The scalar $\lambda$ is called the Lagrange multiplier.

ExampleA Line Constrained to a Circle

Maximize $f(x, y) = x + y$ subject to the constraint $g(x, y) = x^2 + y^2 - 1 = 0$ .

Compute $\nabla f(x, y) = (1, 1)$ and $\nabla g(x, y) = (2x, 2y)$ . The Lagrange condition $\nabla f = \lambda \nabla g$ at an extremum on the circle reads $(1, 1) = \lambda (2x, 2y) \qquad \Longleftrightarrow \qquad x = y \;(\text{with } \lambda \neq 0).$ Intersecting $x = y$ with the constraint $x^2 + y^2 = 1$ gives the two candidate points $(1/\sqrt{2}, 1/\sqrt{2})$ and $(-1/\sqrt{2}, -1/\sqrt{2})$ , where $f$ takes the values $\sqrt{2}$ and $-\sqrt{2}$ respectively.

Global extrema. The constraint set $S = \{x^2 + y^2 = 1\}$ is compact (closed and bounded in $\mathbb{R}^2$ ) and $f$ is continuous, so by the extreme value theorem $f|_S$ attains both a global max and a global min on $S$ . Any such global extremum must satisfy the Lagrange condition (since $\nabla g \neq \vec{0}$ everywhere on $S$ ), so the candidates above include them. Therefore the maximum is $\sqrt{2}$ at $(1/\sqrt{2}, 1/\sqrt{2})$ and the minimum is $-\sqrt{2}$ at $(-1/\sqrt{2}, -1/\sqrt{2})$ .

Remark.

Global versus local extrema. The Lagrange condition is necessary for a constrained extremum (given the regularity $\nabla g \neq \vec{0}$ ), not sufficient. To confirm a candidate is a global max or min, we typically argue via compactness: if $S$ is compact and $f$ continuous, $f|_S$ attains its extrema, and these extrema must appear among the Lagrange candidates. For non-compact constraint sets, additional reasoning (coercivity, asymptotic behaviour) is needed to decide global extrema.

Single-Constraint Lagrange Multiplier Theorem

We now prove this rigorously using the implicit function theorem.

TheoremLagrange Multiplier Theorem — Single Constraint

Let $U \subseteq \mathbb{R}^n$ open and $f, g : U \to \mathbb{R}$ be $C^1$ functions. Let $\vec{v}_0 \in U$ with $g(\vec{v}_0) = 0$ and $\nabla g(\vec{v}_0) \neq \vec{0}$ . Set $S := \{\vec{x} \in U : g(\vec{x}) = 0\}$ . If $f$ , subject to the constraint $S$ , has a relative extremum at $\vec{v}_0$ , then there exists $\lambda \in \mathbb{R}$ such that $\nabla f(\vec{v}_0) = \lambda \, \nabla g(\vec{v}_0).$

Remark.

Intuition: The hypothesis $\nabla g(\vec{v}_0) \neq \vec{0}$ guarantees that the constraint set $S$ is a smooth $(n-1)$ -dimensional manifold near $\vec{v}_0$ (by the implicit function theorem). When this fails, the geometric argument breaks down and the conclusion need not hold. The scalar $\lambda$ measures the proportionality between $\nabla f$ and $\nabla g$ at $\vec{v}_0$ .

The Lagrangian

It is often convenient to combine the necessary condition into a single system of equations via an auxiliary function.

DefinitionLagrangian for One Constraint

Given $f, g : U \to \mathbb{R}$ , the Lagrangian is the function $\mathcal{L} : U \times \mathbb{R} \to \mathbb{R}$ defined by $\mathcal{L}(\vec{x}, \lambda) := f(\vec{x}) - \lambda\, g(\vec{x}).$

Remark.

Intuition: Setting $\nabla_{\vec{x}} \mathcal{L} = \vec{0}$ recovers $\nabla f = \lambda \nabla g$ . Setting $\partial \mathcal{L}/\partial \lambda = 0$ recovers the constraint $g(\vec{x}) = 0$ . So finding the constrained critical points reduces to finding the unconstrained critical points of $\mathcal{L}$ .

ExampleConstrained Extrema on a Sphere

Find the maximum and minimum of $f(x, y, z) = x - 2y + 2z$ subject to $x^2 + y^2 + z^2 = 9$ .

Set $g(x, y, z) = x^2 + y^2 + z^2 - 9$ . Form the Lagrangian $\mathcal{L}(x, y, z, \lambda) = x - 2y + 2z - \lambda(x^2 + y^2 + z^2 - 9),$ and solve $\nabla \mathcal{L} = \vec{0}$ : The first-order conditions are:

$1 - 2\lambda x = 0$
$-2 - 2\lambda y = 0$
$2 - 2\lambda z = 0$
$x^2 + y^2 + z^2 = 9$

Solving the first three for $x$ , $y$ , $z$ in terms of $\lambda$ :

$x = \tfrac{1}{2\lambda}, \qquad y = -\tfrac{1}{\lambda}, \qquad z = \tfrac{1}{\lambda}.$

Substituting into the constraint:

$\tfrac{1}{4\lambda^2} + \tfrac{1}{\lambda^2} + \tfrac{1}{\lambda^2} = 9.$ The last equation gives $\tfrac{9}{4 \lambda^2} = 9$ , so $\lambda = \pm \tfrac{1}{2}$ . The corresponding points are $(1, -2, 2)$ (when $\lambda = \tfrac{1}{2}$ ) and $(-1, 2, -2)$ (when $\lambda = -\tfrac{1}{2}$ ). Evaluating, $f(1, -2, 2) = 1 + 4 + 4 = 9, \qquad f(-1, 2, -2) = -1 - 4 - 4 = -9.$ Since the sphere is compact and $f$ is continuous, the maximum and minimum are attained: the maximum is $9$ at $(1, -2, 2)$ and the minimum is $-9$ at $(-1, 2, -2)$ .

Multiple Constraints: The Rank Condition

We now generalize to $k$ constraints $g_1(\vec{x}) = \cdots = g_k(\vec{x}) = 0$ with $0 < k < n$ . The key hypothesis is that the Jacobian of $\vec{g} = (g_1, \dots, g_k)$ has full rank $k$ at the critical point.

TheoremLagrange Multiplier Theorem — Several Constraints

Let $U \subseteq \mathbb{R}^n$ open and $f : U \to \mathbb{R}$ of class $C^1$ . Let $0 < k < n$ and let $\vec{g} = (g_1, \dots, g_k) : U \to \mathbb{R}^k$ be $C^1$ . Set $S := \{\vec{x} \in U : \vec{g}(\vec{x}) = \vec{0}\} = \bigcap_{i=1}^{k} \{\vec{x} \in U : g_i(\vec{x}) = 0\}.$ Suppose $f$ , subject to the constraints $\vec{g} = \vec{0}$ , has a constrained relative extremum at $\vec{v}_0 \in S$ , and the $k \times n$ Jacobian matrix $J_{\vec{g}}(\vec{v}_0) = \left(\frac{\partial g_i}{\partial x_j}(\vec{v}_0)\right)_{\substack{i = 1, \dots, k \\ j = 1, \dots, n}}$ has rank $k$ . Then there exist $\lambda_1, \dots, \lambda_k \in \mathbb{R}$ such that $\nabla f(\vec{v}_0) = \lambda_1 \nabla g_1(\vec{v}_0) + \cdots + \lambda_k \nabla g_k(\vec{v}_0).$

Remark.

Intuition: The full-rank hypothesis says the $k$ constraints are independent at $\vec{v}_0$ : the gradient vectors $\nabla g_1(\vec{v}_0), \dots, \nabla g_k(\vec{v}_0)$ are linearly independent, so the constraint set $S$ is locally a smooth $(n - k)$ -dimensional manifold. The conclusion asserts that $\nabla f(\vec{v}_0)$ lies in the span of these gradient vectors — equivalently, in the normal space to $S$ at $\vec{v}_0$ .

Remark.

Why the rank condition matters: If the Jacobian fails to have rank $k$ at $\vec{v}_0$ , then the conclusion may fail even for smooth data. A classical example is $f(x, y) = x$ with constraints $g_1(x, y) = y$ and $g_2(x, y) = y - x^2$ . The constraint set is $\{(0, 0)\}$ , at which the Jacobian has rank $1$ (not $2$ ). The gradient of $f$ at the origin is $(1, 0)$ , but the gradients of $g_1, g_2$ at the origin are $(0, 1)$ and $(0, 1)$ — neither direction contains $(1, 0)$ in its span.

The Multi-Constraint Lagrangian

DefinitionLagrangian for Several Constraints

The Lagrangian for the problem of optimizing $f$ subject to $g_1 = \cdots = g_k = 0$ is the function of $n + k$ variables $\mathcal{L}(\vec{x}, \lambda_1, \dots, \lambda_k) := f(\vec{x}) - \lambda_1 g_1(\vec{x}) - \cdots - \lambda_k g_k(\vec{x}).$

Remark.

Intuition: Solving $\nabla \mathcal{L} = \vec{0}$ (in all $n + k$ variables) is equivalent to the system: $\nabla f(\vec{x}) = \sum_{j=1}^{k} \lambda_j \nabla g_j(\vec{x})$ together with the $k$ constraint equations $g_1(\vec{x}) = 0, \ldots, g_k(\vec{x}) = 0$ . At solutions where $J_{\vec{g}}(\vec{x})$ has rank $k$ , these are the candidate constrained extrema.

Worked Example with Two Constraints

ExampleDistance from the Origin to a Space Curve

Consider the curve $C$ in $\mathbb{R}^3$ defined as the intersection of the surfaces $z^2 = x^2 + y^2$ and $x - 2z = 3$ . Find the distance from the origin to the closest point on $C$ .

Setup. Minimizing the distance is equivalent to minimizing its square. Let $f(x, y, z) = x^2 + y^2 + z^2$ and set $g_1(x, y, z) = x^2 + y^2 - z^2, \qquad g_2(x, y, z) = x - 2z - 3.$ The Lagrangian is $\mathcal{L}(x, y, z, \lambda, \mu) = f - \lambda g_1 - \mu g_2.$

The Lagrange system. Setting $\nabla \mathcal{L} = \vec{0}$ gives:

$2x = 2\lambda x + \mu$
$2y = 2\lambda y$
$2z = -2\lambda z - 2\mu$
$z^2 = x^2 + y^2$ (constraint 1)
$x - 2z = 3$ (constraint 2) The second equation gives $(1 - \lambda) y = 0$ , so either $\lambda = 1$ or $y = 0$ .

Case $\lambda = 1$ . The first equation becomes $2x = 2x + \mu$ , so $\mu = 0$ . The third becomes $2z = -2z$ , so $z = 0$ . The constraint $x - 2z = 3$ gives $x = 3$ , and the constraint $z^2 = x^2 + y^2$ gives $0 = 9 + y^2$ , which has no real solution. No solutions.

Case $y = 0$ , $\lambda \neq 1$ . Rewrite the first and third equations as $2(1 - \lambda) x = \mu, \qquad (1 + \lambda)\cdot 2z = -2\mu \quad \Longleftrightarrow \quad (1 + \lambda) z = -\mu.$ The fourth becomes $z^2 = x^2$ , i.e. $z = \pm x$ .

Subcase $z = x$ . Then $x - 2z = -x = 3$ , so $x = -3$ and $z = -3$ . The point is $(-3, 0, -3)$ .

Subcase $z = -x$ . Then $x - 2z = 3x = 3$ , so $x = 1$ and $z = -1$ . The point is $(1, 0, -1)$ .

Rank check. At both points, $J_{\vec{g}} = \begin{pmatrix} 2x & 2y & -2z \\ 1 & 0 & -2 \end{pmatrix}.$ At $(-3, 0, -3)$ , this is $\begin{pmatrix} -6 & 0 & 6 \\ 1 & 0 & -2 \end{pmatrix}$ , and the first and third columns give $\det\begin{pmatrix} -6 & 6 \\ 1 & -2 \end{pmatrix} = 6 \neq 0$ , so rank $2$ . At $(1, 0, -1)$ , the matrix is $\begin{pmatrix} 2 & 0 & 2 \\ 1 & 0 & -2 \end{pmatrix}$ and the first/third columns have $\det = -6 \neq 0$ , so rank $2$ as well.

Values. $f(-3, 0, -3) = 9 + 0 + 9 = 18, \qquad f(1, 0, -1) = 1 + 0 + 1 = 2.$ The constrained maximum of $f$ is $18$ at $(-3, 0, -3)$ (farthest point on $C$ ) and the constrained minimum is $2$ at $(1, 0, -1)$ . The distance from the origin to the closest point on $C$ is $\sqrt{2}$ .

Remark.

Intuition: The Lagrange system is typically nonlinear, and real-world problems require careful case analysis. The rank check is essential: without it, the Lagrange conditions are not guaranteed to hold at the extremum. Once the candidates are in hand, plug them into $f$ to identify which is the max and which is the min; use compactness of the constraint set (when available) or direct estimation to confirm that the extrema are attained.