Hello World, the Lagrange Multiplier Theorem, and Kuhn-Tucker conditions July 26, 2009

Posted by Phi. Isett in Calculus, economics, Uncategorized.
Tags: , , , , , ,

This is a blog which was started on a whim in order to host a very extensive reply to a blog entry of Terence Tao.  This entry is not the reply — the next entry is.

I may continue this blog to contain pieces of mathematics that I find pretty, important and/or not so commonly known.  In any case, I’d feel silly devoting an entire blog to a single reply.  So here.. I’ll take this opportunity to include my favorite proof of the Lagrange multiplier theorem, and later on maybe I’ll post a nice proof of Stirling’s formula, a group algebraic proof of the Riemann-Lebesgue lemma, or something…

The Lagrange Multiplier Theorem helps us solve constrained maximization / minimization problems, making it (among other things) extremely important in economics.  A (weak form of) the Lagrange Multiplier theorem can be stated as follows:

Let $f : \mathbb{R}^n \to \mathbb{R}$ be a real-valued, continuously differentiable function (called the “objective function” — the thing we want to maximize or minimize), and let $g : \mathbb{R}^n \to \mathbb{R}^m$ be continuously differentiable (called the “constraint function” for reasons which will soon be clear).

If $x_0$ is an extremizer of the restriction of $f$ to the (codimension $m$) “constraint manifold” given by $M \equiv \{ x : g(x) = 0 \}$, then there is a unique linear functional $\lambda : \mathbb{R}^m \to \mathbb{R}$ satisfying the equality of linear maps $Df(x_0) = \lambda \circ Dg(x_0)$.

We assume $m \leq n$ — things tend to be more complicated when the number of constraints exceeds the number of degrees of freedom.

Proof:

We know that $f$ must satisfy some first order condition: namely (by further restricting $f$ to trajectories on $M$) that $Df(x_0)v = 0$ for any $v \in \mathbb{R}^n$ which can be realized as a velocity at $x_0$ by a path within the constraint manifold. An exercise in the implicit function theorem shows that these velocities include all of $\mbox{ Ker } Dg(x_0)$ (these are all such velocities — they are the directions in which one can move without changing $g$, and together they are called the “tangent space” of $M$ at $x_0$ under generic circumstances).

Finally, now that $\mbox{ Ker } Dg(x_0) \subseteq \mbox{ Ker } Df(x_0)$, it is a trivial exercise in linear algebra to show that there is a unique linear functional $\lambda : \mathbb{R}^m \to \mathbb{R}$ defined by the identity $Df(x_0) = \lambda \circ Dg(x_0)$.  Basically, we have shown that the zero set of the linear map $Dg(x_0)$ is contained in the zero set of $Df(x_0)$, but because the two maps are linear, their level sets are all translations of the zero level set, and therefore we conclude that every level set of $Dg(x_0)$ is contained in a level set of $Df(x_0)$.

Does anyone know how to turn off the annoying preview thing for hyperlinks..? Never mind, got it.

Edit (29 Aug, 2009):

The same ideas can be used to give a proof of the Kuhn-Tucker conditions for a constrained maximization problem where  we replace the system of equalities $\{ g(x) = 0 \}$ with the system of inequalities $\{ g_i(x) \leq 0 | i =1, \ldots, m \}$.  We simply replace the role of trajectories within the constraint manifold by trajectories going into the constraint set.

Without loss of generality, we may assume a maximizer belongs to the boundary where  the constraints bind (i.e. $g(x_0) = 0$), for if one constraint does not bind (say $g_m(x_0) < 0$), we may restrict our domain to the open set $\{ g_m(x_0) < 0 \}$ and simply remove this component from the constraint function, thereby decreasing $m$ (the corresponding component of the Lagrange multiplier will therefore be zero).  Thus, we can picture the point $x_0$ to be a point of transverse intersection of $m$ hypersurfaces which together form the boundary of the constraint set (at least nearby $x_0$) — some kind of corner.

We now modify the proof of the Lagrange multiplier theorem by showing also that $Dg(x_0)v \leq 0 \Rightarrow Df(x_0) v \leq 0$ at a maximizer $x_0$ where $g(x_0) = 0$ — but this is easy to see.  The inequality $Dg(x_0)v \leq 0$ means (in this situation) that the velocity $v$ points into the constraint set $\{ g_1(x), \ldots, g_m(x) \leq 0\}$ since no constraint function increases in this direction.  Therefore we cannot have $f$ increasing in this direction at a constrained maximum, so we cannot have $\frac{f(x_0 + \epsilon v) - f(x_0)}{\epsilon} \sim Df(x_0) v > 0$.

Again, this argument relies on applying the implicit function theorem to $g$ at the point $x_0$ in order to assure that all such velocities for which $Dg(x_0)v \leq 0$ can be achieved as a velocity at $x_0$ of a trajectory which travels into the constraint set, but like many other applications of the implicit function theorem, this fact is intuitively clear when one pictures the generic situation.

The monotonicity property we have just established — $Dg(x_0)v \leq 0 \Rightarrow Df(x_0) v \leq 0$ — is equivalent to the non-negativity of the functional $\lambda$ satisfying $\lambda \circ Dg(x_0) = Df(x_0)$ .  Note here that the Lagrange multiplier theorem still applies (i.e. there is a $\lambda$) since $x_0$ still maximizes the objective function over the smaller constraint set $\{ g(x) = 0 \}$ after we have removed the constraints which do not bind.

The above extension of the Lagrange multiplier theorem is due to Karush and the resulting conditions are sometimes referred to as “Kuhn-Tucker” conditions (or “Karush-Kuhn-Tucker conditions”).  One runs into such inequality constraints at every corner in economics.  For example, a consumer cannot buy more than she can afford (unless she lives in the US?), nor can a firm produce more than its available production technologies and resources allow…

I should comment on how one can more directly see the conclusions of the Lagrange multiplier theorem and the Karush-Kuhn-Tucker conditions. First the Lagrange multiplier theorem, where we return to the setting of a codimension m constraint manifold $\{ g_1(x) = \ldots = g_m(x) = 0 \}$.

The Lagrange multiplier theorem asserts that $Df(x_0) = \lambda_1 Dg_1(x_0) + \ldots + \lambda_m Dg_m(x_0)$; in other words, the derivative of $f$ (which points in the direction of increase of $f$) lies in the linear space spanned by the derivatives of the constraint functions.  This span is exactly the orthogonal complement of the tangent space at $x_0$ to the constraint manifold.  If $Df(x_0)$ did not lie entirely in this orthogonal complement, there would be a nontrivial projection of $Df(x_0)$ in a direction tangent to the constraint manifold.  Choosing a trajectory in the boundary of the constraint set which achieves this projection as a velocity, the objective function $f$ would increase along this trajectory at $x_0$, which cannot occur at a constrained maximizer/minimizer.

To see the necessity of the Karush-Kuhn-Tucker conditions directly (that $\lambda_i \geq 0$), it is easiest to visualize a corner formed where two lines in a plane intersect transversely , or a corner where three planes in three-dimensional space intersect transversely.  If we picture the inequality constraint as a shaded region in the ambient space with this corner as the boundary, then the vectors $\{ Dg_1(x_0), \ldots, Dg_m(x_0) \}$ are normal to the boundary hypersurfaces (), pointing outside of the constraint region since they correspond to directions of increase of the constraint functions.  In order for the corner $x_0$ to be a maximizer for an objective function $f$, the gradient of $f$, which points in the direction of increase of $f$, must point within the cone which lies “between” all the normal vectors $\{ Dg_1(x_0), \ldots, Dg_m(x_0) \}$.  This fact is easiest to see in a picture… which I… really should… provide…  This cone is the set of non-negative linear combinations of the normal vectors.