## On the geometric meaning of the Cauchy Schwarz inequality, an intro to exterior powers, and surface integrals August 19, 2009

Posted by Phi. Isett in Uncategorized.
Tags: , , , , , , ,

This is a brief remark on the Cauchy Schwarz inequality and one way of understanding its geometric meaning (at least in the context of a real inner product space).  In a real inner product space $V$, the inner product $<\cdot, \cdot>$ allows for a generalization of intuitive geometric notions of “length”, “angle”, and “perpendicular” for vectors in $V$.  For two elements $u, v \in V$, I will write $u \wedge v$ to indicate the parallelogram formed by taking $u$ and $v$ as edges.  The reader may as well simply imagine $V = {\mathbb{R}}^n$ for his favorite $n$ and that the inner product is the usual dot product.

The Cauchy Schwarz inequality says that the area of a parallelogram $u \wedge v$ is positive unless u and v are co-linear (it is also equivalent to the triangle inequality, but I will be talking about this formulation instead).  If $u$ and $v$ are co-linear (they point in the same direction or perhaps opposite directions; maybe $v = - u$) the parallelogram one forms with these two vectors is degenerate and has zero area; otherwise, you wind up with a parallelogram which has positive area.  In fact, the volume of $u \wedge v$ is $|| u || \cdot || v ||$ when the two vectors $u$ and $v$ are perpendicular.  If $u$ and $v$ fail to be perpendicular, then we can observe that shifting the edge $u$ by any amount in the direction of the other edge $v$ does not change the area of the parallelogram; thus the area of $A(u \wedge v)$ is the same as the area $A [ (u + \alpha v) \wedge v ]$ for any $\alpha \in \mathbb{R}$.  By choosing $\alpha$ to minimize the length of the first edge (pick $\alpha = - \frac{}{\|v\|^2}$), we can make both edges perpendicular.  The area of the resulting parallelogram must be non-negative (and must be positive when $u$ and $v$ are linearly independent), giving the inequality $\| (u + \alpha v) \|^2 \| v \|^2 = \|u\|^2\|v\|^2 - | |^2 \geq 0$.  (This is actually a self-contained proof of the Cauchy-Schwartz inequality, and it’s the usual proof, but with some motivation regarding how to choose $\alpha$.)

Of course, the Cauchy-Schwartz inequality is also equivalent to the triangle inequality, and the relationship between these two geometric interpretations can be seen by inspecting the parallelogram one forms by drawing “$u + v = v + u$“.

I hope to write on how formalizing this area-of-parallelogram concept works (only on an intuitive level, for now).  For those already familiar with exterior powers of vector spaces and exterior algebras (and their geometric meaning in terms of parallelograms), I can cut to the chase and say that there’s an induced inner product on the exterior powers and the Cauchy Schwartz inequality is what you get by writing $< u \wedge v , u \wedge v > \geq 0$.

I will also discuss the relation to surface integrals and determinants.

It shouldn’t be so surprising that the same inner product which allows us to generalize the familiar notions of length, angle, and perpendicularity also allows us to measure areas and volumes of parallograms with edges formed from elements of $V$.  In fact, take your favorite two (non-degenerate) oriented parallelograms $u_1 \wedge u_2$ and $v_1 \wedge v_2$ in three dimensional space.  Imagine them with common vertex at the origin.  You should be able to convince yourself that there is a definite, well-defined notion of angle between these two parallelograms, and that this angle only depends on the plane generated by your choice of parallelograms (the planes which contain the parallelograms).  So oriented parallelograms like these seem to already have a definite notion of “area”, “angle”, “projection” and “perpendicular” in much the same way as vectors themselves have notions of “length”, “angle” and “perpendicular”.  So one may ask: is there also an inner product on the space of two-parallelograms?

Yes!.. Sort of.  The space of parallelograms in $V$ with common vertex at the origin is not exactly a vector space (although it is a perfectly interesting topological space); basically, there isn’t a clear, natural way to add parallelograms.  But there is a vector space associated to $V$ which fills this role and is called “the second exterior power of $V$” but is more briefly denoted $\Lambda^2(V)$ .  Typical elements of $\Lambda^2(V)$ may look like parallelograms $u \wedge v$ for $u, v \in V$, or linear combinations of such things like $u \wedge v + (4v) \wedge w$, but there is a catch: there are infinitely many different ways to write any particular element.

There is good reason for this ambiguity, without which we would not be able to successfully define a sensible notion of “addition” or “scalar multiplication”.  Note, however, that we are all accustomed to mathematically constructed sets in which every element admits infinitely many different representations: the numbers “1/2, 3/6, 48/96, etc.” all represent the same rational number, which we call “one-half”.  And it is important that we can manipulate these different representations when we actually work with the number “one-half”: e.g. “one half plus one third” = 3/6 + 2/6 = “five sixths”.  The multitude of ways to represent elements of $\Lambda^2(V)$ is just as natural.

Take, for example, the zero element $0 \in \Lambda^2(V)$.  One way of forming the zero element is by taking the parallelogram $0 \wedge 0$ whose two edges are both the zero vector in $V$, but one could start with an arbitrary vector $v \in V$, take another in the same direction (say $16v$) and form the degenerate parallelogram $v \wedge (16v)$.  Doing so gives another way to represent zero; all degenerate parallelograms like $v \wedge (-v), 0 \wedge v, v \wedge 0$ are zero in $\Lambda^2(V)$, and this fact is critical.

With the above consideration, we now have a hope of defining a nice addition law.  One geometrically intuitive requirement for addition in $\Lambda^2(V)$ is that for all pairs of parallelograms sharing a common second edge $w$, we have $u \wedge w + v \wedge w = (u + v) \wedge w$ and similarly for when a common first edge is shared (there is a simple picture to imagine here involving concatenation of parallelograms).  This requirement certainly does not prohibit addition from being commutative and associative, since these facts were already true in $V$.  Probably the best evidence that this is the right thing to do comes from the special case where $V$ is a plane and these parallelograms all have a certain signed area $Vol(\cdot)$; this choice of addition allows for the property $\mbox{Vol}(u \wedge w) +\mbox{Vol}(v \wedge w) = \mbox{Vol}( (u + v) \wedge w)$ .  But before discussing signed volumes and areas and their connection to exterior powers, let us elaborate the consequences of this choice of addition.

We are making a vector space out of parallelograms, and a real vector space requires a notion of scalar multiplication, which has essentially been determined already.  For example, we have required implicitly that

$(3u) \wedge w = (u + u + u) \wedge w = u \wedge w + u \wedge w + u \wedge w = u \wedge (3w)$

are all different ways of writing $3 (u \wedge w)$, and the same equality must hold when $3$ has been replaced by an arbitrary rational number.  Therefore we have no choice but to require that, for any real number $\alpha$,

$(\alpha u ) \wedge w = \alpha (u \wedge w) = u \wedge (\alpha w)$

all represent the same element of $\Lambda^2(V)$, even though the left and right are different parallelograms.  But when we combine the desired property for our addition with our requirement that all degenerate parallelograms be zero, we see some more funny algebraic consequences (but bear with me and these will also become more intuitive ).  For one thing, unless everything is completely trivial, the order of edges matters when you represent a parallelogram: expanding the identity $(u+v)\wedge(u+v) = 0$ with our addition law yields the property $u \wedge v = - v \wedge u$, which requires us to think of these parallelograms as oriented in a way, if we hope to maintain an intuitive grasp of this $\Lambda^2(V)$ we are constructing.

More generally, if we take a parallelogram $u \wedge w$, and shift one edge in a direction parallel to the other $( u + \alpha w) \wedge w = u \wedge w + \alpha w \wedge w = u \wedge w$ we end up with an equivalent parallelogram in $\Lambda^2(V)$.  However, provided $u$ and $w$ are not linearly dependent, we will not be able to construct a degenerate parallelogram in this manner. But in fact, one can use such “parallel shifts” (either shifting the first or second edge) to construct all equivalent parallelograms in $\Lambda^2(V)$.

Earlier on we noticed that the angle between two (oriented) parallelograms $u_1 \wedge u_2$ and $v_1 \wedge v_2$ depended only on the planes spanned by the two parallelograms, and is therefore in particular independent of parallel shifts in the edges.  I also pointed out at the beginning of the entry that the area of one parallelogram is also independent of such parallel shifts (a lot of valuable intuition is in this exercise).  These geometric observations are what lies behind the following fact: there is an induced inner product on this vector space $\Lambda^2(V)$ which was made from parallelograms.  In fact, the induced norm coming from the inner product on $\Lambda^2(V)$ is exactly the “area” when restricted to parallelograms, and one can use the usual formula for the cosine of an angle between two vectors in order to compute the angle between two oriented planes.  As long as one normalizes properly, we have the familiar property that “area = base × height” so that $\| u \wedge v \| = \|u\| \|v \|$ when $u$ and $v$ are perpendicular.  From the special case of perpendicular edges and the bilinearity of the induced inner product, one deduces the more general area formula $\| u \wedge v \|^2 = \|u\|^2 \|v\|^2 - | |^2$  which ought to be positive unless $u$ and $v$ are linearly dependent.  (The calculation is essentially the following: we “parallel shift” $u$ by an appropriate amount in the direction of $v$ until the two are perpendicular).  Thus we recover the Cauchy Schwarz inequality: the area of a non-degenerate parallelogram is positive.  In fact, we see that the inner product between arbitrary parallelograms can be computed by the formula $u_1 \wedge u_2 \cdot v_1 \wedge v_2 = \det ( u_i \cdot v_j )$ — which one notices remains constant under parallel shifts like $u_1 \to u_1 + \alpha u_2$ — because, by the polarization identity, an inner product is completely determined by the norm it induces.

The preceding discussion generalizes to the construction of higher exterior powers.  For example, there is a space $\Lambda^3(V)$ whose elements are linear combinations of 3-parallelograms, and these 3-parallelograms are also equivalent if one can be constructed from the other after a sequence of parallel shifting of edges.  I.e. $u \wedge v \wedge w = u\wedge v \wedge (w + \alpha u + \beta v)$, so that one of the edges can be shifted in any direction in the span of the other two and the resulting 3-parallelogram is the same element of $\Lambda^3(V)$ — once again, all equivalent parallelograms can be achieved by a finite number of such shifts.  Likewise, there are higher exterior powers, and the ideas are the same.  Their usefulness arises in great part from the fact that a k-parallelogram $u_1 \wedge \ldots \wedge u_k$ is $0 \in \Lambda^k(V)$ exactly when the vectors $\{u_1, \ldots, u_k \}$ are linearly dependent.

There is one especially important case: when $V$ has finite dimension $n$ with basis $\{ e_1, \ldots, e_n \}$ , the space $\Lambda^n(V)$ is a one-dimensional vector space.  Indeed, any nontrivial  n-parallelogram $v_1 \wedge \ldots \wedge v_n$ can be shifted into some scalar multiple of a standard n-parallelogram $e_1 \wedge \ldots \wedge e_n$.  In a different notation, this amounts to proving that any matrix can be “row-reduced” to a diagonal matrix after finitely many column operations — indeed the processes of column operations and row operations for a matrix can be visualized in terms of these parallel shifts.  We say the two (ordered) bases : $e_1 , \ldots, e_n$ and $v_1 , \ldots , v_n$ carry different “orientations” if this scalar is negative (in three dimensions, you can then tell the difference between “right-handed” and “left-handed” bases).

It is clear that any linear map $T : V \to V$  (maybe a rotation, a rescaling, or a projection onto a lower dimensional subspace) induces a map $v_1 \wedge \ldots \wedge v_k \to T v_1 \wedge \ldots \wedge T v_k$ on k-parallelograms.  Since $T$ preserves the addition and scalar multiplication of $V$, k-parallelograms which are equivalent by parallel shift remain equivalent after mapping by $T$.  Therefore, there is a well-defined, induced linear map $\Lambda^k(V) \to \Lambda^k(V)$ (just extend by linearity).  (The extension of the inner product to $\Lambda^k(V)$ is accomplished similarly with the bilinearity replacing the linearity of $T$ playing the key role, but it is a bit more complicated)

When $V$ has dimension $n$, the induced linear map on the one dimensional vector space $\Lambda^n(V)$ must be multiplication by a scalar (called the determinant of $T$), which will be zero exactly when the image of $T$ is contained in a lower dimensional subspace of $V$.  Often $\Lambda^n(V)$ carries a notion of signed volume: for instance, one could choose $Vol( e_1 \wedge \ldots \wedge e_n) = 1$ for some orthonormal basis $e_1, \ldots, e_n$, and this choice, which entirely determines the volume form $Vol : \Lambda^n(V) \to {\mathbb{R}}$, depends only on the orientation of the basis.   In this situation, the determinant is the factor by which $T$ multiplies signed volume.

An important application of the above theory arises when one studies integration on lower-dimensional subsets of Euclidean space (or Riemannian manifolds) — for example, integration on a surface $S$ in three-dimensional space parameterized by a smooth map $\Phi : (0, 1) \times (0, 1) \to S$.  Here all the above theory is applied at a local level.  One wishes to integrate some function $f$ on $S$ by using the function $\Phi(s,t)$ which parameterizes the surface $S$, or perhaps one wishes to measure the size of a region on the surface.  The correct formula for the surface integral is certainly not $\int_S f dS = \int_0^1 \int_0^1 f( \Phi(s,t) ) ds dt$; for example when the function $f$ is identically $1$ this formula would say that the area of the surface $S$ is exactly 1, but not every surface in space has unit area.  The problem with the formula is with the volume form, which we write $ds \wedge dt$ to suggest an infinitesimally small 2-parallelogram.  The map $\Phi$ does not preserve volume, and will send a very small rectangle  $[s_0, s_0 + \Delta s ] \times [t_0, t_0 + \Delta t]$ to another, possibly funny-shaped region (resembling some kind of parallelogram at small scales) and this region may be only half or perhaps even 400 times as large.  Thus the form $ds \wedge dt$ we were using for integration in the incorrect formula must be replaced by a volume form with a density factor $\rho(s,t) ds \wedge dt$ to account for how the map $\Phi$ may shrink or increase volumes nearby a point.

Our theory in the preceding comes into play to compute this density factor $\rho(s,t)$ at any point $s_0, t_0 \in (0, 1) \times (0, 1)$ in terms of the behavior of $\Phi$ infinitesimally close to $s_0, t_0$.  Because $\Phi$ sends points nearby $(s_0, t_0)$ in the square to points nearby $\Phi(s_0, t_0)$ in the surface, $\Phi$ also sends trajectories passing through the point $(s_0, t_0)$ to trajectories on the surface $S$ which pass through the point  $\Phi(s_0, t_0)$.  In particular, if we consider a path in the square $x(\tau)$ which reaches $(s_0, t_0) = x(0)$ at time $\tau = 0$ and with velocity $v \in \mathbb{R}^2$, then the trajectory $\Phi(x(t))$ reaches the point $\Phi(s_0, t_0)$ at time $\tau = 0$ and with a velocity $\Phi'(s_0, t_0) v \in \mathbb{R}^3$ where $\Phi'(s_0, t_0)$ is a linear map (or matrix) called the “derivative” of $\Phi$ at the point $(s_0, t_0)$.  Now we are in almost exactly our earlier situation regarding this map $T = \Phi'(s_0, t_0)$, which is a linear map sending velocities $v$ realized at the point $(s_0, t_0)$ by trajectories in the square to the space of velocities realized at the point $\Phi(s_0, t_0)$ by trajectories in the surface $S$.  The latter space of velocities in the surface $S$ is called “the tangent space to $S$ at the point $\Phi(s_0, t_0)$” and is a two-dimensional vector space because $S$ is a surface.  Similarly, the domain of $\Phi'(s_0, t_0)$ (namely, the space of velocities realized at the point $(s_0, t_0)$ by trajectories in the square) is also two-dimensional.  In both these vector spaces, there is a notion of “length” which allows us to measure the “speed” of a trajectory, and there is also notion of “angle” or “perpendicularity” between two trajectories.  Our ability to measure speeds of parametrized curves allows us to measure lengths of curves by integration, and similarly our natural notion of area for a parallelogram $u \wedge v$ given by two such velocities allows us to compute areas of parametrized surfaces by integration.  In general, the linear map $\Phi'(s_0, t_0)$ will not preserve length or angles between these two spaces of velocity vectors, and hence will not preserve areas of parallelograms.  The factor by which $\Phi'(s_0, t_0)$ multiplies areas of parallelograms is exactly the density factor $\rho(s_0,t_0)$, and can be computed explicitly in examples.

I should include some computational examples to help with all this….

In the meantime, check out this post of Terence Tao on how this same induced inner product can be used to construct interesting processes by which to generate random $n$-element subsets of a finite set $\{1, 2, \ldots, N \}$.