On the geometric meaning of the Cauchy Schwarz inequality, an intro to exterior powers, and surface integrals August 19, 2009
Posted by Phi. Isett in Uncategorized.Tags: Cauchy Schwarz, Cauchy Schwarz inequality, Determinants, Exterior Algebra, Exterior power, inner product, Surface integrals, Surface integration
trackback
This is a brief remark on the Cauchy Schwarz inequality and one way of understanding its geometric meaning (at least in the context of a real inner product space). In a real inner product space , the inner product
allows for a generalization of intuitive geometric notions of “length”, “angle”, and “perpendicular” for vectors in
. For two elements
, I will write
to indicate the parallelogram formed by taking
and
as edges. The reader may as well simply imagine
for his favorite
and that the inner product is the usual dot product.
The Cauchy Schwarz inequality says that the area of a parallelogram is positive unless u and v are co-linear (it is also equivalent to the triangle inequality, but I will be talking about this formulation instead). If
and
are co-linear (they point in the same direction or perhaps opposite directions; maybe
) the parallelogram one forms with these two vectors is degenerate and has zero area; otherwise, you wind up with a parallelogram which has positive area. In fact, the volume of
is
when the two vectors
and
are perpendicular. If
and
fail to be perpendicular, then we can observe that shifting the edge
by any amount in the direction of the other edge
does not change the area of the parallelogram; thus the area of
is the same as the area
for any
. By choosing
to minimize the length of the first edge (pick
), we can make both edges perpendicular. The area of the resulting parallelogram must be non-negative (and must be positive when
and
are linearly independent), giving the inequality
. (This is actually a self-contained proof of the Cauchy-Schwartz inequality, and it’s the usual proof, but with some motivation regarding how to choose
.)
Of course, the Cauchy-Schwartz inequality is also equivalent to the triangle inequality, and the relationship between these two geometric interpretations can be seen by inspecting the parallelogram one forms by drawing ““.
I hope to write on how formalizing this area-of-parallelogram concept works (only on an intuitive level, for now). For those already familiar with exterior powers of vector spaces and exterior algebras (and their geometric meaning in terms of parallelograms), I can cut to the chase and say that there’s an induced inner product on the exterior powers and the Cauchy Schwartz inequality is what you get by writing .
I will also discuss the relation to surface integrals and determinants.
It shouldn’t be so surprising that the same inner product which allows us to generalize the familiar notions of length, angle, and perpendicularity also allows us to measure areas and volumes of parallograms with edges formed from elements of . In fact, take your favorite two (non-degenerate) oriented parallelograms
and
in three dimensional space. Imagine them with common vertex at the origin. You should be able to convince yourself that there is a definite, well-defined notion of angle between these two parallelograms, and that this angle only depends on the plane generated by your choice of parallelograms (the planes which contain the parallelograms). So oriented parallelograms like these seem to already have a definite notion of “area”, “angle”, “projection” and “perpendicular” in much the same way as vectors themselves have notions of “length”, “angle” and “perpendicular”. So one may ask: is there also an inner product on the space of two-parallelograms?
Yes!.. Sort of. The space of parallelograms in with common vertex at the origin is not exactly a vector space (although it is a perfectly interesting topological space); basically, there isn’t a clear, natural way to add parallelograms. But there is a vector space associated to
which fills this role and is called “the second exterior power of
” but is more briefly denoted
. Typical elements of
may look like parallelograms
for
, or linear combinations of such things like
, but there is a catch: there are infinitely many different ways to write any particular element.
There is good reason for this ambiguity, without which we would not be able to successfully define a sensible notion of “addition” or “scalar multiplication”. Note, however, that we are all accustomed to mathematically constructed sets in which every element admits infinitely many different representations: the numbers “1/2, 3/6, 48/96, etc.” all represent the same rational number, which we call “one-half”. And it is important that we can manipulate these different representations when we actually work with the number “one-half”: e.g. “one half plus one third” = 3/6 + 2/6 = “five sixths”. The multitude of ways to represent elements of is just as natural.
Take, for example, the zero element . One way of forming the zero element is by taking the parallelogram
whose two edges are both the zero vector in
, but one could start with an arbitrary vector
, take another in the same direction (say
) and form the degenerate parallelogram
. Doing so gives another way to represent zero; all degenerate parallelograms like
are zero in
, and this fact is critical.
With the above consideration, we now have a hope of defining a nice addition law. One geometrically intuitive requirement for addition in is that for all pairs of parallelograms sharing a common second edge
, we have
and similarly for when a common first edge is shared (there is a simple picture to imagine here involving concatenation of parallelograms). This requirement certainly does not prohibit addition from being commutative and associative, since these facts were already true in
. Probably the best evidence that this is the right thing to do comes from the special case where
is a plane and these parallelograms all have a certain signed area
; this choice of addition allows for the property
. But before discussing signed volumes and areas and their connection to exterior powers, let us elaborate the consequences of this choice of addition.
We are making a vector space out of parallelograms, and a real vector space requires a notion of scalar multiplication, which has essentially been determined already. For example, we have required implicitly that
are all different ways of writing , and the same equality must hold when
has been replaced by an arbitrary rational number. Therefore we have no choice but to require that, for any real number
,
all represent the same element of , even though the left and right are different parallelograms. But when we combine the desired property for our addition with our requirement that all degenerate parallelograms be zero, we see some more funny algebraic consequences (but bear with me and these will also become more intuitive ). For one thing, unless everything is completely trivial, the order of edges matters when you represent a parallelogram: expanding the identity
with our addition law yields the property
, which requires us to think of these parallelograms as oriented in a way, if we hope to maintain an intuitive grasp of this
we are constructing.
More generally, if we take a parallelogram , and shift one edge in a direction parallel to the other
we end up with an equivalent parallelogram in
. However, provided
and
are not linearly dependent, we will not be able to construct a degenerate parallelogram in this manner. But in fact, one can use such “parallel shifts” (either shifting the first or second edge) to construct all equivalent parallelograms in
.
Earlier on we noticed that the angle between two (oriented) parallelograms and
depended only on the planes spanned by the two parallelograms, and is therefore in particular independent of parallel shifts in the edges. I also pointed out at the beginning of the entry that the area of one parallelogram is also independent of such parallel shifts (a lot of valuable intuition is in this exercise). These geometric observations are what lies behind the following fact: there is an induced inner product on this vector space
which was made from parallelograms. In fact, the induced norm coming from the inner product on
is exactly the “area” when restricted to parallelograms, and one can use the usual formula for the cosine of an angle between two vectors in order to compute the angle between two oriented planes. As long as one normalizes properly, we have the familiar property that “area = base × height” so that
when
and
are perpendicular. From the special case of perpendicular edges and the bilinearity of the induced inner product, one deduces the more general area formula
which ought to be positive unless
and
are linearly dependent. (The calculation is essentially the following: we “parallel shift”
by an appropriate amount in the direction of
until the two are perpendicular). Thus we recover the Cauchy Schwarz inequality: the area of a non-degenerate parallelogram is positive. In fact, we see that the inner product between arbitrary parallelograms can be computed by the formula
— which one notices remains constant under parallel shifts like
— because, by the polarization identity, an inner product is completely determined by the norm it induces.
The preceding discussion generalizes to the construction of higher exterior powers. For example, there is a space whose elements are linear combinations of 3-parallelograms, and these 3-parallelograms are also equivalent if one can be constructed from the other after a sequence of parallel shifting of edges. I.e.
, so that one of the edges can be shifted in any direction in the span of the other two and the resulting 3-parallelogram is the same element of
— once again, all equivalent parallelograms can be achieved by a finite number of such shifts. Likewise, there are higher exterior powers, and the ideas are the same. Their usefulness arises in great part from the fact that a k-parallelogram
is
exactly when the vectors
are linearly dependent.
There is one especially important case: when has finite dimension
with basis
, the space
is a one-dimensional vector space. Indeed, any nontrivial n-parallelogram
can be shifted into some scalar multiple of a standard n-parallelogram
. In a different notation, this amounts to proving that any matrix can be “row-reduced” to a diagonal matrix after finitely many column operations — indeed the processes of column operations and row operations for a matrix can be visualized in terms of these parallel shifts. We say the two (ordered) bases :
and
carry different “orientations” if this scalar is negative (in three dimensions, you can then tell the difference between “right-handed” and “left-handed” bases).
It is clear that any linear map (maybe a rotation, a rescaling, or a projection onto a lower dimensional subspace) induces a map
on k-parallelograms. Since
preserves the addition and scalar multiplication of
, k-parallelograms which are equivalent by parallel shift remain equivalent after mapping by
. Therefore, there is a well-defined, induced linear map
(just extend by linearity). (The extension of the inner product to
is accomplished similarly with the bilinearity replacing the linearity of
playing the key role, but it is a bit more complicated)
When has dimension
, the induced linear map on the one dimensional vector space
must be multiplication by a scalar (called the determinant of
), which will be zero exactly when the image of
is contained in a lower dimensional subspace of
. Often
carries a notion of signed volume: for instance, one could choose
for some orthonormal basis
, and this choice, which entirely determines the volume form
, depends only on the orientation of the basis. In this situation, the determinant is the factor by which
multiplies signed volume.
An important application of the above theory arises when one studies integration on lower-dimensional subsets of Euclidean space (or Riemannian manifolds) — for example, integration on a surface in three-dimensional space parameterized by a smooth map
. Here all the above theory is applied at a local level. One wishes to integrate some function
on
by using the function
which parameterizes the surface
, or perhaps one wishes to measure the size of a region on the surface. The correct formula for the surface integral is certainly not
; for example when the function
is identically
this formula would say that the area of the surface
is exactly 1, but not every surface in space has unit area. The problem with the formula is with the volume form, which we write
to suggest an infinitesimally small 2-parallelogram. The map
does not preserve volume, and will send a very small rectangle
to another, possibly funny-shaped region (resembling some kind of parallelogram at small scales) and this region may be only half or perhaps even 400 times as large. Thus the form
we were using for integration in the incorrect formula must be replaced by a volume form with a density factor
to account for how the map
may shrink or increase volumes nearby a point.
Our theory in the preceding comes into play to compute this density factor at any point
in terms of the behavior of
infinitesimally close to
. Because
sends points nearby
in the square to points nearby
in the surface,
also sends trajectories passing through the point
to trajectories on the surface
which pass through the point
. In particular, if we consider a path in the square
which reaches
at time
and with velocity
, then the trajectory
reaches the point
at time
and with a velocity
where
is a linear map (or matrix) called the “derivative” of
at the point
. Now we are in almost exactly our earlier situation regarding this map
, which is a linear map sending velocities
realized at the point
by trajectories in the square to the space of velocities realized at the point
by trajectories in the surface
. The latter space of velocities in the surface
is called “the tangent space to
at the point
” and is a two-dimensional vector space because
is a surface. Similarly, the domain of
(namely, the space of velocities realized at the point
by trajectories in the square) is also two-dimensional. In both these vector spaces, there is a notion of “length” which allows us to measure the “speed” of a trajectory, and there is also notion of “angle” or “perpendicularity” between two trajectories. Our ability to measure speeds of parametrized curves allows us to measure lengths of curves by integration, and similarly our natural notion of area for a parallelogram
given by two such velocities allows us to compute areas of parametrized surfaces by integration. In general, the linear map
will not preserve length or angles between these two spaces of velocity vectors, and hence will not preserve areas of parallelograms. The factor by which
multiplies areas of parallelograms is exactly the density factor
, and can be computed explicitly in examples.
I should include some computational examples to help with all this….
In the meantime, check out this post of Terence Tao on how this same induced inner product can be used to construct interesting processes by which to generate random -element subsets of a finite set
.
Hey Phil: just found your blog through MO. A little nit-pick: it is Cauchy-Schwarz (no T!!!), not Cauchy-Schwartz.
Sigh, I thought I had fixed all of those… I don’t know why that ‘t’ is so tempting… Thanks.
[...] to make the parallelograms into something mathematical. This topic is discussed somewhat in this previous post on the Cauchy Schwartz inequality. GA_googleAddAttr("AdOpt", "1"); GA_googleAddAttr("Origin", [...]