Take your sector of a disk (it could be the whole disk but for now it’s better to picture only a partial sector with 2 radial “edges”) of radius R. Now, foliate it into circular arcs having radii , starting each arc at the same edge of the sector. Now, straighten each of those circular arcs so that they are perpendicular to the edge from which they begin. The circular arc of radius has length where is the angle of the sector.

After straightening, you end up with a triangle of base and of height (this is because is the length of the outermost circular arc).

Punch-line (not obvious): the map preserves area. So you have the formula:

**So why does bending the circles preserve area? **

This part requires much more detail, but is still interesting. Imagine chopping the triangle up into extremely small squares with sides perpendicular to the edges of the triangle. I claim that the area of each of those little squares will be preserved. If you look at one of those little (“infinitesimal”) squares with (bottom) horizontal edge and (left) vertical edge inside of the triangle and try to map it back to the sector, you don’t end up with a rectangle but rather you get a parallelogram with an oblique angle. (At least, if your square is small enough, back in the sector it really does look like a parallelogram. At the end of the post, I’ll say how to make this heuristic argument fully rigorous.)

When viewed in the sector, the edge which used to be vertical, , now points in the angular direction and has the same length (this is because vertical line segments in the triangle correspond to circular arcs in the sector). But the horizontal edge, upon being mapped back to the sector, does not point in the radial direction, but instead makes an oblique angle with . Fortunately, we never change the area of a parallelogram by shifting one edge in a direction parallel to the other. So after shifting in the direction of , we see that it is only the component of perpendicular to (i.e. the “radial” component rather than the “angular” component) that matters. To see what this radius is, look at where lives in the triangle. It has edges on two different vertical lines and . Those lines, remember, used to be circles of radius and respectively, so we conclude that the radial component of has the same length as the original.

That is essentially why the map preserves volumes. It’s more complicated than polar coordinates, where going back from a rectangle to a sector takes squares to rectangles. In that case you don’t need to parallel shift, but volume is not preserved.

A formalization of this intuition involves a bit of differential geometry and the notion of “exterior powers of tangent spaces” to make the parallelograms into something mathematical. This topic is discussed somewhat in this previous post on the Cauchy Schwartz inequality.

]]>Every function which is uniformly continuous on a bounded interval is bounded. You can prove this by bounding f(x) – f(y) by breaking the total change into a bunch of small changes (using uniform continuity) just as in one common proof of the Fundamental Theorem of Calculus (this technique also gets used in the proof of Sard’s theorem, Harnack’s inequality, and some other things).

Now replace the bounded interval by a bounded subset . Exercise: are uniformly continuous functions still bounded?

]]>(It’ll be pretty cool if it catches on! But for me it probably means I will help to run the blog, and for this purpose I’ll definitely need the converter.)

I need to at least try to write some kind of math, so I’ll explain something which I think is cute: how to express the coefficients of a characteristic polynomial of a matrix in terms of sums of determinants of other matrices constructed from its entries. Actually, I’ll first give an example which contains all the ideas. Consider the matrix whose entries are… Let’s say

The characteristic polynomial is the determinant of the matrix where is the identity matrix. is a degree 3 polynomial with a leading coefficient 1. In terms of the , we have

I discussed the one-dimensional vector space and its geometric meaning in a previous post about the Cauchy-Schwartz inequality and integration. We know that , where , and plugging in into the above expression, we see that . The point of this entry is that we can calculate the other derivatives by differentiating, and use the multi-linearity of the wedge product to differentiate easily. Below the fold I will give an example of how this computation works out, I will state what nice, general formula is proven by this method, and I will discuss the geometric meaning of this computation. (And at the end I will ask a question about this LaTeX2WP/Python business which is still troubling me)

Characteristic Polynomial Coefficients

For example, , and can be obtained from 1 to see that

and after performing some parallel shifts, this identity reduces to

Giving the expression

and we can plug in to obtain as the sum of three subdeterminants of .

Proceeding this way in the general case when is linear and counting shared boundary faces, one shows that

(it is sufficient to prove this identity at x = 0). The characteristic polynomial itself is, of course, , so applying the above formula and Taylor expanding about , we obtain the explicit formula in terms of subdeterminants

which can be compared to the expression involving the eigenvalues of .

Now, the same formula holds for arbitrary commutative rings because only integers and natural quantities show up here, which allows us to use the result from (and the universal property of the polynomial ring) to pass to the general case (basically using indeterminants for the coefficients, and then plugging in arbitrary ring elements into the coefficients).

There is a nice geometric interpretation of the above procedure, which is most easily understood in the setting of the real numbers. We know that the k’th exterior power corresponds to k-parallelograms of vectors in which remain equivalent under parallel shift, so it should be easy to see why the differentiation results in an expression involving parallelograms of dimension one less. In the case , think of the differentiation at as a limit of difference quotients, where we compare the signed volumes of two nearby n-parallelograms, one of which being

whatever the linear map does to the standard basis parallelogram, and the other being the very nearby parallelogram

with x a very small number. Thinking of these parallograms as oriented regions in , the difference in signed volume is easily seen to be an integral supported on essentially the boundary of the parallelogram . Because the perturbation is linear, the quantity integrated on each face basically depends only on the direction of the face (in the limit as x goes to 0), and this is why, upon differentiating once, we obtain the expression we have seen for the fluxes through the faces of . The resulting expression ends up being particularly simple when the perturbation of is in the direction of the identity as in our case, but the picture generalizes to perturbations in general directions, and so one has to go into more detail in order to understand precisely what this flux is for any particular perturbation (I believe it is basically the original volume form contracted with a certain vector field associated to the perturbation that ends up being integrated over the boundary). When one differentiates twice, one similarly obtains integrals over the oriented (n-2)-dimensional boundaries of the -dimensional faces, which share (n-2)-dimensional faces with each other (although in the general case it is much less obvious the answer should be supported on the boundary). This sharing of faces results in the factor of in the formula, which counts the number of k-dimensional faces which have each (k-1)-dimensional face as part of their boundary.

I hope the picture is clear. This method of computation can be used to show that differentiating the determinant function at the identity matrix gives the trace of the perturbation direction (which one can in fact take as an intrinsic definition of trace, if he should desire) although that fact is more related to asymptotics at for the computation presented above. By the way, does anybody know any different, intrinsic definitions of trace? (The more geometric, the better; I am very curious — this one I have given has the geometric meaning of a proportional rate of change of volumes of regions under the flow of a vector field, but I don’t know if there’s a better one.)

Welp.. The post seems to have turned out OK. (A huge amount of gratitude goes out to Luca Trevisan for writing his program). I am slowly learning that it can be better to edit the tex file than to edit the post itself. I wonder if the excess of plus signs looks stupid. I wanted to put some kind of brackets around the vectors in the wedges, but doing so turned out to be a disaster.

But now I have a technical question about how to use the program (or Python?), and I would be very grateful for help:

I’m using Windows Vista (sorry), and for this reason (combined with my own stupidity, my lack of Python knowledge, etc.) had a bit of a headache figuring out how to work this program. What I ended up doing (and I’m pretty sure this is stupid, but if anyone knows what to do instead, please let me know) is I literally edited the file latex2wp.py, and in the main body of the code I actually put in the names of the files. E.g.

inputfile = “C:\Users\[more stuff]\charPolCoeffs.tex”

outputfile = “C:\Users\[the same stuff]\charPolCoeffs.html”

I had opened the .py file directly through IDLE and was able to use the “Run Module” feature in that program. It worked, but I don’t think this is what Luca had in mind, and in particular is not what he instructed (but since I’m not on Linux, I can’t exactly do as he instructed). Namely, Luca suggested (assuming, I believe, that I was in the Linux command prompt) that I input into the command line:

python latex2wp.py charPolCoeffs.tex

When I put these things into the python command prompt (after the arrows >>> ) , I think it believed I was trying to define the variable named “python”… And perhaps also the variable “latex2wp”, but it got very confused upon reaching the ‘.’ Basically, I don’t think it was aware what I was trying to do at all, nor was I aware of how to tell it to go run this ‘.py’ file in a distant directory with the ‘.tex’ file as a parameter. I don’t know anything about how Python works so I don’t understand what was wrong exactly or how to fix it. Any help would be greatly appreciated, although it looks like I can survive with the uncomfortable, makeshift solution of literally editing the document for a while.

Edit (15 Nov 2009): I have included a discussion of the geometric meaning of the computation and a statement of the general result obtained by the method.

]]>So that the entry is not completely lame, a way to compute the Fourier transform of on the real line:

Differentiating in the sense of distributions, we have where is the delta-function (the density function corresponding to a point mass at the origin). By taking the Fourier transform of both sides, we conclude (depending on “where you put the “)

(In particular, we’ve actually computed the integral of to be corresponding to )

—- It should be noted, of course, that there are more elementary ways to compute this Fourier transform……… Also note that the Fourier transform has a meromorphic continuation into the complex plane whose poles can be anticipated from the physical space representation.

]]>**Proof:**

Pretend you had a non-constant, bounded holomorphic function on all of . Since is bounded at , Riemann’s theorem on removable singularities implies that extends to a holomorphic (and hence continuous) function on the Riemann sphere , which is compact. If were not constant, the open mapping theorem would apply to this extension, and the image of the Riemann sphere would be an open subset of . But this cannot be the case, because has no nonempty subsets which are both open and compact.

There are some downsides to this proof. It does not rely on the theory of Riemann surfaces (not a downside). It does, however, rely on some relatively (though not truly) heavy machinery, and in order to be a correct proof, one isn’t allowed to use Liouville’s theorem to develop this machinery (but it is possible). Liouville’s theorem follows rather immediately from Cauchy’s integral formulae, and I don’t personally know how to establish things like analyticity, Riemann’s theorem and the open mapping theorem without this tool (although I would be interested if anyone else does!). I also don’t think it really generalizes very well to other PDE for which a Liouville theorem holds.

]]>The Cauchy Schwarz inequality says that the area of a parallelogram is positive unless u and v are co-linear (it is also equivalent to the triangle inequality, but I will be talking about this formulation instead). If and are co-linear (they point in the same direction or perhaps opposite directions; maybe ) the parallelogram one forms with these two vectors is degenerate and has zero area; otherwise, you wind up with a parallelogram which has positive area. In fact, the volume of is when the two vectors and are perpendicular. If and fail to be perpendicular, then we can observe that shifting the edge by any amount in the direction of the other edge does not change the area of the parallelogram; thus the area of is the same as the area for any . By choosing to minimize the length of the first edge (pick ), we can make both edges perpendicular. The area of the resulting parallelogram must be non-negative (and must be positive when and are linearly independent), giving the inequality . (This is actually a self-contained proof of the Cauchy-Schwartz inequality, and it’s the usual proof, but with some motivation regarding how to choose .)

Of course, the Cauchy-Schwartz inequality is also equivalent to the triangle inequality, and the relationship between these two geometric interpretations can be seen by inspecting the parallelogram one forms by drawing ““.

I hope to write on how formalizing this area-of-parallelogram concept works (only on an intuitive level, for now). For those already familiar with exterior powers of vector spaces and exterior algebras (and their geometric meaning in terms of parallelograms), I can cut to the chase and say that there’s an induced inner product on the exterior powers and the Cauchy Schwartz inequality is what you get by writing .

I will also discuss the relation to surface integrals and determinants.

It shouldn’t be so surprising that the same inner product which allows us to generalize the familiar notions of length, angle, and perpendicularity also allows us to measure areas and volumes of parallograms with edges formed from elements of . In fact, take your favorite two (non-degenerate) oriented parallelograms and in three dimensional space. Imagine them with common vertex at the origin. You should be able to convince yourself that there is a definite, well-defined notion of angle between these two parallelograms, and that this angle only depends on the plane generated by your choice of parallelograms (the planes which contain the parallelograms). So oriented parallelograms like these seem to already have a definite notion of “area”, “angle”, “projection” and “perpendicular” in much the same way as vectors themselves have notions of “length”, “angle” and “perpendicular”. So one may ask: is there also an inner product on the space of two-parallelograms?

Yes!.. Sort of. The space of parallelograms in with common vertex at the origin is not exactly a vector space (although it is a perfectly interesting topological space); basically, there isn’t a clear, natural way to add parallelograms. But there is a vector space associated to which fills this role and is called “the second exterior power of ” but is more briefly denoted . Typical elements of may look like parallelograms for , or linear combinations of such things like , but there is a catch: there are infinitely many different ways to write any particular element.

There is good reason for this ambiguity, without which we would not be able to successfully define a sensible notion of “addition” or “scalar multiplication”. Note, however, that we are all accustomed to mathematically constructed sets in which every element admits infinitely many different representations: the numbers “1/2, 3/6, 48/96, etc.” all represent the same rational number, which we call “one-half”. And it is important that we can manipulate these different representations when we actually work with the number “one-half”: e.g. “one half plus one third” = 3/6 + 2/6 = “five sixths”. The multitude of ways to represent elements of is just as natural.

Take, for example, the zero element . One way of forming the zero element is by taking the parallelogram whose two edges are both the zero vector in , but one could start with an arbitrary vector , take another in the same direction (say ) and form the degenerate parallelogram . Doing so gives another way to represent zero; all degenerate parallelograms like are zero in , and this fact is critical.

With the above consideration, we now have a hope of defining a nice addition law. One geometrically intuitive requirement for addition in is that for all pairs of parallelograms sharing a common second edge , we have and similarly for when a common first edge is shared (there is a simple picture to imagine here involving concatenation of parallelograms). This requirement certainly does not prohibit addition from being commutative and associative, since these facts were already true in . Probably the best evidence that this is the right thing to do comes from the special case where is a plane and these parallelograms all have a certain **signed** area ; this choice of addition allows for the property . But before discussing signed volumes and areas and their connection to exterior powers, let us elaborate the consequences of this choice of addition.

We are making a vector space out of parallelograms, and a real vector space requires a notion of scalar multiplication, which has essentially been determined already. For example, we have required implicitly that

are all different ways of writing , and the same equality must hold when has been replaced by an arbitrary rational number. Therefore we have no choice but to require that, for any real number ,

all represent the same element of , even though the left and right are different parallelograms. But when we combine the desired property for our addition with our requirement that all degenerate parallelograms be zero, we see some more funny algebraic consequences (but bear with me and these will also become more intuitive ). For one thing, unless everything is completely trivial, the order of edges matters when you represent a parallelogram: expanding the identity with our addition law yields the property , which requires us to think of these parallelograms as oriented in a way, if we hope to maintain an intuitive grasp of this we are constructing.

More generally, if we take a parallelogram , and shift one edge in a direction parallel to the other we end up with an equivalent parallelogram in . However, provided and are not linearly dependent, we will not be able to construct a degenerate parallelogram in this manner. But in fact, one can use such “parallel shifts” (either shifting the first or second edge) to construct all equivalent parallelograms in .

Earlier on we noticed that the angle between two (oriented) parallelograms and depended only on the planes spanned by the two parallelograms, and is therefore in particular independent of parallel shifts in the edges. I also pointed out at the beginning of the entry that the area of one parallelogram is also independent of such parallel shifts (a lot of valuable intuition is in this exercise). These geometric observations are what lies behind the following fact: there is an induced inner product on this vector space which was made from parallelograms. In fact, the induced norm coming from the inner product on is exactly the “area” when restricted to parallelograms, and one can use the usual formula for the cosine of an angle between two vectors in order to compute the angle between two oriented planes. As long as one normalizes properly, we have the familiar property that “area = base × height” so that when and are perpendicular. From the special case of perpendicular edges and the bilinearity of the induced inner product, one deduces the more general area formula which ought to be positive unless and are linearly dependent. (The calculation is essentially the following: we “parallel shift” by an appropriate amount in the direction of until the two are perpendicular). Thus we recover the Cauchy Schwarz inequality: the area of a non-degenerate parallelogram is positive. In fact, we see that the inner product between arbitrary parallelograms can be computed by the formula — which one notices remains constant under parallel shifts like — because, by the polarization identity, an inner product is completely determined by the norm it induces.

The preceding discussion generalizes to the construction of higher exterior powers. For example, there is a space whose elements are linear combinations of 3-parallelograms, and these 3-parallelograms are also equivalent if one can be constructed from the other after a sequence of parallel shifting of edges. I.e. , so that one of the edges can be shifted in any direction in the span of the other two and the resulting 3-parallelogram is the same element of — once again, all equivalent parallelograms can be achieved by a finite number of such shifts. Likewise, there are higher exterior powers, and the ideas are the same. Their usefulness arises in great part from the fact that a k-parallelogram is exactly when the vectors are linearly dependent.

There is one especially important case: when has finite dimension with basis , the space is a one-dimensional vector space. Indeed, any nontrivial n-parallelogram can be shifted into some scalar multiple of a standard n-parallelogram . In a different notation, this amounts to proving that any matrix can be “row-reduced” to a diagonal matrix after finitely many column operations — indeed the processes of column operations and row operations for a matrix can be visualized in terms of these parallel shifts. We say the two (ordered) bases : and carry different “orientations” if this scalar is negative (in three dimensions, you can then tell the difference between “right-handed” and “left-handed” bases).

It is clear that any linear map (maybe a rotation, a rescaling, or a projection onto a lower dimensional subspace) induces a map on k-parallelograms. Since preserves the addition and scalar multiplication of , k-parallelograms which are equivalent by parallel shift remain equivalent after mapping by . Therefore, there is a well-defined, induced linear map (just extend by linearity). (The extension of the inner product to is accomplished similarly with the bilinearity replacing the linearity of playing the key role, but it is a bit more complicated)

When has dimension , the induced linear map on the one dimensional vector space must be multiplication by a scalar (called the determinant of ), which will be zero exactly when the image of is contained in a lower dimensional subspace of . Often carries a notion of signed volume: for instance, one could choose for some orthonormal basis , and this choice, which entirely determines the volume form , depends only on the orientation of the basis. In this situation, the determinant is the factor by which multiplies signed volume.

An important application of the above theory arises when one studies integration on lower-dimensional subsets of Euclidean space (or Riemannian manifolds) — for example, integration on a surface in three-dimensional space parameterized by a smooth map . Here all the above theory is applied at a local level. One wishes to integrate some function on by using the function which parameterizes the surface , or perhaps one wishes to measure the size of a region on the surface. The correct formula for the surface integral is certainly not ; for example when the function is identically this formula would say that the area of the surface is exactly 1, but not every surface in space has unit area. The problem with the formula is with the volume form, which we write to suggest an infinitesimally small 2-parallelogram. The map does not preserve volume, and will send a very small rectangle to another, possibly funny-shaped region (resembling some kind of parallelogram at small scales) and this region may be only half or perhaps even 400 times as large. Thus the form we were using for integration in the incorrect formula must be replaced by a volume form with a density factor to account for how the map may shrink or increase volumes nearby a point.

Our theory in the preceding comes into play to compute this density factor at any point in terms of the behavior of infinitesimally close to . Because sends points nearby in the square to points nearby in the surface, also sends trajectories passing through the point to trajectories on the surface which pass through the point . In particular, if we consider a path in the square which reaches at time and with velocity , then the trajectory reaches the point at time and with a velocity where is a linear map (or matrix) called the “derivative” of at the point . Now we are in almost exactly our earlier situation regarding this map , which is a linear map sending velocities realized at the point by trajectories in the square to the space of velocities realized at the point by trajectories in the surface . The latter space of velocities in the surface is called “the tangent space to at the point ” and is a two-dimensional vector space because is a surface. Similarly, the domain of (namely, the space of velocities realized at the point by trajectories in the square) is also two-dimensional. In both these vector spaces, there is a notion of “length” which allows us to measure the “speed” of a trajectory, and there is also notion of “angle” or “perpendicularity” between two trajectories. Our ability to measure speeds of parametrized curves allows us to measure lengths of curves by integration, and similarly our natural notion of area for a parallelogram given by two such velocities allows us to compute areas of parametrized surfaces by integration. In general, the linear map will not preserve length or angles between these two spaces of velocity vectors, and hence will not preserve areas of parallelograms. The factor by which multiplies areas of parallelograms is exactly the density factor , and can be computed explicitly in examples.

I should include some computational examples to help with all this….

In the meantime, check out this post of Terence Tao on how this same induced inner product can be used to construct interesting processes by which to generate random -element subsets of a finite set .

]]>First of all, I am not suggesting we discontinue working on problems — rather I’m suggesting a (necessarily) smaller side-project. Polymath should try to write a textbook… or something expository.

There are plenty of reasons to feel initially apprehensive about this idea. We might imagine a group of authors’ artistic disagreements turning into edit wars. Would a great mathematician’s insight be lost through this process? And after all, doesn’t Wikipedia already provide this kind of service? And look at all the obvious inadequacies there! Not the ideal way to be learning mathematics at all, is it? And yet we still try reading it… Well, even if I don’t, other people do and it ends up consistently at the top of Google’s search results along with some other random papers, which only contributes more to the difficulty of finding good mathematical exposition on the internet.

These reasons to feel apprehension only highlight the importance of finding the solution to the more broad problem: find the best way to compose en masse. In some ways, it seems to be open, at least for mathematical purposes, and yet the applications clearly extend beyond mathematics. So I think it’s worth solving (carefully), and here are a few more reasons we have to solve it: (For the time being, the reader may envision we are trying to write a textbook on some well-understood mathematical subject)

- To
**help the Tricki**— Many applications in the articles in the tricki are to problems commonly solved in textbooks. And yet this is a bit antisymmetric — isn’t it more often that we see these funny-looking solutions to problems and then afterward spend a great deal of time and energy thinking “What? Where the heck did that idea come from?” or “I’ll have to remember this technique later for my completely unrelated purposes”, etc. At the same time, we will find that these collaboratively written expositions may inspire many articles for the Tricki, and give the Tricki good places to which to link for examples. So in short, this project may have great potential to grow along with the Tricki, and that is something that is (to my knowledge) essentially absent from the math we usually read. - To
**help the polymath projects**— You may consider the problem of mathematical exposition to be just one important subproblem of how to optimally do Polymath. This problem appears throughout the course of the project when summaries appear, and at the end of the project when the papers must be written out in full. I think this subproblem (just like the one considered in my last post), can be isolated, so we might benefit from concentrating on it. **Stretching our abilities**— Somehow, a printed topology textbook always fails to say as much as it wants to. And, I don’t know about you all, but I am far from comfortable including dynamic media in my posts, so I wouldn’t be able to write a decent online topology book, even if I did understand it very well. You can probably see I have other things to fix in my technical writing, too, but one way for me to learn is to be able to copy the techniques which appear through optimized collaborative writing. As we develop these online expository techniques/technologies and make them easier, we can record how they work in a separate wiki.**Saying things in more than one way**— When you listen to music in a car, you probably get to choose the volume, the level of bass, the active speakers, and so on. On the contrary, when you read a textbook (or Wikipedia), you have no knob for adjusting the level of abstraction or technical detail. Authors of textbooks have spent much energy optimizing these aspects to their liking, and once they have made their choices, you are left to figure out for yourself how to say it in “layman’s terms” or what meaning, if any, lies behind the particular algebraic manipulations. Grant it, these can often be great exercises, but they can also be frustrating and we will always find good conceptual exercises for ourselves. Having these different options available, one can then construct one or many different printable books by making these same choices in one of many different ways, but there are advantages to having various approaches available simultaneously. ( I also think this flexibility is necessary because, for good reason, nobody will agree on exactly one single best way to present something, but we just might be able to have just a bounded number of fundamentally different modes of presentation / points of view)**Possible****updating**— Some fantastic books require supplementation simply because they are outdated in maybe a few ways which aren’t fundamental. When your product is on the internet, you can always update it.**A concentration of fantastic problems**— In my previous essay, I contended that “ratings” were not in the spirit of doing mathematics — that we need, instead, to be more specific when classifying/criticizing posts, but still cannot afford to leave things unclassified. A collaboratively written textbook, even if it somehow manages to be not so well-written, is more likely to attract the most fantastic problems, especially if we provide ratings and other simple, popular statistics for these problems (which measure difficulty, estimate time commitment, how valuable the lessons we learned, number of people marking it as a “favorite” or whatever seems appropriate). Of course, there is a delicate issue regarding how and when to store solutions if at all, and we’d have to hope people give the problems an honest shot before rating. But I find it hard to imagine how a truly massive collaboration can fail to accomplish at least this feat, which by itself is a great thing.

I’ll stop here and hope I convinced everyone that solving the massive exposition problem is worth a shot even though it’s clearly a very difficult problem (but who better than mathematicians and programmers to take it on, eh?). I think it’s better to try to solve this problem simultaneously with the development of polymath. To me these times for polymath resemble the times in which the US Constitution was written — the US could have easily started a government without even a Bill of Rights were there not so much rigorous intellectual debate during the formation! (which is why it’s so great that those leading the polymath movement have chosen to do things this way)

I’ll leave you all with some questions…

*How would we determine what to try to write first?**What does a good candidate exposition project look like?**Is there any kind of exposition a massive, internet-based approach seems inherently incapable of doing? Or does the potential seem unlimited?*

*Are there any subtle but important issues I have completely ignored?**Should I have called this a polymath proposal or should the collaborative exposition idea be considered separate?*

The one comment I think I should make is that the first experimental exposition projects, if enough people are interested, should be very small, with well-defined boundaries, and should help to set a precedent of interaction with the Tricki. I also think that, much like the research-flavor of polymath (and the US constitution), we would have to make things extremely robust from the very beginning. For example, if the Tricki were instead an idea which instead came into being during the future, it should have been easy to integrate into whatever expository polymath system we introduce.

—-

Unrelated… How do you turn off the annoying preview thing for the hyperlinks? And how do you make it so that you only get a small preview of the entry from my blog’s homepage? I’m new to this blogging stuff..

Edit ( 30 Aug, 2009 ):

There is one thing which can be accomplished on the internet which might have great applications for exposition: we can keep our writing simple by attaching small links (like footnotes in LaTeX) to statements whose proofs may be obvious to a decently large fraction of readers, or when proving such a statement would disrupt the flow of the prose. A pop-up could then appear giving a complete proof or two. This feature would not only help to compress the prose and accelerate reading (people get to read their choice of details), but simultaneously this feature would allow for a more detailed exposition of whatever we decide to write.

]]>My point of view is based in part on my experience as a moderator of a webforum of up to a dozen active members which devoted a few years to the collaborative production of a complex storyline with several, deeply interwoven subplots. Time will tell how well large collaborations can produce mathematics –they are certainly an amazing tool for story-writing, and some comparisons can be made upon abstraction so the experience may be relevant. The Google Groups format had tremendous advantages and shortcomings, but an inability to harness people’s free time ultimately lead to our story’s stagnation.

——————————————————————–

At the moment, polymath seems to function in many ways analogously to various forms of entertainment and “time-wasting” (reading blogs and webcomics, participating in forums, watching movies, watching YouTube, etc.) – indeed, this “wasted time” is in some sense exactly the incredible resource which polymath must compete to harness, although within a more restricted audience and for the noble purpose of serving mathematics. I am sort of joking, but this is only my interpretation of Prof. Tao’s original request that the participation in the latest polymath problem solving experiment be casual.

Now, much work has already been done to procrastinate as efficiently as possible– we read a select few blogs/webcomics/twitters and have efficient means of getting to them (feeds, bookmarks); we look at Rotten Tomatoes to decide which movies to watch. On YouTube, we often look at the “N Views” and the “x/5.0 (y votes)“ user-rating figure to decide which videos are best worth our wasted time (and many videos on YouTube are “replies” to others, which are linked). YouTube, for instance, has had to developed mechanisms to help ensure good posts do not get squashed, which is a problem polymath also faces. Regardless of their shortcomings and their obviously differing objectives from those of Polymath, these “procrastination” activities provide good examples of how to effectively organize a mass of information which is spread among many people so as to save time, and that is the problem I want to address here (as opposed to the closely related problem of optimal technology and formatting). In other words, I will not propose any particular technologies, but rather make a list of things which I believe a technology has to be capable of doing in order to harness massive collaboration and spare time in a satisfactory way – a task which is crucial to maximizing the long term impact of polymath.

I wanted to be able to reference people, but the blog format makes this very difficult (.. or at least I don’t know how to do it!..), so instead I must apologize to everyone whose ideas I reproduce here without giving credit. Doing so will obviously be a mistake on my part (and will thus help me raise the point that easy citation is important); many ideas and issues raised below were scattered within the first 43 comments on this entry of Prof. Tao’s blog. In fact, none of the ideas presented below are mine, essentially; they have all been taken from various other internet services which have attempted to solve similar problems.

Here is a **table of contents** of issues raised in that post that will be addressed:

– Sloppy writing — see ***Favorites***, ***Post Classification***, ***View Counting***

– Evaluation/classification of ideas (currently under-explored?, flawed?, showing promise?, novel observation?, etc…) – see ***Post Classification***

– Ease of linking to / citing other posts – see ***Finally*** and ***Private messaging***

– Estimating time commitment – see following paragraph

– Keeping a sense of chronology but being able to edit – I think **Google Wave** and its playback feature should be able to help once it supports LaTeX… But see ***Finally***

– Need for **Leadership **— See the entire section on personal accounts.

– “Visualizing the tree” – see ***Finally***, but I have not considered the problem of visualizing logical dependences

– Private vs. public messaging – see ***Private messaging***

– Personal notes/ journal – see ***Personal blog***

Consider the following model of Polymath, which to a first approximation is a perturbation of YouTube combined with certain aspects of webforum functionality. A person involved in Polymath must have a Polymath account (blog activity / wiki activity, etc. within particular projects is all somehow contained inside a larger entity called “Polymath” – compare Google Groups), and although one need not necessarily go by his true identity in such an account, he must register for any individual project (and, for example, verify he is not a robot). Before deciding whether to participate in a project, he knows which of his “Friends” are involved (where “friendship” is mutual akin to Facebook friendship–humans themselves are a natural and efficient means by which we already organize our time). He also has good ways to estimate the starting cost: all the posts are publicly readable, as are results of simple survey data gathered from those involved, as is present a description of prerequisite knowledge, and statistics about the distance to a leaf in the comment tree, and the age of the project, blahblahblah. Only after registering, however, may he contribute to the project, classify posts, etc.; having done so, the project is added to his “My Projects” folder, from which he can access any project to which he subscribes (compare YouTube, Google Groups). From here on in, everything is contained within a particular polymath project.

The perfect means by which to organize any particular project—be it forum, wiki, blog, or what-have-you– is an open problem we hope will be solved by market forces, but I describe two features which I believe to be essential in this part (see Prof. Tao’s comment on his own entry for a very good list).

Firstly, I believe there must be **personal profiles **(similar to those of a webforum and YouTube). A personal profile is in part generated automatically, and is in part privately constructed. I hope that these personal profiles can help to provide the amount of leadership in a project that is clearly necessary. When viewed by other registered users of a project, a personal profile provides at the very least the following:

– A ***personal record***: a record of what this person has done within the project which can be easily searched and allows for easy citation. — For many reasons, it is important to record what particular individuals have done and have found important. For example, one or more people’s participation in a problem during a period of time can give an approximation to what one might call a “line of attack” or “chapter” in the greater problem– this fulfills a good part of the job of a summary and can facilitate in writing them. I learned this lesson first-hand while building summaries of years’ worth of subplots of the story I mentioned earlier; I used people’s records to trace sub-stories. Again, human records provide great, natural organizational means – I suspect those of you who wrote summaries for the DHJ project may have used a feature to search for individuals, and I’ll come back to this point.

– A directory of ***Favorites*** — Here one bookmarks the posts he has found most valuable to his understanding of the problem. A person uses his “Favorites” directory to make them easily available for citing and referencing. They are public information so that people will have an easier time “getting on the same page” and producing summaries. One should be able to view the Favorites in the order in which they were placed in the Favorites list, rather than simply the order in which they came into being. Favorites have brief comments attached to them describing why they’re bookmarked, some of which may be publicly available, some of which are personal notes. Favorites also often have more extensive notes attached to them, which one can also opt to make public every now and then.

– A ***personal blog*** – The blog contains public and private entries (see the remarks of David Speyer). Here you get to express your own thoughts and evolving point of view. You can also tell people you’re leaving the project, you can ask for help regarding something (other mechanisms should exist for this purpose), inform people of upcoming mini-collaborations you’re having, link to breakthrough posts you’ve found – in the end it’ll be an equilibrium exactly what they’re used for, but they have much potential to help people keep up with parts of the project in which you are concentrating. However, note that the personal blog is meant to be… *personal*, as opposed to the public blogs devoted to the project. There may be no need for comments or public discussion on these blogs, and in fact it may be better to disable such features to prohibit them from containing discussions of general interest in the wrong place (perhaps their entries may, however, be referenced or bookmarked? Or not). Just as we have feeds for our favorite blogs/webcomics/twitters, we can have feeds for the most important blogs within the project, and an option to make the contents of our feed publicly available to help newcomers (just like last.fm allows other people to know what music you’re listening to) analogously in principle to the above Favorites idea for posts. Above all, we have seen already through the examples of Prof.’s Tao and Gowers the impact blogs can have for leadership purposes.

– A ***private messaging*** capability – There are reasons to be apprehensive about this, but in the end I am guessing we will find out it’s both useful and necessary, and that it must have all the features of any other form of posting. [A Student] suggested private messaging as a means to correct or ask questions politely. I suspect such conversations would sometimes lead to important discoveries, which must then be brought to the attention of the problem’s community, but only in an organized form. Therefore if they exist, PM’s must be much more robust! (This need for robustness truly makes me think of where GoogleWave can come in once it can support LaTeX) In the collaborative story-writing setting, the best plot twists were discussed in secret by a few people before being unveiled in some well-planned posts (which also tended to be written better)– I see no reason that analogous mini-collaborations should not play an important part in the PolyMath process. Indeed, such private messages will exist inevitably regardless of whether or not we choose to welcome and incorporate them – we cannot get rid of them, no matter how hesitant we may be as to whether or not they align with the spirit of polyMath.

Secondly, I believe that we must incorporate statistics to classify and measure the popular opinion regarding individual posts. In his comment, Prof. Tao put this idea under the “less mandatory” section, and I agree that it will not be mandatory in every setting. But in the situations where things get very, very big and post numbers become extremely large, I think they are basically necessary just as Rotten Tomatoes and YouTube statistics are necessary to help people efficiently waste their time at the movies and online (and, like I said, wasted time is the resource for which polyMath competes). Here are some examples of how this can be done in conjunction with my previous recommendations:

– ***Post classification*** – An anonymous poster made the excellent observation that many people probably waste time evaluating the accuracy / importance of (sometimes poorly / vaguely written) arguments. But if an argument is flawed, and the flaw has been pointed out, this information should be obvious upon viewing the post and a link to the corrected analysis should be present before a million people have to struggle with it. Or, likewise, if an argument is vague/imprecise, and a more precise version has been written, this classification and a link should be immediately available. The post may be vague not as a fault but because it is what one of my professors would call “the two-minute version” and there are “five minute” and “five hour” versions also available. In whatever case, corrections, elaborations (or, just as importantly, compressions), if they exist should be linked and labeled as such. The taxonomy here is incomplete – it would also be nice to know when an approach has been (by consensus) beaten to death and we have done all we can with it and understand its limits, or when the idea is “underexplored” or “showing promise” – similar to how Wikipedia allows us to label its articles’ flaws (e.g. “not enough references”, “too much jargon”, etc.). (Maybe this element of Wiki is the best solution for the classification problem?)

- These can be
**incorporated with the Favorites**that I proposed above, so that if something should happen to a Favorite post (e.g. a flaw is discovered and elaborated), those who are concerned about that post are informed in their Favorites page. I believe this kind of classification with linking is all we need to “punish” errors – when errors become common, we will put the common errors (or their corrections) in our Favorites list, so that we may quickly provide a link when they reappear. It is good for a Favorites list to have errors in it, as long as they remain well-classified.

– A ***view counter*** – Gives an idea how commonly viewed the knowledge is. It is important for individuals to bookmark a not-well-known post which they feel is important so as to prevent the phenomenon of everyone always looking at the most popular posts. Many posts will become unpopular because they are badly written, so I propose using Favorites or Post Classification to highlight those worth the read despite their low numbers (or not worth reading despite high numbers). One should also keep track of how many people have set any given post as a favorite. With these statistics (along with the age of the post, tags and a subject header), one can approximately isolate what’s really worth reading when the amount of material out there vastly exceeds his available time.

– No numerical ratings — While classification of posts can be quite useful, numerical ratings or “thumbs up / thumbs down” ratings do not promote the overall good. Not only do they offer no explanation as to what may be good or bad about an entry, they tend to be assigned without any real, deep thought. It is very easy to crush a funny-sounding, good idea with just a few bad reviews.

***Finally***, I want to discuss the importance of universal ease of reference / linking and its interaction with “visualizing the tree” and with summaries. For very large projects with very many collaborators, I agree with Kareem Carr, for example, that we do need ways of understanding and approximating the tree structure. I think we could use some kind of metric to measure, in a useful way, the distance between summaries. A summary pertains to a certain region of the tree, which can be approximated based on the humans referenced, and the individual posts referenced. For example, one may approximate what you might call a sub-problem by collecting a handful of the main people involved and the time in which it took place, looking for the correct region of the graph in which they interact most closely, and then taking a small neighborhood of those posts. In this way one can try to associate a region to a summary.

We’ve already noticed that much can be learned from the paradigm of open-source programming, but I hope to have convinced the reader that models from various other popular internet services may also inform the constitution of polymath in particular by helping us see how people themselves can be used to naturally organize massive material. There’s a lot of wasted time out there just waiting to be harnessed for the service of mathematics! Like.. all the time I just spent writing this essay…… ……. …………

————-

If no product exists which is robust enough to do all of the above (while meeting the demands Dr. Tao listed)… I would suggest some people work to develop the product. On the internet you never know what will be the next Facebook, so you might as well start very robust. Last of all, did anyone ever notice that participation in PolyMath projects may end up proving valuable to gauging graduate school applicants? Maybe this is an extremely tricky question. I also wonder what would happen if polymath tried to write a textbook…

Edit (28 July) : I should have stressed that the underlying assumption in all the above is that things be organized so well people actually want to use them. Like, the blogs should be good enough for taking actual notes, for example.

]]>I may continue this blog to contain pieces of mathematics that I find pretty, important and/or not so commonly known. In any case, I’d feel silly devoting an entire blog to a single reply. So here.. I’ll take this opportunity to include my favorite proof of the Lagrange multiplier theorem, and later on maybe I’ll post a nice proof of Stirling’s formula, a group algebraic proof of the Riemann-Lebesgue lemma, or something…

The Lagrange Multiplier Theorem helps us solve constrained maximization / minimization problems, making it (among other things) extremely important in economics. A (weak form of) the Lagrange Multiplier theorem can be stated as follows:

Let be a real-valued, continuously differentiable function (called the “objective function” — the thing we want to maximize or minimize), and let be continuously differentiable (called the “constraint function” for reasons which will soon be clear).

If is an extremizer of the restriction of to the (codimension ) “constraint manifold” given by , then there is a unique linear functional satisfying the equality of linear maps .

We assume — things tend to be more complicated when the number of constraints exceeds the number of degrees of freedom.

**Proof:**

We know that must satisfy some first order condition: namely (by further restricting to trajectories on ) that for any which can be realized as a velocity at by a path within the constraint manifold. An exercise in the implicit function theorem shows that these velocities include all of (these are all such velocities — they are the directions in which one can move without changing , and together they are called the “tangent space” of at under generic circumstances).

Finally, now that , it is a trivial exercise in linear algebra to show that there is a unique linear functional defined by the identity . Basically, we have shown that the zero set of the linear map is contained in the zero set of , but because the two maps are linear, their level sets are all translations of the zero level set, and therefore we conclude that every level set of is contained in a level set of .

Does anyone know how to turn off the annoying preview thing for hyperlinks..? Never mind, got it.

Edit (29 Aug, 2009):

The same ideas can be used to give a proof of the Kuhn-Tucker conditions for a constrained maximization problem where we replace the system of equalities with the system of inequalities . We simply replace the role of trajectories within the constraint manifold by trajectories going *into* the constraint set.

Without loss of generality, we may assume a maximizer belongs to the boundary where the constraints bind (i.e. ), for if one constraint does not bind (say ), we may restrict our domain to the open set and simply remove this component from the constraint function, thereby decreasing (the corresponding component of the Lagrange multiplier will therefore be zero). Thus, we can picture the point to be a point of transverse intersection of hypersurfaces which together form the boundary of the constraint set (at least nearby ) — some kind of corner.

We now modify the proof of the Lagrange multiplier theorem by showing also that at a maximizer where — but this is easy to see. The inequality means (in this situation) that the velocity points into the constraint set since no constraint function increases in this direction. Therefore we cannot have increasing in this direction at a constrained maximum, so we cannot have .

Again, this argument relies on applying the implicit function theorem to at the point in order to assure that all such velocities for which can be achieved as a velocity at of a trajectory which travels into the constraint set, but like many other applications of the implicit function theorem, this fact is intuitively clear when one pictures the generic situation.

The monotonicity property we have just established — — is equivalent to the non-negativity of the functional satisfying . Note here that the Lagrange multiplier theorem still applies (i.e. there is a ) since still maximizes the objective function over the smaller constraint set after we have removed the constraints which do not bind.

The above extension of the Lagrange multiplier theorem is due to Karush and the resulting conditions are sometimes referred to as “Kuhn-Tucker” conditions (or “Karush-Kuhn-Tucker conditions”). One runs into such inequality constraints at every corner in economics. For example, a consumer cannot buy more than she can afford (unless she lives in the US?), nor can a firm produce more than its available production technologies and resources allow…

I should comment on how one can more directly see the conclusions of the Lagrange multiplier theorem and the Karush-Kuhn-Tucker conditions. First the Lagrange multiplier theorem, where we return to the setting of a codimension m constraint manifold .

The Lagrange multiplier theorem asserts that ; in other words, the derivative of (which points in the direction of increase of ) lies in the linear space spanned by the derivatives of the constraint functions. This span is exactly the orthogonal complement of the tangent space at to the constraint manifold. If did not lie entirely in this orthogonal complement, there would be a nontrivial projection of in a direction tangent to the constraint manifold. Choosing a trajectory in the boundary of the constraint set which achieves this projection as a velocity, the objective function would increase along this trajectory at , which cannot occur at a constrained maximizer/minimizer.

To see the necessity of the Karush-Kuhn-Tucker conditions directly (that ), it is easiest to visualize a corner formed where two lines in a plane intersect transversely , or a corner where three planes in three-dimensional space intersect transversely. If we picture the inequality constraint as a shaded region in the ambient space with this corner as the boundary, then the vectors are normal to the boundary hypersurfaces (), pointing outside of the constraint region since they correspond to directions of increase of the constraint functions. In order for the corner to be a maximizer for an objective function , the gradient of , which points in the direction of increase of , must point within the cone which lies “between” all the normal vectors . This fact is easiest to see in a picture… which I… really should… provide… This cone is the set of non-negative linear combinations of the normal vectors.

]]>