Contravariance and Covariance - Part 1- Larry Musa

Contravariance and Covariance - Part 1

Founders of Tensor Analysis

Gregorio Ricci Curbastro		Tullio Levi-Civita

In the design of computer languages, the notion of contravariance and covariance have been borrowed from category theory to facilitate the discussion of type coercion and signature. Category theory, on the other hand, borrowed and abstracted the notion of contravariance and covariance from classical tensor analysis which in turn had its origins in physics.

Unfortunately, the people who tend to be familiar with category theory, and indeed tensor analysis, and the ideas of contravariance and covariance, are more often mathematicians, and not usually computer scientists. I therefore hope to give a gentle introduction to some of these ideas.

Figure 1 The flow of ideas leading to the development and use of category theory in computer science.

We begin with physics because the idea of contravariance and covariance has its origins in the study of vectors. The concept of a vector has undergone a great amount of refinement and generalization in its history. Depending on the mathematical spaces one is considering, such generalization is necessary. But additionally, it is also useful to have a variety of ways of thinking about the vector concept. To quote Feynman:

"Theories of the known, which are described by different physical ideas, may be equivalent in all their predictions and hence scientifically indistinguishable. However, they are not psychologically identical when trying to move from that base into the unknown. For different views suggest different kinds of modifications which might be made and hence are not equivalent in the hypotheses one generates from them in one's attempt to understand what is not yet understood." -- Feynman [1]

(That is also a good quote to keep in the back of one's mind when we come to category theory.) Therefore, we will enumerate some of the common vector definitions. However, no matter how much the concept is generalized, in engineering and physics, the geometric notion of an arrow that obeys a superposition principle is fundamental. And in mathematics, the algebraic notion contained in a linear vector space, is fundamental. In both the arrow and vector space notions, there is a natural idea of contravariance and covariance. In computer science, the concept of a vector (i.e., an ADT) refers to a completely unrelated idea and in no way should be confused with the concepts of vector we are considering here.

Vector Concept 1 - Directed Line Segment: The vector concept one first studies in school, usually in physics, is that of an arrow, or a directed line segment.

We are told in elementary physics that forces and displacements can be modeled by directed line segments. We add an arrow to the tip of a line segment to indicate its direction.The length of the arrow corresponds to its magnitude.

Experimentally, one usually finds that forces and displacements can be combined graphically by the parallelogram method of addition. The discovery of this method seems to have been first made by the Dutch mathematician Simon Stevinus (1548-1620).

Figure 2 Addition of vectors by the "parallelogram" method

Vectors (i.e., directed line segments) can also be added graphically by the "head-to-tail" method. We place the tail of vector B onto the head of vector A. The resultant (i.e., the sum of A and B) is the arrow drawn from the tail of A to the head of B.

Figure 2b Addition of vectors by the "head-to-tail" method

It must be emphasized that the addition of vectors by either method does NOT require a coordinate system. What is required is the notion of "parallelism", which for E² requires a notion of an angle. (Mathematical sidenote: The plane taken without a coordinate system will be designated by E² (the Euclidean plane). This is the usual plane one enounters in elementary geometry in high school. The Greeks, and in particular Euclid, successfully studied geometry long before Descartes introduced the idea of Cartesian coordinates in 1637. The Euclidean plane, together with an arbitrary Cartesian coordinate system, we will designate by R².) By not introducing coordinates we can be sure our definitions are coordinate independent. Many authors start with R², do not distinguish between E² and R², and so one is never sure if the definition is dependent on the choice of coordinates.

Congruent or Equal? In high school geometry, we ask when two shapes are congruent. But if we intend to use vectors to write equations in physics using vectors (i.e, write vector equations), then congruence will not be sufficient. Newton's 2nd Law f = ma, for example, is a vector equation that specifies an equality of vectors. When, then, are two vectors to be considered equal? Clearly, we require equal vectors to have the same magnitude and same direction. What is less clear is can vectors at different points of E² be considered equal?

It turns out that we have actually "begged the question" when we postulated that there are two different ways of adding vectors, and that they are equivalent.
By virtue of the equation:
Parallelogram Method = Head to Tail Method
=> A + B = A + C
=> B = C, we must believe that translated vectors, i.e., vectors in a different position may be considered equal when their magnitudes and directions are the same. (However, when we consider more general definitions of vectors in manifolds other than Rⁿ, this will not necessarily be true.)

Abstracting from our physical examples of force and displacement, to an elementary idea of a vector we have:
Definition : A vector is an object that can be represented by a directed line segment and obeys the parallelogram method of addition. The magnitude of the vector is given by the length of the arrow.

The above vector definition is of great utility in physics and engineering, but it is not capable of being used in more abstact spaces than Rⁿwithout modification. However, those abstract spaces are typically manifolds, and a manifold "locally looks like" Rⁿ (where n is the dimension of the manifold) So, even if we want to jump ahead to a more advanced definition, it is quite useful to understand how vectors are defined in Rⁿ.

Rectangular and Oblique Coordinates Systems in R²
Without loss of generality, we restrict our discussion now to mostly R². Extending to Rⁿ is not difficult, but complicates the example. Shown below is a rectangular coordinate system and the resolution of a vector A into its rectangular coordinates.

Figure 3 Rectangular Components of Vector A

In rectangular coordinates, the contravariant and covariant components of a vector are usually the same, as will become clear later.
We now extend the class of coordinate systems to consider those systems whose axes are not perpendicular, or in other words, oblique coordinates. Let OX be a line through the points O and X, and OY be a line through the points O and Y, chosen so that the lines are not colinear and they meet at an oblique angle. Then, with O chosen as the origin, this constitutes an oblique coordinate system for R².(Figure 4)

Figure 4 An Oblique Coordinate System

The parallel projection onto the OX axis gives the x coordinate, or abscissa, A^x, of the vector A. The parallel projection onto OY gives the y coordinate, or ordinate, A^y, of the vector A.(Figure 5)

Figure 5 Oblique Coordinates of Vector A

The components A^x and A^y are called the contravariant components of A. It is important to note that x and y are not exponents, but their raised position relative to A is the standard notation for contravariant components that is used to distinguish them from the covariant components, which we will now introduce.

Reciprocal Lattice

Figure 6 The Reciprocal Lattice in R²

To define the covariant components of A, we have to introduce the concept of the reciprocal lattice. Rotating the x axis by 90 degrees counter clockwise gives the y^'-axis of the reciprocal lattice, and rotating the y axis by clockwise by 90 degrees gives the x^'-axis of the reciprocal lattice. The original coordinate system is called the direct coordinate system, or direct lattice. The reciprocal lattice is shown, in green, in figure 6.

This procedure works in R²only. Generally in Rⁿ, the reciprocal lattice is defined by introducing non-zero "basis" vectors, not necessarily of unit length, along each axis x¹,...,xⁿ. (Because the number of axes can in general exceed the number of distinct letters in the alphabet, we now switch to labeling the axis not as x,y,z but as x¹,...,xⁿ) Let e₁ be a non-zero vector along the x¹ axis, e₂ a non-zero vector along x² axis,..., and e_n be a non-zero vector along xⁿ , then this set of vectors set {e₁,...,e_n} is a "basis" for the direct coordinate system. If we are given the basis, we can form the coordinate axes, and vice-versa. The reciprocal lattice "basis" , {e¹,...,eⁿ} then is defined by the requirement
eⁱe_kcos[angle(i,k)] =1 if i=k, and 0 otherwise, where angle(i,k) is the angle between eⁱand e_k.

Figure 7 Covariant Components of Vector A

Having constructed the reciprocal lattice, the parallel projections of A onto the x^'-axis and the y^'-axis give the covariant components of A, as shown in Figure 7.

For completeness, it should be noted it is also possible to give an interpretation of the covariant components as orthogonal projections on the direct coordinate sytem, rather than parallel projections on the reciprocal lattice. In fact, this interpretation was given by the founders Ricci and Lev-Civita. However, this interpretation requires multiplication of the components by a factor, which destroys the symmetry of the formulation.

The reciprocal lattice seems to have its origins in crystallography, but it also has great application in solid state physics.

Next Week -- Part 2 -- Contravariance and covariance in abstract vector spaces. Vectors as elements of vector spaces, change of basis, multilinear algebra

References:

[1] Richard Feynman, The Development of the Space-Time View of Quantum Field Theory. Nobel Lecture, 1966.