Contravariance and Covariance - Part 1
Founders of Tensor Analysis | ||
---|---|---|
Gregorio Ricci Curbastro | Tullio Levi-Civita |
In the design of computer languages, the notion of contravariance and covariance
have been borrowed from category theory to facilitate the discussion of type
coercion and signature. Category theory, on the other hand, borrowed and abstracted
the notion of contravariance and covariance from classical tensor analysis which
in turn had its origins in physics.
Unfortunately, the people who tend to be familiar with category
theory, and indeed tensor analysis, and the ideas of contravariance and covariance,
are more often mathematicians, and not usually computer scientists.
I therefore hope to give a gentle introduction to some of these ideas.
Figure 1 The flow of ideas leading to the development and use of category theory in computer science. |
We begin with physics because the idea of contravariance and covariance
has its origins in the study of vectors. The concept of a vector has undergone a
great amount of refinement and generalization in its history. Depending on the mathematical
spaces one is considering, such generalization is necessary. But additionally, it
is also useful to have a variety of ways of thinking about the vector concept. To
quote Feynman:
"Theories of the known, which are described by different physical ideas, may be equivalent in all their predictions and hence scientifically indistinguishable. However, they are not psychologically identical when trying to move from that base into the unknown. For different views suggest different kinds of modifications which might be made and hence are not equivalent in the hypotheses one generates from them in one's attempt to understand what is not yet understood." -- Feynman [1]
"Theories of the known, which are described by different physical ideas, may be equivalent in all their predictions and hence scientifically indistinguishable. However, they are not psychologically identical when trying to move from that base into the unknown. For different views suggest different kinds of modifications which might be made and hence are not equivalent in the hypotheses one generates from them in one's attempt to understand what is not yet understood." -- Feynman [1]
(That is also a good quote to keep in the back of one's mind when we come to category theory.) Therefore, we will enumerate some of the common
vector definitions. However, no matter how much the concept is generalized, in engineering
and physics, the geometric notion of an arrow that obeys a superposition principle
is fundamental. And in mathematics, the algebraic notion contained in a linear vector
space, is fundamental. In both the arrow and vector space notions, there is a natural
idea of contravariance and covariance. In computer science, the concept of a vector
(i.e., an ADT) refers to a completely unrelated idea and in no way should be confused
with the concepts of vector we are considering here.
Vector Concept 1 - Directed Line Segment: The vector concept one first studies in school, usually in physics, is that of an arrow,
or a directed line segment.
We are told in elementary physics that forces and displacements can be modeled by directed line segments. We add an arrow to the tip of a line segment to indicate its direction.The length of the arrow corresponds to its magnitude.
We are told in elementary physics that forces and displacements can be modeled by directed line segments. We add an arrow to the tip of a line segment to indicate its direction.The length of the arrow corresponds to its magnitude.
Experimentally,
one usually finds that forces and displacements can be combined graphically by the
parallelogram method of addition. The discovery of this method seems to have been
first made by the Dutch mathematician Simon Stevinus (1548-1620).
Figure 2 Addition of vectors by the "parallelogram" method |
Vectors (i.e., directed line segments) can also be added graphically by the "head-to-tail" method.
We place the tail of vector B onto the head of vector A. The resultant (i.e., the sum of A and B) is the
arrow drawn from the tail of A to the head of B.
Figure 2b Addition of vectors by the "head-to-tail" method |
It must be emphasized that the addition of vectors by either method does NOT require a coordinate system.
What is required is the notion of "parallelism", which for E2 requires a notion of an angle.
(Mathematical sidenote:
The plane taken
without a coordinate system will be designated by E2 (the Euclidean plane). This is the usual plane one enounters in elementary geometry in high school.
The Greeks, and in particular Euclid, successfully studied geometry long before Descartes introduced the idea of Cartesian coordinates in 1637.
The Euclidean plane, together with an arbitrary Cartesian coordinate system, we will designate by R2.)
By not introducing coordinates we can be sure our definitions are coordinate independent. Many authors start with R2, do not distinguish between E2 and R2,
and so one is never sure if the definition is dependent on the choice of coordinates.
Congruent or Equal?
In high school geometry, we ask when two shapes are congruent. But if we intend to use vectors to write equations in physics using vectors (i.e, write vector equations),
then congruence will not be sufficient. Newton's 2nd Law f = ma, for example, is a vector equation that specifies an equality of vectors.
When, then, are two vectors to be considered equal? Clearly, we require equal vectors to have the same magnitude and same direction. What is less clear is can vectors at different points of
E2 be considered equal?
It turns out that we have actually "begged the question" when we postulated that there are
two different ways of adding vectors, and that they are equivalent.
By virtue of the equation:
Parallelogram Method = Head to Tail Method
=> A + B = A + C
=> B = C, we must believe that translated vectors, i.e., vectors in a different position may be considered equal when their magnitudes and directions are the same. (However, when we consider more general definitions of vectors in manifolds other than Rn, this will not necessarily be true.)
By virtue of the equation:
Parallelogram Method = Head to Tail Method
=> A + B = A + C
=> B = C, we must believe that translated vectors, i.e., vectors in a different position may be considered equal when their magnitudes and directions are the same. (However, when we consider more general definitions of vectors in manifolds other than Rn, this will not necessarily be true.)
Abstracting from our physical examples of force and displacement, to an elementary idea of a vector we have:
Definition : A vector is an object that can be represented by a directed line segment and obeys the parallelogram method of addition. The magnitude of the vector is given by the length of the arrow.
Definition : A vector is an object that can be represented by a directed line segment and obeys the parallelogram method of addition. The magnitude of the vector is given by the length of the arrow.
The above vector definition is of great utility in physics and engineering,
but it is not capable of being used in more abstact spaces than Rn without
modification. However, those abstract spaces are typically manifolds, and
a manifold "locally looks like" Rn (where n is the dimension of
the manifold) So, even if we want to jump ahead to a more advanced definition, it
is quite useful to understand how vectors are defined in Rn.
Rectangular and Oblique Coordinates Systems in R2
Without loss of generality, we restrict our discussion now to mostly R2. Extending to Rn is not difficult, but complicates the example. Shown below is a rectangular coordinate system and the resolution of a vector A into its rectangular coordinates.
Without loss of generality, we restrict our discussion now to mostly R2. Extending to Rn is not difficult, but complicates the example. Shown below is a rectangular coordinate system and the resolution of a vector A into its rectangular coordinates.
Figure 3 Rectangular Components of Vector A |
In rectangular coordinates, the contravariant and covariant
components of a vector are usually the same, as will become clear later.
We now extend the class of coordinate systems to consider those systems whose axes are not perpendicular, or in other words, oblique coordinates. Let OX be a line through the points O and X, and OY be a line through the points O and Y, chosen so that the lines are not colinear and they meet at an oblique angle. Then, with O chosen as the origin, this constitutes an oblique coordinate system for R2.(Figure 4)
We now extend the class of coordinate systems to consider those systems whose axes are not perpendicular, or in other words, oblique coordinates. Let OX be a line through the points O and X, and OY be a line through the points O and Y, chosen so that the lines are not colinear and they meet at an oblique angle. Then, with O chosen as the origin, this constitutes an oblique coordinate system for R2.(Figure 4)
Figure 4 An Oblique Coordinate System |
The parallel projection onto the OX axis gives the x coordinate, or abscissa, Ax,
of the vector A. The parallel projection onto OY gives the y coordinate, or ordinate,
Ay, of the vector A.(Figure 5)
Figure 5 Oblique Coordinates of Vector A |
The components Ax and Ay are called the
contravariant components of A. It is important to note
that x and y are not exponents, but their raised position relative to A is the standard
notation for contravariant components that is used to distinguish them from the
covariant components, which we will now introduce.
Reciprocal Lattice
To define the covariant components of A, we have to introduce the concept of the reciprocal lattice. Rotating the x axis by 90 degrees counter clockwise gives the y'-axis of the reciprocal lattice, and rotating the y axis by clockwise by 90 degrees gives the x'-axis of the reciprocal lattice. The original coordinate system is called the direct coordinate system, or direct lattice. The reciprocal lattice is shown, in green, in figure 6.
Figure 6 The Reciprocal Lattice in R2 |
To define the covariant components of A, we have to introduce the concept of the reciprocal lattice. Rotating the x axis by 90 degrees counter clockwise gives the y'-axis of the reciprocal lattice, and rotating the y axis by clockwise by 90 degrees gives the x'-axis of the reciprocal lattice. The original coordinate system is called the direct coordinate system, or direct lattice. The reciprocal lattice is shown, in green, in figure 6.
This
procedure works in R2only.
Generally in Rn, the reciprocal lattice is defined by introducing non-zero
"basis"
vectors, not necessarily of unit length, along each axis x1,...,xn.
(Because the number of axes can in general exceed the number of distinct letters
in the alphabet, we now switch to labeling the axis not as x,y,z but as x1,...,xn)
Let
e1 be a non-zero vector along the x1 axis, e2 a non-zero vector along x
2
axis,..., and en be a non-zero vector along xn
, then this set of vectors set {e1,...,en}
is a "basis" for the direct coordinate system. If we are given the basis, we can
form the coordinate axes, and vice-versa. The reciprocal lattice "basis" , {e1,...,en} then
is defined by the requirement
eiek cos[angle(i,k)] =1 if i=k, and 0 otherwise, where angle(i,k) is the angle between e i and ek.
eiek cos[angle(i,k)] =1 if i=k, and 0 otherwise, where angle(i,k) is the angle between e i and ek.
Figure 7 Covariant Components of Vector A |
Having constructed the reciprocal lattice, the parallel projections of A onto the x'-axis and the y'-axis give the covariant components of A, as shown in Figure 7.
For completeness, it should be noted it is also possible to give an
interpretation of the covariant components as orthogonal projections on the direct
coordinate sytem, rather than parallel projections on the reciprocal lattice. In
fact, this interpretation was given by the founders Ricci and Lev-Civita. However,
this interpretation requires multiplication of the components by a factor,
which destroys the symmetry of the formulation.
The reciprocal lattice seems to have its origins in crystallography,
but it also has great application in solid state
physics.
Next Week -- Part 2 -- Contravariance and covariance in abstract vector spaces.
Vectors as elements of vector spaces, change of basis, multilinear algebra
References:
[1] Richard Feynman, The Development of the Space-Time View of Quantum Field Theory. Nobel Lecture, 1966.