Special Relativity/Print version

From Wikibooks, open books for an open world
< Special Relativity
Jump to: navigation, search

Note: current version of this book can be found at http://en.wikibooks.org/wiki/Special_relativity

Remember to click "refresh" to view this version.

Table of contents

The principle of relativity

Introduction
The Principle of Relativity
Frames of reference, events and transformations
Special relativity
The postulates of special relativity

Spacetime

The spacetime interpretation of special relativity
Spacetime
The lightcone
The Lorentz transformation equations
The relativity of simultaneity and the Andromeda paradox
The twin paradox
Addition of velocities

Relativistic dynamics

Introduction
Momentum
Force
Mass and energy

Light propagation and the aether

Introduction
The aether drag hypothesis
The Michelson-Morley experiment

Mathematical approach to special relativity

Introduction
Mathematical techniques
Vectors
Matrices
Linear Transformations
Indicial Notation
Analysis of curved surfaces and transformations
Four vectors
The Lorentz transformation
The linearity and homogeneity of spacetime
The Lorentz transformation
Length contraction, time dilation and phase
Hyperbolic geometry
Addition of velocities
Acceleration transformation
Relativistic dynamics
Introduction
Momentum
Relativistic Mass

Appendices

Mathematics of the Lorentz Transformation Equations

Retrieved from "http://en.wikibooks.org/wiki/Special_relativity"



Introduction

Introduction

The Special Theory of Relativity is a description of classical physics that was developed at the end of the nineteenth century and the beginning of the twentieth century. It changed our understanding of older physical theories such as Newtonian Physics and led to early Quantum Theory and General Relativity.

Special relativity applies at all velocities and Newtonian physics is nowadays viewed by physicists as a set of formulae that simplify the main results of relativity and quantum theory for school pupils. Special Relativity provides a deeper understanding of physics as well as explaining everyday effects such as magnetism and the relativistic inertia that underlies kinetic energy and hence the whole of dynamics.

Special Relativity is one of the foundation blocks of physics. It is in no sense a provisional theory and is largely compatible with quantum theory; it not only led to the idea of matter waves but is the origin of quantum 'spin' and underlies the existence of the antiparticles. Special Relativity is a theory of exceptional elegance, Einstein crafted the theory from simple postulates about the constancy of physical laws and of the speed of light and his work has been refined further so that the laws of physics themselves and even the constancy of the speed of light are now understood in terms of the most basic symmetries in space and time.

Further Reading

Feynman Lectures on Physics. Symmetry in Physical Laws. (World Student) Vol 1. Ch 52.

Gross, D.J. The role of symmetry in fundamental physics. PNAS December 10, 1996 vol. 93 no. 25 14256-14259 http://www.pnas.org/content/93/25/14256.full

Historical Development

Special Relativity is not a theory about light, it is a theory about space and time, but it was the strange behaviour of light that first alerted scientists to the possibility that the universe had an unexpected geometry. The short history of Special Relativity given here will start with light but will end with the discovery that the behaviour of light is related to the geometry of the universe.

In the nineteenth century it was widely accepted that light travelled as waves in a substance called the “aether”. Light was thought to travel in this aether in a similar way to how waves in general travel in material substances. Light would travel to our eyes as waves through the aether like sound waves travel to our ears as waves in the air.

The nature of the aether was unknown but a possible link between the aether and electrical and magnetic fields became apparent during the first half of the nineteenth century. Faraday demonstrated that the polarisation of light was affected by magnetic fields and Weber showed that electrical effects could be transmitted across non-conducting materials so there was a strong suggestion that light might be some sort of electromagnetic effect.

James Clerk Maxwell
In 1865 the Scottish physicist, James Clerk Maxwell, drew together the various experiments on electricity and magnetism into an electromagnetic theory of light based on the idea of an aether. One of his key observations was that electrical effects seemed to propagate at nearly light speed. He wrote of the velocity of electrical interactions that:

“This velocity is so nearly that of light that it seems we have strong reason to conclude that light itself (including radiant heat and other radiations, if any) is an electromagnetic disturbance in the form of waves propagated through the electromagnetic field according to electromagnetic laws.”

Maxwell's theory explained radio, heat radiation, light and many other phenomena as electromagnetic waves travelling in an aether. The velocity of these waves depended upon the properties of the aether itself. Someone who was stationary within the aether would measure the speed of light to be constant as a result of the constant properties of the aether. A light ray going from one 'stationary' observer to another in the aether would take the same amount of time to make the journey no matter who observed it in the same way as the speed of sound is the same in all directions in a small, stationary volume of air. However, although stationary observers would all observe the same velocity for light, moving observers would measure the velocity of light as the sum of their velocity relative to the aether and the velocity of light in the aether.

If space were indeed full of an aether then the motion of objects through this aether should be detectable by measuring the velocity of light rays. In practice it is difficult to measure the velocity of light with sufficient precision. Maxwell proposed that an instrument called an "interferometer" would provide the required accuracy. An interferometer splits a ray of light into two identical beams arranged at right angles to each other. It then brings these beams together so that the light waves reinforce each other if they arrive at the same time and are reduced in amplitude if one beam is slightly delayed, a phenomenon known as "interference". If one beam is reflected back and forth at right angles to the direction of travel of the interferometer and the other reflected along the direction of travel then the velocity of the interferometer should affect the velocity of each beam differently and create a delay and hence observable interference. Maxwell proposed that if an interferometer were moved through the aether the addition of the velocity of the equipment to the velocity of the light in the aether would cause a distinctive interference pattern. Maxwell's idea was submitted as a letter to Nature in 1879 (posthumously).

Albert Michelson read Maxwell's paper and in 1887 Michelson and Morley performed an 'interferometer' experiment to test whether the observed velocity of light is indeed the sum of the speed of light in the aether and the velocity of the observer. Michelson and Morley discovered that the measured velocity of light did not change with the velocity of the observer. To everyone's surprise the experiment showed that the speed of light was independent of the speed of the destination or source of the light in the proposed aether.
Albert Abraham Michelson
How might this "null result" of the interferometer experiment be explained? How could the speed of light in a vacuum be constant for all observers no matter how they are moving themselves? It was possible that Maxwell's theory was correct but the theory about the way that velocities add together (known as Galilean Relativity) was wrong. Alternatively it was possible that Maxwell's theory was wrong and Galilean Relativity was correct. However, the most popular interpretation at the time was that both Maxwell and Galileo were correct and something was happening to the measuring equipment. Perhaps the instrument was being squeezed in some way by the aether or some other material effect was occurring.

Various physicists attempted to explain the Michelson and Morley experiment. George Fitzgerald (1889) and Hendrik Lorentz (1895) suggested that objects tend to contract along the direction of motion relative to the aether and Joseph Larmor (1897) and Hendrik Lorentz (1899) proposed that moving objects are contracted and that moving clocks run slow as a result of motion in the aether. Fitzgerald, Larmor and Lorentz's contributions to the analysis of light propagation are of huge importance because they produced the "Lorentz Transformation Equations". The Lorentz Transformation Equations were developed to describe how physical effects would need to change the length of the interferometer arms and the rate of clocks to account for the lack of change in interference fringes in the interferometer experiment. It took the rebellious streak in Einstein to realise that the equations could also be applied to changes in space and time itself.

Albert Einstein
By the late nineteenth century it was becoming clear that aether theories of light propagation were problematic. Any aether would have properties such as being massless, incompressible, entirely transparent, continuous, devoid of viscosity and nearly infinitely rigid. In 1905 Albert Einstein realised that Maxwell's equations did not require an aether. On the basis of Maxwell's equations he showed that the Lorentz Transformation was sufficient to explain that length contraction occurs and clocks appear to go slow provided that the old Galilean concept of how velocities add together was abandoned. Einstein's remarkable achievement was to be the first physicist to propose that Galilean relativity might only be an approximation to reality. He came to this conclusion by being guided by the Lorentz Transformation Equations themselves and noticing that these equations only contain relationships between space and time without any references to the properties of an aether.

In 1905 Einstein was on the edge of the idea that made relativity special. It remained for the mathematician Hermann Minkowski to provide the full explanation of why an aether was entirely superfluous. He announced the modern form of Special Relativity theory in an address delivered at the 80th Assembly of German Natural Scientists and Physicians on September 21, 1908. The consequences of the new theory were radical, as Minkowski put it:

"The views of space and time which I wish to lay before you have sprung from the soil of experimental physics, and therein lies their strength. They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality."

What Minkowski had spotted was that Einstein's theory was actually related to the theories in differential geometry that had been developed by mathematicians during the nineteenth century. Initially Minkowski's discovery was unpopular with many physicists including Poincaré, Lorentz and even Einstein. Physicists had become used to a thoroughly materialist approach to nature in which lumps of matter were thought to bounce off each other and the only events of any importance were those occurring at some universal, instantaneous, present moment. The possibility that the geometry of the world might include time as well as space was an alien idea. The possibility that phenomena such as length contraction could be due to the physical effects of spacetime geometry rather than the increase or decrease of forces between objects was as unexpected for physicists in 1908 as it is for the modern high school student. Einstein rapidly assimilated these new ideas and went on to develop General Relativity as a theory based on differential geometry but many of the earlier generation of physicists were unable to accept the new way of looking at the world.

The adoption of differential geometry as one of the foundations of relativity theory has been traced by Walter (1999). Walter's study shows that by the 1920's modern differential geometry had become the principal theoretical approach to relativity, replacing Einstein's original electrodynamic approach.

Henri Poincaré
It has become popular to credit Henri Poincaré with the discovery of the theory of Special Relativity, but Poincaré got many of the right answers for some of the wrong reasons. He even came up with a version of E=mc^2. In 1904 Poincaré had gone as far as to enunciate the "principle of relativity" in which "The laws of physical phenomena must be the same, whether for a fixed observer, as also for one dragged in a motion of uniform translation, so that we do not and cannot have any means to discern whether or not we are dragged in a such motion." Furthermore, in 1905 Poincaré coined the term "Lorentz Transformation" for the equation that explained the null result of the Michelson Morley experiment. Although Poincaré derived equations to explain the null result of the Michelson Morley experiment, his assumptions were still based upon an aether. It remained for Einstein to show that an aether was unnecessary.

It is also popular to claim that Special Relativity and aether theories such as those due to Poincaré and Lorentz are equivalent and only separated by Occam's Razor. This is not strictly true. Occam's Razor is used to separate a complex theory from a simple theory, the two theories being different. In the case of Poincare's and Lorentz's aether theories both contain the Lorentz Transformation which is already sufficient to explain the Michelson and Morley Experiment, length contraction, time dilation etc. without an aether. The aether theorists simply failed to notice that this is a possibility because they rejected spacetime as a concept for reasons of philosophy or prejudice. In Poincaré's case he rejected spacetime because of philosophical objections to the idea of spatial or temporal extension (see note 1).

It is curious that Einstein actually returned to thinking based on an aether for philosophical reasons similar to those that haunted Poincaré (See Granek 2001). The geometrical form of Special Relativity as formalised by Minkowski does not forbid action at a distance and this was considered to be dubious philosophically. This led Einstein, in 1920, to reintroduce some of Poincaré's ideas into the theory of General Relativity. Whether an aether of the type proposed by Einstein is truly required for physical theory is still an active question in physics. However, such an aether leaves the spacetime of Special Relativity almost intact and is a complex merger of the material and geometrical that would be unrecognised by 19th century theorists.

  • FitzGerald, G. F. (1889), The Ether and the Earth’s Atmosphere, Science 13, 390.
  • Larmor, J. (1897), On a Dynamical Theory of the Electric and Luminiferous Medium, Part 3, Relations with material media, Phil. Trans. Roy. Soc. 190: 205–300, doi:10.1098/rsta.1897.0020
  • Lorentz, H. A. L. (1895), Versuch einer Theorie der electrischen und optischen Erscheinungen in bewegten Körpern, Brill, Leyden.
  • Maxwell, J.C. (1865) A Dynamical Theory of the Electromagnetic Field, Philosophical Transactions, vol 155, p459 (1865)

Note 1: The modern philosophical objection to the spacetime of Special Relativity is that it acts on bodies without being acted upon, however, in General Relativity spacetime is acted upon by its content.

Intended Audience

This book presents special relativity (SR) from first principles and logically arrives at the conclusions. There will be simple diagrams and some thought experiments. Although the final form of the theory came to use Minkowski spaces and metric tensors, it is possible to discuss SR using nothing more than high school algebra. That is the method used here in the first half of the book. That being said, the subject is open to a wide range of readers. All that is really required is a genuine interest.

For a more mathematically sophisticated treatment of the subject, please refer to the Advanced Text in Wikibooks.

The book is designed to confront the way students fail to understand the relativity of simultaneity. This problem is well documented and described in depth in: Student understanding of time in special relativity: simultaneity and reference frames by Scherr et al.

What's so special?

The special theory was suggested in 1905 in Einstein's article "On the Electrodynamics of Moving Bodies", and is so called because it applies in the absence of non-uniform gravitational fields.

In search of a more complete theory, Einstein developed the general theory of relativity published in 1915. General relativity (GR), a more mathematically demanding subject, describes physics in the presence of gravitational fields.

The conceptual difference between the two is the model of spacetime used. Special relativity makes use of a Euclidean-like (flat) spacetime. GR lives in a spacetime that is generally not flat but curved, and it is this curvature which represents gravity. The domain of applicability for SR is not so limited, however. Spacetime can often be approximated as flat, and there are techniques to deal with accelerating special relativistic objects.

Common Pitfalls in Relativity

Here is a collection of common misunderstandings and misconceptions about SR. If you are unfamiliar with SR then you can safely skip this section and come back to it later. If you are an instructor, perhaps this can help you divert some problems before they start by bringing up these points during your presentation when appropriate.

Beginners often believe that special relativity is only about objects that are moving at high velocities. Strictly speaking, this is a mistake. Special relativity applies at all velocities but at low velocity the predictions of special relativity are almost identical to those of the Newtonian empirical formulae. As an object increases its velocity the predictions of relativity gradually diverge from Newtonian Mechanics.

There is sometimes a problem differentiating between the two different concepts "relativity of simultaneity" and "signal latency/delay." This book text differs from some other presentations because it deals with the geometry of spacetime directly and avoids the treatment of delays due to light propagation. This approach is taken because students would not be taught Euclid's geometry using continuous references to the equipment and methods used to measure lengths and angles. Continuous reference to the measurement process obscures the underlying geometrical theory whether the geometry is three dimensional or four dimensional.

If students do not grasp that, from the outset, modern Special Relativity proposes that the universe is four dimensional, then, like Poincaré, they will consider that the constancy of the speed of light is just an event awaiting a mechanical explanation and waste their time by pondering the sorts of mechanical or electrical effects that could adjust the velocity of light to be compatible with observation.

A Word about Wiki

This is a wikibook. That means it has great potential for improvement and enhancement. The improvement can be in the form of refined language, clear mathematics, simple diagrams, and better practice problems and answers. The enhancement can be in the form of artwork, historical context of SR, anything. Feel free to improve and enhance Special Relativity and other wikibooks as you see necessary.

The principle of relativity

The principle of relativity

Galileo Galilei
Principles of relativity address the relationship between observations made at different places. This problem has been a difficult theoretical challenge since the earliest times and involves physical questions such as how the velocities of objects can be combined and how influences are transmitted between moving objects.

One of the most fruitful approaches to this problem was the investigation of how observations are affected by the velocity of the observer. This problem had been tackled by classical philosophers but it was the work of Galileo that produced a real breakthrough. Galileo (1632), in his "Dialogue Concerning the Two Chief World Systems", considered observations of motion made by people inside a ship who could not see the outside:

"have the ship proceed with any speed you like, so long as the motion is uniform and not fluctuating this way and that. You will discover not the least change in all the effects named, nor could you tell from any of them whether the ship was moving or standing still. "

According to Galileo, if the ship moved smoothly then someone inside it would be unable to determine whether they were moving. If people in Galileo's moving ship were eating dinner they would see their peas fall from their fork straight down to their plate in the same way as they might if they were at home on dry land. The peas move along with the people and do not appear to the diners to fall diagonally. This means that the peas continue in a state of uniform motion unless someone intercepts them or otherwise acts on them. It also means that simple experiments that the people on the ship might perform would give the same results on the ship or at home. This concept led to “Galilean Relativity” in which it was held that things continue in a state of motion unless acted upon and that the laws of physics are independent of the velocity of the laboratory.

This simple idea challenged the previous ideas of Aristotle. Aristotle had argued in his Physics that objects must either be moved or be at rest. According to Aristotle, on the basis of complex and interesting arguments about the possibility of a 'void', objects cannot remain in a state of motion without something moving them. As a result Aristotle proposed that objects would stop entirely in empty space. If Aristotle were right the peas that you dropped whilst dining aboard a moving ship would fall in your lap rather than falling straight down on to your plate. Aristotle's idea had been believed by everyone so Galileo's new proposal was extraordinary and, because it was nearly right, became the foundation of physics.

Galilean Relativity contains two important principles: firstly it is impossible to determine who is actually at rest and secondly things continue in uniform motion unless acted upon. The second principle is known as Galileo’s Law of Inertia or Newton's First Law of Motion.

Reference:

Special relativity

Until the nineteenth century it appeared that Galilean relativity treated all observers as equivalent no matter how fast they were moving. If you throw a ball straight up in the air at the North Pole it falls straight back down again and this also happens at the equator even though the equator is moving at almost a thousand miles an hour faster than the pole. Galilean velocities are additive so that the ball continues moving at a thousand miles an hour when it is thrown upwards at the equator and continues with this motion until it is acted on by an external agency.

This simple scheme became questioned in 1865 when James Clerk Maxwell discovered the equations that describe the propagation of electromagnetic waves such as light. His equations showed that the speed of light depended upon constants that were thought to be simple properties of a physical medium or “aether” that pervaded all space. If this were the case then, according to Galilean relativity, it should be possible to add your own velocity to the velocity of incoming light so that if you were travelling at a half the speed of light then any light approaching you would be observed to be travelling at 1.5 times the speed of light in the aether. Similarly, any light approaching you from behind would strike you at 0.5 times the speed of light in the aether. Light itself would always go at the same speed in the aether so if you shone a light from a torch whilst travelling at high speed the light would plop into the aether and slow right down to its normal speed. This would spoil Galileo's Relativity because all you would need to do to discover whether you were in a moving ship or on dry land would be to measure the speed of light in different directions. The light would go slower in your direction of travel through the aether and faster in the opposite direction.

If the Maxwell equations are valid and the simple classical addition of velocities applies then there should be a preferred reference frame, the frame of the stationary aether. The preferred reference frame would be considered the true zero point to which all velocity measurements could be referred.

Special relativity restored a principle of relativity in physics by maintaining that Maxwell's equations are correct but that classical velocity addition is wrong: there is no preferred reference frame. Special relativity brought back the interpretation that in all inertial reference frames the same physics is going on and there is no phenomenon that would allow an observer to pinpoint a zero point of velocity. Einstein preserved the principle of relativity by proposing that the laws of physics are the same regardless of the velocity of the observer. According to Einstein, whether you are in the hold of Galileo's ship or in the cargo bay of a space ship going at a large fraction of the speed of light the laws of physics will be the same.

Einstein's idea shared the same philosophy as Galileo's idea, both men believed that the laws of physics would be unaffected by motion at a constant velocity. In the years between Galileo and Einstein it was believed that it was the way velocities simply add to each other that preserved the laws of physics but Einstein adapted this simple concept to allow for Maxwell's equations.

Frames of reference, events and transformations

Before proceeding further with the analysis of relative motion the concepts of reference frames, events and transformations need to be defined more closely.

Physical observers are considered to be surrounded by a reference frame which is a set of coordinate axes in terms of which position or movement may be specified or with reference to which physical laws may be mathematically stated.

An event is something that happens independently of the reference frame that might be used to describe it. Turning on a light or the collision of two objects would constitute an event.

Suppose there is a small event, such as a light being turned on, that is at coordinates x,y,z,t in one reference frame. What coordinates would another observer, in another reference frame moving relative to the first at velocity v along the x axis assign to the event? This problem is illustrated below:

Relstandard.gif

What we are seeking is the relationship between the second observer's coordinates for the event x^', y^', z^', t^' and the first observer's coordinates for the event x,y,z,t. The coordinates refer to the positions and timings of the event that are measured by each observer and, for simplicity, the observers are arranged so that they are coincident at t=0. According to Galilean Relativity:

x^' = x - vt

y^' = y

z^' = z

t^' = t

This set of equations is known as a Galilean coordinate transformation or Galilean transformation.

These equations show how the position of an event in one reference frame is related to the position of an event in another reference frame. But what happens if the event is something that is moving? How do velocities transform from one frame to another?

The calculation of velocities depends on Newton's formula: v = dx/dt. The use of Newtonian physics to calculate velocities and other physical variables has led to Galilean Relativity being called Newtonian Relativity in the case where conclusions are drawn beyond simple changes in coordinates. The velocity transformations for the velocities in the three directions in space are, according to Galilean relativity:

\mathbf{u^'_x = u_x - v}

\mathbf{u^'_y = u_y}

\mathbf{u^'_z = u_z}

Where \mathbf{u^'_x, u^'_y, u^'_z} are the velocities of a moving object in the three directions in space recorded by the second observer and \mathbf{u_x, u_y, u_z} are the velocities recorded by the first observer and \mathbf{v} is the relative velocity of the observers. The minus sign in front of the \mathbf{v} means the moving object is moving away from both observers.

This result is known as the classical velocity addition theorem and summarises the transformation of velocities between two Galilean frames of reference. It means that the velocities of projectiles must be determined relative to the velocity of the source and destination of the projectile. For example, if a sailor throws a stone at 10 km/hr from Galileo's ship which is moving towards shore at 5 km/hr then the stone will be moving at 15 km/hr when it hits the shore.

In Newtonian Relativity the geometry of space is assumed to be Euclidean and the measurement of time is assumed to be the same for all observers.

The derivation of the classical velocity addition theorem is as follows If the Galilean transformations are differentiated with respect to time:
x^' = x - vt
So:
dx^'/dt = dx/dt - v
But in Galilean relativity t^' = t and so dx^'/dt^' = dx^'/dt therefore:
dx^'/dt^' = dx/dt - v
dy^'/dt^' = dy/dt
dz^'/dt^' = dz/dt
If we write u^'_x = dx^'/dt^' etc. then:
u^'_x = u_x - v
u^'_y = u_y
u^'_z = u_z

The postulates of special relativity

In the previous section transformations from one frame of reference to another were described using the simple addition of velocities that were introduced in Galileo's time and these transformations are consistent with Galileo's main postulate which was that the laws of physics would be the same for all inertial observers so that no-one could tell who was at rest. Aether theories had threatened Galileo's postulate because the aether would be at rest and observers could determine that they were at rest simply by measuring the speed of light in the direction of motion. Einstein preserved Galileo's fundamental postulate that the laws of physics are the same in all inertial frames of reference but to do so he had to introduce a new postulate that the speed of light would be the same for all observers. These postulates are listed below:

1. First postulate: the principle of relativity

Formally: the laws of physics are the same in all inertial frames of reference.

Informally: every physical theory should look the same mathematically to every inertial observer. Experiments in a physics laboratory in a spaceship or planet orbiting the sun and galaxy will give the same results no matter how fast the laboratory is moving.

2. Second postulate: the invariance of the speed of light

Formally: the speed of light in free space is a constant in all inertial frames of reference.

Informally: the speed of light in a vacuum, commonly denoted c, is the same for all inertial observers, is the same in all directions, and does not depend on the velocity of the object emitting the light.

Using these postulates Einstein was able to calculate how the observation of events depends upon the relative velocity of observers. He was then able to construct a theory of physics that led to predictions such as the equivalence of mass and energy and early quantum theory.

Einstein's formulation of the axioms of relativity is known as the electrodynamic approach to relativity. It has been superseded in most advanced textbooks by the “space-time approach” in which the laws of physics themselves are due to symmetries in space-time and the constancy of the speed of light is a natural consequence of the existence of space-time. However, Einstein's approach is equally valid and represents a tour de force of deductive reasoning which provided the insights required for the modern treatment of the subject.

Einstein's Relativity - the electrodynamic approach

Einstein asked how the lengths and times that are measured by the observers might need to vary if both observers found that the speed of light was constant. He looked at the formulae for the velocity of light that would be used by the two observers, (x = ct) and (x^' = ct^'), and asked what constants would need to be introduced to keep the measurement of the speed of light at the same value even though the relative motion of the observers meant that the x^' axis was continually advancing. His working is shown in detail in the appendix. The result of this calculation is the Lorentz Transformation Equations:

x' = \gamma (x - vt)\,
y' = y \,
z' = z \,
t' = \gamma (t - \frac{v x}{c^{2}})\,

Where the constant  \gamma = \frac {1}{\sqrt {1 -\frac{v^2}{c^2}}}. These equations apply to any two observers in relative motion but note that the sign within the brackets changes according to the direction of the velocity - see the appendix.

The Lorentz Transformation is the equivalent of the Galilean Transformation with the added assumption that everyone measures the same velocity for the speed of light no matter how fast they are travelling. The speed of light is a ratio of distance to time (ie: metres per second) so for everyone to measure the same value for the speed of light the length of measuring rods, the length of space between light sources and receivers and the number of ticks of clocks must dynamically differ between the observers. So long as lengths and time intervals vary with the relative velocity of two observers (v) as described by the Lorentz Transformation the observers can both calculate the speed of light as the ratio of the distance travelled by a light ray divided by the time taken to travel this distance and get the same value.

Einstein's approach is "electrodynamic" because it assumes, on the basis of Maxwell's equations, that light travels at a constant velocity. As mentioned above, the idea of a universal constant velocity is strange because velocity is a ratio of distance to time. Do the Lorentz Transformation Equations hide a deeper truth about space and time? Einstein himself (Einstein 1920) gives one of the clearest descriptions of how the Lorentz Transformation equations are actually describing properties of space and time itself. His general reasoning is given below.

If the equations are combined they satisfy the relation:

(1) x^{'2} - c^2t^{'2} = x^2 - c^2t^2 \,

Einstein (1920) describes how this can be extended to describe movement in any direction in space:

(2) x^{'2} + y^{'2} + z^{'2} - c^2t^{'2} = x^2 + y^2 + z^2 - c^2t^2 \,

Equation (2) is a geometrical postulate about the relationship between lengths and times in the universe. It suggests that there is a constant s such that:

s^2 = x^{'2} + y^{'2} + z^{'2} - c^2t^{'2} \,
s^2 = x^2 + y^2 + z^2 - c^2t^2 \,

This equation was recognised by Minkowski as an extension of Pythagoras' Theorem (ie: s^2 = x^2 + y^2), such extensions being well known in early twentieth century mathematics. What the Lorentz Transformation is telling us is that the universe is a four dimensional spacetime and as a result there is no need for any "aether". (See Einstein 1920, appendices, for Einstein's discussion of how the Lorentz Transformation suggests a four dimensional universe but be cautioned that "imaginary time" has now been replaced by the use of "metric tensors").

Einstein's analysis shows that the x-axis and time axis of two observers in relative motion do not overlie each other, The equation relating one observer's time to the other observer's time shows that this relationship changes with distance along the x-axis ie:

t' = \gamma (t - \frac{v x}{c^{2}})\,

This means that the whole idea of "frames of reference" needs to be re-visited to allow for the way that axes no longer overlie each other.


Einstein, A. (1920). Relativity. The Special and General Theory. Methuen & Co Ltd 1920. Written December, 1916. Robert W. Lawson (Authorised translation). http://www.bartleby.com/173/

Inertial reference frames

The Lorentz Transformation for time involves a component (vx/c^2) which results in time measurements being different along the x-axis of relatively moving observers. This means that the old idea of a frame of reference that simply involves three space dimensions with a time that is in common between all of the observers no longer applies. To compare measurements between observers the concept of a "reference frame" must be extended to include the observer's clocks.

An inertial reference frame is a conceptual, three-dimensional latticework of measuring rods set at right angles to each other with clocks at every point that are synchronised with each other (see below for a full definition). An object that is part of, or attached to, an inertial frame of reference is defined as an object which does not disturb the synchronisation of the clocks and remains at a constant spatial position within the reference frame. The inertial frame of reference that has a moving, non-rotating body attached to it is known as the inertial rest frame for that body. An inertial reference frame that is a rest frame for a particular body moves with the body when observed by observers in relative motion.

Inertial.svg

This type of reference frame became known as an "inertial" frame of reference because, as will be seen later in this book, each system of objects that are co-moving according to Newton's law of inertia (without rotation, gravitational fields or forces acting) have a common rest frame, with clocks that differ in synchronisation and rods that differ in length, from those in other, relatively moving, rest frames.

There are many other definitions of an "inertial reference frame" but most of these, such as "an inertial reference frame is a reference frame in which Newton's First Law is valid" do not provide essential details about how the coordinates are arranged and/or represent deductions from more fundamental definitions.

The following definition by Blandford and Thorne(2004) is a fairly complete summary of what working physicists mean by an inertial frame of reference:

"An inertial reference frame is a (conceptual) three-dimensional latticework of measuring rods and clocks with the following properties: (i ) The latticework moves freely through spacetime (i.e., no forces act on it), and is attached to gyroscopes so it does not rotate with respect to distant, celestial objects. (ii ) The measuring rods form an orthogonal lattice and the length intervals marked on them are uniform when compared to, e.g., the wavelength of light emitted by some standard type of atom or molecule; and therefore the rods form an orthonormal, Cartesian coordinate system with the coordinate x measured along one axis, y along another, and z along the third. (iii ) The clocks are densely packed throughout the latticework so that, ideally, there is a separate clock at every lattice point. (iv ) The clocks tick uniformly when compared, e.g., to the period of the light emitted by some standard type of atom or molecule; i.e., they are ideal clocks. (v) The clocks are synchronized by the Einstein synchronization process: If a pulse of light, emitted by one of the clocks, bounces off a mirror attached to another and then returns, the time of bounce t_b as measured by the clock that does the bouncing is the average of the times of emission and reception as measured by the emitting and receiving clock: t_b=1/2(t_e + t_r)

¹For a deeper discussion of the nature of ideal clocks and ideal measuring rods see, e.g., pp. 23-29 and 395-399 of Misner, Thorne, and Wheeler (1973)."

Special Relativity demonstrates that the inertial rest frames of objects that are moving relative to each other do not overlay one another. Each observer sees the other, moving observer's, inertial frame of reference as distorted. This discovery is the essence of Special Relativity and means that the transformation of coordinates and other measurements between moving observers is complicated. It will be discussed in depth below.

Inertialoverlay.GIF

Blandford, R.D. and Thorne, K.S.(2004). Applications of Classical Physics. California Institute of Technology. See: http://www.pma.caltech.edu/Courses/ph136/yr2004/

Spacetime

The modern approach to relativity

Although the special theory of relativity was first proposed by Einstein in 1905, the modern approach to the theory depends upon the concept of a four-dimensional universe, that was first proposed by Hermann Minkowski in 1908.

Minkowski's contribution appears complicated but is simply an extension of Pythagoras' Theorem:

In 2 dimensions: h^2 = x^2 + y^2

In 3 dimensions: h^2 = x^2 + y^2 + z^2

in 4 dimensions: s^2 = x^2 + y^2 + z^2 + kt^2

(where k = -c^2)

The modern approach uses the concept of invariance to explore the types of coordinate systems that are required to provide a full physical description of the location and extent of things. The modern theory of special relativity begins with the concept of "length". In everyday experience, it seems that the length of objects remains the same no matter how they are rotated or moved from place to place. We think that the simple length of a thing is "invariant". However, as is shown in the illustrations below, what we are actually suggesting is that length seems to be invariant in a three-dimensional coordinate system.

Coord1b.gif

The length of a thing in a two-dimensional coordinate system is given by Pythagoras's theorem:

 x^2 + y^2 = h^2

This two-dimensional length is not invariant if the thing is tilted out of the two-dimensional plane. In everyday life, a three-dimensional coordinate system seems to describe the length fully. The length is given by the three-dimensional version of Pythagoras's theorem:

 h^2 = x^2 + y^2 + z^2

The derivation of this formula is shown in the illustration below.

Coord2.gif

It seems that, provided all the directions in which a thing can be tilted or arranged are represented within a coordinate system, then the coordinate system can fully represent the length of a thing. However, it is clear that things may also be changed over a period of time. Time is another direction in which things can be arranged. This is shown in the following diagram:

Coord3.gif

The length of a straight line between two events in space and time is called a "space-time interval".

In 1908 Hermann Minkowski pointed out that if things could be rearranged in time, then the universe might be four-dimensional. He boldly suggested that Einstein's recently-discovered theory of Special Relativity was a consequence of this four-dimensional universe. He proposed that the space-time interval might be related to space and time by Pythagoras' theorem in four dimensions:

 s^2 = x^2 + y^2 + z^2 + (ict)^2

Where i is the imaginary unit (sometimes imprecisely called \sqrt{-1}), c is a constant, and t is the time interval spanned by the space-time interval, s. The symbols x, y and z represent displacements in space along the corresponding axes. In this equation, the 'second' becomes just another unit of length. In the same way as centimetres and inches are both units of length related by centimetres = 'conversion constant' times inches, metres and seconds are related by metres = 'conversion constant' times seconds. The conversion constant, c has a value of about 300,000,000 meters per second. Now i^2 is equal to minus one, so the space-time interval is given by:

 s^2 = x^2 + y^2 + z^2 - (ct)^2

Minkowski's use of the imaginary unit has been superseded by the use of advanced geometry that uses a tool known as the "metric tensor". The metric tensor permits the existence of "real" time and the negative sign in the expression for the square of the space-time interval originates in the way that distance changes with time when the curvature of spacetime is analysed (see advanced text). We now use real time but Minkowski's original equation for the square of the interval survives so that the space-time interval is still given by:

 s^2 = x^2 + y^2 + z^2 - (ct)^2

Space-time intervals are difficult to imagine; they extend between one place and time and another place and time, so the velocity of the thing that travels along the interval is already determined for a given observer.

If the universe is four-dimensional, then the space-time interval will be invariant, rather than spatial length. Whoever measures a particular space-time interval will get the same value, no matter how fast they are travelling. In physical terminology the invariance of the spacetime interval is a type of Lorentz Invariance. The invariance of the spacetime interval has some dramatic consequences.

The first consequence is the prediction that if a thing is travelling at a velocity of c metres per second, then all observers, no matter how fast they are travelling, will measure the same velocity for the thing. The velocity c will be a universal constant. This is explained below.

When an object is travelling at c, the space time interval is zero, this is shown below:

The distance travelled by an object moving at velocity v in the x direction for t seconds is:
 x = vt
If there is no motion in the y or z directions the space-time interval is  s^2 = x^2 + 0 + 0 - (ct)^2
So:  s^2 = (vt)^2 - (ct)^2
But when the velocity v equals c:
 s^2 = (ct)^2 - (ct)^2
And hence the space time interval  s^2 = (ct)^2 - (ct)^2 = 0

A space-time interval of zero only occurs when the velocity is c (if x>0). All observers observe the same space-time interval so when observers observe something with a space-time interval of zero, they all observe it to have a velocity of c, no matter how fast they are moving themselves.

The universal constant, c, is known for historical reasons as the "speed of light in a vacuum". In the first decade or two after the formulation of Minkowski's approach many physicists, although supporting Special Relativity, expected that light might not travel at exactly c, but might travel at very nearly c. There are now few physicists who believe that light in a vacuum does not propagate at c.

The second consequence of the invariance of the space-time interval is that clocks will appear to go slower on objects that are moving relative to you. Suppose there are two people, Bill and John, on separate planets that are moving away from each other. John draws a graph of Bill's motion through space and time. This is shown in the illustration below:

Coord5.gif

Being on planets, both Bill and John think they are stationary, and just moving through time. John spots that Bill is moving through what John calls space, as well as time, when Bill thinks he is moving through time alone. Bill would also draw the same conclusion about John's motion. To John, it is as if Bill's time axis is leaning over in the direction of travel and to Bill, it is as if John's time axis leans over.

John calculates the length of Bill's space-time interval as:
s^2 = (vt)^2 - (ct)^2
whereas Bill doesn't think he has travelled in space, so writes:
s^2 = (0)^2 - (cT)^2

The space-time interval, s^2, is invariant. It has the same value for all observers, no matter who measures it or how they are moving in a straight line. Bill's  s^2 equals John's  s^2 so:

(0)^2 - (cT)^2 = (vt)^2 - (ct)^2
and
-(cT)^2 = (vt)^2 - (ct)^2
hence
t = T / \sqrt{1 - v^2/c^2}.

So, if John sees Bill measure a time interval of 1 second (T = 1) between two ticks of a clock that is at rest in Bill's frame, John will find that his own clock measures between these same ticks an interval t, called coordinate time, which is greater than one second. It is said that clocks in motion slow down, relative to those on observers at rest. This is known as "relativistic time dilation of a moving clock". The time that is measured in the rest frame of the clock (in Bill's frame) is called the proper time of the clock.

John will also observe measuring rods at rest on Bill's planet to be shorter than his own measuring rods, in the direction of motion. This is a prediction known as "relativistic length contraction of a moving rod". If the length of a rod at rest on Bill's planet is X, then we call this quantity the proper length of the rod. The length x of that same rod as measured from John's planet, is called coordinate length, and given by

x = X \sqrt{1 - v^2/c^2}.

This equation can be derived directly and validly from the time dilation result with the assumption that the speed of light is constant.

The last consequence is that clocks will appear to be out of phase with each other along the length of a moving object. This means that if one observer sets up a line of clocks that are all synchronised so they all read the same time, then another observer who is moving along the line at high speed will see the clocks all reading different times. In other words observers who are moving relative to each other see different events as simultaneous. This effect is known as Relativistic Phase or the Relativity of Simultaneity. Relativistic phase is often overlooked by students of Special Relativity, but if it is understood then phenomena such as the twin paradox are easier to understand.

The way that clocks go out of phase along the line of travel can be calculated from the concepts of the invariance of the space-time interval and length contraction.

Relphasesimple.gif

In the diagram above John is conventionally stationary. Distances between two points according to Bill are simple lengths in space (x) all at t=0 whereas John sees Bill's measurement of distance as a combination of a distance (X) and a time interval (T):

x^2 = X^2 - (cT)^2

Notice that the quantities represented by capital letters are proper lengths and times and in this example refer to John's measurements.

Bill's distance, x, is the length that he would obtain for things that John believes to be X metres in length. For Bill it is John who has rods that contract in the direction of motion so Bill's determination "x" of John's distance "X" is given from:

x = X \sqrt{1 - v^2/c^2}.

This relationship between proper and coordinate lengths was seen above to relate Bill's proper lengths to John's measurements. It also applies to how Bill observes John's proper lengths.

x = X \sqrt{1 - v^2/c^2}
Thus x^2 = X^2 - (v^2/c^2)X^2
So: (cT)^2 = (v^2/c^2)X^2
And cT = (v/c)X
So: T = (v/c^2)X

Clocks that are synchronised for one observer go out of phase along the line of travel for another observer moving at v metres per second by :(v/c^2) seconds for every metre. This is one of the most important results of Special Relativity and should be thoroughly understood by students.

The net effect of the four-dimensional universe is that observers who are in motion relative to you seem to have time coordinates that lean over in the direction of motion and consider things to be simultaneous that are not simultaneous for you. Spatial lengths in the direction of travel are shortened, because they tip upwards and downwards, relative to the time axis in the direction of travel, akin to a rotation out of three-dimensional space.

Coord6.gif

What velocity would cause events A and B to be simultaneous?
Example
An observer records an event A next to herself then an event B a millisecond later and 600 kilometres away. How fast would another observer need to travel in the direction of event B to record it as simultaneous with event A?
The phase difference between moving clocks is given by: T = (v/c^2)X
The velocity needed to create a phase difference of 10-3 secs is:
v =  \frac{10^{-3} \times 3 \times 10^8 \times c}{6 \times 10^{5}}
This means that another observer who is travelling at 0.5c relative to the first observer in the direction of the line joining events A and B will consider that these events are simultaneous.


Interpreting space-time diagrams

Great care is needed when interpreting space-time diagrams. Diagrams present data in two dimensions, and cannot show faithfully how, for instance, a zero length space-time interval appears.

Coord8.gif

When diagrams are used to show both space and time it is important to be alert to space and time being related by Minkowski's equation and not by simple Euclidean geometry. The diagrams are only aids to understanding the approximate relation between space and time and it must not be assumed, for instance, that simple trigonometric relationships can be used to relate lines that represent spatial displacements and lines that represent temporal displacements.

It is sometimes mistakenly held that the time dilation and length contraction results only apply for observers at x=0 and t=0. This is untrue. An inertial frame of reference is defined so that length and time comparisons can be made anywhere within a given reference frame.

Time dilation applies to time measurements taken between corresponding planes of simultaneity

Time differences in one inertial reference frame can be compared with time differences anywhere in another inertial reference frame provided it is remembered that these differences apply to corresponding pairs of lines or pairs of planes of simultaneous events.

Spacetime

Spacetime diagram showing an event, a world line, and a line of simultaneity.

In order to gain an understanding of both Galilean and Special Relativity it is important to begin thinking of space and time as being different dimensions of a four-dimensional vector space called spacetime. Actually, since we can't visualize four dimensions very well, it is easiest to start with only one space dimension and the time dimension. The figure shows a graph with time plotted on the vertical axis and the one space dimension plotted on the horizontal axis. An event is something that occurs at a particular time and a particular point in space. ("Julius X. wrecks his car in Lemitar, NM on 21 June at 6:17 PM.") A world line is a plot of the position of some object as a function of time (more properly, the time of the object as a function of position) on a spacetime diagram. Thus, a world line is really a line in spacetime, while an event is a point in spacetime. A horizontal line parallel to the position axis (x-axis) is a line of simultaneity; in Galilean Relativity all events on this line occur simultaneously for all observers. It will be seen that the line of simultaneity differs between Galilean and Special Relativity; in Special Relativity the line of simultaneity depends on the state of motion of the observer.

In a spacetime diagram the slope of a world line has a special meaning. Notice that a vertical world line means that the object it represents does not move -- the velocity is zero. If the object moves to the right, then the world line tilts to the right, and the faster it moves, the more the world line tilts. Quantitatively, we say that

velocity = \frac{1}{slope~of~world~line} . (5.1)

Notice that this works for negative slopes and velocities as well as positive ones. If the object changes its velocity with time, then the world line is curved, and the instantaneous velocity at any time is the inverse of the slope of the tangent to the world line at that time.

The hardest thing to realize about spacetime diagrams is that they represent the past, present, and future all in one diagram. Thus, spacetime diagrams don't change with time -- the evolution of physical systems is represented by looking at successive horizontal slices in the diagram at successive times. Spacetime diagrams represent the evolution of events, but they don't evolve themselves.

The lightcone

Things that move at the speed of light in our four dimensional universe have surprising properties. If something travels at the speed of light along the x-axis and covers x meters from the origin in t seconds the space-time interval of its path is zero.

s^2 = x^2 - (ct)^2

but  x = ct so:

s^2 = (ct)^2 - (ct)^2 = 0

Extending this result to the general case, if something travels at the speed of light in any direction into or out from the origin it has a space-time interval of 0:

0 = x^2 + y^2 + z^2 - (ct)^2

This equation is known as the Minkowski Light Cone Equation. If light were travelling towards the origin then the Light Cone Equation would describe the position and time of emission of all those photons that could be at the origin at a particular instant. If light were travelling away from the origin the equation would describe the position of the photons emitted at a particular instant at any future time 't'.

Rellightcone.gif

At the superficial level the light cone is easy to interpret. Its backward surface represents the path of light rays that strike a point observer at an instant and its forward surface represents the possible paths of rays emitted from the point observer. Things that travel along the surface of the light cone are said to be light- like and the path taken by such things is known as a null geodesic.

Events that lie outside the cones are said to be space-like or, better still space separated because their space time interval from the observer has the same sign as space (positive according to the convention used here). Events that lie within the cones are said to be time-like or time separated because their space-time interval has the same sign as time.

However, there is more to the light cone than the propagation of light. If the added assumption is made that the speed of light is the maximum possible velocity then events that are space separated cannot affect the observer directly. Events within the backward cone can have affected the observer so the backward cone is known as the "affective past" and the observer can affect events in the forward cone hence the forward cone is known as the "affective future".

The assumption that the speed of light is the maximum velocity for all communications is neither inherent in nor required by four dimensional geometry although the speed of light is indeed the maximum velocity for objects if the principle of causality is to be preserved by physical theories (ie: that causes precede effects).

The Lorentz transformation equations

The discussion so far has involved the comparison of interval measurements (time intervals and space intervals) between two observers. The observers might also want to compare more general sorts of measurement such as the time and position of a single event that is recorded by both of them. The equations that describe how each observer describes the other's recordings in this circumstance are known as the Lorentz Transformation Equations. (Note that the symbols below signify coordinates.)

Rellorentz.gif

The table below shows the Lorentz Transformation Equations.

x^' = \frac{x - vt}{\sqrt{(1 - v^2/c^2)}}

x = \frac{x^' + vt^'}{\sqrt{(1 - v^2/c^2)}}

y^' = y

y = y^'

z^' = z

z = z^'

t^' = \frac{t - (v/c^2)x}{\sqrt{(1 - v^2/c^2)}}

t = \frac{t^' + (v/c^2)x^'}{\sqrt{(1 - v^2/c^2)}}

See mathematical derivation of Lorentz transformation.

Notice how the phase ( (v/c2)x ) is important and how these formulae for absolute time and position of a joint event differ from the formulae for intervals.

A spacetime representation of the Lorentz Transformation

Spacetime representation of Lorentz Transformation for time
Bill and John are moving at a relative velocity, v, and synchronise clocks when they pass each other. Both Bill and John observe an event along Bill's direction of motion. What times will Bill and John assign to the event? It was shown above that the relativistic phase was given by: vx/c^2. This means that Bill will observe an extra amount of time elapsing on John's time axis due to the position of the event. Taking phase into account and using the time dilation equation Bill is going to observe that the amount of time his own clocks measure can be compared with John's clocks using:

T = \frac {t - vx/c^2}{\sqrt{1 - v^2/c^2}}.

This relationship between the times of a common event between reference frames is known as the Lorentz Transformation Equation for time.

Simultaneity, time dilation and length contraction

More about the relativity of simultaneity

Most physical theories assume that it is possible to synchronise clocks. If you set up an array of synchronised clocks over a volume of space and take a snapshot of all of them simultaneously, you will find that the one closest to you will appear to show a later time than the others, due to the time light needs to travel from each of the distant clocks towards you. However, if the correct clock positions are known, by taking the transmission time of light into account, one can easily compensate for the differences and synchronise the clocks properly. The possibility of truly synchronising clocks exists because the speed of light is constant and this constant velocity can be used in the synchronisation process (the use of the predictable delays when light is used for synchronising clocks is known as "Einstein synchronisation").

The Lorentz transformation for time compares the readings of synchronised clocks at any instant. It compares the actual readings on clocks allowing for any time delay due to transmitting information between observers and answers the question "what does the other observer's clock actually read now, at this moment". The answer to this question is shocking. The Lorentz transformation for time shows that the clocks in any frame of reference moving relative to you cease to be synchronised!

Inertialoverlay.GIF

The desynchronisation between relatively moving observers is illustrated below with a simpler diagram:

Rel1.gif

The effect of the relativity of simultaneity is for each observer to consider that a different set of events is simultaneous. The relativistic phase difference between clocks ("relativistic phase") means that observers who are moving relative to each other have different sets of things that are simultaneous, or in their "present moment". It is this discovery that time is no longer absolute that profoundly unsettles many students of relativity.

Coord6.gif

The amount by which the clocks differ between two observers depends upon the distance of the clock from the observer (t = xv/c^2 ). Notice that if both observers are part of inertial frames of reference with clocks that are synchronised at every point in space then the phase difference can be obtained by simply reading the difference between the clocks at the distant point and clocks at the origin. This difference will have the same value for both observers.

Example: flashing lights

Two lights that are stationary with respect to each other are separated by 3000 metres. The lights flash one after another with a time interval of 4 microseconds between the flashes. How fast would you need to travel along the line of the lights to see the lights flashing simultaneously?

This problem is a straightforward example of relativistic phase, events that are simultaneous for a moving observer being successive events in their own frame of reference. The phase difference between clocks with distance is given by:
Two successive flashes

\Delta t = vx/c^2

So v = \Delta tc^2/x

Dividing by c to express v as a fraction of the speed of light gives:

v_c = \Delta tc/x

Therefore:  v_c = \frac{4 \times 10^{-6} \times 3 \times 10^8}{3 \times 10^3}

Thus v_c = 0.4 c

(See the "explosion" example in Scherr et al (2001))

Discuss the relationship between the times that a stationary and a moving observer will assign to the flashes in the example: the Lorentz transformation for the time of an event is t' = \gamma (t - vx/c^2), it is composed of an elapsed time, t and a phase difference, vx/c^2. The flashing lights example is arranged so that t = t' = 0 when the observers in the two inertial frames of reference pass each other at the first flash and so that t = t and t' = 0 is the time of the second flash.

If t' = 0 then 0 = \gamma (t - vx/c^2) so, according to the Lorentz transformation the moving observer finds events to be simultaneous in the stationary inertial reference frame when t = vx/c^2, ie: when the elapsed time in the stationary frame is equal to the phase difference between the frames.

Scherr, R.E., Shaffer, P.S. and Vokos, S. Student understanding of time in special relativity: simultaneity and reference frames. Physics Education Research, American Journal of Physics Supplement, 69, S24-35 (2001)

Example: Relativistic Bug Capture

How fast would a 1000 m long spaceship need to travel to observe a nanosecond difference between stationary clocks aligned with its bow and stern?

From t = xv/c^2 :

10^{-9} = 10^3 v/(9 \times 10^{16})

Therefore v = 9 \times 10^4 ms^{-1} or 90 km per second.

This effect could be used to bring together bugs of different ages that are twins in their own inertial frame of reference:

How the relativistic phase difference between clocks in moving and stationary inertial frames provides a moving observer with simultaneous access to different times in the stationary frame.

Bill travels past Jim in a very long spaceship and simultaneously captures two bugs, one coincident with the bow of the spaceship and one coincident with the stern. Although the bugs are twins in their own frame of reference they are different ages in Bill's reference frame. The captured bugs are free to wander up and down Bill's spaceship and could meet each other.

What length of spaceship travelling at 90 km per second would be needed to capture the bugs in the picture?

The Andromeda Paradox

What do we mean when we say that events are occuring "now"? If we are looking out over a cityscape, watching the traffic, lots of things appear to be happening all at once. We can take a snapshot with a camera and the scene on the photo consists of all those things that happened at very nearly the same time. They are only "very nearly" at the same time because the events on the photo that were furthest away actually occurred slightly earlier than those that were nearby because of the time taken for light to reach the camera. If we want to discover the events that really happened at the same time we would need to subtract the time taken for the light to get to us. This would be highly necessary if you were observing events on the Moon: if you were on the Earth and saw the time on a lunar clock you would know that the real time on the moon was more than a second later. But would this be enough? What about the relativistic phase differences between clocks due to motion?

Special Relativity introduces yet another factor, in addition to the travel time of light, that upsets our knowledge of which events are simultaneous. The relativistic phase differences between clocks are tiny at the distance of the moon but have the startling consequence that at distances as large as our separation from nearby galaxies an observer who is driving on the earth can have a radically different set of events that are simultaneous with her "present moment" from another person who is standing on the earth. The classic example of this effect of relativistic phase is the "Andromeda Paradox", also known as the "Rietdijk-Putnam-Penrose" argument. Penrose described the argument:

"Two people pass each other on the street; and according to one of the two people, an Andromedean space fleet has already set off on its journey, while to the other, the decision as to whether or not the journey will actually take place has not yet been made. How can there still be some uncertainty as to the outcome of that decision? If to either person the decision has already been made, then surely there cannot be any uncertainty. The launching of the space fleet is an inevitability." (Penrose 1989).

The argument is illustrated below:

Rel2.gif

Notice that neither observer can actually "see" what is happening on Andromeda now. The argument is not about what can be "seen", it is purely about what different observers consider to be contained in their instantaneous present moment. The two observers observe the same, two million year old events in their telescopes but the moving observer must assume that events at the present moment on Andromeda are a day or two in advance of those in the present moment of the stationary observer. (Incidentally, the two observers see the same events in their telescopes because length contraction of the distance from Earth to Andromeda compensates exactly for the time difference on Andromeda.)

This "paradox" has generated considerable philosophical debate on the nature of time and free-will. The advanced text of this book provides a discussion of some of the issues surrounding this geometrical interpretation of special relativity.

A result of the relativity of simultaneity is that if the car driver launches a space rocket towards the Andromeda galaxy it might have a several days head start compared with a space rocket launched from the ground. This is because the "present moment" for the moving car driver is progressively advanced with distance compared with the present moment on the ground. The present moment for the car driver is shown in the illustration below:

Rel3.gif

The net effect of the Andromeda paradox is that when someone is moving towards a distant point there are later events at that point than for someone who is not moving towards the distant point. There is a time gap between the events in the present moment of the two people.

The twin paradox

The "Twin Paradox" derives from an article by Langevin (1911) who used travel to a distant star and back to describe the relationships between times in different inertial reference frames. Langevin's original example was called the "Clock Paradox" and showed that a space traveller who travels to a distant star and back finds that he has aged less than the people who stay on Earth. Benguigui (2012) provides a detailed account of how, over time, this example became the story of two twins, one who travels out into space and one who stays at home, and how it became described by Weyl in 1922 as the "Twin Paradox". The "Twin Paradox" is an interesting example of the relativity of simultaneity and time dilation and deserves close study so that these features of Special Relativity can be understood. However, be warned, the reason that the "twin paradox" has attracted so much puzzlement is that, although superficially a simple problem, the analysis of the various inertial frames of reference is complex.

A one way trip

The twin "paradox" consists of two journeys, an outbound journey and a return journey. Much can be learnt about the relativity of simultaneity by considering just the outbound journey without any return. The single journey without any return might consist of the following scenario: Jim stays at home on Earth and Bill goes off in a spaceship, Bill flies past Jim at a velocity of 0.8c, they both set all of their clocks to zero as they pass each other and Bill flies straight to Mars where he drops off a record of his clock reading. Jim, who stays on Earth, finds that Bill's clocks record less time than his own for the journey.

The journey to Mars consists of two inertial frames of reference. In Jim's frame of reference Bill is moving and Jim is stationary. In Bill's frame of reference Jim, Earth and Mars move and Bill is stationary. There are only two reference frames so questions such as "what does Jim observe if he considers himself to be moving" are equivalent to asking "how does Jim move in Bill's frame of reference?".

The start of the journey is shown below:

Jim and Bill at the start of the journey

Suppose for ease of calculation Mars is assumed to be 100 light seconds away from Earth. Jim's view of the journey is straightforward and is shown below:

Jim's view of the journey from Earth to Mars

If Mars is 100 light seconds away, Jim will time Bill's journey to take 125 seconds (100/0.8). Relativistic time dilation will cause Bill's clocks to read 75 seconds ie:

Bill's clocks will read: 125/\gamma

where gamma is: \frac{1}{\sqrt{1-\frac{v^2}{c^2}}} = 1.667

so  125/1.6667 = 75 seconds.

In Jim's inertial frame of reference Jim has aged by 125 seconds when Bill gets to Mars but Bill, as a result of time dilation, has only aged by 75 seconds.

The reason that this lack of aging by Bill seems to be paradoxical is that it might be thought that Jim must also have aged less than Bill, after all, Jim travels away from Bill as much as Bill travels away from Jim so Bill should find Jim to have aged less than himself in the same way as Jim finds Bill to have aged less than himself. In fact there is no paradox because Bill and Jim have very different ideas of the journey. Bill's view of the journey is shown below.

Bill's view of the journey from Earth to Mars

As a result of relativistic phase, Bill finds that the clocks on Mars start the journey 80 seconds ahead of his own. The Martian clocks record another 45 seconds for the journey. The 45 seconds elapsed time for the journey is what Bill expects when time dilation is taken into account (ie: 75/\gamma = 75/1.6667 = 45 seconds). More surprising still, when Bill considers Jim's clock readings back on Earth he finds that only 45 seconds have elapsed on Jim's earth-based clocks during the journey because the 80 seconds of relativistic phase difference mean that Bill's clocks as he passes Mars are simultaneous with events on Earth that are only 45 seconds later than when he passed Earth.

If Bill had an absurdly long rocket he could use a version of "relativistic bug capture" (see above) to net one of Jim's clocks the moment Bill reaches Mars. Bill could then show there was a Jim on Earth whose clocks had only progressed by 45 seconds for Bill's journey in Bill's inertial frame of reference. If Bill had captured one of Jim's clocks, the Jim who is simultaneous in his own frame of reference with the moment that Bill passes Mars would remember a giant net coming down from the tail of a spaceship and taking one of his clocks 45 seconds after Bill passed him.

Notice that there are two Jims in this story, one who is simultaneous with the Mars that Bill visits and another who is simultaneous with Bill in Bill's frame of reference. The Jim who is simultaneous with Bill in Bill's frame of reference would be an earlier Jim than the Jim who is simultaneous in his own frame of reference with Mars when Bill arrives.

The one way journey is symmetrical in the sense that Jim observes Bill to age less than himself during the journey and Bill also observes an earlier Jim to have aged less than himself.

The one way journey would become fully symmetrical if Bill continued past Mars until his clocks read that 125 seconds had elapsed, at this point he would assess that Jim had experienced 75 seconds of elapsed time. When Jim's clock's read 125 seconds Jim finds Bill's clocks read 75 seconds and, symmetrically, when Bill's clocks read 125 seconds Bill finds that Jim's clocks read 75 seconds. This symmetry is emphasised in the schematic diagram below which compares Jim and Bill's clock readings:

Schematic of Jim and Bill's clock readings

What happens when there is a change of velocity?

In the previous section it was shown that time dilation is symmetrical when observers separate at a constant velocity. The symmetry is evident in the way that, during the one way journey, Jim observes Bill's clocks to go slow and Bill also observes Jim's clocks to go slow. The symmetry between the observers means that Bill can regard himself as stationary and observing Jim departing or vice versa.

Jim and Bill's view of the turn

Bill finds that there are two Jims at two separate times involved in going to Mars then suddenly changing direction to return to Earth: these are labelled A and C in the diagram. The first Jim has clocks that read 45 seconds since Bill passed Earth, this is the Jim who is simultaneous with Bill in Bill's frame of reference as Bill passes Mars. The second Jim has clocks that read 205 seconds since meeting Bill, this Jim is simultaneous with Bill, in Bill's frame of reference, the moment after he has turned around at Mars and reached a velocity of -0.8c for the return journey.

The lack of any single Jim that is simultaneous with Bill when Bill changes velocity introduces an asymmetry into Special Relativity. Bill turning around at Mars to come back to Earth is not equivalent to Earth turning to meet Bill because it is not clear, until Bill has made the turn, which Earth and which Jim is making the journey. Langevin (1911), who first proposed the example of a traveller departing and returning younger, was well aware of the asymmetry and stated that: "Thus the asymmetry – which occurred because only the traveler, in the middle of his journey, has undergone an acceleration that changes the direction of his velocity".

In Special Relativity the laws of physics are the same for each observer in an inertial frame of reference. An inertial frame of reference might be all the clocks and measuring rods in a room on board a ship or in an entire city, the crucial feature of an inertial reference frame being that the clocks and measuring rods are stationary with respect to each other. Motions are measured so that the rest of the universe moves relative to the observer's inertial frame of reference. There are three inertial frames of reference in the example of Bill going to Mars and returning to Earth: Jim, outbound Bill and inbound Bill. Outbound and inbound Bill are separated by a period of varying velocity when Bill turns around. This period of changing velocity can also be regarded as a non-inertial hiatus in Bill's single inertial reference frame after which the relations between Bill's clocks and measuring rods and those in the rest of the universe have changed.

Questions such as "how does the Jim who is simultaneous with Bill after the turn view events if he regards himself as moving towards Bill?" are really about the inbound Bill's reference frame. In Jim's own inertial frame of reference Bill just goes to Mars, turns round and comes back again. Jim always regards himself as stationary unless his reference frame becomes non-inertial by experiencing a change in velocity in which case Jim would regard himself as moving from one stationary state to another that has a different set of relationships with the universe. Special Relativity holds that the laws of physics are the same in all inertial reference frames, it does not hold that all motion is relative, even in non-inertial changes.

The gap in time between the Jim who is simultaneous with Bill as Bill reaches Mars and the Jim who is simultaneous with Bill as he starts the journey back to Earth is known as the "Time Gap". The time gap in the "twin paradox" consists of the sum of the outgoing and incoming phase differences and in this case the time gap is 160 seconds .

The time gap description of the "twin paradox"

Once it is accepted that Bill and Jim have very different views of the journey these views can be summarised in the "Time Gap" description of the journey. In this description Bill flies to Mars and discovers that the clocks there are reading a later time than his own clock. He turns round to fly back to Earth and realises that the relativity of simultaneity means that, for Bill, the clocks on Earth will have jumped forward and are ahead of those on Mars, yet another "time gap" appears. When Bill gets back to Earth the time gaps and time dilations mean that people on Earth have recorded more clock ticks that he did.

For ease of calculation suppose that Bill is moving at a truly astonishing velocity of 0.8c in the direction of a distant point that is 100 light seconds away (about 30 million kilometres). The illustration below shows Jim and Bill's observations:

Overalljourney.jpg

From Bill's viewpoint there is both a time dilation and a phase effect. It is the added factor of "phase" that explains why, although the time dilation occurs for both observers, Bill observes the same readings on Jim's clocks over the whole journey as does Jim.

To summarise the mathematics of the twin paradox using the example:

Jim observes the distance as 100 light seconds and the distant point is in his frame of reference. According to Jim it takes Bill the following time to make the journey:

Time taken = distance / velocity therefore according to Jim:
t = 100/0.8 = 125 seconds
Again according to Jim, time dilation should affect the observed time on Bill's clocks:
T = t \times \sqrt {1 - v^2/c^2} so:
T = 125 \times \sqrt {1 - 0.8^2} = 75 seconds

So for Jim the round trip takes 250 secs and Bill's clock reads 150 secs.

Bill measures the distance as:

X = x \times \sqrt {1 - v^2/c^2} =  100 \times \sqrt {1 - 0.8^2} = 60 light seconds.
For Bill it takes X/v = 60/0.8 = 75 seconds.

Bill observes Jim's clocks to appear to run slow as a result of time dilation:

t^' = T \times \sqrt {1 - v^2/c^2} so:
t^' = 75 \times \sqrt {1 - 0.8^2} = 45 seconds

But there is also a time gap of vx/c^2 = 80 seconds.

So for Bill, Jim's clocks register 125 secs have passed from the start to the distant point. This is composed of 45 secs elapsing on Jim's clocks at the turn round point plus an 80 secs time gap from the start of the journey. Bill sees 250 secs total time recorded on Jim's clocks over the whole journey, this is the same time as Jim observes on his own clocks.


Further reading:

Benguigui, L (2012). A tale of two twins. arXiv:1212.4414v1

Bohm, D. The Special Theory of Relativity (W. A. Benjamin, 1965).

D’Inverno, R. Introducing Einstein’s Relativity (Oxford University Press, 1992).

Eagle, A. A note on Dolby and Gull on radar time and the twin "paradox". American Journal of Physics. 2005, VOL 73; NUMB 10, pages 976-978. http://arxiv.org/PS_cache/physics/pdf/0411/0411008v2.pdf

Langevin, P. (1911) L'Evolution de l'espace et du temps Scientia 10 (1911), 31-34

The nature of length contraction

According to special relativity items such as measuring rods consist of events distributed in space and time and a three dimensional rod is the events that compose the rod at a single instant. However, from the relativity of simultaneity it is evident that two observers in relative motion will have different sets of events that are present at a given instant. This means that two observers moving relative to each other will usually be observing measuring rods that are composed of different sets of events. If the word "rod" means the three dimensional form of the object called a rod then these two observers in relative motion observe different rods.

The way that measuring rods differ between observers can be seen by using a Minkowski diagram. The area of a Minkowski diagram that corresponds to all of the events that compose an object over a period of time is known as the worldtube of the object. It can be seen in the image below that length contraction is the result of individual observers having different sections of an object's worldtube in their present instant.

Relcontract.gif

(It should be recalled that the longest lengths on space-time diagrams are often the shortest in reality).

It is sometimes said that length contraction occurs because objects rotate into the time axis. This is actually a half truth, there is no actual rotation of a three dimensional rod, instead the observed three dimensional slice of a four dimensional rod is changed which makes it appear as if the rod has rotated into the time axis. In special relativity it is not the rod that rotates into time, it is the observer's slice of the worldtube of the rod that rotates.

There can be no doubt that the three dimensional slice of the worldtube of a rod does indeed have different lengths for relatively moving observers. The issue of whether or not the events that compose the worldtube of the rod are always existent is a matter for philosophical speculation.

Further reading:

Vesselin Petkov. (2005) Is There an Alternative to the Block Universe View?

Dragan V Redžić (2010). Relativistic length agony continued

More about time dilation

The term "time dilation" is applied to the way that observers who are moving relative to you record fewer clock ticks between events than you. In special relativity this is not due to properties of the clocks, such as their mechanisms getting heavier. Indeed, it should not even be said that the clocks tick faster or slower because what is truly occurring is that the clocks record shorter or longer elapsed times and this recording of elapsed time is independent of the mechanism of the clocks. The differences between clock readings are due to the clocks traversing shorter or longer distances between events along an observer's path through spacetime. This can be seen most clearly by re-examining the Andromeda Paradox.

Suppose Bill passes Jim at high velocity on the way to Mars. Jim has previously synchronised the clocks on Mars with his Earth clocks but for Bill the Martian clocks read times well in advance of Jim's. This means that Bill has a head start because his present instant contains what Jim considers to be the Martian future. Jim observes that Bill travels through both space and time and expresses this observation by saying that Bill's clocks recorded fewer ticks than his own. Bill achieves this strange time travel by having what Jim considers to be the future of distant objects in his present moment. Bill is literally travelling into future parts of Jim's frame of reference.

In special relativity time dilation and length contraction are not material effects, they are physical effects due to travel within a four dimensional spacetime. The mechanisms of the clocks and the structures of measuring rods are irrelevant.

It is important for advanced students to be aware that special relativity and General Relativity differ about the nature of spacetime. General Relativity, in the form championed by Einstein, avoids the idea of extended space and time and is what is known as a "relationalist" theory of physics. Special relativity, on the other hand, is a theory where extended spacetime is pre-eminent. The brilliant flowering of physical theory in the early twentieth century has tended to obscure this difference because, within a decade, special relativity had been subsumed within General Relativity. The interpretation of special relativity that is presented here should be learnt before advancing to more advanced interpretations.

The Pole-barn paradox

The length contraction in relativity is symmetrical. When two observers in relative motion pass each other they both measure a contraction of length.

Rellencon.gif

(Note that Minkowski's metric involves the subtraction of displacements in time, so what appear to be the longest lengths on a 2D sheet of paper are often the shortest lengths in a (3+1)D reality).

The symmetry of length contraction leads to two questions. Firstly, how can a succession of events be observed as simultaneous events by another observer? This question led to the concept of de Broglie waves and quantum theory. Secondly, if a rod is simultaneously between two points in one frame how can it be observed as being successively between those points in another frame? For instance, if a pole enters a building at high speed how can one observer find it is fully within the building and another find that the two ends of the rod are opposed to the two ends of the building at successive times? What happens if the rod hits the end of the building? The second question is known as the "pole-barn paradox" or "ladder paradox".

Polebarn.GIF

The pole-barn paradox states the following: suppose a superhero running at 0.75c and carrying a horizontal pole 15 m long towards a barn 10m long, with front and rear doors. When the runner and the pole are inside the barn, a ground observer closes and then opens both doors (by remote control) so that the runner and pole are momentarily captured inside the barn and then proceed to exit the barn from the back door.

One may be surprised to see a 15-m pole fit inside a 10-m barn. But the pole is in motion with respect to the ground observer, who measures the pole to be contracted to a length of 9.9 m (check using equations).

The “paradox” arises when we consider the runner’s point of view. The runner sees the barn contracted to 6.6 m. Because the pole is in the rest frame of the runner, the runner measures it to have its proper length of 15 m. Now, how can our superhero make it safely through the barn?

The resolution of the “paradox” lies in the relativity of simultaneity. The closing of the two doors is measured to be simultaneous by the ground observer. However, since the doors are at different positions, the runner says that they do not close simultaneously. The rear door closes and then opens first, allowing the leading edge of the pole to exit. The front door of the barn does not close until the trailing edge of the pole passes by.

What happens if the rear door is kept closed and made out of some impenetrable material? Can we or can we not trap the rod inside the barn by closing the front door while the whole rod is inside according to a ground observer? When the front end of the rod hits the rear door, information about this impact will travel backwards along the rod in the form of a shock wave. The information cannot travel faster than c, so the rear end of the rod will continue to travel forward at its original speed until the wave reaches it. Even if the shock wave is traveling at the speed of light, it will not reach the rear end of the rod until after the rear end has passed through the front door even in the runner's frame. Therefore the whole rod (albeit quite scrunched up) will be inside the barn when the front door closes. If it is infinitely elastic, it will end up compressed and "spring loaded" against the inside of the closed barn.

Evidence for length contraction, the field of an infinite straight current

Length contraction can be directly observed in the field of an infinitely straight current. This is shown in the illustration below.

Relelectro.GIF

Non-relativistic electromagnetism describes the electric field due to a charge using:

E = \frac{\lambda}{2 \pi \epsilon_0 r}

and describes the magnetic field due to an infinitely long straight current using the Biot Savart law:

B = \frac{\mu_0 I}{2 \pi r}

Or using the charge density (from I = \lambda v where \lambda ):

B = \frac{\mu_0 \lambda v}{2 \pi r}

Using relativity it is possible to show that the formula for the magnetic field given above can be derived using the relativistic effect of length contraction on the electric field and so what we call the "magnetic" field can be understood as relativistic observations of a single phenomenon. The relativistic calculation is given below.

If Jim is moving relative to the wire at the same velocity as the negative charges he sees the wire contracted relative to Bill:

l_+ = l \sqrt{1 - v^2/c^2}

Bill should see the space between the charges that are moving along the wire to be contracted by the same amount but the requirement for electrical neutrality means that the moving charges will be spread out to match those in the frame of the fixed charges in the wire.

This means that Jim sees the negative charges spread out so that:

l_- = \frac{l}{\sqrt{1 - v^2/c^2}}

The net charge density observed by Jim is:

\lambda = \frac{q}{l_-} - \frac{q}{l_+}

Substituting:

\lambda = \frac{q}{l} ( \sqrt{1 - v^2/c^2} - \frac{1}{\sqrt{1 - v^2/c^2}})

Using the binomial expansion:

\lambda = \frac{q}{l} (1 - \frac{v^2}{2c^2} - 1 - \frac{v^2}{2c^2})

Therefore, allowing for a net positive charge, the positive charges being fixed:

\lambda = \frac{qv^2}{l c^2}

The electric field at Jim's position is given by:

E = \frac{\lambda v^2}{2 \pi \epsilon_0 r c^2}

The force due to the electrical field at Jim's position is given by F = Eq which is:

F = \frac{q \lambda v^2}{2 \pi \epsilon_0 r c^2}

Now, from classical electromagnetism:

c^2 = \frac{1}{\epsilon_0 \mu_0}

So substituting this into F = \frac{q \lambda v^2}{2 \pi \epsilon_0 r c^2}:

(1) F = \frac{q\mu_0 \lambda v^2}{2 \pi r}

This is the formula for the relativistic electric force that is observed by Bill as a magnetic force. How does this compare with the non-relativistic calculation of the magnetic force? The force on a charge at Jim's position due to the magnetic field is, from the classical formula:

F = Bqv

Which from the Biot-Savart law is:

(2) F = \frac{q\mu_0 \lambda v^2}{2 \pi r}

which shows that the same formula applies for the relativistic excess electrical force experienced by Jim as the formula for the classical magnetic force.

It can be seen that once the idea of space-time is understood the unification of the two fields is straightforward. Jim is moving relative to the wire at the same speed as the negatively charged current carriers so Jim only experiences an electric field. Bill is stationary relative to the wire and observes that the charges in the wire are balanced whereas Jim observes an imbalance of charge. Bill assigns the attraction between Jim and the current carriers to a "magnetic field".

It is important to notice that, in common with the explanation of length contraction given above, the events that constitute the stream of negative charges for Jim are not the same events as constitute the stream of negative charges for Bill. Bill and Jim's negative charges occupy different moments in time.

Incidently, the drift velocity of electrons in a wire is about a millimetre per second but a huge charge is available in a wire (See link below).

Further reading:

Purcell, E. M. Electricity and Magnetism. Berkeley Physics Course. Vol. 2. 2nd ed. New York, NY: McGraw-Hill. 1984. ISBN: 0070049084.

Useful links:

http://hyperphysics.phy-astr.gsu.edu/hbase/electric/ohmmic.html

http://hyperphysics.phy-astr.gsu.edu/hbase/relativ/releng.html

De Broglie waves

De Broglie noticed that the differing three dimensional sections of the universe would cause oscillations in the rest frame of an observer to appear as wave trains in the rest frame of observers who are moving.

Reldebroglie.gif

He combined this insight with Einstein's ideas on the quantisation of energy to create the foundations of quantum theory. De Broglie's insight is also a round-about proof of the description of length contraction given above - observers in relative motion have differing three dimensional slices of a four dimensional universe. The existence of matter waves is direct experimental evidence of the relativity of simultaneity.

Constuddebroglie.gif

Further reading: de Broglie, L. (1925) On the theory of quanta. A translation of : RECHERCHES SUR LA THEORIE DES QUANTA (Ann. de Phys., 10e s´erie, t. III (Janvier-F ´evrier 1925).by: A. F. Kracklauer. http://replay.web.archive.org/20090509012910/http://www.ensmp.fr/aflb/LDB-oeuvres/De_Broglie_Kracklauer.pdf

Bell's spaceship paradox

Bell devised a thought experiment called the "Spaceship Paradox" to enquire whether length contraction involved a force and whether this contraction was a contraction of space. In the Spaceship Paradox two spaceships are connected by a thin, stiff string and are both equally and linearly accelerated to a velocity v relative to the ground, at which, in the special relativity version of the paradox, the acceleration ceases. The acceleration on both spaceships is arranged to be equal according to ground observers so, according to observers on the ground, the spaceships will stay the same distance apart. It is asked whether the string would break.

It is useful when considering this problem to investigate what happens to a single spaceship. If a spaceship that has rear thrusters is accelerated linearly, according to ground observers, to a velocity v then the ground observers will observe it to have contracted in the direction of motion. The acceleration experienced by the front of the spaceship will have been slightly less than the acceleration experienced by the rear of the spaceship during contraction and then would suddenly reach a high value, equalising the front and rear velocities, once the rear acceleration and increasing contraction had ceased. From the ground it would be observed that overall the acceleration at the rear could be linear but the acceleration at the front would be non-linear.

In Bell's thought experiment both spaceships are artificially constrained to have constant acceleration, according to the ground observers, until the acceleration ceases. Sudden adjustments are not allowed. Furthermore no difference between the accelerations at the front and rear of the assembly are permitted so any tendency towards contraction would need to be borne as tension and extension in the string.

The most interesting part of the paradox is what happens to the space between the ships. From the ground the spaceships will stay the same distance apart (the experiment is arranged to achieve this) whilst according to observers on the spaceships they will appear to become increasingly separated. This implies that acceleration is not invariant between reference frames (see Part II) and the force applied to the spaceships will indeed be affected by the difference in separation of the ships observed by each frame.

The section on the nature of length contraction above shows that as the string changes velocity the observers on the ground observe a changing set of events that compose the string. These new events define a string that is shorter than the original. This means that the string will indeed attempt to contract as observed from the ground and will be drawn out under tension as observed from the spaceships. If the string were unable to bear the extension and tension in the moving frame or the tension in the rest frame it would break.

Another interesting aspect of Bell's Spaceship Paradox is that in the inertial frames of the ships, owing to the relativity of simultaneity, the lead spaceship will always be moving slightly faster than the rear spaceship so the spaceship-string system does not form a true inertial frame of reference until the acceleration ceases in the frames of reference of both ships. The asynchrony of the cessation of acceleration shows that the lead ship reaches the final velocity before the rear ship in the frame of reference of either ship. However, this time difference is very slight (less than the time taken for an influence to travel down the string at the speed of light x/c > vx/c^2 ).

It is necessary at this stage to give a warning about extrapolating special relativity into the domain of general relativity (GR). SR cannot be applied with confidence to accelerating systems which is why the comments above have been confined to qualitative observations.

Further reading

Bell, J. S. (1976). Speakable and unspeakable in quantum mechanics. Cambridge University Press 1987 ISBN 0-521-52338-9

Hsu, J-P and Suzuki, N. (2005) Extended Lorentz Transformations for Accelerated Frames and the Solution of the “Two-Spaceship Paradox” AAPPS Bulletin October 2005 p.17 http://www.aapps.org/archive/bulletin/vol15/15-5/15_5_p17p21%7F.pdf

Matsuda, T and Kinoshita, A (2004. A Paradox of Two Space Ships in Special Relativity. AAPPS Bulletin February 2004 p3. http://www.aapps.org/archive/bulletin/vol14/14_1/14_1_p03p07.pdf

The transverse doppler effect

The existence of time dilation means that the frequency of light emitted from a source that is moving towards or away from an observer should be red shifted in directions that are perpendicular to the direction of motion. The transverse doppler effect is given by:

\nu = \nu^{\prime} \sqrt{ 1 - \frac{v^2}{c^2}}

Where \nu is the observed frequency and \nu^{\prime} is the frequency if the source were stationary relative to the observer (the proper frequency).

This effect was first confirmed by Ives and Stillwell in 1938. The transverse doppler effect is a purely relativistic effect and has been used as an example of proof that time dilation occurs.

Relativistic transformation of angles

If a rod makes an angle with its direction of motion toward or away from an observer the component of its length in the direction of motion will be contracted. This means that observed angles are also transformed during changes of frames of reference. Assuming that motion occurs along the x-axis, suppose the rod has a proper length (rest length) of L^{\prime} metres and makes an angle of \theta^{\prime}degrees with the x'-axis in its rest frame. The tangent of the angle made with the axes is:

Tangent in rest frame of rod = \tan \theta^{\prime} = \frac{L^{\prime}_y}{L^{\prime}_x}
Tangent in observer's frame = \tan \theta = \frac{L_y}{L_x}
Therefore:
\frac {\tan \theta}{\tan \theta^{\prime}} =  \frac {L_y L^{\prime}_x} {L^{\prime}_y L_x}
But L_x = L^{\prime}_x \sqrt {1 - v^2/c^2}
And L_y = L^{\prime}_y
So
\tan \theta =\frac {\tan \theta^{\prime}} {\sqrt {1 - v^2/c^2}}

Showing that angles with the direction of motion are observed to increase with velocity.

The angle made by a moving object with the x-axis also involves a transformation of velocities to calculate the correct angle of incidence.

Addition of velocities

How can two observers, moving at v m/sec relative to each other, compare their observations of the velocity of a third object?

Relvel.gif

Suppose one of the observers measures the velocity of the object as u^' where:

u^' =  \frac{x^'}{t^'}

The coordinates x^' and t^' are given by the Lorentz transformations:

x^' = \frac{x - vt}{\sqrt{(1 - v^2/c^2)}}

and

t^' = \frac{t - (v/c^2)x}{\sqrt{(1 - v^2/c^2)}}

but

x^' =  u^' t^'

so:

\frac{x - vt}{\sqrt{(1 - v^2/c^2)}} = u^' \frac{t - (v/c^2)x}{\sqrt{(1 - v^2/c^2)}}

and hence:

x - vt = u^' ( t - vx/c^2)

Notice the role of the phase term vx/c^2. The equation can be rearranged as:

x = \frac{(u^' + v)}{(1 + u^'v/c^2)} t

given that x =  u t:

u = \frac{(u^' + v)}{(1 + u^'v/c^2)}

This is known as the relativistic velocity addition theorem, it applies to velocities parallel to the direction of mutual motion.

The existence of time dilation means that even when objects are moving perpendicular to the direction of motion there is a discrepancy between the velocities reported for an object by observers who are moving relative to each other. If there is any component of velocity in the x direction ({u_x} , {{u^'}_x}) then the phase affects time measurement and hence the velocities perpendicular to the x-axis. The table below summarises the relativistic addition of velocities in the various directions in space.

{u^'}_x = \frac{(u_x - v)}{(1 - u_x v/c^2)}

u_x = \frac{({u^'}_x + v)}{(1 + {u^'}_x v/c^2)}

{u^'}_y = \frac{u_y \sqrt{1 - v^2/c^2}}{(1 - u_x v/c^2)}

u_y = \frac{{u^'}_y \sqrt{1 - v^2/c^2}}{(1 + {u^'}_x v/c^2)}

{u^'}_z = \frac{u_z \sqrt{1 - v^2/c^2}}{(1 - u_x v/c^2)}

u_z = \frac{{u^'}_z \sqrt{1 - v^2/c^2}}{(1 + {u^'}_x v/c^2)}

Notice that for an observer in another reference frame the sum of two velocities (u and v) can never exceed the speed of light. This means that the speed of light is the maximum velocity in any frame of reference. Simultaneity, time dilation and length contraction

Dynamics

Introduction

The way that the velocity of a particle can differ between observers who are moving relative to each other means that momentum needs to be redefined as a result of relativity theory.

The illustration below shows a typical collision of two particles. In the right hand frame the collision is observed from the viewpoint of someone moving at the same velocity as one of the particles, in the left hand frame it is observed by someone moving at a velocity that is intermediate between those of the particles.

Relcollision.gif

If momentum is redefined then all the variables such as force (rate of change of momentum), energy etc. will become redefined and relativity will lead to an entirely new physics. The new physics has an effect at the ordinary level of experience through the relation K = \gamma m c^2 - m c^2\, whereby it is the tiny deviations in gamma from unity that are expressed as everyday kinetic energy so that the whole of physics is related to "relativistic" reasoning rather than Newton's empirical ideas.

Momentum

In physics momentum is conserved within a closed system, the law of conservation of momentum applies. Consider the special case of identical particles colliding symmetrically as illustrated below:

Relcollision2.gif

The momentum change by the red ball is:

2m\mathbf{u_{yR}}

The momentum change by the blue ball is:

-2m\mathbf{u_{yB}}

The situation is symmetrical so the Newtonian conservation of momentum law is demonstrated:

2m\mathbf{u_{yR}}=2m\mathbf{u_{yB}}

Notice that this result depends upon the y components of the velocities being equal, that is, \mathbf{u_{yR}}=\mathbf{u_{yB}}.

The relativistic case is rather different. The collision is illustrated below, the left hand frame shows the collision as it appears for one observer and the right hand frame shows exactly the same collision as it appears for another observer moving at the same velocity as the blue ball:

Relcollision3.gif

The configuration shown above has been simplified because one frame contains a stationary blue ball (ie: u_{xB}=0) and the velocities are chosen so that the vertical velocity of the red ball is exactly reversed after the collision ie:u_{yR}^' = -u_{yB}^'. Both frames show exactly the same event, it is only the observers who differ between frames. The relativistic velocity transformations between frames is:

u_{yR}^' = \frac{u_{yR} \sqrt{1 - v^2/c^2}}{1 - u_{xR}v/c^2}

u_{yB}^' =\frac{u_{yB} \sqrt{1 - v^2/c^2}}{1 - u_{xB}v/c^2}= u_{yB} \sqrt{1 - v^2/c^2} given that u_{xB}=0\,.

Suppose that the y components are equal in one frame, in Newtonian physics they will also be equal in the other frame. However, in relativity, if the y components are equal in one frame they are not necessarily equal in the other frame (time dilation is not directional so perpendicular velocities differ between the observers). For instance if u_{yR}^' = u_{yB}^' then:

u_{yB} = \frac{u_{yR}}{1 - u_{xR}v/c^2}

So if u_{yR}^' = u_{yB}^' then in this case u_{yR} \ne u_{yB}.

If the mass were constant between collisions and between frames then although 2 m\mathbf{u_{yR}^'} = 2m\mathbf{u_{yB}^'} it is found that:

2 m\mathbf{u_{yR}} \ne 2m\mathbf{u_{yB}}

So momentum defined as mass times velocity is not conserved in a collision when the collision is described in frames moving relative to each other. Notice that the discrepancy is very small if u_{xR} and v are small.

To preserve the principle of momentum conservation in all inertial reference frames, the definition of momentum has to be changed. The new definition must reduce to the Newtonian expression when objects move at speeds much smaller than the speed of light, so as to recover the Newtonian formulas.

The velocities in the y direction are related by the following equation when the observer is travelling at the same velocity as the blue ball ie: when  u_{xB} = 0\,:

u_{yB} = \frac{u_{yR}}{1 - u_{xR}v/c^2}

If we write m_B for the mass of the blue ball) and m_R for the mass of the red ball as observed from the frame of the blue ball then, if the principle of relativity applies:

2 m_R u_{yR} = 2 m_B u_{yB} \,

So:

m_R = m_B \frac{u_{yB}}{u_{yR}}

But:

u_{yB} = \frac{u_{yR}}{1 - u_{xR}v/c^2}

Therefore:

m_R = \frac{m_B}{1 - u_{xR}v/c^2}

This means that, if the principle of relativity is to apply then the mass must change by the amount shown in the equation above for the conservation of momentum law to be true.

The reference frame was chosen so that u_{yR}^' = -u_{yB}^' and hence  u_{xR}^' = v. This allows v to be determined in terms of u_{xR}\,:

u_{xR}^' = \frac{u_{xR} - v}{1 - u_{xR}v/c^2} = v

and hence:

v = \frac{c^2}{u_{xR}}(1 - \sqrt{1-u_{xR}^2/c^2})

So substituting for v in m_R = \frac{m_B}{1 - u_{xR}v/c^2}:

m_R = \frac{m_B}{\sqrt{1 - u_{xR}^2/c^2}}

The blue ball is at rest so its mass is sometimes known as its rest mass, and is given the symbol m. As the balls were identical at the start of the boost the mass of the red ball is the mass that a blue ball would have if it were in motion relative to an observer; this mass is sometimes known as the relativistic mass symbolised by M. These terms are now infrequently used in modern physics, as will be explained at the end of this section. The discussion given above was related to the relative motions of the blue and red balls, as a result u_{xR} corresponds to the speed of the moving ball relative to an observer who is stationary with respect to the blue ball. These considerations mean that the relativistic mass is given by:

M = \frac{m}{\sqrt{1 - u^2/c^2}}

The relativistic momentum is given by the product of the relativistic mass and the velocity \mathbf{p}=M\mathbf{u}.

The overall expression for momentum in terms of rest mass is:

\mathbf{p} = \frac{m\mathbf{u}}{\sqrt{1-u^2/c^2}}

and the components of the momentum are:

p_x = \frac{mu_x}{\sqrt{1-u^2/c^2}}

p_y = \frac{mu_y}{\sqrt{1-u^2/c^2}}

p_z = \frac{mu_z}{\sqrt{1-u^2/c^2}}

So the components of the momentum depend upon the appropriate velocity component and the speed.

Since the factor with the square root is cumbersome to write, the following abbreviation is often used, called the Lorentz gamma factor:

 \gamma = \frac{1}{\sqrt{1-u^2/c^2}}

The expression for the momentum then reads  \mathbf{p} = m \gamma \mathbf{u} .

It can be seen from the discussion above that we can write the momentum of an object moving with velocity \mathbf{u} as the product of a function  M(u) of the speed  u and the velocity  \mathbf{u} :

 M(u) \mathbf{u}

The function  M(u) must reduce to the object's mass  m at small speeds, in particular when the object is at rest  M(0) = m .

There is a debate about the usage of the term "mass" in relativity theory. If inertial mass is defined in terms of momentum then it does indeed vary as M = \gamma m for a single particle that has rest mass, furthermore, as will be shown below the energy of a particle that has a rest mass is given by E=Mc^2. Prior to the debate about nomenclature the function  M(u) , or the relation M = \gamma m, used to be called 'relativistic mass', and its value in the frame of the particle was referred to as the 'rest mass' or 'invariant mass'. The relativistic mass, M = \gamma m, would increase with velocity. Both terms are now largely obsolete: the 'rest mass' is today simply called the mass, and the 'relativistic mass' is often no longer used since, as will be seen in the discussion of energy below, it is identical to the energy but for the units.

Force

Newton's second law states that the total force acting on a particle equals the rate of change of its momentum. The same form of Newton's second law holds in relativistic mechanics. The relativistic 3 force is given by:

 \mathbf{f} = d\mathbf{p}/dt

If the relativistic mass is used:

\frac{d\mathbf{p}}{dt}= \frac{d(m\mathbf{u})}{dt}

By Leibniz's law where d(xy)=xdy+ydx:

\mathbf{f} = \frac{d\mathbf{p}}{dt}= m\frac{d\mathbf{u}}{dt}+\mathbf{u}\frac{dm}{dt}

This equation for force will be used below to derive relativistic expressions for the energy of a particle in terms of the old concept of "relativistic mass".

The relativistic force can also be written in terms of acceleration. Newton's second law can be written in the familiar form

 \mathbf{F} = m \mathbf{a}

where  \mathbf{a} = d\mathbf{v}/dt is the acceleration.

here m is not the relativistic mass but is the invariant mass.

In relativistic mechanics, momentum is  \mathbf{p} = m \gamma \mathbf{v}

again m being the invariant mass and the force is given by  \mathbf{F} = \frac{d\mathbf{p}}{dt} = m \frac{d(\gamma \mathbf{v})}{dt}

This form of force is used in the derivation of the expression for energy without relying on relativistic mass.

It will be seen in the second section of this book that Newton's second law in terms of acceleration is given by:

 \mathbf{F} = m \gamma( \mathbf{a} + \frac{\gamma^2 v}{c^2} \frac{dv}{dt} \mathbf{v})

Energy

The debate over the use of the concept "relativistic mass" means that modern physics courses may forbid the use of this in the derivation of energy. The newer derivation of energy without using relativistic mass is given in the first section and the older derivation using relativistic mass is given in the second section. The two derivations can be compared to gain insight into the debate about mass but a knowledge of 4 vectors is really required to discuss the problem in depth. In principle the first derivation is most mathematically correct because "relativistic mass" is given by: M = \frac{m}{\sqrt{1 - u^2/c^2}} which involves the constants m and c.


Derivation of relativistic energy using the relativistic momentum

In the following, modern derivation, m means the invariant mass - what used to be called the "rest mass". Energy is defined as the work done in moving a body from one place to another. We will make use of the relativistic momentum  p=\gamma mv . Energy is given from:

 dE = \mathbf{f}d\mathbf{x}

so, over the whole path:

 E = \int_{0}^{x} \mathbf{f}d\mathbf{x}

Kinetic energy (K) is the energy used to move a body from a velocity of 0 to a velocity \mathbf{u}. Restricting the motion to one dimension:

K = \int_{u=0}^{u=u} \mathbf{f} dx

Using the relativistic 3 force:

K = \int_{u=0}^{u=u} \frac{d(m\gamma u)}{dt}dx=\int_{u=0}^{u=u}m \frac{d(\gamma u)}{dt}dx= \int_{u=0}^{u=u} m d(\gamma u)\frac{dx}{dt}

substituting for d(\gamma u) and using dx/dt=u:

K = \int_{u=0}^{u=u} m (\gamma du + ud\gamma) u

Which gives:

K = \int_{u=0}^{u=u} m (u\gamma du + u^2 d\gamma)

The Lorentz factor  \gamma is given by:

\gamma = \frac{1}{\sqrt{1 - u^2/c^2}}

meaning that :

d\gamma = \frac{u}{c^2}\gamma^3du

du = \frac{c^2}{u\gamma^3}d\gamma

So that

K = \int_{\gamma=1}^{\gamma=\gamma} m (u\gamma \frac{c^2}{u\gamma^3}d\gamma + u^2 d\gamma)
= \int_{\gamma=1}^{\gamma=\gamma} m (\frac{c^2}{\gamma^2} + u^2) d\gamma
= \int_{\gamma=1}^{\gamma=\gamma} m c^2 d\gamma

Alternatively, we can use the fact that:

\gamma^2c^2 - \gamma^2u^2 = c^2\,

Differentiating:

2\gamma c^2d\gamma - \gamma^22udu -u^22\gamma d\gamma =0\,

So, rearranging:

\gamma u du + u^2 d\gamma = c^2 d\gamma\,

In which case:

K = \int_{u=0}^{u=u} m (u\gamma du + u^2 d\gamma) = \int_{u=0}^{u=u} m c^2 d\gamma \,

As  u goes from 0 to  u , the Lorentz factor  \gamma goes from 1 to  \gamma , so:

K = m c^2 \int_{\gamma=1}^{\gamma=\gamma} d\gamma \,

and hence:

K = \gamma m c^2 - m c^2\,

The amount \gamma mc^2 is known as the total energy of the particle. The amount m c^2 is known as the rest energy of the particle. If the total energy of the particle is given the symbol E:

 E = \gamma m c^2  = mc^2 + K \,

So it can be seen that m c^2 is the energy of a mass that is stationary. This energy is known as mass energy.

The Newtonian approximation for kinetic energy can be derived by using the binomial theorem to expand \gamma = (1-u^2/c^2)^{-\frac{1}{2}}.

The binomial expansion is:

(a + x)^n = a^n + na^{n-1}x + \frac{n(n-1)}{2!}a^{n-2}x^2 ....

So expanding (1-u^2/c^2)^{-\frac{1}{2}}:

K = \frac{1}{2} m u^2 + \frac{3m u^4}{8c^2} + \frac{5m u^6}{16c^4} + ...

So if u is much less than c:

K = \frac{1}{2} m u^2

which is the Newtonian approximation for low velocities.

Derivation of relativistic energy using the concept of relativistic mass

Energy is defined as the work done in moving a body from one place to another. Energy is given from:

 dE = \mathbf{F}d\mathbf{x}

so, over the whole path:

 E = \int_{0}^{x} \mathbf{F}d\mathbf{x}

Kinetic energy (K) is the energy used to move a body from a velocity of 0 to a velocity u. So:

K = \int_{u=0}^{u=u} F dx

Using the relativistic force:

K = \int_{u=0}^{u=u} \frac{d(Mu)}{dt}dx

So:

K = \int_{u=0}^{u=u} d(Mu)\frac{dx}{dt}

substituting for d(Mu) and using dx/dt=u:

K = \int_{u=0}^{u=u} (Mdu + udM) u

Which gives:

K = \int_{u=0}^{u=u} (Mu du + u^2 dM)

The relativistic mass is given by:

M = \frac{m}{\sqrt{1 - u^2/c^2}}

Which can be expanded as:

M^2c^2 - M^2u^2 = m^2c^2

Differentiating:

2Mc^2dM - M^22udu -u^22MdM =0

So, rearranging:

Mu du + u^2 dM = c^2 dM

In which case:

K = \int_{u=0}^{u=u} (Mu du + u^2 dM)

is simplified to:

K = \int_{u=0}^{u=u} c^2 dM

But the mass goes from m to M so:

K = c^2 \int_{M=m}^{M=M} dM)

and hence:

K = Mc^2 - mc^2

The amount Mc^2 is known as the total energy of the particle. The amount mc^2 is known as the rest energy of the particle. If the total energy of the particle is given the symbol E:

E = mc^2 + K

So it can be seen that mc^2 is the energy of a mass that is stationary. This energy is known as mass energy and is the origin of the famous formula E=mc^2 that is iconic of the nuclear age.

The Newtonian approximation for kinetic energy can be derived by substituting the rest mass for the relativistic mass ie:

M = \frac{m}{\sqrt{1 - u^2/c^2}}

and:

K = Mc^2 - mc^2

So:

K = \frac{mc^2}{\sqrt{1-u^2/c^2}} - mc^2

ie:

K = mc^2 ((1-u^2/c^2)^{-\frac{1}{2}} -1)


The binomial theorem can be used to expand (1-u^2/c^2)^{-\frac{1}{2}}:

The binomial theorem is:

(a + x)^n = a^n + na^{n-1}x + \frac{n(n-1)}{2!}a^{n-2}x^2 ....

So expanding (1-u^2/c^2)^{-\frac{1}{2}}:

K = \frac{1}{2} mu^2 + \frac{3mu^4}{8c^2} + \frac{5mu^6}{16c^4} + ...

So if u is much less than c:

K = \frac{1}{2} mu^2

Which is the Newtonian approximation for low velocities.

Nuclear Energy

When protons and neutrons (nucleons) combine to form elements the combination of particles tends to be in a lower energy state than the free neutrons and protons. Iron has the lowest energy and elements above and below iron in the scale of atomic masses tend to have higher energies. This decrease in energy as neutrons and protons bind together is known as the binding energy. The atomic masses of elements are slightly different from that calculated from their constituent particles and this difference in mass energy, calculated from E=mc^2, is almost exactly equal to the binding energy.

The binding energy can be released by converting elements with higher masses per nucleon to those with lower masses per nucleon. This can be done by either splitting heavy elements such as uranium into lighter elements such as barium and krypton or by joining together light elements such as hydrogen into heavier elements such as deuterium. If atoms are split the process is known as nuclear fission and if atoms are joined the process is known as nuclear fusion. Atoms that are lighter than iron can be fused to release energy and those heavier than iron can be split to release energy.

When hydrogen and a neutron are combined to make deuterium the energy released can be calculated as follows:

The mass of a proton is 1.00731 amu, the mass of a neutron is 1.00867 amu and the mass of a deuterium nucleus is 2.0136 amu. The difference in mass between a deuterium nucleus and its components is 0.00238 amu. The energy of this mass difference is:

E = mc^2 = 1.66 \times 10^{-27} \times 0.00238 \times (3 \times 10^8)^2

So the energy released is 3.57 \times 10^{-13} joules or about 2 \times 10^{11} joules per gram of protons (ionised hydrogen).

(Assuming 1 amu = 1.66 \times 10^{-27} Kg, Avogadro's number = 6 \times 10^{23} and the speed of light is 3 \times 10^8 metres per second)

Present day nuclear reactors use a process called nuclear fission in which rods of uranium emit neutrons which combine with the uranium in the rod to produce uranium isotopes such as 236U which rapidly decay into smaller nuclei such as Barium and Krypton plus three neutrons which can cause further generation of 236U and further decay. The fact that each neutron can cause the generation of three more neutrons means that a self sustaining or chain reaction can occur. The generation of energy results from the equivalence of mass and energy; the decay products, barium and krypton have a lower mass than the original 236U, the missing mass being released as 177 MeV of radiation. The nuclear equation for the decay of 236U is written as follows:

^{236}_{92}U \rightarrow ^{144}_{56}Ba + ^{89}_{36}Kr + 3n + 177 MeV

Nuclear explosion
If a large amount of the uranium isotope 235U (the critical mass) is confined the chain reaction can get out of control and almost instantly release a large amount of energy. A device that confines a critical mass of uranium is known as an atomic bomb or A-bomb. A bomb based on the fusion of deuterium atoms is known as a thermonuclear bomb, hydrogen bomb or H-bomb.

aether

Introduction

Many students confuse Relativity Theory with a theory about the propagation of light. According to modern Relativity Theory the constancy of the speed of light is a consequence of the geometry of spacetime rather than something specifically due to the properties of photons; but the statement "the speed of light is constant" often distracts the student into a consideration of light propagation. This confusion is amplified by the importance assigned to interferometry experiments, such as the Michelson-Morley experiment, in most textbooks on Relativity Theory.

The history of theories of the propagation of light is an interesting topic in physics and was indeed important in the early days of Relativity Theory. In the seventeenth century two competing theories of light propagation were developed. Christiaan Huygens published a wave theory of light which was based on Huygen's principle whereby every point in a wavelike disturbance can give rise to further disturbances that spread out spherically. In contrast Newton considered that the propagation of light was due to the passage of small particles or "corpuscles" from the source to the illuminated object. His theory is known as the corpuscular theory of light. Newton's theory was widely accepted until the nineteenth century.

In the early nineteenth century Thomas Young performed his Young's slits experiment and the interference pattern that occurred was explained in terms of diffraction due to the wave nature of light. The wave theory was accepted generally until the twentieth century when quantum theory confirmed that light had a corpuscular nature and that Huygen's principle could not be applied.

The idea of light as a disturbance of some medium, or aether, that permeates the universe was problematical from its inception (US spelling: "ether"). The first problem that arose was that the speed of light did not change with the velocity of the observer. If light were indeed a disturbance of some stationary medium then as the earth moves through the medium towards a light source the speed of light should appear to increase. It was found however that the speed of light did not change as expected. Each experiment on the velocity of light required corrections to existing theory and led to a variety of subsidiary theories such as the "aether drag hypothesis". Ultimately it was experiments that were designed to investigate the properties of the aether that provided the first experimental evidence for Relativity Theory.

The aether drag hypothesis

The aether drag hypothesis was an early attempt to explain the way experiments such as Arago's experiment showed that the speed of light is constant. The aether drag hypothesis is now considered to be incorrect.

According to the aether drag hypothesis light propagates in a special medium, the aether, that remains attached to things as they move. If this is the case then, no matter how fast the earth moves around the sun or rotates on its axis, light on the surface of the earth would travel at a constant velocity.

Stellar Aberration. If a telescope is travelling at high speed, only light incident at a particular angle can avoid hitting the walls of the telescope tube.

The primary reason the aether drag hypothesis is considered invalid is because of the occurrence of stellar aberration. In stellar aberration the position of a star when viewed with a telescope swings each side of a central position by about 20.5 seconds of arc every six months. This amount of swing is the amount expected when considering the speed of earth's travel in its orbit. In 1871, George Biddell Airy demonstrated that stellar aberration occurs even when a telescope is filled with water. It seems that if the aether drag hypothesis were true then stellar aberration would not occur because the light would be travelling in the aether which would be moving along with the telescope.

The "train analogy" for the absence of aether drag.

If you visualize a bucket on a train about to enter a tunnel and a drop of water drips from the tunnel entrance into the bucket at the very centre, the drop will not hit the centre at the bottom of the bucket. The bucket is the tube of a telescope, the drop is a photon and the train is the earth. If aether is dragged then the droplet would be travelling with the train when it is dropped and would hit the centre of bucket at the bottom.

The amount of stellar aberration, α is given by:

tan(\alpha) = v \delta t / c \delta t

So:

tan(\alpha) = v  / c

The speed at which the earth goes round the sun, v = 30 km/s, and the speed of light is c = 300,000,000 m/s which gives α = 20.5 seconds of arc every six months. This amount of aberration is observed and this contradicts the aether drag hypothesis.

In 1818, Augustin Jean Fresnel introduced a modification to the aether drag hypothesis that only applies to the interface between media. This was accepted during much of the nineteenth century but has now been replaced by special theory of relativity (see below).

The aether drag hypothesis is historically important because it was one of the reasons why Newton's corpuscular theory of light was replaced by the wave theory and it is used in early explanations of light propagation without relativity theory. It originated as a result of early attempts to measure the speed of light.

In 1810, François Arago realised that variations in the refractive index of a substance predicted by the corpuscular theory would provide a useful method for measuring the velocity of light. These predictions arose because the refractive index of a substance such as glass depends on the ratio of the velocities of light in air and in the glass. Arago attempted to measure the extent to which corpuscles of light would be refracted by a glass prism at the front of a telescope. He expected that there would be a range of different angles of refraction due to the variety of different velocities of the stars and the motion of the earth at different times of the day and year. Contrary to this expectation he found that that there was no difference in refraction between stars, between times of day or between seasons. All Arago observed was ordinary stellar aberration.

In 1818 Fresnel examined Arago's results using a wave theory of light. He realised that even if light were transmitted as waves the refractive index of the glass-air interface should have varied as the glass moved through the aether to strike the incoming waves at different velocities when the earth rotated and the seasons changed.

Fresnel proposed that the glass prism would carry some of the aether along with it so that "...the aether is in excess inside the prism". He realised that the velocity of propagation of waves depends on the density of the medium so proposed that the velocity of light in the prism would need to be adjusted by an amount of 'drag'.

The velocity of light  v_n in the glass without any adjustment is given by:

 v_n = c / n

The drag adjustment  v_d is given by:

 v_d  = v (1 - \frac {\rho_e}{\rho_g})

Where  \rho_e is the aether density in the environment, \rho_g is the aether density in the glass and v is the velocity of the prism with respect to the aether.

The factor (1 - \frac {\rho_e}{\rho_g}) can be written as  (1 - \frac{1}{n^2}) because the refractive index, n, would be dependent on the density of the aether. This is known as the Fresnel drag coefficient.

The velocity of light in the glass is then given by:

 V = \frac {c}{n} + v (1 - \frac{1}{n^2})

This correction was successful in explaining the null result of Arago's experiment. It introduces the concept of a largely stationary aether that is dragged by substances such as glass but not by air. Its success favoured the wave theory of light over the previous corpuscular theory.

The Fresnel drag coefficient was confirmed by an interferometer experiment performed by Fizeau. Water was passed at high speed along two glass tubes that formed the optical paths of the interferometer and it was found that the fringe shifts were as predicted by the drag coefficient.

Relfizeau.gif

The special theory of relativity predicts the result of the Fizeau experiment from the velocity addition theorem without any need for an aether.

If V is the velocity of light relative to the Fizeau apparatus and U is the velocity of light relative to the water and v is the velocity of the water:

 U = \frac {c}{n}
 V = \frac {c/n + v}{1 + v/nc}

which, if v/c is small can be expanded using the binomial expansion to become:

 V = \frac {c}{n} + v (1 - \frac{1}{n^2})

This is identical to Fresnel's equation.

It may appear as if Fresnel's analysis can be substituted for the relativistic approach, however, more recent work has shown that Fresnel's assumptions should lead to different amounts of aether drag for different frequencies of light and violate Snell's law (see Ferraro and Sforza (2005)).

The aether drag hypothesis was one of the arguments used in an attempt to explain the Michelson-Morley experiment before the widespread acceptance of the special theory of relativity.

The Fizeau experiment is consistent with relativity and approximately consistent with each individual body, such as prisms, lenses etc. dragging its own aether with it. This contradicts some modified versions of the aether drag hypothesis that argue that aether drag may happen on a global (or larger) scale and stellar aberration is merely transferred into the entrained "bubble" around the earth which then faithfully carries the modified angle of incidence directly to the observer.

References

The Michelson-Morley experiment

The Michelson-Morley experiment, one of the most important and famous experiments in the history of physics, was performed in 1887 by Albert Michelson and Edward Morley at what is now Case Western Reserve University, and is considered to be the first strong evidence against the theory of a luminiferous aether.

Physics theories of the late 19th century postulated that, just as water waves must have a medium to move across (water), and audible sound waves require a medium to move through (air), so also light waves require a medium, the "luminiferous aether". The speed of light being so great, designing an experiment to detect the presence and properties of this aether took considerable thought.

Measuring aether

AetherWind.svg
A depiction of the concept of the “aether wind”.

Each year, the Earth travels a tremendous distance in its orbit around the sun, at a speed of around 30 km/second, over 100,000 km per hour. It was reasoned that the Earth would at all times be moving through the aether and producing a detectable "aether wind". At any given point on the Earth's surface, the magnitude and direction of the wind would vary with time of day and season. By analysing the effective wind at various different times, it should be possible to separate out components due to motion of the Earth relative to the Solar System from any due to the overall motion of that system.

The effect of the aether wind on light waves would be like the effect of wind on sound waves. Sound waves travel at a constant speed relative to the medium that they are travelling through (this varies depending on the pressure, temperature etc (see sound), but is typically around 340 m/s). So, if the speed of sound in our conditions is 340 m/s, when there is a 10 m/s wind relative to the ground, into the wind it will appear that sound is travelling at 330 m/s (340 - 10). Downwind, it will appear that sound is travelling at 350 m/s (340 + 10). Measuring the speed of sound compared to the ground in different directions will therefore enable us to calculate the speed of the air relative to the ground.

If the speed of the sound cannot be directly measured, an alternative method is to measure the time that the sound takes to bounce off of a reflector and return to the origin. This is done parallel to the wind and perpendicular (since the direction of the wind is unknown before hand, just determine the time for several different directions). The cumulative round trip effects of the wind in the two orientations slightly favors the sound travelling at right angles to it. Similarly, the effect of an aether wind on a beam of light would be for the beam to take slightly longer to travel round-trip in the direction parallel to the “wind” than to travel the same round-trip distance at right angles to it.

“Slightly” is key, in that, over a distance such as a few meters, the difference in time for the two round trips would be only about a millionth of a millionth of a second. At this point the only truly accurate measurements of the speed of light were those carried out by Albert Abraham Michelson, which had resulted in measurements accurate to a few meters per second. While a stunning achievement in its own right, this was certainly not nearly enough accuracy to be able to detect the aether.

The experiments

Michelson, though, had already seen a solution to this problem. His design, later known as an interferometer, sent a single source of white light through a half-silvered mirror that was used to split it into two beams travelling at right angles to one another. After leaving the splitter, the beams travelled out to the ends of long arms where they were reflected back into the middle on small mirrors. They then recombined on the far side of the splitter in an eyepiece, producing a pattern of constructive and destructive interference based on the length of the arms. Any slight change in the amount of time the beams spent in transit would then be observed as a shift in the positions of the interference fringes. If the aether were stationary relative to the sun, then the Earth's motion would produce a shift of about 0.04 fringes.

Michelson had made several measurements with an experimental device in 1881, in which he noticed that the expected shift of 0.04 was not seen, and a smaller shift of about 0.02 was. However his apparatus was a prototype, and had experimental errors far too large to say anything about the aether wind. For a measurement of the aether wind, a much more accurate and tightly controlled experiment would have to be carried out. The prototype was, however, successful in demonstrating that the basic method was feasible.

A Michelson interferometer

He then combined forces with Edward Morley and spent a considerable amount of time and money creating an improved version with more than enough accuracy to detect the drift. In their experiment the light was repeatedly reflected back and forth along the arms, increasing the path length to 11m. At this length the drift would be about .4 fringes. To make that easily detectable the apparatus was located in a closed room in the basement of a stone building, eliminating most thermal and vibrational effects. Vibrations were further reduced by building the apparatus on top of a huge block of marble, which was then floated in a pool of mercury. They calculated that effects of about 1/100th of a fringe would be detectable.

The mercury pool allowed the device to be turned, so that it could be rotated through the entire range of possible angles to the "aether wind". Even over a short period of time some sort of effect would be noticed simply by rotating the device, such that one arm rotated into the direction of the wind and the other away. Over longer periods day/night cycles or yearly cycles would also be easily measurable.

During each full rotation of the device, each arm would be parallel to the wind twice (facing into and away from the wind) and perpendicular to the wind twice. This effect would show readings in a sine wave formation with two peaks and two troughs. Additionally if the wind was only from the earth's orbit around the sun, the wind would fully change directions east/west during a 12 hour period. In this ideal conceptualization, the sine wave of day/night readings would be in opposite phase.

Because it was assumed that the motion of the solar system would cause an additional component to the wind, the yearly cycles would be detectable as an alteration of the maginitude of the wind. An example of this effect is a helicopter flying forward. While on the ground, a helicopter's blades would be measured as travelling around at 50 km/h at the tips. However, if the helicopter is travelling forward at 50 km/h, there are points at which the tips of the blades are travelling 0 km/h and 100 km/h with respect to the air they are travelling through. This increases the magnitude of the lift on one side and decreases it on the other just as it would increase and decrease the magnitude of an ether wind on a yearly basis.

The most famous failed experiment

Ironically, after all this thought and preparation, the experiment became what might be called the most famous failed experiment to date. Instead of providing insight into the properties of the aether, Michelson and Morley's 1887 article in the American Journal of Science reported the measurement to be as small as one-fortieth of the expected displacement but "since the displacement is proportional to the square of the velocity" they concluded that the measured velocity was approximately one-sixth of the expected velocity of the Earth's motion in orbit and "certainly less than one-fourth". Although this small "velocity" was measured, it was considered far too small to be used as evidence of aether, it was later said to be within the range of an experimental error that would allow the speed to actually be zero.

Although Michelson and Morley went on to different experiments after their first publication in 1887, both remained active in the field. Other versions of the experiment were carried out with increasing sophistication. Kennedy and Illingsworth both modified the mirrors to include a half-wave "step", eliminating the possibility of some sort of standing wave pattern within the apparatus. Illingsworth could detect changes on the order of 1/300th of a fringe, Kennedy up to 1/1500th. Miller later built a non-magnetic device to eliminate magnetostriction, while Michelson built one of non-expanding invar to eliminate any remaining thermal effects. Others from around the world increased accuracy, eliminated possible side effects, or both. All of these with the exception of Dayton Miller also returned what is considered a null result.

Morley was not convinced of his own results, and went on to conduct additional experiments with Dayton Miller. Miller worked on increasingly large experiments, culminating in one with a 32m (effective) arm length at an installation at the Mount Wilson observatory. To avoid the possibility of the aether wind being blocked by solid walls, he used a special shed with thin walls, mainly of canvas. He consistently measured a small positive effect that varied, as expected, with each rotation of the device, the sidereal day and on a yearly basis. The low magnitude of the results he attributed to aether entrainment (see below). His measurements amounted to only ~10 kps instead of the expected ~30 kps expected from the earth's orbital motion alone. He remained convinced this was due to partial entrainment, though he did not attempt a detailed explanation.

Though Kennedy later also carried out an experiment at Mount Wilson, finding 1/10 the drift measured by Miller, and no seasonal effects, Miller's findings were considered important at the time, and were discussed by Michelson, Hendrik Lorentz and others at a meeting reported in 1928 (ref below). There was general agreement that more experimentation was needed to check Miller's results. Lorentz recognised that the results, whatever their cause, did not quite tally with either his or Einstein's versions of special relativity. Einstein was not present at the meeting and felt the results could be dismissed as experimental error (see Shankland ref below).

Name Year Arm length (meters) Fringe shift expected Fringe shift measured Experimental Resolution Upper Limit on Vaether
Michelson 1881 1.2 0.04 0.02
Michelson and Morley 1887 11.0 0.4 < 0.01 8 km/s
Morley and Morley 1902–1904 32.2 1.13 0.015
Miller 1921 32.0 1.12 0.08
Miller 1923–1924 32.0 1.12 0.03
Miller (Sunlight) 1924 32.0 1.12 0.014
Tomascheck (Starlight) 1924 8.6 0.3 0.02
Miller 1925–1926 32.0 1.12 0.088
Mt Wilson) 1926 2.0 0.07 0.002
Illingworth 1927 2.0 0.07 0.0002 0.0006 1 km/s
Piccard and Stahel (Rigi) 1927 2.8 0.13 0.006
Michelson et al. 1929 25.9 0.9 0.01
Joos 1930 21.0 0.75 0.002

In recent times versions of the MM experiment have become commonplace. Lasers and masers amplify light by repeatedly bouncing it back and forth inside a carefully tuned cavity, thereby inducing high-energy atoms in the cavity to give off more light. The result is an effective path length of kilometers. Better yet, the light emitted in one cavity can be used to start the same cascade in another set at right angles, thereby creating an interferometer of extreme accuracy.

The first such experiment was led by Charles H. Townes, one of the co-creators of the first maser. Their 1958 experiment put an upper limit on drift, including any possible experimental errors, of only 30 m/s. In 1974 a repeat with accurate lasers in the triangular Trimmer experiment reduced this to 0.025 m/s, and included tests of entrainment by placing one leg in glass. In 1979 the Brillet-Hall experiment put an upper limit of 30 m/s for any one direction, but reduced this to only 0.000001 m/s for a two-direction case (ie, still or partially entrained aether). A year long repeat known as Hils and Hall, published in 1990, reduced this to 2x10-13.

Fallout

This result was rather astounding and not explainable by the then-current theory of wave propagation in a static aether. Several explanations were attempted, among them, that the experiment had a hidden flaw (apparently Michelson's initial belief), or that the Earth's gravitational field somehow "dragged" the aether around with it in such a way as locally to eliminate its effect. Miller would have argued that, in most if not all experiments other than his own, there was little possibility of detecting an aether wind since it was almost completely blocked out by the laboratory walls or by the apparatus itself. Be this as it may, the idea of a simple aether, what became known as the First Postulate, had been dealt a serious blow.

A number of experiments were carried out to investigate the concept of aether dragging, or entrainment. The most convincing was carried out by Hamar, who placed one arm of the interferometer between two huge lead blocks. If aether were dragged by mass, the blocks would, it was theorised, have been enough to cause a visible effect. Once again, no effect was seen.

Walter Ritz's Emission theory (or ballistic theory), was also consistent with the results of the experiment, not requiring aether, more intuitive and paradox-free. This became known as the Second Postulate. However it also led to several "obvious" optical effects that were not seen in astronomical photographs, notably in observations of binary stars in which the light from the two stars could be measured in an interferometer.

The Sagnac experiment placed the MM apparatus on a constantly rotating turntable. In doing so any ballistic theories such as Ritz's could be tested directly, as the light going one way around the device would have different length to travel than light going the other way (the eyepiece and mirrors would be moving toward/away from the light). In Ritz's theory there would be no shift, because the net velocity between the light source and detector was zero (they were both mounted on the turntable). However in this case an effect was seen, thereby eliminating any simple ballistic theory. This fringe-shift effect is used today in laser gyroscopes.

Another possible solution was found in the Lorentz-FitzGerald contraction hypothesis. In this theory all objects physically contract along the line of motion relative to the aether, so while the light may indeed transit slower on that arm, it also ends up travelling a shorter distance that exactly cancels out the drift.

In 1932 the Kennedy-Thorndike experiment modified the Michelson-Morley experiment by making the path lengths of the split beam unequal, with one arm being very long. In this version the two ends of the experiment were at different velocities due to the rotation of the earth, so the contraction would not "work out" to exactly cancel the result. Once again, no effect was seen.

Ernst Mach was among the first physicists to suggest that the experiment actually amounted to a disproof of the aether theory. The development of what became Einstein's special theory of relativity had the Fitzgerald-Lorentz contraction derived from the invariance postulate, and was also consistent with the apparently null results of most experiments (though not, as was recognised at the 1928 meeting, with Miller's observed seasonal effects). Today relativity is generally considered the "solution" to the MM null result.

The Trouton-Noble experiment is regarded as the electrostatic equivalent of the Michelson-Morley optical experiment, though whether or not it can ever be done with the necessary sensitivity is debatable. On the other hand, the 1908 Trouton-Rankine experiment that spelled the end of the Lorentz-FitzGerald contraction hypothesis achieved an incredible sensitivity.

References

Mathematical analysis of the Michelson Morley Experiment

The Michelson interferometer splits light into rays that travel along two paths then recombines them. The recombined rays interfere with each other. If the path length changes in one of the arms the interference pattern will shift slightly, moving relative to the cross hairs in the telescope. The Michelson interferometer is arranged as an optical bench on a concrete block that floats on a large pool of mercury. This allows the whole apparatus to be rotated smoothly.

If the earth were moving through an aether at the same velocity as it orbits the sun (30 km/sec) then Michelson and Morley calculated that a rotation of the apparatus should cause a shift in the fringe pattern. The basis of this calculation is given below.

Relmm.gif

Consider the time taken t_1 for light to travel along Path 1 in the illustration:

t_1 = \frac{L_f}{c-v} + \frac{L_f}{c+v}

Rearranging terms:

\frac{L_f}{c-v} + \frac{L_f}{c+v} = \frac{2L_fc}{c^2-v^2}

further rearranging:

\frac{2L_fc}{c^2-v^2} = \frac{2L_f}{c}\frac{1}{1-v^2/c^2}

hence:

t_1 = \frac{2L_f}{c}\frac{1}{1-v^2/c^2}

Considering Path 2, the light traces out two right angled triangles so:

ct_2 = 2 \sqrt{L_m^2 + (vt_2/2)^2}

Rearranging:

t_2 = \frac{2L_m}{\sqrt{c^2-v^2}}

So:

t_2 =\frac{2L_m}{c} \frac{1}{\sqrt{1-(v/c)^2}}

It is now easy to calculate the difference (\Delta t between the times spent by the light in Path 1 and Path 2:

\Delta t = \frac{2}{c} \left(\frac{L_m}{\sqrt{1-v^2/c^2}}-\frac{L_f}{1-v^2/c^2}\right)

If the apparatus is rotated by 90 degrees the new time difference is:

\Delta t^' = \frac{2}{c} \left(\frac{L_m}{1-v^2/c^2}-\frac{L_f}{\sqrt{1-v^2/c^2}}\right)

because L_m and L_f exchange roles.

The interference fringes due to the time difference between the paths will be different after rotation if \Delta t and \Delta t^' are different.

\Delta t^' - \Delta t = \frac{2}{c} \left(\frac{L_m+L_f}{1-v^2/c^2}-\frac{L_f+L_m}{\sqrt{1-v^2/c^2}}\right)

This difference between the two times can be calculated if the binomial expansions of \frac{1}{1-v^2/c^2} and \frac{1}{\sqrt{1-v^2/c^2}} are used:

\frac{1}{1-v^2/c^2}= 1 + \frac{v^2}{c^2} + \left(\frac{v^2}{c^2}\right)^2 + ....
\frac{1}{\sqrt{1-v^2/c^2}}= 1 + \frac{1}{2}\frac{v^2}{c^2} + \frac{3}{8}\left(\frac{v^2}{c^2}\right)^2 + ....

So:

\Delta t^' - \Delta t \approx \frac{L_f + L_m}{c}\frac{v^2}{c^2}

If the period of one vibration of the light is T then the number of fringes (n), that will move past the cross hairs of the telescope when the apparatus is rotated will be:

n = \frac{\Delta t^' - \Delta t}{T}

Inserting the formula for \Delta t^' - \Delta t:

n \approx \frac{L_f + L_m}{cT}\frac{v^2}{c^2}

But cT for a light wave is the wavelength of the light ie: cT = \lambda so:

n \approx \frac{L_f + L_m}{\lambda}\frac{v^2}{c^2}

If the wavelength of the light is  5 \times 10^{-7} and the total path length is 20 metres then:

n = \left(\frac{20}{5 \times 10^{-7}}\right)10^{-8}

So the fringes will shift by 0.4 fringes (ie: 40%) when the apparatus is rotated.

However, no fringe shift is observed. The null result of the Michelson-Morley experiment is nowdays explained in terms of the constancy of the speed of light. The assumption that the light would have a velocity of c-v and c+v depending on the direction relative to the hypothetical "aether wind" is false, the light always travels at c between two points in a vacuum and the speed of light is not affected by any "aether wind". This is because, in {special relativity} the Lorentz transforms induce a {length contraction}. Doing over the above calculations we obtain:

L_f=L_m{\sqrt{1-v^2/c^2}}

(taking into consideration the length contraction)

It is now easy to recalculate the difference \Delta t between the times spent by the light in Path 1 and Path 2:

\Delta t = \frac{2}{c} \left(\frac{L_m}{\sqrt{1-v^2/c^2}}-\frac{L_f}{1-v^2/c^2}\right)=0 because L_f=L_m{\sqrt{1-v^2/c^2}}

If the apparatus is rotated by 90 degrees the new time difference is:

\Delta t^' = \frac{2}{c} \left(\frac{L_m}{1-v^2/c^2}-\frac{L_f}{\sqrt{1-v^2/c^2}}\right)=0

The interference fringes due to the time difference between the paths will be different after rotation if \Delta t and \Delta t^' are different.

\Delta t^' - \Delta t = \frac{2}{c} \left(\frac{L_m+L_f}{1-v^2/c^2}-\frac{L_f+L_m}{\sqrt{1-v^2/c^2}}\right)=0

Wave propagation in moving medium

To date, it is pointed out that the medium of light in Michelson-Morley experiment is the air. And the velocity of medium is zero. Therefore,

t_1 = \frac{L}{c-v} + \frac{L_f}{c+v}=\frac{2L}{c}
t_2=\frac{2L}{c}
\Delta t=0

After apparatus rotated 90°, there is no interference movement. [1]

Coherence length

The coherence length of light rays from a source that has wavelengths that differ by \Delta \lambda is:

x =  \frac{\lambda^2}{2 \pi \Delta \lambda}

If path lengths differ by more than this amount then interference fringes will not be observed. White light has a wide range of wavelengths and interferometers using white light must have paths that are equal to within a small fraction of a millimetre for interference to occur. This means that the ideal light source for a Michelson Interferometer should be monochromatic and the arms should be as near as possible equal in length.

The calculation of the coherence length is based on the fact that interference fringes become unclear when light rays are about 60 degrees (about 1 radian or one sixth of a wavelength (\approx 1/2\pi)) out of phase. This means that when two beams are:

\frac{\lambda}{2 \pi}

metres out of step they will no longer give a well defined interference pattern. Suppose a light beam contains two wavelengths of light, \lambda and \lambda + \Delta \lambda, then in:

\frac{\lambda}{2 \pi \Delta \lambda}

cycles they will be \frac{\lambda}{2 \pi} out of phase.

The distance required for the two different wavelengths of light to be this much out of phase is the coherence length. Coherence length = number of cycles x length of each cycle so:

coherence length = \frac{\lambda^2}{2 \pi \Delta \lambda} .

Lorentz-Fitzgerald Contraction Hypothesis

After the first Michelson-Morley experiments in 1881 there were several attempts to explain the null result. The most obvious point of attack is to propose that the path that is parallel to the direction of motion is contracted by \sqrt{1-v^2/c^2} in which case \Delta t and \Delta t^' would be identical and no fringe shift would occur. This possibility was proposed in 1892 by Fitzgerald. Lorentz produced an "electron theory of matter" that would account for such a contraction.

Students sometimes make the mistake of assuming that the Lorentz-Fitzgerald contraction is equivalent to the Lorentz transformations. However, in the absence of any treatment of the time dilation effect the Lorentz-Fitgerald explanation would result in a fringe shift if the apparatus is moved between two different velocities. The rotation of the earth allows this effect to be tested as the earth orbits the sun. Kennedy and Thorndike (1932) performed the Michelson-Morley experiment with a highly sensitive apparatus that could detect any effect due to the rotation of the earth; they found no effect. They concluded that both time dilation and Lorentz-Fitzgerald Contraction take place, thus confirming relativity theory.

If only the Lorentz-Fitzgerald contraction applied then the fringe shifts due to changes in velocity would be: n = (v_1^2 - v_2^2)/c^2 \times (L_f-L_m)/\lambda. Notice how the sensitivity of the experiment is dependent on the difference in path length L_f-L_m and hence a long coherence length is required.

Recent Michelson-Morley experiments

Optical tests of the isotropy of the speed of light have become commonplace. New technologies, including the use of lasers and masers, have significantly improved measurement precision.

Author Year Description Upper bounds
Essen[2] 1955 The frequency of a rotating microwave optical cavity resonator is compared with that of a quartz clock ~3 km/s
Jaseja et al.[3] 1964 The frequencies of two Helium–neon lasers, mounted on a rotating table, placed perpendicular to each other. ~30 m/s
Shamir and Fox[4] 1969 Both arms of the interferometer were contained in a transparent solid (Poly(methyl methacrylate). The light source was a Helium–neon laser. ~7 km/s

More recent experiments still, using other types of experiment such as optical resonators (Eisele et al.[5]), have shown that the speed of light is constant to within 10^{-8} m/s .

External links


References

  1. The model of wave propagation in classical physics.
  2. Essen, L. (1955). "A New Æther-Drift Experiment". Nature 175 (4462): 793–794. doi:10.1038/175793a0. Bibcode1955Natur.175..793E. 
  3. Jaseja, T. S.; Javan, A.; Murray, J.; Townes, C. H. (1964). "Test of Special Relativity or of the Isotropy of Space by Use of Infrared Masers". Phys. Rev. 133 (5a): 1221–1225. doi:10.1103/PhysRev.133.A1221. Bibcode1964PhRv..133.1221J. 
  4. Shamir, J.; Fox, R. (1969). "A new experimental test of special relativity". Il Nuovo Cimento B 62 (2): 258–264. doi:10.1007/BF02710136. Bibcode1969NCimB..62..258S. 
  5. Eisele, Ch.; Nevsky, A. Yu.; Schiller, S. (2009). "Laboratory Test of the Isotropy of Light Propagation at the 10−17 level". Physical Review Letters 103 (9): 090401. doi:10.1103/PhysRevLett.103.090401. PMID 19792767. Bibcode2009PhRvL.103i0401E. http://www.exphy.uni-duesseldorf.de/Publikationen/2009/Eisele%20et%20al%20Laboratory%20Test%20of%20the%20Isotropy%20of%20Light%20Propagation%20at%20the%2010-17%20Level%202009.pdf. 

mathematical approach

Vectors

Physical effects involve things acting on other things to produce a change of position, tension etc. These effects usually depend upon the strength, angle of contact, separation etc of the interacting things rather than on any absolute reference frame so it is useful to describe the rules that govern the interactions in terms of the relative positions and lengths of the interacting things rather than in terms of any fixed viewpoint or coordinate system. Vectors were introduced in physics to allow such relative descriptions.

The use of vectors in elementary physics often avoids any real understanding of what they are. They are a new concept, as unique as numbers themselves, which have been related to the rest of mathematics and geometry by a series of formulae such as linear combinations, scalar products etc.

Vectors are defined as "directed line segments" which means they are lines drawn in a particular direction. The introduction of time as a geometric entity means that this definition of a vector is rather archaic, a better definition might be that a vector is information arranged as a continuous succession of points in space and time. Vectors have length and direction, the direction being from earlier to later.

Vectors are represented by lines terminated with arrow symbols to show the direction. A point that moves from the left to the right for about three centimetres can be represented as:

Relvectsym.gif

If a vector is represented within a coordinate system it has components along each of the axes of the system. These components do not normally start at the origin of the coordinate system.

Relvector2.jpg

The vector represented by the bold arrow has components a, b and c which are lengths on the coordinate axes. If the vector starts at the origin the components become simply the coordinates of the end point of the vector and the vector is known as the position vector of the end point.

Addition of Vectors

If two vectors are connected so that the end point of one is the start of the next the sum of the two vectors is defined as a third vector drawn from the start of the first to the end of the second:

Relvectoradd.jpg

c is the sum of a and b:

c = a + b

If a components of a are a, b, c and the components of b are d, e, f then the components of the sum of the two vectors are (a+d), (b+e) and (c+f). In other words, when vectors are added it is the components that add numerically rather than the lengths of the vectors themselves.

Rules of Vector Addition

1. Commutativity a + b = b + a

2. Associativity (a + b) + c = a + (b + c)

If the zero vector (which has no length) is labelled as 0

3. a + (-a) = 0

4. a + 0 = a

Rules of Vector Multiplication by a Scalar

The discussion of components and vector addition shows that if vector a has components a,b,c then qa has components qa, qb, qc. The meaning of vector multiplication is shown below:

Relvectormult.jpg

The bottom vector c is added three times which is equivalent to multiplying it by 3.

1. Distributive laws q(a + b) = qa + qb and (q + p)a = qa + pa

2. Associativity q(pa) = qpa

Also 1 a = a

If the rules of vector addition and multiplication by a scalar apply to a set of elements they are said to define a vector space.

Linear Combinations and Linear Dependence

An element of the form:

q_1\mathbf{a_1} + q_2\mathbf{a_2} + q_3\mathbf{a_3} +.... + q_m \mathbf{a_m}

is called a linear combination of the vectors.

The set of vectors multiplied by scalars in a linear combination is called the span of the vectors. The word span is used because the scalars (q) can have any value - which means that any point in the subset of the vector space defined by the span can contain a vector derived from it.

Suppose there were a set of vectors ({a_1,a_2,.... ,a_m}) , if it is possible to express one of these vectors in terms of the others, using any linear combination, then the set is said to be linearly dependent. If it is not possible to express any one of the vectors in terms of the others, using any linear combination, it is said to be linearly independent.

In other words, if there are values of the scalars such that:

(1). \mathbf{a_1} = q_2\mathbf{a_2} + q_3\mathbf{a_3} +.... + q_m\mathbf{a_m}

the set is said to be linearly dependent.

There is a way of determining linear dependence. From (1) it can be seen that if q_1 is set to minus one then:

q_1\mathbf{a_1} + q_2\mathbf{a_2} + q_3\mathbf{a_3} +.... + q_m\mathbf{a_m} = 0

So in general, if a linear combination can be written that sums to a zero vector then the set of vectors (\mathbf{a_1,a_2,.... ,a_m}) are not linearly independent.

If two vectors are linearly dependent then they lie along the same line (wherever a and b lie on the line, scalars can be found to produce a linear combination which is a zero vector). If three vectors are linearly dependent they lie on the same line or on a plane (collinear or coplanar).

Dimension

If n+1 vectors in a vector space are linearly dependent then n vectors are linearly independent and the space is said to have a dimension of n. The set of n vectors is said to be the basis of the vector space.

Scalar Product

Also known as the 'dot product' or 'inner product'. The scalar product is a way of removing the problem of angular measures from the relationship between vectors and, as Weyl put it, a way of comparing the lengths of vectors that are arbitrarily inclined to each other.

Consider two vectors with a common origin:

Relscalar.jpg

The projection of \mathbf{a} on the adjacent side is:

P = | \mathbf{a} | cos \theta

Where | \mathbf{a} | is the length of \mathbf{a}.

The scalar product is defined as:

(2) \mathbf{a . b} = | \mathbf{a} | | \mathbf{b} | cos \theta

Notice that cos \theta is zero if \mathbf{a} and \mathbf{b} are perpendicular. This means that if the scalar product is zero the vectors composing it are orthogonal (perpendicular to each other).

(2) also allows cos \theta to be defined as:

cos \theta = \mathbf{a . b} / ( | \mathbf{a} | | \mathbf{b} |)

The definition of the scalar product also allows a definition of the length of a vector in terms of the concept of a vector itself. The scalar product of a vector with itself is:

\mathbf{a . a} = | \mathbf{a} | | \mathbf{a} | cos 0

cos 0 (the cosine of zero) is one so:

\mathbf{a . a} = a^2

which is our first direct relationship between vectors and scalars. This can be expressed as:

(3) a = \sqrt{\mathbf{a . a}}

where a is the length of \mathbf{a}.

Properties:

1. Linearity [G\mathbf{a} + H\mathbf{b}].\mathbf{c} = G\mathbf{a.c} + H\mathbf{b.c}

2. symmetry \mathbf{a.b} = \mathbf{b.a}

3. Positive definiteness \mathbf{a.a} is greater than or equal to 0

4. Distributivity for vector addition \mathbf{(a + b).c} = \mathbf{a.c + b.c}

5. Schwarz inequality | \mathbf{a.b} | \leq ab

6. Parallelogram equality | \mathbf{a} + \mathbf{b} |^2 + | \mathbf{a} - \mathbf{b} |^2 = 2( | \mathbf{a} |^2 + | \mathbf{b} |^2)

From the point of view of vector physics the most important property of the scalar product is the expression of the scalar product in terms of coordinates.

7. \mathbf{a.b} = a_1b_1 + a_2b_2 + a_3b_3

This gives us the length of a vector in terms of coordinates (Pythagoras' theorem) from:

8. \mathbf{a.a} = a^2 = a_1^2 + a_2^2 + a_3^2

The derivation of 7 is:

\mathbf{a} = a_1\mathbf{i} + a_2\mathbf{j} + a_3\mathbf{k}

where \mathbf{i}, \mathbf{j}, \mathbf{k} are unit vectors along the coordinate axes. From (4)

\mathbf{a.b} = (a_1\mathbf{i} + a_2\mathbf{j} + a_3\mathbf{k}) .\mathbf{b} = a_1\mathbf{i}.\mathbf{b} + a_2\mathbf{j}.\mathbf{b} + a_3\mathbf{k}.\mathbf{b}

but \mathbf{b} = b_1\mathbf{i} + b_2\mathbf{j} + b_3\mathbf{k}

so:

\mathbf{a.b} = b_1a_1\mathbf{i .i} + b_2a_1\mathbf{i .j} + b_3a_1\mathbf{i .k} + b_1a_2\mathbf{j.i} + b_2a_2\mathbf{j.j} + b_3a_2\mathbf{j.k} + b_1a_3\mathbf{k.i} + b_2a_3\mathbf{k.j} + b_3a_3\mathbf{k.k}

\mathbf{i .j, i .k, j .k,} etc. are all zero because the vectors are orthogonal, also \mathbf{i .i, j.j} and \mathbf{k.k} are all one (these are unit vectors defined to be 1 unit in length).

Using these results:

\mathbf{a.b} = a_1b_1 + a_2b_2 + a_3b_3

Matrices

Matrices are sets of numbers arranged in a rectangular array. They are especially important in linear algebra because they can be used to represent the elements of linear equations.

11a + 2b = c

5a + 7b = d

The constants in the equation above can be represented as a matrix:

\mathbf{A} =
\begin{bmatrix}
11 & 2 \\
5 & 7 \\
\end{bmatrix}

The elements of matrices are usually denoted symbolically using lower case letters:

\mathbf{A} =
\begin{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22} \\
\end{bmatrix}


Matrices are said to be equal if all of the corresponding elements are equal.

Eg: if a_{ij}= b_{ij}

Then \mathbf{A} = \mathbf{B}

Matrix Addition

Matrices are added by adding the individual elements of one matrix to the corresponding elements of the other matrix.

c_{ij} = a_{ij} + b_{ij}

or \mathbf{C} = \mathbf{A} + \mathbf{B}

Matrix addition has the following properties:

1. Commutativity \mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A}

2. Associativity (\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C})

and

3. \mathbf{A} + (-\mathbf{A}) = 0

4. \mathbf{A} + 0 = \mathbf{A}

From matrix addition it can be seen that the product of a matrix \mathbf{A} and a number p is simply p\mathbf{A} where every element of the matrix is multiplied individually by p.

Transpose of a Matrix

A matrix is transposed when the rows and columns are interchanged:

\mathbf{A} =
\begin{bmatrix}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33} \\
\end{bmatrix}
\mathbf{A^T} =
\begin{bmatrix}
a_{11} & a_{21} & a_{31} \\
a_{12} & a_{22} & a_{32} \\
a_{13} & a_{23} & a_{33} \\
\end{bmatrix}


Notice that the principal diagonal elements stay the same after transposition.

A matrix is symmetric if it is equal to its transpose eg: a_{kj} = a_{jk}.

It is skew symmetric if \mathbf{A^T} = -\mathbf{A} eg: a_{kj} = -a_{jk}. The principal diagonal of a skew symmetric matrix is composed of elements that are zero.

Other Types of Matrix

Diagonal matrix: all elements above and below the principal diagonal are zero.

\begin{bmatrix}
4 & 0 & 0 \\
0 & -1 & 0 \\
0 & 0 & 2 \\
\end{bmatrix}


Unit matrix: denoted by I, is a diagonal matrix where all elements of the principal diagonal are 1.

\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
\end{bmatrix}

Matrix Multiplication

Matrix multiplication is defined in terms of the problem of determining the coefficients in linear transformations.

Consider a set of linear transformations between 2 coordinate systems that share a common origin and are related to each other by a rotation of the coordinate axes.


Two Coordinate Systems Rotated Relative to Each Other

If there are 3 coordinate systems, x, y, and z these can be transformed from one to another:

x_1 = a_{11}y_1 + a_{12}y_2

x_2 = a_{21}y_1 + a_{22}y_2


y_1 = b_{11}z_1 + b_{12}z_2

y_2 = b_{21}z_1 + b_{22}z_2


x_1 = c_{11}z_1 + c_{12}z_2

x_2 = c_{21}z_1 + c_{22}z_2


By substitution:

x_1 = a_{11}(b_{11}z_1 + b_{12}z_2) + a_{12}(b_{21}z_1 + b_{22}z_2)

x_2 = a_{21}(b_{11}z_1 + b_{12}z_2) + a_{22}(b_{21}z_1 + b_{22}z_2)


x_1 = (a_{11}b_{11} + a_{12}(b_{21})z_1 + (a_{11}b_{12} + a_{12}b_{22})z_2

x_2 = (a_{21}b_{11} + a_{22}(b_{21})z_1 + (a_{21}b_{12} + a_{22}b_{22})z_2


Therefore:

c_{11} = (a_{11}b_{11} + a_{12}(b_{21})

c_{12} = (a_{11}b_{12} + a_{12}b_{22})

c_{21} = (a_{21}b_{11} + a_{22}b_{21})

c_{22} = (a_{21}b_{12} + a_{22}b_{22})


The coefficient matrices are:

\mathbf{A} =
\begin{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22} \end{bmatrix}
\mathbf{B} =
\begin{bmatrix}
b_{11} & b_{12} \\
b_{21} & b_{22} \end{bmatrix}
\mathbf{C} =
\begin{bmatrix}
c_{11} & c_{12} \\
c_{21} & c_{22} \end{bmatrix}

From the linear transformation the product of A and B is defined as:

\mathbf{C} = \mathbf{AB} =
\begin{bmatrix}
(a_{11}b_{11} + a_{12}b_{21}) & (a_{11}b_{12} + a_{12}b_{22}) \\
(a_{21}b_{11} + a_{22}b_{21}) & (a_{21}b_{12} + a_{22}b_{22}) \end{bmatrix}

In the discussion of scalar products it was shown that, for a plane the scalar product is calculated as: \mathbf{a.b} = a_1b_1 + a_2b_2 where a and b are the coordinates of the vectors a and b.

Now mathematicians define the rows and columns of a matrix as vectors:

A Column vector is \mathbf{b}=
\begin{bmatrix}
b_{11} \\
b_{21} \end{bmatrix}

And a Row vector \mathbf{a}=
\begin{bmatrix}
a_{11} & a_{12} \end{bmatrix}


Matrices can be described as vectors eg:

\mathbf{A} =
\begin{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22} \end{bmatrix}
=
\begin{bmatrix}
\mathbf{a_{1}} \\ 
\mathbf{a_{2}} \end{bmatrix}

and

\mathbf{B} =
\begin{bmatrix}
b_{11} & b_{12} \\
b_{21} & b_{22} \end{bmatrix}
=
\begin{bmatrix}
\mathbf{b_{1}} \mathbf{b_{2}} \end{bmatrix}

Matrix multiplication is then defined as the scalar products of the vectors so that:


\mathbf{C} =
\begin{bmatrix}
\mathbf{a_1.b_1} & \mathbf{a_1.b_2} \\
\mathbf{a_2.b_1} & \mathbf{a_2.b_2} \end{bmatrix}

From the definition of the scalar product \mathbf{a_1.b_1} = a_{11}b_{11} + a_{12}b_{21} etc.

In the general case:

\mathbf{C} =
\begin{bmatrix}
\mathbf{a_1.b_1} & \mathbf{a_1.b_2} & . & \mathbf{a_1.b_n} \\
\mathbf{a_2.b_1} & \mathbf{a_2.b_2} & . & \mathbf{a_2.b_n} \\
. & . & . & . \\
\mathbf{a_m.b_1} & \mathbf{a_m.b_2} & . & \mathbf{a_m.b_n} \end{bmatrix}

This is described as the multiplication of rows into columns (eg: row vectors into column vectors). The first matrix must have the same number of columns as there are rows in the second matrix or the multiplication is undefined.

After matrix multiplication the product matrix has the same number of rows as the first matrix and columns as the second matrix:

\begin{bmatrix}
1 & 3 & 4 \\
6 & 3 & 2 \end{bmatrix}
times 
\begin{bmatrix}
2 \\
3 \\
7 \end{bmatrix}
has 2 rows and 1 column 
\begin{bmatrix}
39 \\
35 \end{bmatrix}

ie: first row is 1*2 + 3*3 + 4*7 = 39 and second row is 6*2 + 3*3 + 2*7 = 35

\mathbf{AB} = \begin{bmatrix}
1 & 3 & 2 \\
2 & -1 & 3 \end{bmatrix}
times 
\begin{bmatrix}
2 & 3 & 4\\
3 & 2 & 1\\
5 & 1 & 3 \end{bmatrix}
has 2 rows and 3 columns
\begin{bmatrix}
21 & 11 & 13 \\
16 & 7 & 16 \end{bmatrix}

Notice that \mathbf{BA} cannot be determined because the number of columns in the first matrix must equal the number of rows in the second matrix to perform matrix multiplication.


Properties of Matrix Multiplication

1. Not commutative \mathbf{AB} \ne \mathbf{BA}

2. Associative \mathbf{A(BC)} = \mathbf{(AB)C}

(k\mathbf{A})\mathbf{B} = k(\mathbf{AB}) = \mathbf{A}(k\mathbf{B})

3. Distributative for matrix addition

(\mathbf{A} + \mathbf{B})\mathbf{C} = \mathbf{AC} + \mathbf{BC}

matrix multiplication is not commutative so \mathbf{C}(\mathbf{A} + \mathbf{B}) = \mathbf{CA} + \mathbf{CB} is a separate case.

4. The cancellation law is not always true:

\mathbf{AB} = 0 does not mean \mathbf{A}=0 or \mathbf{B}=0

There is a case where matrix multiplication is commutative. This involves the scalar matrix where the values of the principle diagonal are all equal. Eg:

\mathbf{S} =
\begin{bmatrix}
k & 0 & 0 \\
0 & k & 0 \\
0 & 0 & k \end{bmatrix}

In this case \mathbf{AS} = \mathbf{SA} = k\mathbf{A}. If the scalar matrix is the unit matrix: \mathbf{AI} = \mathbf{IA} = \mathbf{A}.

Linear Transformations

A simple linear transformation such as:

x_1 = a_{11}y_1 + a_{12}y_2

x_2 = a_{21}y_1 + a_{22}y_2

can be expressed as:

\mathbf{x} = \mathbf{Ay}

eg:


\begin{bmatrix}
x_1 \\
x_2 \end{bmatrix}
=
\begin{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22} \end{bmatrix}
*
\begin{bmatrix}
y_1 \\
y_2 \end{bmatrix}

and y_1 = b_{11}z_1 + b_{12}z_2

y_2 = b_{21}z_1 + b_{22}z_2


as: \mathbf{y} = \mathbf{Bz}

Using the associative law:

\mathbf{x} = \mathbf{A}(\mathbf{Bz}) = \mathbf{ABz} = \mathbf{Cz}

and so:

\mathbf{C} = \mathbf{AB} =
\begin{bmatrix}
(a_{11}b_{11} + a_{12}b_{21}) & (a_{11}b_{12} + a_{12}b_{22}) \\
(a_{21}b_{11} + a_{22}b_{21}) & (a_{21}b_{12} + a_{22}b_{22}) \end{bmatrix}

as before.

Indicial Notation

Consider a simple rotation of coordinates:

Relrotate2.jpg

x^{\mu} is defined as x_1 , x_2

x^{\nu} is defined as x_{1}^' , x_{2}^'

The scalar product can be written as:

\mathbf{s.s} =g_{\mu \nu} x^\mu x^\nu

Where:

g_{\mu \nu} =
\begin{bmatrix}
1 & 0 \\
0 & 1 \end{bmatrix}

and is called the metric tensor for this 2D space.

 \mathbf{s.s} = g_{11} x_1x_{1}^' + g_{12} x_1x_2^' + g_{21} x_2x_{1}^' + g_{22} x_2x_{2}^'

Now, g_{11} = 1, g_{12} = 0, g_{21} = 0, g_{22} = 1 so:

\mathbf{s.s} = x_1x_{1}^' + x_2x_{2}^'

If there is no rotation of coordinates the scalar product is:

\mathbf{s.s} = x_1x_1 + x_2x_2

s^2 = x_{1}^2 + x_{2}^2

Which is Pythagoras' theorem.

The Summation Convention

Indexes that appear as both subscripts and superscripts are summed over.

g_{\mu \nu} x^\mu x^\nu = g_{11} x_1x_{1}^' + g_{12} x_1x_2^' + g_{21} x_2x_{1}^' + g_{22} x_2x_{2}^'

by promoting \nu to a superscript it is taken out of the summation ie:.

g_{\mu}^\nu x^\mu x^\nu = g_{1\nu} x_1x_{\nu}^' + g_{2\nu} x_2x_{\nu}^'

Matrix Multiplication in Indicial Notation

Consider:

Columns times rows:


\begin{bmatrix}
x_1 \\
x_2 \end{bmatrix}
times \begin{bmatrix}y_1 & y_2 \end{bmatrix}
= 
\begin{bmatrix}
x_1 y_1 & x_1 y_2 \\
x_2 y_1 & x_2 y_2 \end{bmatrix}


Matrix product \mathbf{XY} = x_iy_j Where i = 1, 2 j = 1, 2

There being no summation the indexes are both subscripts.

Rows times columns: 
\begin{bmatrix}
x_1 & x_2 \end{bmatrix}
times 
\begin{bmatrix}
y_1 \\
y_2 \end{bmatrix}
= 
\begin{bmatrix}
x_1 y_1 & x_2 y_2 \end{bmatrix}

Matrix product \mathbf{XY} = \delta_{ij} x^iy^j

Where \delta_{ij} is known as Kronecker delta and has the value 0 when i \ne j and 1 when i = j . It is the indicial equivalent of the unit matrix:

\begin{bmatrix}
1 & 0 \\
0 & 1 \end{bmatrix}

There being summation one value of i is a subscript and the other a superscript.

A matrix in general can be specified by any of:

M_{i}^j , M_{ij} , M^{i}_j , M^{ij} depending on which subscript or superscript is being summed over.

Vectors in Indicial Notation

A vector can be expressed as a sum of basis vectors.

\mathbf{x} = a_1\mathbf{e}_1 + a_2\mathbf{e}_2 + a_3\mathbf{e}_3

In indicial notation this is: x = a^ie_i

Linear Transformations in indicial notation

Consider \mathbf{x} = \mathbf{Ay} where \mathbf{A} is a coefficient matrix and \mathbf{x} and \mathbf{y} are coordinate matrices.

In indicial notation this is:

x^{\mu} = A^{\mu}_{\nu} x^{\nu}

which becomes:

x_1 = a_{11} x^{'}_1+ a_{12} x^{'}_2+ a_{13} x^{'}_3

x_2 = a_{21} x^{'}_1+ a_{22} x^{'}_2+ a_{23} x^{'}_3

x_3 = a_{31} x^{'}_1+ a_{32} x^{'}_2+ a_{33} x^{'}_3

The Scalar Product in indicial notation

In indicial notation the scalar product is:

\mathbf{x.y} = \delta_{ij} x^i y^j

Analysis of curved surfaces and transformations

It became apparent at the start of the nineteenth century that issues such as Euclid's parallel postulate required the development of a new type of geometry that could deal with curved surfaces and real and imaginary planes. At the foundation of this approach is Gauss's analysis of curved surfaces which allows us to work with a variety of coordinate systems and displacements on any type of surface.

Elementary geometric analysis is useful as an introduction to Special Relativity because it suggests the physical meaning of the coefficients that appear in coordinate transformations.

Suppose there is a line on a surface. The length of this line can be expressed in terms of a coordinate system. A short length of line \Delta s in a two dimensional space may be expressed in terms of Pythagoras' theorem as:

\Delta s^2 = \Delta x^2 + \Delta y^2

Suppose there is another coordinate system on the surface with two axes: x1, x2, how can the length of the line be expressed in terms of these coordinates? Gauss tackled this problem and his analysis is quite straightforward for two coordinate axes:

Figure 1:

Constudtensor.gif

It is possible to use elementary differential geometry to describe displacements along the plane in terms of displacements on the curved surfaces:

 \Delta Y = \Delta x_1 \frac {\delta Y}{\delta x_1} + \Delta x_2 \frac{\delta Y}{\delta x_2}

 \Delta Z = \Delta x_1 \frac {\delta Z}{\delta x_1} + \Delta x_2 \frac{\delta Z}{\delta x_2}

The displacement of a short line is then assumed to be given by a formula, called a metric, such as Pythagoras' theorem

\Delta S^2 = \Delta Y^2 + \Delta Z^2

The values of  \Delta Y and  \Delta Z can then be substituted into this metric:

\Delta S^2 = ( \Delta x_1 \frac {\delta Y}{\delta x_1} + \Delta x_2 \frac{\delta Y}{\delta x_2} )^2 + ( \Delta x_1 \frac {\delta Z}{\delta x_1} + \Delta x_2 \frac{\delta Z}{\delta x_2} )^2

Which, when expanded, gives the following:

\Delta S^2 =

( \frac{\delta Y}{\delta x_1}\frac{\delta Y}{\delta x_1}  + \frac{\delta Z}{\delta x_1} \frac{\delta Z}{\delta x_1} ) \Delta x_1 \Delta x_1

 +( \frac{\delta Y}{\delta x_2}\frac{\delta Y}{\delta x_1}  + \frac{\delta Z}{\delta x_2} \frac{\delta Z}{\delta x_1} ) \Delta x_2 \Delta x_1

 + ( \frac{\delta Y}{\delta x_1}\frac{\delta Y}{\delta x_2}  + \frac{\delta Z}{\delta x_1} \frac{\delta Z}{\delta x_2} ) \Delta x_1 \Delta x_2

 + ( \frac{\delta Y}{\delta x_2}\frac{\delta Y}{\delta x_2}  + \frac{\delta Z}{\delta x_2} \frac{\delta Z}{\delta x_2} ) \Delta x_2 \Delta x_2

This can be represented using summation notation:

\Delta S^2 =  \sum_{i=1}^2 \sum_{k=1}^2 (\frac{\delta Y}{\delta x_i}\frac{\delta Y}{\delta x_k}  + \frac{\delta Z}{\delta x_i} \frac{\delta Z}{\delta x_k} ) \Delta x_i \Delta x_k

Or, using indicial notation:

\Delta S^2 = g_{ik} \Delta x^i \Delta x^k

Where:

 g_{ik} = (\frac{\delta Y}{\delta x^i}\frac{\delta Y}{\delta x^k}  + \frac{\delta Z}{\delta x^i} \frac{\delta Z}{\delta x^k} )

If the coordinates are not merged then \Delta s is dependent on both sets of coordinates. In matrix notation:

\Delta s^2 = \mathbf{g} \Delta \mathbf{x} \Delta \mathbf{x}

becomes:


\begin{bmatrix}
\Delta x_1 & \Delta x_2 \end{bmatrix}
times 
\begin{bmatrix}
a & b \\
c & d \end{bmatrix}
times 
\begin{bmatrix}
\Delta x_1 \\
\Delta x_2 \end{bmatrix}

Where a, b, c, d stand for the values of g_{ik}.

Therefore:


\begin{bmatrix}
\Delta x_1a + \Delta x_2c & \Delta x_1b + \Delta x_2d \end{bmatrix}
times 
\begin{bmatrix}
\Delta x_1 \\
\Delta x_2 \end{bmatrix}

Which is:

(\Delta{x_1}a + \Delta{x_2}c) \Delta{x_1} + (\Delta{x_1}b + \Delta{x_2}d) \Delta{x_2} = \Delta{x_1}^2a + 2\Delta{x_1}\Delta{x_2}(c + b) + \Delta{x_2}^2d

So:

\Delta{s}^2 = \Delta{x_1}^2a + 2\Delta{x_1}\Delta{x_2}(c + b) + \Delta{x_2}^2d

\Delta{s}^2 is a bilinear form that depends on both \Delta{x_1} and \Delta{x_2}. It can be written in matrix notation as:

\Delta{s}^2 = \mathbf{\Delta{x}^T A \Delta{x}}

Where A is the matrix containing the values in g_{ik}. This is a special case of the bilinear form known as the quadratic form because the same matrix (\mathbf{\Delta{x}}) appears twice; in the generalised bilinear form \mathbf{B} = \mathbf{x^TAy} (the matrices \mathbf{x} and \mathbf{y} are different).

If the surface is a Euclidean plane then the values of gik are:


\begin{bmatrix}
\delta{Y}/\delta{x_1} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_1} \delta{Z}/\delta{x_1} & \delta{Y}/\delta{x_2} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_1} \\
\delta{Y}/\delta{x_2} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_1} & \delta{Y}/\delta{x_2} \delta{Y}/\delta{x_2} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_2} \end{bmatrix}

Which become:

g_{\mu \nu} =
\begin{bmatrix}
1 & 0 \\
0 & 1 \end{bmatrix}

So the matrix A is the unit matrix I and:

\Delta{s}^2 = \mathbf{\Delta{x^T} I \Delta{x}}

and:

\Delta{s}^2 = \Delta{x_1}^2 + \Delta{x_2}^2

Which recovers Pythagoras' theorem yet again.

If the surface is derived from some other metric such as \Delta{s^2} = -\Delta{Y}^2 + \Delta{Z}^2 then the values of gik are:


\begin{bmatrix}
-\delta{Y}/\delta{x_1} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_1} \delta{Z}/\delta{x_1} &  -\delta{Y}/\delta{x_2} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_1} \\
-\delta{Y}/\delta{x_2} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_1} &  -\delta{Y}/\delta{x_2} \delta{Y}/\delta{x_2} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_2} \end{bmatrix}

Which becomes:

g_{\mu \nu} =
\begin{bmatrix}
-1 & 0 \\
 0 & 1 \end{bmatrix}

Which allows the original metric to be recovered ie: \Delta{s^2} = -\Delta{x_1}^2 + \Delta{x_2}^2.

It is interesting to compare the geometrical analysis with the transformation based on matrix algebra that was derived in the section on indicial notation above:

 \mathbf{s.s} = g_{11} x_1x_{1}^' + g_{12} x_1x_2^' + g_{21} x_2x_{1}^' + g_{22} x_2x_{2}^'

Now,

g_{\mu \nu} =
\begin{bmatrix}
1 & 0 \\
0 & 1 \end{bmatrix}

ie: g_{11} = 1, g_{12} = 0, g_{21} = 0, g_{22} = 1 so:

\mathbf{s.s} = x_1x_{1}^' + x_2x_{2}^'

If there is no rotation of coordinates the scalar product is:

\mathbf{s.s} = x_1x_1 + x_2x_2

s^2 = x_{1}^2 + x_{2}^2

Which recovers Pythagoras' theorem. However, the reader may have noticed that Pythagoras' theorem had been assumed from the outset in the derivation of the scalar product (see above).

The geometrical analysis shows that if a metric is assumed and the conditions that allow differential geometry are present then it is possible to derive one set of coordinates from another. This analysis can also be performed using matrix algebra with the same assumptions.

The example above used a simple two dimensional Pythagorean metric, some other metric such as the metric of a 4D Minkowskian space:

\Delta S^2 = - \Delta T^2 + \Delta X^2 + \Delta Y^2 + \Delta Z^2

could be used instead of Pythagoras' theorem.

Waves

Applications of Special Relativity

In this chapter we continue the study of special relativity by applying the ideas developed in the previous chapter to the study of waves.

First, we shall show how to describe waves in the context of spacetime. We then see how waves which have no preferred reference frame (such as that of a medium supporting them) are constrained by special relativity to have a dispersion relation of a particular form. This dispersion relation turns out to be that of the relativistic matter waves of quantum mechanics.

Second, we shall investigate the Doppler shift phenomenon, in which the frequency of a wave takes on different values in different coordinate systems.

Third, we shall show how to add velocities in a relativistically consistent manner. This will also prove useful when we come to discuss particle behaviour in special relativity.

A new mathematical idea will be presented in the context of relativistic waves, namely the spacetime vector or four-vector. Writing the laws of physics totally in terms of relativistic scalars and four-vectors ensures that they will be valid in all inertial reference frames.

Waves in Spacetime

Waves in Spacetime

We now look at the characteristics of waves in spacetime. Recall that a wave in one space dimension can be represented by

A(x,t) = A_0 \sin(kx- \omega t) \,

where A_0 is the (constant) amplitude of the wave, k is the wavenumber, and \omega is the angular frequency, and that the quantity \phi = kx - \omega t is called the phase of the wave. For a wave in three space dimensions, the wave is represented in a similar way,

A(\mathbf{x},t) = A_0 \sin(\mathbf{k}\cdot\mathbf{x}- \omega t)

where \mathbf{x} is now the position vector and \mathbf{k} is the wave vector. The magnitude of the wave vector, |\mathbf{k}| = k is just the wavenumber of the wave and the direction of this vector indicates the direction the wave is moving. The phase of the wave in this case is \phi = \mathbf{k}\cdot\mathbf{x} - \omega t.

Figure 5.1: Sketch of wave fronts for a wave in spacetime.
Figure 5.1: Sketch of wave fronts for a wave in spacetime. The large arrow is the associated wave four-vector, which has slope \omega /ck. The slope of the wave fronts is the inverse, ck/ \omega.

In the one-dimensional case \phi = kx - \omega t. A wave front has constant phase \phi, so solving this equation for t and multiplying by c, the speed of light in a vacuum, gives us an equation for the world line of a wave front:

ct = \frac{ckx}{\omega}-\frac{c\phi}{\omega}=\frac{cx}{u_p}-\frac{c\phi}{\omega} \quad \mbox{(wave front).}

The slope of the world line in a spacetime diagram is the coefficient of x, or c/u_p, where u_p = \omega/k is the phase speed.

An application of relativity to waves

Returning to the phase of a wave, we immediately see that

\phi = \mathbf{k} \cdot \mathbf{x} - \omega t =
	\mathbf{k} \cdot \mathbf{x} - ( \omega /c)(ct)
	= \underline{k} \cdot \underline{x} .

Thus, a compact way to write a wave is A( \underline{x} ) = \sin ( \underline{k} \cdot \underline{x} ) .(6.8)

Since \underline{x} is known to be a four-vector and since the phase of a wave is known to be a scalar independent of reference frame, it follows that \underline{k} is indeed a four-vector rather than just a set of numbers. Thus, the square of the length of the wave four-vector must also be a scalar independent of reference frame

\underline{k} \cdot \underline{k} =
	\mathbf{k} \cdot \mathbf{k} - \omega^2 / c^2 = const .


Figure 5.2: Resolution of a four-vector into components in two different reference frames. Resolution of a four-vector into components in two different reference frames.

Let us review precisely what this means. As this figure shows, we can resolve a position four-vector x into components in two different reference frames, e.g (X,T) and (X′,T′), but these are just different ways of writing the same vector.

This is exactly the same as the way a three-vector has different components in a rotated frame.

Similarly, just as a three-vector has the same magnititude in all frames, so does a 4-vector; i.e,

X^2-c^2 T^2 = X^{\prime 2} - c^2 T^{\prime 2}

Applying this to the wave four-vector, we infer that

k^2-c^{-2} \omega^2 = k^{\prime 2} - c^{-2} \omega^{\prime 2}

where the unprimed and primed values of k and ω refer to the components of the wave four-vector in two different reference frames.

Up to now, this argument applies to any wave. However, waves can be divided into two categories, those for which a special reference frame exists, and those for which there is no such special frame.

As an example of the former, sound waves look simplest in the reference frame in which the gas carrying the sound is stationary. The same is true of light propagating through a material medium with an index of refraction not equal to unity. In both cases the speed of the wave is the same in all directions only in the frame in which the material medium is stationary.

If there is no material medium, then there is no unambiguous way of finding a special frame so the waves must fall into the second category. This includes all waves in a vacuum, such as light.

In this case the following argument can be made. An observer moving with respect to waves of frequency ω and wave number k sees waves of frequency ω′ and wave number k′. If the observer can tell in any way that they come from a source moving with respect to them, then they can use this to identify a special frame for those waves, so the waves must look just like ones from a stationary source of frequency ω

This forces us to conclude that for such waves

 \omega^2= c^2 k^2 + \mu^2 \,

where μ is a constant. All waves in a vacuum must have this form, a much more restricted choice than in classical physics.

In classical physics, ω and k for light are related by

\omega = c k  \,

In relativistic physics, we've seen that for waves with no special reference frame, such as light, ω and k are related by

\omega^2 = c^2 k^2 + \mu^2  \,

If μ=0 then the relativistic equation reduces to the classical, so we can assume that, for light, μ does equal zero.

This means that light does not have a mimimum frequency.

If μ is not zero then the wave being described are dispersive. The phase speed is

u_p = \frac{\omega}{k} = \sqrt{c^2 + \frac{\mu^2}{k^2}}

This phase speed always exceeds c, which at first may seem like an unphysical conclusion. However, the group velocity of the wave is

u_g = \frac{d\omega}{dk} = \frac{kc^2}{\sqrt{k^2c^2+\mu^2}}
= \frac{kc^2}{\omega} = \frac{c^2}{u_p}

which is always less than c. Since wave packets and hence signals propagate at the group velocity, waves of this type are physically reasonable even though the phase speed exceeds the speed of light.

Another interesting property of such waves is that the wave four-vector is parallel to the world line of a wave packet in spacetime. This is easily shown by the following argument.

The spacelike component of a wave four-vector is k, while the timelike component is ω/c . The slope of the four-vector on a spacetime diagram is therefore ω/kc. However, the slope of the world line of a wave packet moving with group velocity is c/ug, which is also ω/kc .

Note that when we have k is zero we have ω=μ. In this case the group velocity of the wave is zero. For this reason we sometimes call μ the rest frequency of the wave.

The Doppler Effect

You have probably heard how the pitch of a train horn changes as it passes you. When the train is approaching, the pitch or frequency is higher than when it is moving away from you. This is called the Doppler effect. A similar, but distinct effect occurs if you are moving past a source of sound. If a stationary whistle is blowing, the pitch perceived from a moving car is higher while moving toward the source than when moving away. The first case thus has a moving source, while the second case has a moving observer.

In this section we will compute the Doppler effect as it applies to light moving through a vacuum. The figure below shows the geometry for computing the time between wave fronts of light for a stationary and a moving reference frame.

  • The time between wavefronts for the stationary observer, in the stationary frame, is T.
  • The time between wavefronts for the moving observer, in the stationary frame, is T^{\prime}.
  • The time between wavefronts for the moving observer, in the stationary frame, is \tau.

Since the world lines of the wave fronts have a slope of unity, the sides of the shaded triangle both have the same value, C. If the observer is moving at velocity v , the slope of the observer's world line is c/v, i.e

\frac{c}{v}=\frac{cT+X}{X}

Solving this for X and substituting in give cT^{\prime} = cT + X gives

T^{\prime}=\frac{T}{1-\frac{v}{c}} \quad (1)

In classical physics T^{\prime} and \tau are the same so this formula, as it stands, leads directly to the classical Doppler shift for a moving observer.

However, in relativity T^{\prime} and \tau are different. We can use the Lorentz transformation to correct for this.

The second wavefront passes the moving observer at (vT^{\prime}, cT^{\prime}) in the stationary observers frame, but at (0,c\tau) in its own frame. The Lorentz transform tells us that.

c\tau = \gamma \left( cT'-\frac{v}{c}vT' \right) 
= cT' \gamma \left( 1 - \frac{v^2}{c^2} \right) = cT'/ \gamma

Substituting in equation (1) gives

\begin{matrix}
\tau & = & T \frac{ \sqrt{ 1 - v^2/c^2 }}{1-v/c} \\
& = & T \sqrt{ \frac{ (1-v/c)(1+v/c)}{(1-v/c)^2}} \\
& = & T \sqrt{ \frac{c+v}{c-v} } 
\end{matrix}

From this we infer the relativistic Doppler shift formula for light in a vacuum:

\nu^{\prime}= \nu \sqrt{ \frac{c-v}{c+v} }

since frequency is inversely proportional to time.

We could go on to determine the Doppler shift resulting from a moving source. However, by the principle of relativity, the laws of physics should be the same in the reference frame in which the observer is stationary and the source is moving. Furthermore, the speed of light is still c in that frame. Therefore, the problem of a stationary observer and a moving source is conceptually the same as the problem of a moving observer and a stationary source when the wave is moving at speed c.

This is unlike the case for, say, sound waves, where the stationary observer and the stationary source yield different formulas for the Doppler shift.

External Links

Mathematical Transformations


The teaching of Special Relativity on undergraduate physics courses involves a considerable mathematical background knowledge. Particularly important are the manipulation of vectors and matrices and an elementary knowledge of curvature. The background mathematics is given at the end of this section and can be referenced by those who are unfamiliar with these techniques.

The Lorentz transformation

The Lorentz transformation deals with the problem of observers who are moving relative to each other. How are the coordinates of an event recorded by one observer related to the coordinates of the event recorded by the other observer? The standard configuration used in the calculation of the Lorentz transformation is shown below:

Relstandard.gif

There are several ways of deriving the Lorentz transformations. The usual method is to work from Einstein's postulates (that the laws of physics are the same between all inertial reference frames and the speed of light is constant) whilst adding assumptions about isotropy, linearity and homogeneity. The second is to work from the assumption of a four dimensional Minkowskian metric.

In mathematics transformations are frequently symbolised with the "maps to" symbol:

 (x,y,z,t) \mapsto (x^',y^',z^',t^')

The linearity and homogeneity of spacetime

Consider a clock moving freely. According to Newton's first law, objects continue in a state of uniform motion unless acted upon by a force; so, the velocity of the clock in any given direction (dx_i/dt) must be a constant.

If the clock is a real clock with readings given by \tau then the rate of change with respect to these readings of the elapsed time at any other point in an inertial frame of reference, dt/d\tau, will be a constant. If the clock were to tick at an uneven rate compared with other clocks then the universe would not be homogeneous in time—at some times the clock would appear to accelerate. This would also mean that Newton's first law would be broken and the universe would not be homogeneous in space.

If dx_i/dt and dt/d\tau are constant then dx_{\mu} (\mu=1,2,3,4) is also constant. This means that the clock is not accelerating, i.e. that d^2x_{\mu}/d\tau^2 = 0.

Linearity is demonstrated by the way that the length of things does not depend on position or relative position; for instance, if x' = ax^2 the distance between two points would depend upon the position of the observer whereas if the relationship is linear (x' = ax) separations are independent of position.

The linearity and homogeneity assumptions mean that the coordinates of objects in the S' inertial frame are related to those in the S inertial frame by:

x_{\nu}' = (\sum \Lambda_{\nu\mu}x_\mu) + B_\nu

This formula is known as a Poincaré transformation. It can be expressed in indicial notation as:

x^{'\nu} = \Lambda_{\mu}^\nu x^\mu + B^\nu

If the origins of the frames coincide then B_\nu can be assumed to be zero and the equation:

x_{\nu}' = \sum \Lambda_{\nu\mu}x_\mu

Those who are unfamiliar with the notation should note that the symbols x_1 etc. mean x_1=x, x_2=y, x_3=z and x_4=t so the equation above is shorthand for:

x' = a_{11} x + a_{12} y + a_{13} z  + a_{14} t

y' = a_{21} x + a_{22} y + a_{23} z  + a_{24} t

z' = a_{31} x + a_{32} y + a_{33} z  + a_{34} t

t' = a_{41} x + a_{42} y + a_{43} z  + a_{44} t

In matrix notation the set of equations can be written as:

\mathbf{x'}=\mathbf{\Lambda x}

The standard configuration (see diagram above) has several properties, for instance:

The spatial origin of both observer's coordinate systems lies on the line of motion so the x axes can be chosen to be parallel.
The point given by x=vt is the same as x'=0.
The origins of both coordinate systems can coincide so that clocks can be synchronised when they are next to each other.
The coordinate planes ,y, y' and z,z', can be arranged to be orthogonal (at right angles) to the direction of motion.
Isotropy means that coordinate planes that are orthogonal at y=0 and z=0 in one frame are orthogonal at y'=0 and z'=0 in the other frame.

According to the relativity principle any transformations between the same two inertial frames of reference must be the same. This is known as the reciprocity theorem.

The Lorentz transformation

From the linearity assumption and given that at the origin y=0=y^' so there is no constant offset then y^'=Ky and y=Ky^', therefore K=1. So:

y^'=y

and, by the same reasoning:

z^'=z

Now, considering the x coordinate of the event, the z and y axes can be assumed to be 0 (ie: an arbitrary shift of the coordinates to allow the event to lie on the x axes). If this is done then the linearity consideration and the fact that x=vt and x^'=0 are the same point gives:

(1) x^' = \gamma (x - vt)

where \gamma is a constant. According to the reciprocity theorem we also have:

(2) x = \gamma (x^' + vt^')

Einstein's assumption that the speed of light is a constant can now be introduced so that x=ct and also x^'=ct^'. So:

ct^'= \gamma t(c - v)

and

ct= \gamma t^'(c + v)

So:

c^2tt^'= \gamma^2tt^'(c^2 - v^2)

and

\mathbf{\gamma = \frac {1}{\sqrt {1-v^2/c^2}}}

Therefore the Lorentz transformation equations are:

t^' = \gamma (t - vx/c^2)

x^' = \gamma (x - vt)

y^'=y

z^'=z

The transformation for the time coordinate can derived from the transformation for the x coordinate assuming x=ct and x^'=ct^' or directly from equations (1) and (2) with a similar substitution for x=ct.

The coefficients of the Lorentz transformation can be represented in matrix format:


\begin{bmatrix}
c t' \\x' \\y' \\z'
\end{bmatrix}
=
\begin{bmatrix}
\gamma&-\frac{v}{c} \gamma&0&0\\
-\frac{v}{c} \gamma&\gamma&0&0\\
0&0&1&0\\
0&0&0&1\\
\end{bmatrix}
\begin{bmatrix}
c t\\x\\y\\z
\end{bmatrix}.

A coordinate transformation of this type, that is due to motion, is known as a boost.

Relltex.gif

The Lorentz transformation equations can be used to show that:

c^2dt^{'2} - dx^{'2} - dy^{'2} - dz^{'2} = c^2dt^{2} - dx^{2} - dy^{2} - dz^{2}

Although whether the assumptions of linearity, isotropy and homogeneity in the derivation of the Lorentz transformation actually assumed this identity from the outset is a moot point.

Given that: c^2dt^{'2} - dx^{'2} - dy^{'2} - dz^{'2} also equals c^2dt^{''2} - dx^{''2} - dy^{''2} - dz^{''2} and a continuous range of other transformations it is clear that:

\mathbf{\Delta s}^2 = c^2\Delta{t}^{2} - \Delta{x}^{2} - \Delta{y}^{2} - \Delta{z}^{2}

The quantity \Delta s is known as the spacetime interval and the quantity \mathbf{\Delta s}^2 is known as the squared displacement.

A given squared displacement is constant for all observers, no matter how fast they are travelling, it is said to be invariant.

The equation:

ds^2 = c^2dt^{2} - dx^{2} - dy^{2} - dz^{2}

is known as the metric of spacetime.

Another look at the Lorentz transformation

We have seen how length and duration both look different to a moving observer:

  • lengths are contracted by a factor γ, when measured at the same time.
  • durations are contracted by a factor γ, when measured in the same place.

We can extend these results to allow for measurements taking place at different times and places.

Lets consider two observers, O and O' such that O' is moving at velocity v along the x-axis with respect to O. We'll use primed variables for all the measurements O' makes.

We can assume for now both observers have the same origin and x-axis because we already know how to allow for observers being relatively rotated and displaced. We can put these complications back in later.

Now any length or duration can be written as the difference between two coordinates, for the two ends of the body or the start and end of the event, so it is sufficient to know how to change coordinates from one frame to the other.

We know how to do this in classical physics,

\begin{matrix} x' & = & x-vt \\ t' & = & t \end{matrix}

we need to extend this to relativity.

Notice that the in classical physics the relationship is linear; the graphs of these equations are straight lines. This makes the maths much simpler, so we will try to find a linear relationship between the coordinates for relativity, i.e equations of the general form

\begin{matrix} x' & = & mx + nt \\ t' & = & px + qt \end{matrix}

where m, n, p and q are all independent of the coordinates.

To begin with know that

  • when t=0, x′ = γx (Lorentz contraction)
  • when x=0, t′ = γt (Time dilation)

and that O' is travelling at velocity v. They measure their position to be at x′=0, but O measures it to be at vt so we must have

x′=0 when x-vt=0

The only relationship between x and x′ that satisfies these criteria is

x'=\gamma (x - vt) \,

Both observers must measure the same speed for light,

 x=ct \Rightarrow x'=ct' \,

or, substituting and rearranging,

 x=ct \Rightarrow t'=\gamma \frac{x-vt}{c}

The only linear relationship between t and t′ that satisfies these criteria is

t'=\gamma (-\frac{vx}{c^2} + t) \,


So the primed and unprimed coordinates are related by

\begin{matrix} 
x' & = & \gamma (x-vt) \\ t' & = & \gamma \left(-\frac{vx}{c^2} + t \right)
\end{matrix}

These equations are called the Lorentz transform.

They look simpler if we write them in terms of ct rather than t

\begin{matrix} 
x' & = & \gamma \left( x-\frac{v}{c}ct \right) \\ 
ct' & = & \gamma \left( -\frac{v}{c}x + ct \right)
\end{matrix}

Written this way they look much like the equations describing a rotation in three dimensions. In fact, once we allow for the different Pythagorean theorem, they are exactly like the equations for rotation.

  • If observers are moving relative to each other, their coordinate systems are rotated in the (x,ct) plane.

The derivation of the constancy of the speed of light from the first postulate alone

The following text was taken from a "Wikiversity" lecture.

The lorentz transformation can be deduced from the first postulate alone with no additional assumptions.

The Lorentz transformation describes the way a vector in spacetime as seen by an observer O1 changes when it is seen by an observer O2 in a different inertial system.

The postulate of special relativity states that the laws of physics are independent of the velocity of an observer. This is equivalent to the requirement that the mathematical formulation of the laws of physics must be invariant if the coordinates of space and time are altered using a lorentz transformation.

This is the reason why the Lorentz transformation is a central concept in special relativity.

Four-vectors are vectors that, subjected to the Lorentz transformation, behave like vectors from spacetime. The standard approach to find relativistic laws to the corresponding non-relativistic ones is to find four-vectors to replace space vectors. Surprisingly often this simple approach is successful.

Here, for simplicity, the space coordinates are choosen so that the x-axis points in the direction in which O2 is moving with respect to O1 and the y and z coordinates are not considered.

Here is a way to determine a matrix L(v) that transforms coordinates from O1 to O2 where v is the velocity of O2 seen from O1.

Most generally, this matrix looks like L(v) =\left(\begin{matrix}a & b \\ c & d\end{matrix}\right) and transforms \left(\begin{matrix}t \\ x\end{matrix}\right) to \left(\begin{matrix}t^\prime \\ x^\prime\end{matrix}\right) via \left(\begin{matrix}t' \\ x'\end{matrix}\right) = \left(\begin{matrix}a & b \\ c & d\end{matrix}\right)\left(\begin{matrix}t \\ x\end{matrix}\right) .

The first requirement on this matrix is that O2 will be at rest (in space) within its own coordinate system.

Therfore \left(\begin{matrix}t' \\ 0\end{matrix}\right) = \left(\begin{matrix}a & b \\ c & d\end{matrix}\right)\left(\begin{matrix} t \\ tv\end{matrix}\right) for any time t.

This gives us  ct +tvd = 0 \Rightarrow c = -vd and L(v) =\left(\begin{matrix}a & b \\ -vd & d\end{matrix}\right).

The backward transformation from O2 to O1 is described by the inverse matrix L(v') = L(v)^{-1} =  (\det(L(v))^{-1}\left(\begin{matrix}d & -b \\ vd & a\end{matrix}\right) (where \, v' is the velocity of O1 seen from O2) has to leave O1 at rest in its own coordinates.

Assuming  v' = -v (being strict this would have to be proved) we have \left(\begin{matrix}t \\ 0\end{matrix}\right) = (\det(L(v))^{-1}\left(\begin{matrix}d & -b \\ vd & a\end{matrix}\right)\left(\begin{matrix} t' \\ -t'v\end{matrix}\right) for any time \, t' resulting in  a=d \Rightarrow L(v) =\left(\begin{matrix}a & b \\ -va & a\end{matrix}\right).

The coefficients are functions of v. For all v the transformation is invertible, so 0 \neq \det L(v) = a(v)^2 + v a(v) b(v) \Rightarrow a(v) \neq 0.

For v=0 L(v) is the identity matrix, so b(0)=0. This allows to rewrite  b(v) = -v a(v) \varepsilon(v) with a yet unknown function \varepsilon(v).


Now  L(v) =a(v) \left(\begin{matrix} 1 & -v\varepsilon(v) \\ -v & 1\end{matrix}\right).


Now for every v_0, v_1 there exists a velocity v_2 such that L(v_0) L(v_1) = L(v_2). Using what we have proved up to now gives \begin{align}
L(v_0)L(v_1) & = a(v_0) \left(\begin{matrix} 1 & -v_0\varepsilon(v_0) \\ -v_0 & 1\end{matrix}\right) a(v_1) \left(\begin{matrix} 1 & -v_1\varepsilon(v_1) \\ -v_1 & 1\end{matrix}\right) \\
& = a(v_0) a(v_1)\left(\begin{matrix} 1+v_0v_1\varepsilon(v_0) & -v_1\varepsilon(v_1)-v_0\varepsilon(v_0) \\ -v_0-v_1 & 1+v_0 v_1 \varepsilon(v_1)\end{matrix}\right) \\
& = a(v_2) \left(\begin{matrix} 1 & -v_2\varepsilon(v_2) \\ -v_2 & 1\end{matrix}\right) \\
& = L(v_2).
\end{align}

Comparing the diagonal components, we see immediately that \varepsilon(v_0) = \varepsilon(v_1) for arbitrary values of v_0, v_1, and so \varepsilon(v) = \operatorname{const.} = \varepsilon.

Using this we can rewrite the equation for the a(v_0)a(v_1)(1+v_0v_1\varepsilon) = a(v_2) and \,a(v_0)a(v_1)(-v_0-v_1) = -v_2 a(v_2) by comparing each of the components at indeces (1,1) and (2,1) respectively. Substituting a(v_2) yields

(1) \ \ a(v_0)a(v_1)(-v_0-v_1) = -v_2 a(v_0)a(v_1)(1+v_0v_1\varepsilon)

from which we have v_2 = \frac{v_0+v_1}{1+v_0v_1\varepsilon}.

Reinserting this into the first equation we have 
a(v_0)a(v_1)(1+v_0v_1\varepsilon) = a(\frac{v_0+v_1}{1+v_0v_1\varepsilon}).

In case  v := v_0 = -v_1 this simplifies to 
a(v)a(-v)(1-v^2\varepsilon) = a(0) = 1 as \, L(0) is the identity matrix.


Let \varphi(x) = \varphi(-x) be an even function and \psi(x) = -\psi(-x) an odd function such that a(v) = e^{\varphi(v) + \psi(v)}.

This gives us e^{\varphi(v) + \psi(v) + \varphi(v) - \psi(v)-2\pi\operatorname i k} = (1-v^2\varepsilon)^{-1}. Taking the logarithm and simplifying yields \varphi(v) = -\frac12\ln (1-v^2\varepsilon) + \pi\operatorname i k from which we get a(v)=\pm\frac{1}{\sqrt{(1-v^2\varepsilon)}} e^{\psi(v)} with any odd function \,\psi(-v)=-\psi(v). The negative sign is ruled out by \, a(0)=1.


Substituting \, a(v) in equation (1) verifies the solution but leaves us with \psi(v_0) + \psi(v_1) = \psi(\frac{v_0+v_1}{1+v_0v_1\varepsilon}).

To prove \psi(v) \equiv 0 we flip the direction of the x-axis in the base system as well as in the transformed system. The choice of the direction of the axis must not lead to different results. In the flipped coordinates the Lorentz transformation takes the form L(v') = a(v') \left(\begin{matrix} 1 & -v'\varepsilon \\ -v' & 1\end{matrix}\right) = a(-v) \left(\begin{matrix} 1 & v\varepsilon \\ v & 1\end{matrix}\right) for v' = -v.

On the other hand we can compute L(v') by using a coordinate transformation on L(v):

 L(v') = \left(\begin{matrix} 1 & 0\\ 0 & -1\end{matrix}\right) a(v) \left(\begin{matrix} 1 & v\varepsilon \\ v & 1\end{matrix}\right) \left(\begin{matrix} 1 & 0 \\ 0 & -1\end{matrix}\right) =
a(v)\left(\begin{matrix} 1 & -v\varepsilon \\ -v & 1\end{matrix}\right) from which we get \,a(-v) = a(v) and hence \,\psi(-v)=\psi(v). Together with \,\psi(-v)=-\psi(v) this gives \psi(v) \equiv 0.

Looking at the eigenvectors of L(v) we see they are \left(\begin{matrix} 1 \\ \pm\frac{1}{\sqrt\varepsilon}\end{matrix}\right) independent of v.

This is equivalent to a transformation invariant velocity \frac1\sqrt\varepsilon=:c_0 which experimentation shows to be equal to the speed of light. Altogether we have:

L(v)= \frac{1}{\sqrt{1-\frac{v^2}{c_0^2}}}\left(\begin{matrix} 1 & -\frac{v}{c_0^2} \\ -v & 1\end{matrix}\right)

Note that for \varepsilon=0 we would have the Galileian case with no finite invariant velocity.

Behold that if we did the same deduction not in the spacetime plane for (t, x) but in the space plane (x, y) with slope m=y/x instead of velocity v=x/t nothing would change except that, not by calculation but by experimentation alone, it would have turned out that \,\varepsilon = -1 and L(m) is just a rotation in the euclidian space with no real valued eigenvectors.


It is an interesting exercise to compare this result with the assumptions of reciprocity, homogeneity and isotropy mentioned earlier. Do the linear equations contained in the matrix formulation of the Lorentz transformation above assume a particular spacetime from the outset?


It is further important to mention that the determinant \, \det L(v) = 1 independent of velocity. This directly implicates that spacetime volumes are conserved.

The geometry of space-time

SR uses a 'flat' 4-dimensional Minkowski space, which is an example of a space-time. This space, however, is very similar to the standard 3 dimensional Euclidean space, and fortunately by that fact, very easy to work with.

The differential of distance(ds) in Cartesian 3D space is defined as:

 ds^2 = dx_1^2 + dx_2^2 + dx_3^2

where (dx_1,dx_2,dx_3) are the differentials of the three spatial dimensions. In the geometry of special relativity, a fourth dimension, time, is added, with units of c, so that the equation for the differential of distance becomes:

 ds^2 = dx_1^2 + dx_2^2 + dx_3^2 - c^2 dt^2

In many situations it may be convenient to treat time as imaginary (e.g. it may simplify equations), in which case t in the above equation is replaced by i.t', and the metric becomes

 ds^2 = dx_1^2 + dx_2^2 + dx_3^2 + c^2(dt')^2

Note that ds in this case is the distance, and not the interval. Caution should be exercised in the use of 'imaginary' time, it is not part of the modern theory of relativity. Blandford and Thorne (2004) in their "Applications of Classical Physics" (http://www.pma.caltech.edu/Courses/ph136/yr2004/ ), write the following about imaginary time: "(i) it hides the true physical geometry of Minkowski spacetime, (ii) it cannot be extended in any reasonable manner to non-orthonormal bases in flat spacetime, and (iii) it cannot be extended in any reasonable manner to the curvilinear coordinates that one must use in general relativity."

If we reduce the spatial dimensions to 2, so that we can represent the physics in a 3-D space

 ds^2 = dx_1^2 + dx_2^2 - c^2 dt^2

We see that things such as light which move at the speed of light lie along a dual-cone:

Sr1.jpg

defined by the equation

 ds^2 = 0 = dx_1^2 + dx_2^2 - c^2 dt^2

or

 dx_1^2 + dx_2^2 = c^2 dt^2

Which is the equation of a circle with r=c dt. The path of something that moves at the speed of light is known as a null geodesic. If we extend the equation above to three spatial dimensions, the null geodesics are continuous concentric spheres, with radius = distance = c×(±time).

Null spherical space (special relativity).jpg

 ds^2 = 0 = dx_1^2 + dx_2^2 + dx_3^2 - c^2 dt^2
 dx_1^2 + dx_2^2 + dx_3^2 = c^2 dt^2

This null dual-cone represents the "line of sight" of a point in space. That is, when we look at the stars and say "The light from that star which I am receiving is X years old.", we are looking down this line of sight: a null geodesic. We are looking at an event d = \sqrt{x_1^2+x_2^2+x_3^2} meters away and d/c seconds in the past. For this reason the null dual cone is also known as the 'light cone'. (The point in the lower left of the picture below represents the star, the origin represents the observer, and the line represents the null geodesic "line of sight".)

Sr1.jpg

The cone in the -t region is the information that the point is 'receiving', while the cone in the +t section is the information that the point is 'sending'.

Length contraction, time dilation and phase

Consider two inertial frames in standard configuration. There is a rigid rod moving along in the second frame at v m/s. The length of the rod is determined by observing the positions of the end points of the rod simultaneously - if the rod is moving it would be nonsense to use any other measure of length. An observer who is moving at the same velocity as the rod measures its "rest length". The Lorentz transformation for coordinates along the x axis is:

x^' = \gamma (x - vt)

Suppose the positions, x_1, x_2, of the two ends of the rod are determined simultaneously (ie: t is constant):

(x^'_1 - x^'_2) = \gamma (x_1 - x_2)

Or, using L_0 = (x^'_1 - x^'_2) for the rest length of the rod and L = (x_1 - x_2) for the length of the rod that is measured by the observer who sees it fly past at v m/s:

\mathbf{L_0 = \gamma L}

Or, elaborating \gamma:

\mathbf{L = {L_0}\sqrt{1-v^2/c^2}}

In other words the length of an object moving with velocity v is contracted in the direction of motion by a factor \sqrt{1-v^2/c^2} in the direction of motion.

The Lorentz transformation also affects the rate at which clocks appear to change their readings. The Lorentz transformation for time is:

t^' = \gamma (t - vx/c^2)

This transformation has two components:

t^' = \gamma t - \gamma vx/c^2

and is a straight line graph (ie: t^' = mt + c).

The gradient of the graph is \gamma so:

\Delta t^' = \gamma \Delta t

or:

t^'_1-t^'_2 = \gamma (t_1 - t_2)

Therefore clocks in the moving frame will appear to go slow, if T_0 is a time interval in the rest frame and T is a time interval in the moving frame:

T = \gamma T_0

Or, expanding:

\mathbf{T = \frac{T_0}{\sqrt{1-v^2/c^2}}}

The intercept of the graph is:

-\gamma vx/c^2

This means that if a clock at point x is compared with a clock that was synchronised between frames at the origin it will show a constant time difference of \gamma vx/c^2 seconds. This quantity is known as the relativistic phase difference or "phase".

Reldil.gif

The relativistic phase is as important as the length contraction and time dilation results. It is the amount by which clocks that are synchronised at the origin go out of synchronisation with distance along the direction of travel. Phase affects all clocks except those at the point where clocks are syncronised and the infinitessimal y and z planes that cut this point. All clocks everywhere else will be out of synchronisation between the frames. The effect of phase is shown in the illustration below:

Rel1.gif

If the inertial frames are each composed of arrays of clocks spread over space then the clocks will be out of synchronisation as shown in the illustration above.

It is interesting to see how the phase term cancels out when time differences are being considered. It cancels out because the phase term x^'v/c^2 is applied to the clock that is at a constant position in the primed frame. The Lorentz transform for time is:

t = \frac{t^' + (v/c^2)x^'}{\sqrt{(1 - v^2/c^2)}} \,

A difference between two times is:

t_1 - t_2 = \frac{t_1^' + (v/c^2)x^' - t_2^' - (v/c^2)x^'}{\sqrt{(1 - v^2/c^2)}} \,

The important thing to note is that the clock is at the same position x^' in its own reference frame.

The result is that:

t_1 - t_2 = \frac{t_1^' - t_2^'}{\sqrt{(1 - v^2/c^2)}} \,

Where the phase terms have cancelled.

Definition sketch for understanding the Lorentz contraction

Hyperbolic geometry

In the flat spacetime of Special Relativity:

s^2 = c^2t^{2} - x^{2} - y^2 - z^2

Considering the x-axis alone:

s^2 = c^2t^{2} - x^{2}

The standard equation of a hyperbola is:

\frac{x^2}{a^2} - \frac{y^2}{b^2} = 1

In the case of spacetime:

\frac{(ct)^2}{s^2} - \frac{x^2}{s^2} = 1

Spacetime intervals separate one place or event in spacetime from another. So, for a given motion from one place to another or a given fixed length in one reference frame, given time interval etc. the metric of spacetime describes a hyperbolic space. This hyperbolic space encompasses the coordinates of all the observations made of the given interval by any observers.

Relhyperb.gif

It is possible to conceive of rotations in hyperbolic space in a similar way to rotations in Euclidean space. The idea of a rotation in hyperbolic space is summarised in the illustration below:

Relhyper2.gif

A rotation in hyperbolic space is equivalent to changing from one frame of reference to another whilst observing the same spacetime interval. It is moving from coordinates that give:

(ct)^2 - x^2 = s^2

to coordinates that give:

(ct^')^2 - {x^'}^2 = s^2

The formula for a rotation in hyperbolic space provides an alternative form of the Lorentz transformation ie:

\begin{bmatrix}
c t \\x 
\end{bmatrix}
=
\begin{bmatrix}
\cosh \phi & \sinh \phi \\
\sinh \phi & \cosh \phi 
\end{bmatrix}
\begin{bmatrix}
c t'\\x'
\end{bmatrix}.

From which:

x = x^' \cosh \phi + ct^' \sinh \phi

ct = x^' \sinh \phi + ct^' \cosh \phi

The value of \phi can be determined by considering the coordinates assigned to a moving light that moves along the x axis from the origin at v m sec^1 flashes on for t seconds then flashes off.

The coordinates assigned by an observer on the light are: t',0,0,0, the coordinates assigned by the stationary observer are t, x=vt, 0, 0. The hyperbola representing these observations is illustrated below:

Relbeta.gif

The equation of the hyperbola is:

(ct)^2 - x^2 = s^2 = (ct^')^2

but x=vt for the end of the flash so:

\tanh \phi = \frac {v}{c}

Now, from hyperbolic trigonometry:

\frac{1}{\sqrt{1-\tanh^2{\phi}}} = \sqrt{ 
\frac{\cosh^2\phi} 
{\cosh^2\phi-\sinh^2\phi}
}= \cosh\phi

But \tanh\phi = \frac{v}{c} so:

\cosh\phi = \frac{1}{\sqrt{1-v^2/c^2}} = \gamma

and, from the hyperbolic trigonometric formula \sinh\phi = \tanh\phi \cosh\phi:

\sinh\phi = {\frac{v}{c}}\gamma

Inserting these values into the equations for the hyperbolic rotation:

x = x^' \cosh \phi + ct^' \sinh \phi

x = \gamma x^' + ct^' \gamma v/c

Which gives the standard transform for x:

x = \gamma ( x^' + vt^')

In a similar way ct = x^' \sinh \phi + ct^' \cosh \phi is equivalent to:

t = \gamma (t^' + vx/c^2)

So the Lorentz transformations can also be derived from the assumption that boosts are equivalent to rotations in hyperbolic space with a metric s^2 = c^2t^{2} - x^{2} - y^2 - z^2.

The quantity \phi is known as the rapidity of the boost.

Addition of velocities

Suppose there are three observers 1, 2, and 3 who are moving at different velocities along the x-axis. Observers 1 and 2 are moving at a relative velocity v and observers 2 and 3 are moving at a relative velocity of u^'. The problem is to determine the velocity of observer 3 as observed by observer 1 (u).

It turns out that there is a very convenient relationship between rapidities that solves this problem:

If v/c = \tanh{\phi} and u^'/c = \tanh{\alpha} then:

u/c = \tanh(\phi + \alpha)

In other words the rapidities can be simply added from one observer to another ie:

\sigma = \phi + \alpha

Hence:

\tanh(\sigma) =  \tanh(\phi + \alpha)

So the velocities can be added by simply adding the rapidities. Using hyperbolic trigonometry:

u/c = \tanh(\alpha + \phi) = \frac{\tanh{\alpha} + \tanh{\phi}}{1+\tanh{\alpha}\tanh{\phi}} = \frac{u^'/c + v/c}{1+u^'v/c^2} Therefore:

\mathbf{u = \frac{u^' + v}{1+u^'v/c^2}}

Which is the relativistic velocity addition theorem.

The relationship u/c = \tanh(\phi + \alpha) is shown below:

Relvelh.gif

Velocity transformations can be obtained without referring to the rapidity. The general case of the transformation of velocities in any direction is derived as follows:

\mathbf{u^'}= (u^'_1, u^'_2, u^'_3)

where u^'_1 etc. are the components of the velocity in the x, y, z directions.

Writing out the components of velocity:

u^'_1 = dx^'/dt^'

u^'_2 = dy^'/dt^'

u^'_3 = dz^'/dt^'

But from the Lorentz transformations:

dx^' = \gamma (dx - v dt)

dy^' = dy

dz^' = dz

dt^' = \gamma (dt - vdx/c^2)

Therefore:

u^'_1 = dx^'/dt^' = \frac{\gamma (dx - v dt)}{\gamma (dt - vdx/c^2)}

u^'_2 = dy^'/dt^' = \frac{dy}{\gamma (dt - vdx/c^2)}

u^'_3 = dz^'/dt^' = \frac{dz}{\gamma (dt - vdx/c^2)}

Dividing top and bottom of each fraction by dt:

u^'_1 = \frac{\gamma (dx/dt - v)}{\gamma (1 - vdx/dt/c^2)}

u^'_2 = \frac{dy/dt}{\gamma (1 - vdx/dt/c^2)}

u^'_3 = \frac{dz/dt}{\gamma (1 - vdx/dt/c^2)}

Substituting \mathbf{u} = (u_1, u_2, u_3)

u^'_1 = \frac{u - v}{1 - uv/c^2}

u^'_2 = \frac{u_2}{\gamma (1 - uv/c^2)}

u^'_3 = \frac{u_3}{\gamma (1 - uv/c^2)}

The full velocity transformations are tabulated below:

{u^'}_x = \frac{(u_x - v)}{(1 - u_x v/c^2)}

u_x = \frac{({u^'}_x + v)}{(1 + {u^'}_x v/c^2)}

{u^'}_y = \frac{u_y \sqrt{1 - v^2/c^2}}{(1 - u_x v/c^2)}

u_y = \frac{{u^'}_y \sqrt{1 - v^2/c^2}}{(1 + {u^'}_x v/c^2)}

{u^'}_z = \frac{u_z \sqrt{1 - v^2/c^2}}{(1 - u_x v/c^2)}

u_z = \frac{{u^'}_z \sqrt{1 - v^2/c^2}}{(1 + {u^'}_x v/c^2)}

Having calculated the components of the velocity vector it it now possible to calculate the magnitudes of the overall vectors between frames:

u = \sqrt{u_1^2 + u_2^2 + u_3^2}

u^' = \sqrt{u^{'2}_1 + u^{'2}_2 + u^{'2}_3}

Addition of velocities - another approach

In classical physics, velocities simply add. If an object moves with speed u in one reference frame, which is itself moving at v with respect to a second frame, the object moves at speed u+v in that second frame.

This is inconsistant with relativity because it predicts that if the speed of light is c in the first frame it will be v+c in the second.

We need to find an alternative formula for combining velocities. We can do this with the Lorentz transform.

Because the factor v/c will keep recurring we shall call that ratio β.

We are considering three frames; frame O, frame O' which moves at speed u with respect to frame O, and frame O" which moves at speed v with respect to frame O'.

We want to know the speed of O" with respect to frame O,U which would classically be u+v.

The transforms from O to O' and O' to O" can be written as matrix equations,

\begin{pmatrix} x' \\ ct' \end{pmatrix} = \gamma \begin{pmatrix} 1 & - \beta \\ -\beta & 1 \end{pmatrix}
\begin{pmatrix} x \\ ct \end{pmatrix} \quad
\begin{pmatrix} x'' \\ ct'' \end{pmatrix} = \gamma' 
\begin{pmatrix} 1 & - \beta' \\ -\beta' & 1 \end{pmatrix}
\begin{pmatrix} x' \\ ct' \end{pmatrix}

where we are defining the β's and γ's as

\begin{matrix}
\beta = \frac{u}{c} & \gamma = \frac{1}{\sqrt{1-\beta^2 }} \\
\beta^\prime = \frac{v}{c} & 
\gamma^\prime  = \frac{1}{\sqrt{1-{\beta^\prime}^2 }} 
\end{matrix}

We can combine these to get the relationship between the O and O" coordinates simply by multiplying the matrices, giving


\begin{pmatrix} x'' \\ ct'' \end{pmatrix} = \gamma \gamma^\prime 
\begin{pmatrix} 1+\beta \beta' & - (\beta + \beta') \\
- (\beta + \beta')  & 1+\beta \beta'  \end{pmatrix}
\begin{pmatrix} x \\ ct \end{pmatrix} \quad (1)

This should be the same as the Lorentz transform between the two frames,


\begin{pmatrix} x'' \\ ct'' \end{pmatrix} = 
\gamma'' \begin{pmatrix} 1 & - \beta'' \\ -\beta'' & 1 \end{pmatrix}
\begin{pmatrix} x \\ ct \end{pmatrix} \quad (2) \mbox{  where }
\begin{matrix} \beta'' & = & \frac{U}{c} \\
\gamma'' & = & \frac{1}{\sqrt{1-{\beta''}^2 }} \end{matrix}

These two sets of equations do look similar. We can make them look more similar still by taking a factor of 1+ββ' out of the matrix in (1) giving:


\begin{pmatrix} x'' \\ ct'' \end{pmatrix} = \gamma \gamma' (1+\beta \beta')  
\begin{pmatrix} 1 & - \frac{\beta + \beta'}{1+\beta \beta'} \\
- \frac{\beta + \beta'}{1+\beta \beta'}  & 1  \end{pmatrix}
\begin{pmatrix} x \\ ct \end{pmatrix}

This will be identical with equation 2 if

\beta''=\frac{\beta + \beta'}{1+\beta \beta'} \mbox{ (3a)   and } 
\gamma'' = \gamma \gamma' (1+\beta \beta')  \mbox{ (3b)}

Since the two equations must give identical results, we know these conditions must be true.

Writing the β's in terms of the velocities equation 3a becomes

\frac{U}{c}=\frac{\frac{u}{c} + \frac{v}{c}}{1+\frac{uv}{c^2}}

which tells us U in terms of u and v.

A little algebra shows that this implies equation 3b is also true

Multiplying by c we can finally write.

U = \frac{u+v}{1+\frac{uv}{c^2}}

Notice that if u or v is much smaller than c the denominator is approximately 1, and the velocities approximately add but if either u or v is c then so is U, just as we expected.

Acceleration transformation

This section is under development.

It was seen above that:

\frac {u}{c} = \tanh{\phi}

and, if \frac {v}{c} =\tanh \beta and \frac{u^'}{c} = \tanh \epsilon the velocity addition theorem can be expressed as the sum of the rapidities:

\phi = \beta + \epsilon

If we differentiate this equation with respect to t to investigate acceleration, then assuming v is constant:

\frac {d\phi}{dt} = \frac{d\epsilon}{dt}

so


(1) \frac {d\phi}{dt} = \frac{d\epsilon}{dt^'}\frac{dt^'}{dt}

But \frac {d\phi}{dt} \! is also equal to:

\frac {d\phi}{dt} = \frac {d\phi}{du}\frac{du}{dt}

But \phi = \tanh^{-1} (u/c) \! and the derivative of an arctangent is given by:

\frac{d \tanh^{-1}(x)}{dx} = \frac{1}{1-x^2}

and hence:

\frac{d\phi}{du} = \frac{1}{c}\frac {1}{(1-u^2/c^2)}

For brevity we will use the notation:

\frac {1}{1-u^2/c^2} = \gamma^2(u)

ie: \gamma(u) is gamma for observers moving at a relative velocity of u.

So:

(2)\frac {d\phi}{dt} = \frac{1}{c} \gamma^2(u) \frac{du}{dt}\!

The velocity of the object observed by the unprimed and primed observers is:

u^2 = {u_1}^2 + {u_2}^2 + {u_3}^2 \!

u^{'2} = {u_1}^{'2} + {u_2}^{'2} + {u_3}^{'2}\!

and:

u = \frac{dr}{dt}\!

u^' = \frac{dr^'}{dt^'}\!

So:

u^2dt^2 = dr^2 \!

u^{'2}dt^{'2} = dr^{'2} \!

and using the Minkowski metric:

ds^2 = dr^2 - cdt^2 = dr^{'2} - cdt^{'2} \!

So:

u^2dt^2 - cdt^2 = u^{'2}dt^{'2} - cdt^{'2} \!

Hence:

dt^2(c^2 - u^2) = dt^{'2}(u^{'2} - c^2) \!

Given that \gamma (u) = \frac {c}{\sqrt{c^2 - u^2}} \! and \gamma (u^') = \frac {c}{\sqrt{c^2 - u^{'2}}}\!

(3)\frac{dt^'}{dt} = \frac {\gamma(u^')}{\gamma(u)}

Therefore, substituting (2) and (3) in (1):

\frac{1}{c} \gamma^2(u) \frac{du}{dt} = \frac{d\epsilon}{dt^'}\frac {\gamma(u^')}{\gamma(u)}

Applying the differential of arctanh as before to determine \frac{d\epsilon}{dt^'}:

\gamma^3(u^')\frac {du^'}{dt^'} = \gamma^3(u)\frac {du}{dt}

This is a different result from the Newtonian formula in which du/dt = du^'/dt^'. The proper acceleration, \alpha \! is defined as the acceleration of an object in its rest frame. It is the instantaneous change in velocity for an observer for whom u^'=0 \! and \alpha = du^'/dt^' \!. In these circumstances:

\alpha = \gamma^3(u)\frac{du}{dt}

The term \frac{du}{dt}\! is called "coordinate acceleration".

Another approach to acceleration

Classically we would talk about a particle at x(t) with acceleration d²x/dt². In relativity we must treat time as just another coordinate, and use derivatives with respect to proper time, which means our notion of acceleration will change. We can assume the particle is moving slower than light.

Since τ increases monotonically with t (no time travel allowed) we can easily parameterise the path of the particle with τ rather than t. Derivatives with respect to these two variables then differ only by a factor of γ

\frac{d}{d\tau} = \gamma \frac{d}{dt}

which gives us the connection between the classical and relativstic formulae.

Lets suppose we have a particle moving on the path (x(τ),ct(τ)), where τ is the particle's proper time.

Its velocity four vector is

\underline{u} = \frac{d}{d\tau}(x, ct) = \gamma (\dot{\mathbf{x}}, c)

In the particles rest frame this is (0,c), which has constant magnitude -c2, but this scalar must be the same in all frames, thus

\underline{u} \cdot \underline{u} = -c^2

That is, the magnitude of the velocity is always constant.

Differentiating this, we can immediately say

\underline{u} \cdot \underline{a} = 
\frac{d}{d\tau} \underline{u} \cdot \underline{u}= 0

so, velocity and acceleration are always perpendicular.

These two results are much simpler than in classical physics.

Differentiating the velocity gives us, after some algebra ,

\underline{a} = \gamma^4 \ddot{x} (1, \frac{v}{c})

when the motion is along the x-axis. Since the spatial and temporal components must be 3-vectors and scalars respectively we immediately see that for motion in arbitary directions,


\underline{a} = \gamma^4 \left( \ddot{\mathbf{x}} , \frac{\ddot{\mathbf{x}}\cdot \mathbf{v} }{c} \right)

The magnitude of this is

|\underline{a}| = \gamma^3 \ddot{x}

which is the classical magnitude, corrected by a factor dependant on γ. When the velocity is much less than c this factor is approximately 1, so in that limit the magnitudes are the same, as they should be.

Knowing this, we can work out the equation of motion for a particle with constant acceleration a along the x-axis. For simplicity, we'll assume the initial velocity is 0.

Since the accleration is constant

 \frac{\dot{v}}{\left( 1-\frac{v^2}{c^2} \right)^{\frac{3}{2}}} = a

Integrating once we get

\frac{v}{\sqrt{1-\frac{v^2}{c^2}}} = at

or, on rearranging

 v = \frac{at}{\sqrt{1+a^2t^2/c^2}}

If at is much less than c this gives a velocity of approximately at, identical to the classical result, but as t tends to infinity the velocity tends to c.

The position is

 x = d+ \frac{c^2}{a}\sqrt{1+ \frac{a^2 t^2}{c^2}}

where d is some constant. We can choose our coordinate system to make d zero.

Calling the interval between the origin and the particle, I, we have

\begin{matrix}
I^2 & = & x^2 & - & c^2t^2 \\
& = & \frac{c^4}{a^2} \left( 1+ \frac{a^2 t^2}{c^2} \right) & - & c^2t^2\\
& = & \frac{c^4}{a^2} & & \end{matrix}

Thus the interval between the particle and the origin is constant. Notice, this is the equation for an hyperbola, so we know the particle's trajectory.

Momentum

Classically, momentum is velocity multiplied by mass. We can use the same definition in relativity, and see where it takes us.

\underline{p}=m_0 \underline{u}= m_0 \gamma (\mathbf{v}, c)

You may sometimes see the product m0γ which shows up here called the relativistic mass, but we will not be using this approach.

The mass, m0, is generally called the rest mass, to distinguish it from the relativistic mass.

The spatial component of the four-momemntum is clearly the classical momentum, scaled by a factor of γ. At speeds much less than c this will be approximately 1.

The temporal component is m0γc. To see what this means we can look at its value when v/c is small.

m_0 \gamma c = \frac{m_0 c}{\sqrt{1-v^2/c^2}}= 
m_0 c \left( 1 + \frac{1}{2}\frac{v^2}{c^2} + \frac{3}{8}\frac{v^4}{c^4} + \cdots
\right)

The first term in this expansion is a constant.

The second term is

\frac{m_0 v^2}{2c}

which we recognise as being the classical kinetic energy, divided by c.

Now, adding a constant to the definition of kinetic energy makes no real difference, since all that matters are changes in energy, so we can identify this temporal component of relativistic momentum with the energy over c.

\underline{p}= (\gamma \mathbf{p}, E/c)

We then have

E = m_0 c^2 + \frac{1}{2}m_0 v^2 + \frac{3}{8}\frac{m_0 v^4}{c^2} 
+ \cdots

Even at rest, the particle has a kinetic energy,

E=m_0 c^2, \,

the most famous relativistic equation.

Force

Classically, we have

\mathbf{F}= \frac{d\mathbf{p}}{dt}

We can get the equivalent relativistic equation simply by replacing 3-vectors with 4-vectors and t with τ, giving

\underline{F} = \frac{d\underline{p}}{d\tau}

Provided the rest mass is constant, as it is for all simple systems, we can rewrite this as

\underline{F} = m \underline{a}

We already know about a so we can now write

\underline{F} = \gamma^4 \left( \mathbf{F}, 
\frac{ \mathbf{F} \cdot \mathbf{v} }{c} \right)

The temporal component of this is essentially the power, the rate of change of energy with time, as might be expected from energy being the temporal component of momentum.

Accelerated Frames and Event Horizons

Figure: Spacetime diagram showing the world line of the origin of a reference frame undergoing constant acceleration.

We know this world line is an hyperbola, with asymptotes xct. Looking at the diagram show another property.

No light from the 'twilight zone', where x is smaller than ct, can ever reach the accelerated observer. Hence, no event in the twilight zone can affect the observer.

Conversely, there is a region, where x is smaller than -ct, which no light from the accelerated observer can ever reach. Hence the observer can not affect any events in this region.

Because of this we call the two aymptotes event horizons. In some ways, they are like one-way barriers in space-time. The accelerated observer can see objects vanishing behind an event horizon, but they never see any re-emerge from it.

These particular event horizons are observer dependent, as are all event horizons in special relativity. However, general relativity permits event-horizons which are observer independent.

We are not going to prove this, but we can some idea about how event horizons apply to general relativity by using its basic principle, called the equivalence principle by Einstein: that gravity is nothing more than an inertial force.

This means that gravity is locally indistinguishable from forces due to acceleration; if the elevator doors are closed, we can't tell whether we are stationary in a 1g gravity field or being accelerated at 1g in zero gravity.

This suggests that something can be learned about general relativity by examining the properties of accelerated reference frames

Now, since the gravitational force on the Earth points downward, it follows that we must be constantly accelerating upward as we stand on the surface of the Earth. The obvious problem with this interpretation of gravity is that we don't appear to be moving away from the center of the Earth, which would seem to be a natural consequence of such an acceleration.

However, the calculation of the previous chapter shows that even though the object associated with the curved world line in the figure above is accelerating away from the origin, it always remains the same distance (in its own frame) from the origin. In other words, even though we are accelerating away from the center of the Earth, the distance to the center of the Earth remains constant!

Therefore, because of the equivalence principle, we can expect to see the same things as an accelerated observer in special relativity, including an event horizon.

For acceleration in special relativity, inertial objects pass the event horizon at speed c at time infinity, relative to the observer.

Similarly, in general relativity, freely falling objects pass the event horizon at speed c at time infinity, relative to a static observer. By reversing the trajectories we can conclude that at the event horizon the escape velocity is c, i.e gravity is so strong light can not escape.

Thus, using the equivalence principle, and behaviour under acceleration in special relativity, we've predicted black holes.

There isn't actually an event horizon inside the earth, since the mass of the earth is not concentrated in a point and the earth can only be treated as a point mass while outside the earth's surface, which lies outside where the event horizon should otherwise have been.

The red shift

Spacetime diagram for explaining the gravitational red shift.

Light emitted at a lower level in a gravitational field has its frequency reduced as it travels to a higher level. This phenomenon is called the gravitational red shift.

We can see why this happens by using the principle of equivalence. Being in a gravitational field is equivalent to being in an accelerated frame, so knowing how the doppler shift works in such a frame will tell us how it works in a gravitational field.

We view the process of light emission and absorption from the the unaccelerated or inertial frame, as shown in the figure above. In this reference frame the observer of the light is accelerating to the right, as indicated by the curved red world line, which is equivalent to a gravitational force to the left.

The light is emitted at point A with frequency \omega by a source which is stationary at this instant. At this instant the observer is also stationary in this frame. However, by the time the light gets to the observer, they have a velocity to the right which means that the observer measures a Doppler shifted freqency \omega' for the light. Since the observer is moving away from the source, \omega'<\omega, as indicated above.

The relativistic Doppler shift is given by

 \frac{\omega'}{\omega} = \sqrt{ \frac{1 - U/c}{1 + U/c}} ,

so we need to compute U/c. The line of simultaneity for the observer at point B goes through the origin, and is thus given by line segment OB. The slope of this line is U/c, where U is the velocity of the observer at point B. From the figure we see that this slope is also given by the ratio X' /X.

Equating these, eliminating X in favor of L = √(X2 - X2), which is the actual invariant distance of the observer from the origin, and substituting into the previous equation results in our gravitational red shift formula:

\begin{matrix} 
\frac{\omega'}{\omega} & = & \sqrt{ \frac{X - X'}{X + X'} } & = &
\sqrt{\frac{(L^2 + X'^2 )^{1/2} - X'}{(L^2 + X'^2 )^{1/2} + X'} }\\
& =& \frac{(L^2 + X'^2 )^{1/2} - X'}{L} & =& \frac{X-X'}{L}
\end{matrix}

If X′ = 0, then there is no redshift, because the source is collocated with the observer. On the other hand, if the source is located at the origin, so X′=X, the Doppler shifted frequency is zero. In addition, the light never gets to the observer, since the world line is asymptotic to the light world line passing through the origin. If the source is at a higher level in the gravitational field than the observer, so that X′ < 0, then the frequency is shifted to a higher value, i. e., it becomes a blue shift.

To see how this doppler shift relates to the strength of gravity, g, and the distance h between the source and the observer, first note that

L=\frac{c^2}{g} \qquad X-X'=L-h

Making these substitutions gives

\frac{\omega'}{\omega} = 1-\frac{gh}{c^2} \qquad (1)

So the redshift is proportional to gravity.

Since this doppler shift doesn't depend on the type of wave we can conclude that it is actually caused by time dilation, just like the doppler shift due to relative motion.

That is, gravity slows down time.

Energy and frequency

In equation 1 gh is the change in gravitational potential energy so the change in frequency is proportional to change in potential energy, which suggests there might be a connection with energy conservation.

However, since we haven't yet established any connection between frequency and energy, we can't simply apply an energy conservation argument. Instead, we can argue in reverse, finding out what the energy-frequency relationship must be if energy is to be conserved.

Suppose we have two identical systems, both at rest in a uniform gravitational field g with initial energy E separated by a vertical distance h.

The system has mass E/c2, giving it potential energy, so the total energy of the two systems is initially

E_i=2E - \frac{gh}{c^2}E

where the second term is due to the lesser potential energy of the lower system.

The lower system emits a burst of waves, frequency ω, energy E(ω). When this reaches the upper system the waves have been red-shifted to frequency ω', energy E(ω'). This energy is absorbed by the upper system.

The total energy is now

E_f=E-E(\omega)+E+ E(\omega') - \left(E-E(\omega)\right)\frac{gh}{c^2}

Since we want to preserve energy conservation, these two equations must give the same result. Equating them, we get

E(\omega') = E(\omega)(1-\frac{gh}{c^2})

Comparing this with the doppler shift we see that

\frac{E(\omega')}{E(\omega)} = \frac{\omega'}{\omega}

which can only be true if E(ω) is proportional to ω

I.e, energy conservation implies energy is proportional to frequency, which is one of the axioms of quantum theory.

We could equally well have started with the quantum result and proved the gravitational red shift must exist. Either theory requires the other for consistency.

Since energy and frequency are each the temporal components of a four-vector, their being proportional implies the four-vectors themselves, and their spatial components, are also proportional. So, for waves, momentum is proportional to k.

Remember too, we saw earlier, when we looked at Hamilton's equations, that classical mechanics would be equivalent to a theory of anisotropic waves, in the geometrical optics limit, if energy were proportional to frequency and momentum to wave number. This proportionality isn't just required for energy conservation, it would make possible a theory uniting waves and particles.

None of this actually proves the proportionality, doing that requires experiment, but it does make it a natural assumptiom, which is indeed confirmed by experiment.

Because of all this, from now on we'll assume that energy and frequency are related in this way, with the constant of proportionality being \hbar

 (c\mathbf{p}, E) = \hbar (c\mathbf{k}, \omega)

Gravity and curvature

The gravitational redshift also implies that space is curved. We can see this by considering a rectangle in space-time.

Without gravity, if we start at some point A, wait for time t, then move at light-speed to the right for a distance h, we get to the same place, B, as if we move at light-speed for a distance h then wait for time t at rest with respect to A.

With gravity, if we follow the first path, we rise a distance of ct then wait for time t. On the second path, we begin by waiting for time t, but this is dilated by gravity. To an observer at B we appear to be waiting for a time t(1+gh/c2) before we start, so we end up at B later than on the first path.

Thus, with gravity, it matters which order we add distance vectors in. This can't happen if space is flat, so space must be curved.

To describe how it's curved we'd need the techniques of General Relativity.

External Links

Potential Momentum

In classical physics we know that kinematics can often be described by a potential energy alone. Now we've seen that in relativity the energy is just the temporal component of the momentum 4-vector, so we should expect the same of the potential energy. To see how this works, we'll reason by analogy from the classical case.

For a free, non-relativistic particle of mass m, the total energy E equals the kinetic energy K and is related to the momentum Π of the particle by

 E = K = \frac{| \boldsymbol{\Pi} |^2}{2m} \qquad \mbox{(free, non-relativistic)} .

In the non-relativistic case, the momentum is Π= mv, where v is the particle velocity.

If the particle is not free, but is subject to forces associated with a potential energy U(x,y,z), then the equation must be modified to account for the contribution of U to the total energy:

 E - U = K = \frac{| \boldsymbol{\Pi} |^2}{2m} \qquad \mbox{(non-free, non-relativistic)} .

The force on the particle is related to the potential energy by

 \mathbf{F} = - \left( \frac{\partial U}{\partial x} , \frac{\partial U}{\partial y} , \frac{\partial U}{\partial z} \right) .

For a free, relativistic particle, we have

 E = ( | \boldsymbol{\Pi} |^2 c^2 + m^2 c^4 )^{1/2} \qquad \mbox{(free, relativistic)} .

The obvious way to add forces to the relativistic case is by rewriting this equation with a potential energy:

 E - U = ( | \boldsymbol{\Pi} |^2 c^2 + m^2 c^4 )^{1/2} \qquad \mbox{(incomplete!)} .

However \boldsymbol\Pi = (\Pi, E/c) is a four-vector, so an equation with something subtracted from just one of the components of this four-vector is not relativistically invariant. In other words, this equation doesn't obey the principle of relativity, and therefore cannot be correct!

How can we fix this problem? One way is to define a new four-vector with U/c being its timelike part and some new vector Q being its spacelike part:

 \underline{Q} \equiv ( \mathbf{ Q} , U/c) \qquad \mbox{(potential four-momentum)} .

We then subtract Q from the momentum Π. When we do this, equation (13.5) becomes

 E - U = ( | \boldsymbol{\Pi} - \mathbf{ Q}|^2 c^2 + m^2 c^4 )^{1/2} \qquad \mbox{(non-free, relativistic)} .

The quantity Q is called the potential momentum and Q is the potential four-momentum.

If |Π-Q| is much smaller than mc, this becomes approximately

E = mc^2 + \frac{1}{2m}\left( \boldsymbol{\Pi} - \mathbf{Q} \right)^2

This expression for the energy has the same form as the Hamiltonian we looked at for classical velocity dependent forces, so we know it predicts a force perpendicular to the velocity, when the condition is met. It turns out to be perpendicular even when the condition is met.

In classical physics the potential momentum is an optional extra. In relativity it is a necessary part of any potential field.

Some additional terminology is useful. We define

 \mathbf{ p} \equiv \boldsymbol{\Pi} - \mathbf{ Q} \qquad \mbox{(kinetic momentum})

as the kinetic momentum since in the classical case it reduces to mv. In order to avoid confusion, we rename Π the total momentum. Thus, the total momentum equals the kinetic plus the potential momentum, in analogy with energy.


Conservation of 4 momentum

We earlier introduced the ideas of energy and momentum conservation. In other words, if we have a number of particles isolated from the rest of the universe, each with momentum pi and energy Ei, then particles may be created and destroyed and they may collide with each other.

In these interactions the energy and momentum of each particle may change, but the sum total of all the energy and the sum total of all the momentum remains constant with time:

 E = \sum_i E_i = \mbox{const} \qquad 
\mathbf{p} = \sum_i \mathbf{p}_i = \mbox{const}

The expression is simpler in terms of four-momentum:

 \underline{p} = \sum_i \underline{p}_i = \mbox{const}

At this point a statement such as the one above should ring alarm bells. Just what does it mean to say that the total energy and momentum remain constant with time in the context of relativity? Which time? The time in which reference frame?

Suppose two particles exchange four-momentum remotely at the time indicated by the fat horizontal bar in the left panel. Conservation of four-momentum implies that

 \underline{p}_A + \underline{p}_B = \underline{p}'_A + \underline{p}'_B

where the subscripted letters correspond to the particle labels in the figure. Primed values refer to the momentum after the exchange while no primes indicates values before the exchange.

Now view the exchange from the reference frame in the right panel. A problem with four-momentum conservation exists in the region between the thin horizontal lines. In this region particle B has already transferred its four-momentum, but it has yet to be received by particle A. In other words, four-momentum is not conserved in this reference frame!

This problem is so serious that we must eliminate the concept of action at a distance from the repertoire of physics. The only way to have particles interact remotely and still conserve four-momentum in all reference frames is to assume that all remote interactions are mediated by another particle, or by a field.

If the force is being mediated by a particle then first, particle A emits particle C in a manner which conserves the four-momentum. Second, particle C is absorbed by particle B in a similarly conservative interaction.

If the force is being mediated by a field then first, particle A emits wave C, with momentum proportional to its wavenumber in a manner, which conserves the four-momentum. Wave C then travels at c or less until it is absorbed by particle B in a similarly conservative interaction.

We'll see that in quantum theory the difference between a particle and a field vanishes, so these two pictures actually both describe the same mechanism a different perspective. Which every picture we use, four-momentum is conserved at all times in all reference frames.

In other words, momentum and energy are transferred from particle A to particle B in a two step process. In between the momentum resides in a particle or field.

Mathematical Appendix

Mathematics of the Lorentz Transformation Equations

Consider two observers O and O^', moving at velocity v \, relative to each other who synchronise their clocks so that t=t^'=0 as they pass each other. They both observe the same event as a flash of light. How will the coordinates recorded by the observers of the event that produced the light be interrelated?

The relationship between the coordinates can be derived using linear algebra on the basis of the postulates of relativity and an extra homogeneity and isotropy assumption.

The homogeneity and isotropy assumption: space is uniform and homogeneous in all directions. If this were not the case then when comparing lengths between coordinate systems the lengths would depend upon the position of the measurement. For instance, if  x^' = a x^2 \, the distance between two points would depend upon position.

The linear equations relating coordinates in the primed and unprimed frames are:

x^' = a_{11} x + a_{12} y + a_{13} z  + a_{14} t \,
y^' = a_{21} x + a_{22} y + a_{23} z  + a_{24} t \,
z^' = a_{31} x + a_{32} y + a_{33} z  + a_{34} t \,
t^' = a_{41} x + a_{42} y + a_{43} z  + a_{44} t \,

There is no relative motion in the y or z directions so, according to the 'relativity' postulate:

z^' = z \,
y^' = y \,

Hence:

a_{22} = 1 \, and a_{21} = a_{23} = a_{24} = 0 \,
a_{33} = 1 \, and a_{31} = a_{32} = a_{34} = 0 \,

So the following equations remain to be solved:

x^' = a_{11} x + a_{12} y + a_{13} z  + a_{14} t \,
t^' = a_{41} x + a_{42} y + a_{43} z  + a_{44} t \,

If space is isotropic (the same in all directions) then the motion of clocks should be independent of the y and z axes (otherwise clocks placed symmetrically around the x-axis would appear to disagree). Hence:

 a_{42} = a_{43} = 0 \,

so:

t^' = a_{41} x + a_{44} t \,

Events satisfying x^' = 0 \, must also satisfy x = vt \,. So:

0 = a_{11} vt + a_{12} y + a_{13} z  + a_{14} t \,

and

-a_{11} vt = a_{12} y + a_{13} z  + a_{14} t \,

Given that the equations are linear then a_{12} y + a_{13} z = 0 \, and:

-a_{11} vt = a_{14} t \,

and

-a_{11} v = a_{14} \,


Therefore the correct transformation equation for x^' \, is:

x^' = a_{11} (x - vt) \,

The analysis to date gives the following equations:

x^' = a_{11} (x - vt) \,
y^' = y \,
z^' = z \,
t^' = a_{41} x + a_{44} t \,


Assuming that the speed of light is constant, the coordinates of a flash of light that expands as a sphere will satisfy the following equations in each coordinate system:

x^2 + y^2 + z^2 = c^2t^2 \,
x^{'2} + y^{'2} + z^{'2} = c^2t^{'2} \,

Substituting the coordinate transformation equations into the second equation gives:

a_{11}^2(x - vt)^2 + y^2 + z^2 = c^2(a_{41}x + a_{44}t)^2 \,

rearranging:

(a_{11}^2 - c^2 a_{41}^2)x^2 + y^2 + z^2 - 2(va_{11}^2 + c^2a_{41} a_{44}) xt = (c^2 a_{44}^2 - v^2 a_{11}^2)t^2 \,

We demand that this is equivalent with

x^2 + y^2 + z^2 = c^2t^2 \,

So we get:

 c^2 a_{44}^2 - v^2 a_{11}^2 = c^2 \,
 a_{11}^2 - c^2 a_{41}^2 = 1 \,
va_{11}^2 + c^2a_{41} a_{44} = 0 \,


Solving these 3 simultaneous equations gives:

 a_{44} = \frac{1}{\sqrt{(1 - v^2/c^2)}} \,
 a_{11} = \frac{1}{\sqrt{(1 - v^2/c^2)}} \,
 a_{41} = -\frac{v/c^2}{\sqrt{(1 - v^2/c^2)}} \,

Substituting these values into:

x^' = a_{11} (x - vt) \,
y^' = y \,
z^' = z \,
t^' = a_{41} x + a_{44} t \,

gives:

x^' = \frac{x - vt}{\sqrt{(1 - v^2/c^2)}} \,
y^' = y \,
z^' = z \,
t^' = \frac{t - (v/c^2)x}{\sqrt{(1 - v^2/c^2)}} \,

The inverse transformation is:

x = \frac{x^' + vt^'}{\sqrt{(1 - v^2/c^2)}} \,
y = y^' \,
z = z^' \,
t = \frac{t^' + (v/c^2)x^'}{\sqrt{(1 - v^2/c^2)}} \,


Einstein's original approach

How would two observers measure the position and timing of an event by using light rays if the speed of light were constant? The modern analysis of this problem, exposing the assumptions involved, is given above but Einstein's original reasoning (Einstein 1905,1920) is as follows.

Light is transmitted along the positive x axis according to the equation x = ct where c is the velocity of light. This can be rewritten as:

x - ct = 0

Another observer, moving relatively to the first may find different values for x and t but the same equation will apply:

x^' - ct^' = 0

A simple relationship between these formulae, which apply to the same event is:

(x^' - ct^') = \lambda (x - ct)


Light is transmitted along the negative x axis according to the equation x = -ct where c is the velocity of light. This can be rewritten as:

x + ct = 0
x^' + ct^' = 0

And:

(x^' + ct^') = \mu (x + ct)

Adding the equations and substituting a = \frac{\lambda + \mu}{2} and b = \frac{\lambda - \mu}{2}:

(1) x' = ax - bct
(2) ct' = act - bx

The origin of one set of coordinates can be set so that x^' = 0 hence:

x = \frac{bc}{a} t

If v is the velocity of one observer relative to the other then v = \frac{x}{t} and:

(3) v = \frac{bc}{a}

At t = 0:

(4) x^' = ax

Therefore two points separated by unit distance in the primed frame of reference ie: when x^' = 1 have the following separation in the unprimed frame:

(5) \Delta x = \frac{1}{a}

Now t can be eliminated from equations (1) and (2) and combined with v = \frac{bc}{a} and (4) to give in the case where x=1 and t^' = 0:

(6) x^' = a (1 -\frac{v^2}{c^2}) x

And, if \Delta x = 1:

(7) \Delta x^' = a (1 -\frac{v^2}{c^2})

Now if the two moving systems are identical and the situation is symmetrical a measurement in the unprimed system of a division showing one metre on a measuring rod in the primed system is going to be identical to a measurement in the primed system of a division showing one metre on a measuring rod in the unprimed system. Thus (5) and (7) can be combined so that:

\frac {1}{a} = a (1 -\frac{v^2}{c^2})

So:

a^2 = \frac {1}{(1 -\frac{v^2}{c^2})}

Inserting this value for a into equations (1) and (2) and solving for b gives:


 x^' = \frac {x - vt}{\sqrt {1 -\frac{v^2}{c^2}}}


 t^' = \frac {t - (v/c^2)x}{\sqrt {1 -\frac{v^2}{c^2}}}

These are the Lorentz Transformation Equations for events on the x axis.

Einstein, A. (1920). Relativity. The Special and General Theory. Methuen & Co Ltd 1920. Written December, 1916. Robert W. Lawson (Authorised translation). http://www.bartleby.com/173/


License

GNU Free Documentation License

Version 1.3, 3 November 2008 Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. <http://fsf.org/>

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

0. PREAMBLE

The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.

This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.

1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.

A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.

The "publisher" means any person or entity that distributes copies of the Document to the public.

A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition.

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.

2. VERBATIM COPYING

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and you may publicly display copies.

3. COPYING IN QUANTITY

If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.

4. MODIFICATIONS

You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

  1. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.
  2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement.
  3. State on the Title page the name of the publisher of the Modified Version, as the publisher.
  4. Preserve all the copyright notices of the Document.
  5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
  6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.
  7. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.
  8. Include an unaltered copy of this License.
  9. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.
  10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.
  11. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
  12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.
  13. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified version.
  14. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section.
  15. Preserve any Warranty Disclaimers.

If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.

You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties—for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.

5. COMBINING DOCUMENTS

You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements".

6. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.

7. AGGREGATION WITH INDEPENDENT WORKS

A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.

8. TRANSLATION

Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.

If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.

9. TERMINATION

You may not copy, modify, sublicense, or distribute the Document except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, or distribute it is void, and will automatically terminate your rights under this License.

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.

Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice.

Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, receipt of a copy of some or all of the same material does not give you any rights to use it.

10. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. If the Document specifies that a proxy can decide which future versions of this License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Document.

11. RELICENSING

"Massive Multiauthor Collaboration Site" (or "MMC Site") means any World Wide Web server that publishes copyrightable works and also provides prominent facilities for anybody to edit those works. A public wiki that anybody can edit is an example of such a server. A "Massive Multiauthor Collaboration" (or "MMC") contained in the site means any set of copyrightable works thus published on the MMC site.

"CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0 license published by Creative Commons Corporation, a not-for-profit corporation with a principal place of business in San Francisco, California, as well as future copyleft versions of that license published by that same organization.

"Incorporate" means to publish or republish a Document, in whole or in part, as part of another Document.

An MMC is "eligible for relicensing" if it is licensed under this License, and if all works that were first published under this License somewhere other than this MMC, and subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or invariant sections, and (2) were thus incorporated prior to November 1, 2008.

The operator of an MMC Site may republish an MMC contained in the site under CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is eligible for relicensing.

How to use this License for your documents

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:

Copyright (c) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled "GNU
Free Documentation License".

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with...Texts." line with this:

with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.

If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.