Special Relativity/Mathematical approach

From Wikibooks, open books for an open world
< Special Relativity
Jump to: navigation, search


Physical effects involve things acting on other things to produce a change of position, tension etc. These effects usually depend upon the strength, angle of contact, separation etc of the interacting things rather than on any absolute reference frame so it is useful to describe the rules that govern the interactions in terms of the relative positions and lengths of the interacting things rather than in terms of any fixed viewpoint or coordinate system. Vectors were introduced in physics to allow such relative descriptions.

The use of vectors in elementary physics often avoids any real understanding of what they are. They are a new concept, as unique as numbers themselves, which have been related to the rest of mathematics and geometry by a series of formulae such as linear combinations, scalar products etc.

Vectors are defined as "directed line segments" which means they are lines drawn in a particular direction. The introduction of time as a geometric entity means that this definition of a vector is rather archaic, a better definition might be that a vector is information arranged as a continuous succession of points in space and time. Vectors have length and direction, the direction being from earlier to later.

Vectors are represented by lines terminated with arrow symbols to show the direction. A point that moves from the left to the right for about three centimetres can be represented as:


If a vector is represented within a coordinate system it has components along each of the axes of the system. These components do not normally start at the origin of the coordinate system.


The vector represented by the bold arrow has components a, b and c which are lengths on the coordinate axes. If the vector starts at the origin the components become simply the coordinates of the end point of the vector and the vector is known as the position vector of the end point.

Addition of Vectors[edit]

If two vectors are connected so that the end point of one is the start of the next the sum of the two vectors is defined as a third vector drawn from the start of the first to the end of the second:


c is the sum of a and b:

c = a + b

If a components of a are a, b, c and the components of b are d, e, f then the components of the sum of the two vectors are (a+d), (b+e) and (c+f). In other words, when vectors are added it is the components that add numerically rather than the lengths of the vectors themselves.

Rules of Vector Addition

1. Commutativity a + b = b + a

2. Associativity (a + b) + c = a + (b + c)

If the zero vector (which has no length) is labelled as 0

3. a + (-a) = 0

4. a + 0 = a

Rules of Vector Multiplication by a Scalar[edit]

The discussion of components and vector addition shows that if vector a has components a,b,c then qa has components qa, qb, qc. The meaning of vector multiplication is shown below:


The bottom vector c is added three times which is equivalent to multiplying it by 3.

1. Distributive laws q(a + b) = qa + qb and (q + p)a = qa + pa

2. Associativity q(pa) = qpa

Also 1 a = a

If the rules of vector addition and multiplication by a scalar apply to a set of elements they are said to define a vector space.

Linear Combinations and Linear Dependence[edit]

An element of the form:

q_1\mathbf{a_1} + q_2\mathbf{a_2} + q_3\mathbf{a_3} +.... + q_m \mathbf{a_m}

is called a linear combination of the vectors.

The set of vectors multiplied by scalars in a linear combination is called the span of the vectors. The word span is used because the scalars (q) can have any value - which means that any point in the subset of the vector space defined by the span can contain a vector derived from it.

Suppose there were a set of vectors ({a_1,a_2,.... ,a_m}) , if it is possible to express one of these vectors in terms of the others, using any linear combination, then the set is said to be linearly dependent. If it is not possible to express any one of the vectors in terms of the others, using any linear combination, it is said to be linearly independent.

In other words, if there are values of the scalars such that:

(1). \mathbf{a_1} = q_2\mathbf{a_2} + q_3\mathbf{a_3} +.... + q_m\mathbf{a_m}

the set is said to be linearly dependent.

There is a way of determining linear dependence. From (1) it can be seen that if q_1 is set to minus one then:

q_1\mathbf{a_1} + q_2\mathbf{a_2} + q_3\mathbf{a_3} +.... + q_m\mathbf{a_m} = 0

So in general, if a linear combination can be written that sums to a zero vector then the set of vectors (\mathbf{a_1,a_2,.... ,a_m}) are not linearly independent.

If two vectors are linearly dependent then they lie along the same line (wherever a and b lie on the line, scalars can be found to produce a linear combination which is a zero vector). If three vectors are linearly dependent they lie on the same line or on a plane (collinear or coplanar).


If n+1 vectors in a vector space are linearly dependent then n vectors are linearly independent and the space is said to have a dimension of n. The set of n vectors is said to be the basis of the vector space.

Scalar Product[edit]

Also known as the 'dot product' or 'inner product'. The scalar product is a way of removing the problem of angular measures from the relationship between vectors and, as Weyl put it, a way of comparing the lengths of vectors that are arbitrarily inclined to each other.

Consider two vectors with a common origin:


The projection of \mathbf{a} on the adjacent side is:

P = | \mathbf{a} | cos \theta

Where | \mathbf{a} | is the length of \mathbf{a}.

The scalar product is defined as:

(2) \mathbf{a . b} = | \mathbf{a} | | \mathbf{b} | cos \theta

Notice that cos \theta is zero if \mathbf{a} and \mathbf{b} are perpendicular. This means that if the scalar product is zero the vectors composing it are orthogonal (perpendicular to each other).

(2) also allows cos \theta to be defined as:

cos \theta = \mathbf{a . b} / ( | \mathbf{a} | | \mathbf{b} |)

The definition of the scalar product also allows a definition of the length of a vector in terms of the concept of a vector itself. The scalar product of a vector with itself is:

\mathbf{a . a} = | \mathbf{a} | | \mathbf{a} | cos 0

cos 0 (the cosine of zero) is one so:

\mathbf{a . a} = a^2

which is our first direct relationship between vectors and scalars. This can be expressed as:

(3) a = \sqrt{\mathbf{a . a}}

where a is the length of \mathbf{a}.


1. Linearity [G\mathbf{a} + H\mathbf{b}].\mathbf{c} = G\mathbf{a.c} + H\mathbf{b.c}

2. symmetry \mathbf{a.b} = \mathbf{b.a}

3. Positive definiteness \mathbf{a.a} is greater than or equal to 0

4. Distributivity for vector addition \mathbf{(a + b).c} = \mathbf{a.c + b.c}

5. Schwarz inequality | \mathbf{a.b} | \leq ab

6. Parallelogram equality | \mathbf{a} + \mathbf{b} |^2 + | \mathbf{a} - \mathbf{b} |^2 = 2( | \mathbf{a} |^2 + | \mathbf{b} |^2)

From the point of view of vector physics the most important property of the scalar product is the expression of the scalar product in terms of coordinates.

7. \mathbf{a.b} = a_1b_1 + a_2b_2 + a_3b_3

This gives us the length of a vector in terms of coordinates (Pythagoras' theorem) from:

8. \mathbf{a.a} = a^2 = a_1^2 + a_2^2 + a_3^2

The derivation of 7 is:

\mathbf{a} = a_1\mathbf{i} + a_2\mathbf{j} + a_3\mathbf{k}

where \mathbf{i}, \mathbf{j}, \mathbf{k} are unit vectors along the coordinate axes. From (4)

\mathbf{a.b} = (a_1\mathbf{i} + a_2\mathbf{j} + a_3\mathbf{k}) .\mathbf{b} = a_1\mathbf{i}.\mathbf{b} + a_2\mathbf{j}.\mathbf{b} + a_3\mathbf{k}.\mathbf{b}

but \mathbf{b} = b_1\mathbf{i} + b_2\mathbf{j} + b_3\mathbf{k}


\mathbf{a.b} = b_1a_1\mathbf{i .i} + b_2a_1\mathbf{i .j} + b_3a_1\mathbf{i .k} + b_1a_2\mathbf{j.i} + b_2a_2\mathbf{j.j} + b_3a_2\mathbf{j.k} + b_1a_3\mathbf{k.i} + b_2a_3\mathbf{k.j} + b_3a_3\mathbf{k.k}

\mathbf{i .j, i .k, j .k,} etc. are all zero because the vectors are orthogonal, also \mathbf{i .i, j.j} and \mathbf{k.k} are all one (these are unit vectors defined to be 1 unit in length).

Using these results:

\mathbf{a.b} = a_1b_1 + a_2b_2 + a_3b_3


Matrices are sets of numbers arranged in a rectangular array. They are especially important in linear algebra because they can be used to represent the elements of linear equations.

11a + 2b = c

5a + 7b = d

The constants in the equation above can be represented as a matrix:

\mathbf{A} =
11 & 2 \\
5 & 7 \\

The elements of matrices are usually denoted symbolically using lower case letters:

\mathbf{A} =
a_{11} & a_{12} \\
a_{21} & a_{22} \\

Matrices are said to be equal if all of the corresponding elements are equal.

Eg: if a_{ij}= b_{ij}

Then \mathbf{A} = \mathbf{B}

Matrix Addition[edit]

Matrices are added by adding the individual elements of one matrix to the corresponding elements of the other matrix.

c_{ij} = a_{ij} + b_{ij}

or \mathbf{C} = \mathbf{A} + \mathbf{B}

Matrix addition has the following properties:

1. Commutativity \mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A}

2. Associativity (\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C})


3. \mathbf{A} + (-\mathbf{A}) = 0

4. \mathbf{A} + 0 = \mathbf{A}

From matrix addition it can be seen that the product of a matrix \mathbf{A} and a number p is simply p\mathbf{A} where every element of the matrix is multiplied individually by p.

Transpose of a Matrix

A matrix is transposed when the rows and columns are interchanged:

\mathbf{A} =
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33} \\
\mathbf{A^T} =
a_{11} & a_{21} & a_{31} \\
a_{12} & a_{22} & a_{32} \\
a_{13} & a_{23} & a_{33} \\

Notice that the principal diagonal elements stay the same after transposition.

A matrix is symmetric if it is equal to its transpose eg: a_{kj} = a_{jk}.

It is skew symmetric if \mathbf{A^T} = -\mathbf{A} eg: a_{kj} = -a_{jk}. The principal diagonal of a skew symmetric matrix is composed of elements that are zero.

Other Types of Matrix

Diagonal matrix: all elements above and below the principal diagonal are zero.

4 & 0 & 0 \\
0 & -1 & 0 \\
0 & 0 & 2 \\

Unit matrix: denoted by I, is a diagonal matrix where all elements of the principal diagonal are 1.

1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\

Matrix Multiplication[edit]

Matrix multiplication is defined in terms of the problem of determining the coefficients in linear transformations.

Consider a set of linear transformations between 2 coordinate systems that share a common origin and are related to each other by a rotation of the coordinate axes.

Two Coordinate Systems Rotated Relative to Each Other

If there are 3 coordinate systems, x, y, and z these can be transformed from one to another:

x_1 = a_{11}y_1 + a_{12}y_2

x_2 = a_{21}y_1 + a_{22}y_2

y_1 = b_{11}z_1 + b_{12}z_2

y_2 = b_{21}z_1 + b_{22}z_2

x_1 = c_{11}z_1 + c_{12}z_2

x_2 = c_{21}z_1 + c_{22}z_2

By substitution:

x_1 = a_{11}(b_{11}z_1 + b_{12}z_2) + a_{12}(b_{21}z_1 + b_{22}z_2)

x_2 = a_{21}(b_{11}z_1 + b_{12}z_2) + a_{22}(b_{21}z_1 + b_{22}z_2)

x_1 = (a_{11}b_{11} + a_{12}(b_{21})z_1 + (a_{11}b_{12} + a_{12}b_{22})z_2

x_2 = (a_{21}b_{11} + a_{22}(b_{21})z_1 + (a_{21}b_{12} + a_{22}b_{22})z_2


c_{11} = (a_{11}b_{11} + a_{12}(b_{21})

c_{12} = (a_{11}b_{12} + a_{12}b_{22})

c_{21} = (a_{21}b_{11} + a_{22}b_{21})

c_{22} = (a_{21}b_{12} + a_{22}b_{22})

The coefficient matrices are:

\mathbf{A} =
a_{11} & a_{12} \\
a_{21} & a_{22} \end{bmatrix}
\mathbf{B} =
b_{11} & b_{12} \\
b_{21} & b_{22} \end{bmatrix}
\mathbf{C} =
c_{11} & c_{12} \\
c_{21} & c_{22} \end{bmatrix}

From the linear transformation the product of A and B is defined as:

\mathbf{C} = \mathbf{AB} =
(a_{11}b_{11} + a_{12}b_{21}) & (a_{11}b_{12} + a_{12}b_{22}) \\
(a_{21}b_{11} + a_{22}b_{21}) & (a_{21}b_{12} + a_{22}b_{22}) \end{bmatrix}

In the discussion of scalar products it was shown that, for a plane the scalar product is calculated as: \mathbf{a.b} = a_1b_1 + a_2b_2 where a and b are the coordinates of the vectors a and b.

Now mathematicians define the rows and columns of a matrix as vectors:

A Column vector is \mathbf{b}=
b_{11} \\
b_{21} \end{bmatrix}

And a Row vector \mathbf{a}=
a_{11} & a_{12} \end{bmatrix}

Matrices can be described as vectors eg:

\mathbf{A} =
a_{11} & a_{12} \\
a_{21} & a_{22} \end{bmatrix}
\mathbf{a_{1}} \\ 
\mathbf{a_{2}} \end{bmatrix}


\mathbf{B} =
b_{11} & b_{12} \\
b_{21} & b_{22} \end{bmatrix}
\mathbf{b_{1}} \mathbf{b_{2}} \end{bmatrix}

Matrix multiplication is then defined as the scalar products of the vectors so that:

\mathbf{C} =
\mathbf{a_1.b_1} & \mathbf{a_1.b_2} \\
\mathbf{a_2.b_1} & \mathbf{a_2.b_2} \end{bmatrix}

From the definition of the scalar product \mathbf{a_1.b_1} = a_{11}b_{11} + a_{12}b_{21} etc.

In the general case:

\mathbf{C} =
\mathbf{a_1.b_1} & \mathbf{a_1.b_2} & . & \mathbf{a_1.b_n} \\
\mathbf{a_2.b_1} & \mathbf{a_2.b_2} & . & \mathbf{a_2.b_n} \\
. & . & . & . \\
\mathbf{a_m.b_1} & \mathbf{a_m.b_2} & . & \mathbf{a_m.b_n} \end{bmatrix}

This is described as the multiplication of rows into columns (eg: row vectors into column vectors). The first matrix must have the same number of columns as there are rows in the second matrix or the multiplication is undefined.

After matrix multiplication the product matrix has the same number of rows as the first matrix and columns as the second matrix:

1 & 3 & 4 \\
6 & 3 & 2 \end{bmatrix}
2 \\
3 \\
7 \end{bmatrix}
has 2 rows and 1 column 
39 \\
35 \end{bmatrix}

ie: first row is 1*2 + 3*3 + 4*7 = 39 and second row is 6*2 + 3*3 + 2*7 = 35

\mathbf{AB} = \begin{bmatrix}
1 & 3 & 2 \\
2 & -1 & 3 \end{bmatrix}
2 & 3 & 4\\
3 & 2 & 1\\
5 & 1 & 3 \end{bmatrix}
has 2 rows and 3 columns
21 & 11 & 13 \\
16 & 7 & 16 \end{bmatrix}

Notice that \mathbf{BA} cannot be determined because the number of columns in the first matrix must equal the number of rows in the second matrix to perform matrix multiplication.

Properties of Matrix Multiplication

1. Not commutative \mathbf{AB} \ne \mathbf{BA}

2. Associative \mathbf{A(BC)} = \mathbf{(AB)C}

(k\mathbf{A})\mathbf{B} = k(\mathbf{AB}) = \mathbf{A}(k\mathbf{B})

3. Distributative for matrix addition

(\mathbf{A} + \mathbf{B})\mathbf{C} = \mathbf{AC} + \mathbf{BC}

matrix multiplication is not commutative so \mathbf{C}(\mathbf{A} + \mathbf{B}) = \mathbf{CA} + \mathbf{CB} is a separate case.

4. The cancellation law is not always true:

\mathbf{AB} = 0 does not mean \mathbf{A}=0 or \mathbf{B}=0

There is a case where matrix multiplication is commutative. This involves the scalar matrix where the values of the principle diagonal are all equal. Eg:

\mathbf{S} =
k & 0 & 0 \\
0 & k & 0 \\
0 & 0 & k \end{bmatrix}

In this case \mathbf{AS} = \mathbf{SA} = k\mathbf{A}. If the scalar matrix is the unit matrix: \mathbf{AI} = \mathbf{IA} = \mathbf{A}.

Linear Transformations[edit]

A simple linear transformation such as:

x_1 = a_{11}y_1 + a_{12}y_2

x_2 = a_{21}y_1 + a_{22}y_2

can be expressed as:

\mathbf{x} = \mathbf{Ay}


x_1 \\
x_2 \end{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22} \end{bmatrix}
y_1 \\
y_2 \end{bmatrix}

and y_1 = b_{11}z_1 + b_{12}z_2

y_2 = b_{21}z_1 + b_{22}z_2

as: \mathbf{y} = \mathbf{Bz}

Using the associative law:

\mathbf{x} = \mathbf{A}(\mathbf{Bz}) = \mathbf{ABz} = \mathbf{Cz}

and so:

\mathbf{C} = \mathbf{AB} =
(a_{11}b_{11} + a_{12}b_{21}) & (a_{11}b_{12} + a_{12}b_{22}) \\
(a_{21}b_{11} + a_{22}b_{21}) & (a_{21}b_{12} + a_{22}b_{22}) \end{bmatrix}

as before.

Indicial Notation[edit]

Consider a simple rotation of coordinates:


x^{\mu} is defined as x_1 , x_2

x^{\nu} is defined as x_{1}^' , x_{2}^'

The scalar product can be written as:

\mathbf{s.s} =g_{\mu \nu} x^\mu x^\nu


g_{\mu \nu} =
1 & 0 \\
0 & 1 \end{bmatrix}

and is called the metric tensor for this 2D space.

 \mathbf{s.s} = g_{11} x_1x_{1}^' + g_{12} x_1x_2^' + g_{21} x_2x_{1}^' + g_{22} x_2x_{2}^'

Now, g_{11} = 1, g_{12} = 0, g_{21} = 0, g_{22} = 1 so:

\mathbf{s.s} = x_1x_{1}^' + x_2x_{2}^'

If there is no rotation of coordinates the scalar product is:

\mathbf{s.s} = x_1x_1 + x_2x_2

s^2 = x_{1}^2 + x_{2}^2

Which is Pythagoras' theorem.

The Summation Convention[edit]

Indexes that appear as both subscripts and superscripts are summed over.

g_{\mu \nu} x^\mu x^\nu = g_{11} x_1x_{1}^' + g_{12} x_1x_2^' + g_{21} x_2x_{1}^' + g_{22} x_2x_{2}^'

by promoting \nu to a superscript it is taken out of the summation ie:.

g_{\mu}^\nu x^\mu x^\nu = g_{1\nu} x_1x_{\nu}^' + g_{2\nu} x_2x_{\nu}^'

Matrix Multiplication in Indicial Notation[edit]


Columns times rows:

x_1 \\
x_2 \end{bmatrix}
times \begin{bmatrix}y_1 & y_2 \end{bmatrix}
x_1 y_1 & x_1 y_2 \\
x_2 y_1 & x_2 y_2 \end{bmatrix}

Matrix product \mathbf{XY} = x_iy_j Where i = 1, 2 j = 1, 2

There being no summation the indexes are both subscripts.

Rows times columns: 
x_1 & x_2 \end{bmatrix}
y_1 \\
y_2 \end{bmatrix}
x_1 y_1 & x_2 y_2 \end{bmatrix}

Matrix product \mathbf{XY} = \delta_{ij} x^iy^j

Where \delta_{ij} is known as Kronecker delta and has the value 0 when i \ne j and 1 when i = j . It is the indicial equivalent of the unit matrix:

1 & 0 \\
0 & 1 \end{bmatrix}

There being summation one value of i is a subscript and the other a superscript.

A matrix in general can be specified by any of:

M_{i}^j , M_{ij} , M^{i}_j , M^{ij} depending on which subscript or superscript is being summed over.

Vectors in Indicial Notation[edit]

A vector can be expressed as a sum of basis vectors.

\mathbf{x} = a_1\mathbf{e}_1 + a_2\mathbf{e}_2 + a_3\mathbf{e}_3

In indicial notation this is: x = a^ie_i

Linear Transformations in indicial notation[edit]

Consider \mathbf{x} = \mathbf{Ay} where \mathbf{A} is a coefficient matrix and \mathbf{x} and \mathbf{y} are coordinate matrices.

In indicial notation this is:

x^{\mu} = A^{\mu}_{\nu} x^{\nu}

which becomes:

x_1 = a_{11} x^{'}_1+ a_{12} x^{'}_2+ a_{13} x^{'}_3

x_2 = a_{21} x^{'}_1+ a_{22} x^{'}_2+ a_{23} x^{'}_3

x_3 = a_{31} x^{'}_1+ a_{32} x^{'}_2+ a_{33} x^{'}_3

The Scalar Product in indicial notation[edit]

In indicial notation the scalar product is:

\mathbf{x.y} = \delta_{ij} x^i y^j

Analysis of curved surfaces and transformations[edit]

It became apparent at the start of the nineteenth century that issues such as Euclid's parallel postulate required the development of a new type of geometry that could deal with curved surfaces and real and imaginary planes. At the foundation of this approach is Gauss's analysis of curved surfaces which allows us to work with a variety of coordinate systems and displacements on any type of surface.

Elementary geometric analysis is useful as an introduction to Special Relativity because it suggests the physical meaning of the coefficients that appear in coordinate transformations.

Suppose there is a line on a surface. The length of this line can be expressed in terms of a coordinate system. A short length of line \Delta s in a two dimensional space may be expressed in terms of Pythagoras' theorem as:

\Delta s^2 = \Delta x^2 + \Delta y^2

Suppose there is another coordinate system on the surface with two axes: x1, x2, how can the length of the line be expressed in terms of these coordinates? Gauss tackled this problem and his analysis is quite straightforward for two coordinate axes:

Figure 1:


It is possible to use elementary differential geometry to describe displacements along the plane in terms of displacements on the curved surfaces:

 \Delta Y = \Delta x_1 \frac {\delta Y}{\delta x_1} + \Delta x_2 \frac{\delta Y}{\delta x_2}

 \Delta Z = \Delta x_1 \frac {\delta Z}{\delta x_1} + \Delta x_2 \frac{\delta Z}{\delta x_2}

The displacement of a short line is then assumed to be given by a formula, called a metric, such as Pythagoras' theorem

\Delta S^2 = \Delta Y^2 + \Delta Z^2

The values of  \Delta Y and  \Delta Z can then be substituted into this metric:

\Delta S^2 = ( \Delta x_1 \frac {\delta Y}{\delta x_1} + \Delta x_2 \frac{\delta Y}{\delta x_2} )^2 + ( \Delta x_1 \frac {\delta Z}{\delta x_1} + \Delta x_2 \frac{\delta Z}{\delta x_2} )^2

Which, when expanded, gives the following:

\Delta S^2 =

( \frac{\delta Y}{\delta x_1}\frac{\delta Y}{\delta x_1}  + \frac{\delta Z}{\delta x_1} \frac{\delta Z}{\delta x_1} ) \Delta x_1 \Delta x_1

 +( \frac{\delta Y}{\delta x_2}\frac{\delta Y}{\delta x_1}  + \frac{\delta Z}{\delta x_2} \frac{\delta Z}{\delta x_1} ) \Delta x_2 \Delta x_1

 + ( \frac{\delta Y}{\delta x_1}\frac{\delta Y}{\delta x_2}  + \frac{\delta Z}{\delta x_1} \frac{\delta Z}{\delta x_2} ) \Delta x_1 \Delta x_2

 + ( \frac{\delta Y}{\delta x_2}\frac{\delta Y}{\delta x_2}  + \frac{\delta Z}{\delta x_2} \frac{\delta Z}{\delta x_2} ) \Delta x_2 \Delta x_2

This can be represented using summation notation:

\Delta S^2 =  \sum_{i=1}^2 \sum_{k=1}^2 (\frac{\delta Y}{\delta x_i}\frac{\delta Y}{\delta x_k}  + \frac{\delta Z}{\delta x_i} \frac{\delta Z}{\delta x_k} ) \Delta x_i \Delta x_k

Or, using indicial notation:

\Delta S^2 = g_{ik} \Delta x^i \Delta x^k


 g_{ik} = (\frac{\delta Y}{\delta x^i}\frac{\delta Y}{\delta x^k}  + \frac{\delta Z}{\delta x^i} \frac{\delta Z}{\delta x^k} )

If the coordinates are not merged then \Delta s is dependent on both sets of coordinates. In matrix notation:

\Delta s^2 = \mathbf{g} \Delta \mathbf{x} \Delta \mathbf{x}


\Delta x_1 & \Delta x_2 \end{bmatrix}
a & b \\
c & d \end{bmatrix}
\Delta x_1 \\
\Delta x_2 \end{bmatrix}

Where a, b, c, d stand for the values of g_{ik}.


\Delta x_1a + \Delta x_2c & \Delta x_1b + \Delta x_2d \end{bmatrix}
\Delta x_1 \\
\Delta x_2 \end{bmatrix}

Which is:

(\Delta{x_1}a + \Delta{x_2}c) \Delta{x_1} + (\Delta{x_1}b + \Delta{x_2}d) \Delta{x_2} = \Delta{x_1}^2a + 2\Delta{x_1}\Delta{x_2}(c + b) + \Delta{x_2}^2d


\Delta{s}^2 = \Delta{x_1}^2a + 2\Delta{x_1}\Delta{x_2}(c + b) + \Delta{x_2}^2d

\Delta{s}^2 is a bilinear form that depends on both \Delta{x_1} and \Delta{x_2}. It can be written in matrix notation as:

\Delta{s}^2 = \mathbf{\Delta{x}^T A \Delta{x}}

Where A is the matrix containing the values in g_{ik}. This is a special case of the bilinear form known as the quadratic form because the same matrix (\mathbf{\Delta{x}}) appears twice; in the generalised bilinear form \mathbf{B} = \mathbf{x^TAy} (the matrices \mathbf{x} and \mathbf{y} are different).

If the surface is a Euclidean plane then the values of gik are:

\delta{Y}/\delta{x_1} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_1} \delta{Z}/\delta{x_1} & \delta{Y}/\delta{x_2} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_1} \\
\delta{Y}/\delta{x_2} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_1} & \delta{Y}/\delta{x_2} \delta{Y}/\delta{x_2} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_2} \end{bmatrix}

Which become:

g_{\mu \nu} =
1 & 0 \\
0 & 1 \end{bmatrix}

So the matrix A is the unit matrix I and:

\Delta{s}^2 = \mathbf{\Delta{x^T} I \Delta{x}}


\Delta{s}^2 = \Delta{x_1}^2 + \Delta{x_2}^2

Which recovers Pythagoras' theorem yet again.

If the surface is derived from some other metric such as \Delta{s^2} = -\Delta{Y}^2 + \Delta{Z}^2 then the values of gik are:

-\delta{Y}/\delta{x_1} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_1} \delta{Z}/\delta{x_1} &  -\delta{Y}/\delta{x_2} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_1} \\
-\delta{Y}/\delta{x_2} \delta{Y}/\delta{x_1} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_1} &  -\delta{Y}/\delta{x_2} \delta{Y}/\delta{x_2} + \delta{Z}/\delta{x_2} \delta{Z}/\delta{x_2} \end{bmatrix}

Which becomes:

g_{\mu \nu} =
-1 & 0 \\
 0 & 1 \end{bmatrix}

Which allows the original metric to be recovered ie: \Delta{s^2} = -\Delta{x_1}^2 + \Delta{x_2}^2.

It is interesting to compare the geometrical analysis with the transformation based on matrix algebra that was derived in the section on indicial notation above:

 \mathbf{s.s} = g_{11} x_1x_{1}^' + g_{12} x_1x_2^' + g_{21} x_2x_{1}^' + g_{22} x_2x_{2}^'


g_{\mu \nu} =
1 & 0 \\
0 & 1 \end{bmatrix}

ie: g_{11} = 1, g_{12} = 0, g_{21} = 0, g_{22} = 1 so:

\mathbf{s.s} = x_1x_{1}^' + x_2x_{2}^'

If there is no rotation of coordinates the scalar product is:

\mathbf{s.s} = x_1x_1 + x_2x_2

s^2 = x_{1}^2 + x_{2}^2

Which recovers Pythagoras' theorem. However, the reader may have noticed that Pythagoras' theorem had been assumed from the outset in the derivation of the scalar product (see above).

The geometrical analysis shows that if a metric is assumed and the conditions that allow differential geometry are present then it is possible to derive one set of coordinates from another. This analysis can also be performed using matrix algebra with the same assumptions.

The example above used a simple two dimensional Pythagorean metric, some other metric such as the metric of a 4D Minkowskian space:

\Delta S^2 = - \Delta T^2 + \Delta X^2 + \Delta Y^2 + \Delta Z^2

could be used instead of Pythagoras' theorem.