Objectives

define vector and matrix
add, subtract, and multiply matrices
find the transpose of a square matrix
define the inverse of a matrix
setup simultaneous linear equations in matrix form

Resources

Vectors and Matrices

A matrix is a rectangular array of things, such as numbers, symbols, or expressions. Matrices are commonly used to express linear transformations and system of linear equations.

A triangular matrix is a special type of square matrices. If all entries of A below the main diagonal are zero, A is called an upper triangular matrix.

a_{ij}=0

for all

i>j

Similarly if all entries of A above the main diagonal are zero, A is called a lower triangular matrix.

a_{ij}=0

for all

i<j

If all entries outside the main diagonal are zero, A is called a diagonal matrix.

a_{ij}=0

for all

i\neq j

This matrix

{\begin{bmatrix}1&4&2\\0&3&4\\0&0&1\\\end{bmatrix}}

is upper triangular and this matrix

{\begin{bmatrix}1&0&0\\2&8&0\\4&9&7\\\end{bmatrix}}

is lower triangular.

The identity matrix of size n is the n-by-n matrix in which all the elements on the main diagonal are equal to 1 and all other elements are equal to 0.

Matrix Multiplication

Matrix multiplication takes two matrices and produces another matrix. The rule for the operation is illustrated as follows:

{\overset {4\times 2{\text{ matrix}}}{\begin{bmatrix}{\color {Brown}{a_{11}}}&{\color {Brown}{a_{12}}}\\\cdot &\cdot \\{\color {Orange}{a_{31}}}&{\color {Orange}{a_{32}}}\\\cdot &\cdot \\\end{bmatrix}}}{\overset {2\times 3{\text{ matrix}}}{\begin{bmatrix}\cdot &{\color {Plum}{b_{12}}}&{\color {Violet}{b_{13}}}\\\cdot &{\color {Plum}{b_{22}}}&{\color {Violet}{b_{23}}}\\\end{bmatrix}}}={\overset {4\times 3{\text{ matrix}}}{\begin{bmatrix}\cdot &x_{12}&x_{13}\\\cdot &\cdot &\cdot \\\cdot &x_{32}&x_{33}\\\cdot &\cdot &\cdot \\\end{bmatrix}}}

The values at the intersections marked with circles are:

{\begin{aligned}x_{12}&={\color {Brown}{a_{11}}}{\color {Plum}{b_{12}}}+{\color {Brown}{a_{12}}}{\color {Plum}{b_{22}}}\\x_{13}&={\color {Brown}{a_{11}}}{\color {Violet}{b_{13}}}+{\color {Brown}{a_{12}}}{\color {Violet}{b_{23}}}\\x_{32}&={\color {Orange}{a_{31}}}{\color {Plum}{b_{12}}}+{\color {Orange}{a_{32}}}{\color {Plum}{b_{22}}}\\x_{33}&={\color {Orange}{a_{31}}}{\color {Violet}{b_{13}}}+{\color {Orange}{a_{32}}}{\color {Violet}{b_{23}}}\end{aligned}}

Transpose of a Matrix

The transpose of a matrix is defined as follows: The transpose of a matrix A is another matrix A^T created by any one of the following equivalent actions:

reflect A over its main diagonal (which runs from top-left to bottom-right) to obtain A^T
write the rows of A as the columns of A^T
write the columns of A as the rows of A^T:

The following figure illustrates the idea:

Inverse of a Matrix

An n-by-n square matrix A is called invertible if there exists an n-by-n square matrix B such that

\mathbf {AB} =\mathbf {BA} =\mathbf {I} _{n}\

where I_n denotes the n-by-n identity matrix and the multiplication used is ordinary matrix multiplication. B is called the inverse of A, denoted by A⁻¹.

Represent System of Linear Equations in Matrix Form

{\begin{alignedat}{7}2x&&\;+\;&&y&&\;-\;&&z&&\;=\;&&8&\qquad (L_{1})\\-3x&&\;-\;&&y&&\;+\;&&2z&&\;=\;&&-11&\qquad (L_{2})\\-2x&&\;+\;&&y&&\;+\;&&2z&&\;=\;&&-3&\qquad (L_{3})\end{alignedat}}

We can represent this system of linear equations in matrix form as shown below:

{\begin{bmatrix}2x&+&y&-&z\\-3x&-&y&+&2z\\-2x&+&y&+&2z\end{bmatrix}}={\begin{bmatrix}8\\-11\\-3\end{bmatrix}}

Then we can use matrix multiplication to separate the variables.

{\begin{bmatrix}2&1&-1\\-3&-1&2\\-2&1&2\end{bmatrix}}{\begin{bmatrix}x\\y\\z\end{bmatrix}}={\begin{bmatrix}8\\-11\\-3\end{bmatrix}}

{\begin{bmatrix}25&5&1\\64&8&1\\144&12&1\end{bmatrix}}{\begin{bmatrix}a_{1}\\a_{2}\\a_{3}\end{bmatrix}}={\begin{bmatrix}106.8\\177.2\\279.2\end{bmatrix}}

The general matrix form for system of linear equations is as follows:

A_{n\times n}X_{n\times 1}=C_{n\times 1}

Matrix A is the coefficient matrix. X is the solution vector (matrix) and C is the right hand side vector. If we multiply the inverse of A on bother sides we can see the solution is closely related to the inverse of A.

{\begin{alignedat}{3}A^{-1}AX&&\;=\;&&A^{-1}C\\IX&&\;=\;&&A^{-1}C\\X&&\;=\;&&A^{-1}C\end{alignedat}}

Gaussian Elimination

Sources:

book chapter

Gaussian elimination is an algorithm for solving a system of linear equations, which is similar to finding the inverse of an invertible square matrix. The algorithm consists of a sequence of row reduction operations performed on the associated matrix of coefficients. There are three types of elementary row operations:

swapping two rows
multiplying a row by a non-zero constant
adding a multiple of one row to another row.

For example, the first linear equation (L₁, pivot equation) can be used to eliminate $x$ from the next two equations: :

{\begin{alignedat}{7}2x&&\;+\;&&y&&\;-\;&&z&&\;=\;&&8&\qquad (L_{1})\\-3x&&\;-\;&&y&&\;+\;&&2z&&\;=\;&&-11&\qquad (L_{2})\\-2x&&\;+\;&&y&&\;+\;&&2z&&\;=\;&&-3&\qquad (L_{3})\end{alignedat}}

Then L₂ (pivot equation) can be used to eliminate $y$ from L₃. This process is called forward elimination. Now L₃ can be solved with one unknown $z$ , which can be used to substitute the $z$ in L₂ to solve $y$ . This process is called backward substitution.

The algorithm for the Gaussian Elimination method can be implemented as follows:

''' 
x = gauss_elimin(a, b)
Solves [a][x] = [b] by Gauss elimination.
'''
from numpy import dot, array

def gauss_elimin(a, b):
  (rows, cols) = a.shape
  # elimination phase
  for row in range(0, rows-1): # pivot equation/row
    for i in range(row+1, rows):
      if a[i, row] != 0.0:
        factor = a[i, row]/a[row, row]
        a[i, row+1:rows] = a[i, row+1:rows] - factor*a[row, row+1:rows]
        b[i] = b[i] - factor*b[row]
  # back substitution 
  for k in range(rows-1,-1,-1):
    b[k] = (b[k] - dot(a[k, k+1:rows],b[k+1:rows]))/a[k, k]
  return b

a = array([[3, 2.0], [-6, 6]])
b = array([7, 6])
print gauss_elimin(a, b)

Gaussian Elimination with Partial Pivoting

Gaussian elimination method is prone to round-off errors especially when a large number equations allows the error to accumulate. Another problem is the division by zero error. For example, the following matrix will break the method because the second pivot element is zero and the method requires the pivot elements to be non-zero.

\left[{\begin{array}{ccc|c}1&-1&2&8\\0&0&-1&-11\\0&2&-1&-3\end{array}}\right]

Gaussian elimination with partial pivoting is similar to the Gaussian elimination except that in the $i$ th forward elimination step the maximum of $|a_{i,i}|,|a_{i+1,i}|,...,|a_{n,i}|$ is found and the row that contains the maximum will be swapped with the pivot row. The previous coefficient matrix would look as follows after the swaps.

\left[{\begin{array}{ccc|c}1&-1&2&8\\0&2&-1&-3\\0&0&-1&-11\end{array}}\right]

Gauss-Seidel Method

Gaussian elimination is a direct (straightforward) method that transforms the original equations to equivalent ones that are easier to solve. Some systems of equations have no solution because for example the number of equations is less than the number of unknowns or one equation contradicts another equation.

Gauss-Seidel method is an iterative (or indirect) method that starts with a guess at the solution and repeatedly refine the guess till it converges (the convergence criterion is met). This method is advantageous because it is more computationally efficient when the coefficient matrix is large and sparse and the round-off errors can be control by adjusting the error tolerance.

The algorithm consists the following steps:

rewrite each equation for the corresponding unknowns, for example, the first equation is written with $x_{1}$ on the left-hand side and the second equation with $x_{2}$ on the left-hand side:

x_{i}={\frac {c_{i}-\sum _{j=1,j\neq i}^{n}a_{ij}x_{j}}{a_{ii}}},i=1,2,3,...,n.

find initial guess for the $x_{i}$ 's.
use the rewritten equations to calculate the new estimates - always using the most recent estimates.
repeat the previous step till the largest absolute relative approximate error among all $x_{i}$ 's is less than the specified tolerance.

The drawback of most iterative method is that it may not converge when the system of equations has a coefficient matrix that is not diagonally dominant. A system of equations where the coefficient matrix is diagonally dominant does always converge. The matrix A is diagonally dominant if

|a_{ii}|\geq \sum _{j\neq i}|a_{ij}|\quad {\text{for all }}i,

and

|a_{ii}|>\sum _{j\neq i}|a_{ij}|\quad {\text{for at least one }}i.

An implementation in Python using numpy simply iterates to produce the solution vector.

LU Decomposition

LU decomposition factors the coefficient matrix A to the product of a lower triangular matrix and an upper triangular matrix: A = LU. The process of deriving L and U from A is called LU decomposition or LU factorization, which is similar to Gaussian elimination method.

Once LU decomposition is done, we can use the system of linear equations represented by A using forward and back substitutions:

{\begin{alignedat}{3}AX&&\;=\;&&C\\LUX&&\;=\;&&C\\L^{-1}LUX&&\;=\;&&L^{-1}C\\UX&&\;=\;&&L^{-1}C\\\end{alignedat}}

Let $L^{-1}C=Z$ , which means $LZ=C$ and $Z$ can be solved by forward elimination. From $UX=Z$ we can solve X using backward elimination.

One method of LU decomposition is similar to Gaussian elimination. For a $3\times 3$ :

A={\begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\\\end{bmatrix}}={\begin{bmatrix}1&0&0\\l_{21}&1&0\\l_{31}&l_{32}&1\\\end{bmatrix}}{\begin{bmatrix}u_{11}&u_{12}&u_{13}\\0&u_{22}&u_{23}\\0&0&u_{33}\\\end{bmatrix}}.

A={\begin{bmatrix}u_{11}&u_{12}&u_{13}\\u_{11}l_{21}&u_{12}l_{21}+u_{22}&u_{13}l_{21}+u_{23}\\u_{11}l_{31}&u_{12}l_{31}+u_{22}l_{32}&u_{13}l_{31}+u_{23}l_{32}+u_{33}\\\end{bmatrix}}

From the last matrix we can see to eliminate $a_{21}$ we need to multiple the first row of A by $l_{21}$ and similarly to eliminate $a_{31}$ by multiplying the first row by $l_{31}$ . To calculate L we can simply record the multipliers used in the forward elimination steps and the matrix U is the same as the resultant upper triangular matrix from forward elimination.

Sample code in Python

Find the Inverse of a Matrix

Finding the inverse of a matrix is a process related to solving the system of linear equations. If the inverse of a coefficient matrix is found then the system of equations represented by the coefficient matrix is solved.

{\begin{alignedat}{3}AX&&\;=\;&&C\\A^{-1}AX&&\;=\;&&A^{-1}C\\X&&\;=\;&&A^{-1}C\\\end{alignedat}}

One way to find the inverse of a matrix is to find each column of the inverse matrix separately by solving a system of linear equations. If $AB=I$ and

A={\begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{bmatrix}}\ B={\begin{bmatrix}b_{11}&b_{12}&b_{13}\\b_{21}&b_{22}&b_{23}\\b_{31}&b_{32}&b_{33}\end{bmatrix}}

then

A{\begin{bmatrix}b_{11}\\b_{21}\\b_{31}\end{bmatrix}}=LU{\begin{bmatrix}b_{11}\\b_{21}\\b_{31}\end{bmatrix}}={\begin{bmatrix}1\\0\\0\end{bmatrix}}

.

We can calculate the first column of $B$ by solving a system of equations using the LU decomposition method. The rest of the columns can be calculated in a similar fashion. As you can see the LU decomposition method is advantageous the LU decomposition is performed once and the result is used many times, which lowers the computational complexity of the whole solution.