Statistics/Numerical Methods/Basic Linear Algebra and Gram-Schmidt Orthogonalization

Introduction

Basically, all the sections found here can be also found in a linear algebra book. However, the Gram-Schmidt Orthogonalization is used in statistical algorithm and in the solution of statistical problems. Therefore, we briefly jump into the linear algebra theory which is necessary to understand Gram-Schmidt Orthogonalization.

The following subsections also contain examples. It is very important for further understanding that the concepts presented here are not only valid for typical vectors as tuple of real numbers, but also functions that can be considered vectors.

Fields

Definition

A set ${\displaystyle R}$ with two operations ${\displaystyle +}$ and ${\displaystyle *}$ on its elements is called a field (or short ${\displaystyle (R,+,*)}$), if the following conditions hold:

1. For all ${\displaystyle \alpha ,\beta \in R}$ holds ${\displaystyle \alpha +\beta \in R}$
2. For all ${\displaystyle \alpha ,\beta \in R}$ holds ${\displaystyle \alpha +\beta =\beta +\alpha }$ (commutativity)
3. For all ${\displaystyle \alpha ,\beta ,\gamma \in R}$ holds ${\displaystyle \alpha +(\beta +\gamma )=(\alpha +\beta )+\gamma }$ (associativity)
4. It exist a unique element ${\displaystyle 0}$, called zero, such that for all ${\displaystyle \alpha \in R}$ holds ${\displaystyle \alpha +0=\alpha }$
5. For all ${\displaystyle \alpha \in R}$ a unique element ${\displaystyle -\alpha }$, such that holds ${\displaystyle \alpha +(-\alpha )=0}$
6. For all ${\displaystyle \alpha ,\beta \in R}$ holds ${\displaystyle \alpha *\beta \in R}$
7. For all ${\displaystyle \alpha ,\beta \in R}$ holds ${\displaystyle \alpha *\beta =\beta *\alpha }$ (commutativity)
8. For all ${\displaystyle \alpha ,\beta ,\gamma \in R}$ holds ${\displaystyle \alpha *(\beta *\gamma )=(\alpha *\beta )*\gamma }$ (associativity)
9. It exist a unique element ${\displaystyle 1}$, called one, such that for all ${\displaystyle \alpha \in R}$ holds ${\displaystyle \alpha *1=\alpha }$
10. For all non-zero ${\displaystyle \alpha \in R}$ a unique element ${\displaystyle \alpha ^{-1}}$, such that holds ${\displaystyle \alpha *\alpha ^{-1}=1}$
11. For all ${\displaystyle \alpha ,\beta ,\gamma \in R}$ holds ${\displaystyle \alpha *(\beta +\gamma )=\alpha *\beta +\alpha *\gamma }$ (distributivity)

The elements of ${\displaystyle R}$ are also called scalars.

Examples

It can easily be proven that real numbers with the well known addition and multiplication ${\displaystyle (IR,+,*)}$ are a field. The same holds for complex numbers with the addition and multiplication. Actually, there are not many more sets with two operations which fulfill all of these conditions.

For statistics, only the real and complex numbers with the addition and multiplication are important.

Vector spaces

Definition

A set ${\displaystyle V}$ with two operations ${\displaystyle +}$ and ${\displaystyle *}$ on its elements is called a vector space over R, if the following conditions hold:

1. For all ${\displaystyle x,y\in V}$ holds ${\displaystyle x+y\in V}$
2. For all ${\displaystyle x,y\in V}$ holds ${\displaystyle x+y=y+x}$ (commutativity)
3. For all ${\displaystyle x,y,z\in V}$ holds ${\displaystyle x+(y+z)=(x+y)+z}$ (associativity)
4. It exist a unique element ${\displaystyle \mathbb {O} }$, called origin, such that for all ${\displaystyle x\in V}$ holds ${\displaystyle x+\mathbb {O} =x}$
5. For all ${\displaystyle x\in V}$ exists a unique element ${\displaystyle -v}$, such that holds ${\displaystyle x+(-x)=\mathbb {O} }$
6. For all ${\displaystyle \alpha \in R}$ and ${\displaystyle x\in V}$ holds ${\displaystyle \alpha *x\in V}$
7. For all ${\displaystyle \alpha ,\beta \in R}$ and ${\displaystyle x\in V}$ holds ${\displaystyle \alpha *(\beta *x)=(\alpha *\beta )*x}$ (associativity)
8. For all ${\displaystyle x\in V}$ and ${\displaystyle 1\in R}$ holds ${\displaystyle 1*x=x}$
9. For all ${\displaystyle \alpha \in R}$ and for all ${\displaystyle x,y\in V}$holds ${\displaystyle \alpha *(x+y)=\alpha *x+\alpha *y}$ (distributivity wrt. vector addition)
10. For all ${\displaystyle \alpha ,\beta \in R}$ and for all ${\displaystyle x\in V}$holds ${\displaystyle (\alpha +\beta )*x=\alpha *x+\beta *x}$ (distributivity wrt. scalar addition)

Note that we used the same symbols ${\displaystyle +}$ and ${\displaystyle *}$ for different operations in ${\displaystyle R}$ and ${\displaystyle V}$. The elements of ${\displaystyle V}$ are also called vectors.

Examples:

1. The set ${\displaystyle IR^{p}}$ with the real-valued vectors ${\displaystyle (x_{1},...,x_{p})}$ with elementwise addition ${\displaystyle x+y=(x_{1}+y_{1},...,x_{p}+y_{p})}$ and the elementwise multiplication ${\displaystyle \alpha *x=(\alpha x_{1},...,\alpha x_{p})}$ is a vector space over ${\displaystyle IR}$.
2. The set of polynomials of degree ${\displaystyle p}$, ${\displaystyle P(x)=b_{0}+b_{1}x+b_{2}x^{2}+...+b_{p}x^{p}}$, with usual addition and multiplication is a vector space over ${\displaystyle IR}$.

Linear combinations

A vector ${\displaystyle x}$ can be written as a linear combination of vectors ${\displaystyle x_{1},...x_{n}}$, if

${\displaystyle x=\sum _{i=1}^{n}\alpha _{i}x_{i}}$

with ${\displaystyle \alpha _{i}\in R}$.

Examples:

• ${\displaystyle (1,2,3)}$ is a linear combination of ${\displaystyle (1,0,0),\,(0,1,0),\,(0,0,1)}$ since ${\displaystyle (1,2,3)=1*(1,0,0)+2*(0,1,0)+3*(0,0,1)}$
• ${\displaystyle 1+2*x+3*x^{2}}$ is a linear combination of ${\displaystyle 1+x+x^{2},\,x+x^{2},\,x^{2}}$ since ${\displaystyle 1+2*x+3*x^{2}=1*(1+x+x^{2})+1*(x+x^{2})+1*(x^{2})}$

Basis of a vector space

A set of vectors ${\displaystyle x_{1},...,x_{n}}$ is called a basis of the vector space ${\displaystyle V}$, if

1. for each vector ${\displaystyle x\in V}$ exist scalars ${\displaystyle \alpha _{1},...,\alpha _{n}\in R}$ such that ${\displaystyle x=\sum _{i}\alpha _{i}x_{i}}$ 2. there is no subset of ${\displaystyle \{x_{1},...,x_{n}\}}$ such that 1. is fulfilled.

Note, that a vector space can have several bases.

Examples:

• Each vector ${\displaystyle (\alpha _{1},\alpha _{2},\alpha _{3})\in IR^{3}}$ can be written as ${\displaystyle \alpha _{1}*(1,0,0)+\alpha _{2}*(0,1,0)+\alpha _{3}*(0,0,1)}$. Therefore is ${\displaystyle \{(1,0,0),(0,1,0),(0,0,1)\}}$ a basis of ${\displaystyle IR^{3}}$.
• Each polynomial of degree ${\displaystyle p}$ can be written as linear combination of ${\displaystyle \{1,x,x^{2},...,x^{p}\}}$ and therefore forms a basis for this vector space.

Actually, for both examples we would have to prove condition 2., but it is clear that it holds.

Dimension of a vector space

A dimension of a vector space is the number of vectors which are necessary for a basis. A vector space has infinitely many number of basis, but the dimension is uniquely determined. Note that the vector space may have a dimension of infinity, e.g. consider the space of continuous functions.

Examples:

• The dimension of ${\displaystyle IR^{3}}$ is three, the dimension of ${\displaystyle IR^{p}}$ is ${\displaystyle p}$ .
• The dimension of the polynomials of degree ${\displaystyle p}$ is ${\displaystyle p+1}$.

Scalar products

A mapping ${\displaystyle <.,.>:V\times V\rightarrow R}$ is called a scalar product if the following holds for all ${\displaystyle x,x_{1},x_{2},y,y_{1},y_{2}\in V}$ and ${\displaystyle \alpha _{1},\alpha _{2}\in R}$ :

1. ${\displaystyle <\alpha _{1}x_{1}+\alpha _{2}x_{2},y>=\alpha _{1}+\alpha _{2}}$
2. ${\displaystyle =\alpha _{1}+\alpha _{2}}$
3. ${\displaystyle ={\overline {}}}$ with ${\displaystyle {\overline {\alpha +\imath \beta }}=\alpha -\imath \beta }$
4. ${\displaystyle \geq 0}$ with ${\displaystyle =0\Leftrightarrow x=\mathbb {O} }$

Examples:

• The typical scalar product in ${\displaystyle IR^{p}}$ is ${\displaystyle =\sum _{i}x_{i}y_{i}}$.
• ${\displaystyle =\int _{a}^{b}f(x)*g(x)dx}$ is a scalar product on the vector space of polynomials of degree ${\displaystyle p}$.

Norm

A norm of a vector is a mapping ${\displaystyle \|.\|:V\rightarrow R}$, if holds

1. ${\displaystyle \|x\|\geq 0}$ for all ${\displaystyle x\in V}$ and ${\displaystyle \|x\|=0\Leftrightarrow x=\mathbb {O} }$ (positive definiteness)
2. ${\displaystyle \|\alpha v\|=\mid \alpha \mid \|x\|}$ for all ${\displaystyle x\in V}$ and all ${\displaystyle \alpha \in R}$
3. ${\displaystyle \|x+y\|\leq \|x\|+\|y\|}$ for all ${\displaystyle x,y\in V}$ (triangle inequality)

Examples:

• The ${\displaystyle L_{q}}$ norm of a vector in ${\displaystyle IR^{p}}$ is defined as ${\displaystyle \|x\|_{q}={\sqrt[{q}]{\sum _{i=1}^{p}x_{i}^{q}}}}$.
• Each scalar product generates a norm by ${\displaystyle \|x\|={\sqrt {}}}$, therefore ${\displaystyle \|f\|={\sqrt {\int _{a}^{b}f^{2}(x)dx}}}$ is a norm for the polynomials of degree ${\displaystyle p}$.

Orthogonality

Two vectors ${\displaystyle x}$ and ${\displaystyle y}$ are orthogonal to each other if ${\displaystyle =0}$. In ${\displaystyle IR^{p}}$ it holds that the cosine of the angle between two vectors can expressed as

${\displaystyle \cos(\angle (x,y))={\frac {}{\|x\|\|y\|}}}$.

If the angle between ${\displaystyle x}$ and ${\displaystyle y}$ is ninety degree (orthogonal) then the cosine is zero and it follows that ${\displaystyle =0}$.

A set of vectors ${\displaystyle x_{1},...,x_{p}}$ is called orthonormal, if

${\displaystyle ={\begin{cases}0&{\mbox{ if }}i\neq j\\1&{\mbox{ if }}i=j\end{cases}}}$.

If we consider a basis ${\displaystyle e_{1},...,e_{p}}$ of a vector space then we would like to have a orthonormal basis. Why ?

Since we have a basis, each vector ${\displaystyle x}$ and ${\displaystyle y}$ can be expressed by ${\displaystyle x=\alpha _{1}e_{1}+...+\alpha _{p}e_{p}}$ and ${\displaystyle y=\beta _{1}e_{1}+...+\beta _{p}e_{p}}$. Therefore the scalar product of ${\displaystyle x}$ and ${\displaystyle y}$ reduces to

 ${\displaystyle \ }$ ${\displaystyle =<\alpha _{1}e_{1}+...+\alpha _{p}e_{p},\beta _{1}e_{1}+...+\beta _{p}e_{p}>\ }$ ${\displaystyle =\sum _{i=1}^{p}\sum _{j=1}^{p}\alpha _{i}\beta _{j}}$ ${\displaystyle =\sum _{i=1}^{p}\alpha _{i}\beta _{i}}$ ${\displaystyle =\alpha _{1}\beta _{1}+...+\alpha _{p}\beta _{p}.\ }$

Consequently, the computation of a scalar product is reduced to simple multiplication and addition if the coefficients are known. Remember that for our polynomials we would have to solve an integral!

Gram-Schmidt orthogonalization

Algorithm

The aim of the Gram-Schmidt orthogonalization is to find for a set of vectors ${\displaystyle x_{1},...,x_{p}}$ an equivalent set of orthonormal vectors ${\displaystyle o_{1},...,o_{p}}$ such that any vector which can be expressed as linear combination of ${\displaystyle x_{1},...,x_{p}}$ can also be expressed as linear combination of ${\displaystyle o_{1},...,o_{p}}$:

1. Set ${\displaystyle b_{1}=x_{1}}$ and ${\displaystyle o_{1}=b_{1}/\|b_{1}\|}$

2. For each ${\displaystyle i>1}$ set ${\displaystyle b_{i}=x_{i}-\sum _{j=1}^{i-1}{\frac {}{}}b_{j}}$ and ${\displaystyle o_{i}=b_{i}/\|b_{i}\|}$, in each step the vector ${\displaystyle x_{i}}$ is projected on ${\displaystyle b_{j}}$ and the result is subtracted from ${\displaystyle x_{i}}$.

Example

Consider the polynomials of degree two in the interval${\displaystyle [-1,1]}$ with the scalar product ${\displaystyle =\int _{-1}^{1}f(x)g(x)dx}$ and the norm ${\displaystyle \|f\|={\sqrt {}}}$. We know that ${\displaystyle f_{1}(x)=1,f_{2}(x)=x}$ and ${\displaystyle f_{3}(x)=x^{2}}$ are a basis for this vector space. Let us now construct an orthonormal basis:

Step 1a: ${\displaystyle b_{1}(x)=f_{1}(x)=1}$

Step 1b: ${\displaystyle o_{1}(x)={\frac {b_{1}(x)}{\|b_{1}(x)\|}}={\frac {1}{\sqrt {}}}={\frac {1}{\sqrt {\int _{-1}^{1}1dx}}}={\frac {1}{\sqrt {2}}}}$

Step 2a: ${\displaystyle b_{2}(x)=f_{2}(x)-{\frac {}{}}b_{1}(x)=x-{\frac {\int _{-1}^{1}x\ 1dx}{2}}1=x-{\frac {0}{2}}1=x}$

Step 2b: ${\displaystyle o_{2}(x)={\frac {b_{2}(x)}{\|b_{2}(x)\|}}={\frac {x}{\sqrt {}}}={\frac {x}{\sqrt {\int _{-1}^{1}x^{2}dx}}}={\frac {x}{\sqrt {2/3}}}=x{\sqrt {3/2}}}$

Step 3a: ${\displaystyle b_{3}(x)=f_{3}(x)-{\frac {}{}}b_{1}(x)-{\frac {}{}}b_{2}(x)=x^{2}-{\frac {\int _{-1}^{1}x^{2}1\ dx}{2}}1-{\frac {\int _{-1}^{1}x^{2}x\ dx}{2/3}}x=x^{2}-{\frac {2/3}{2}}1-{\frac {0}{2/3}}x=x^{2}-1/3}$

Step 3b: ${\displaystyle o_{3}(x)={\frac {b_{3}(x)}{\|b_{3}(x)\|}}={\frac {x^{2}-1/3}{\sqrt {}}}={\frac {x^{2}-1/3}{\sqrt {\int _{-1}^{1}(x^{2}-1/3)^{2}dx}}}={\frac {x^{2}-1/3}{\sqrt {\int _{-1}^{1}x^{4}-2/3x^{2}+1/9\ dx}}}={\frac {x^{2}-1/3}{\sqrt {8/45}}}={\sqrt {\frac {5}{8}}}(3x^{2}-1)}$

It can be proven that ${\displaystyle 1/{\sqrt {2}},x{\sqrt {3/2}}}$ and ${\displaystyle {\sqrt {\frac {5}{8}}}(3x^{2}-1)}$ form a orthonormal basis with the above scalarproduct and norm.

Numerical instability

Consider the vectors ${\displaystyle x_{1}=(1,\epsilon ,0,0),x_{2}=(1,0,\epsilon ,0)}$ and ${\displaystyle x_{3}=(1,0,0,\epsilon )}$. Assume that ${\displaystyle \epsilon }$ is so small that computing ${\displaystyle 1+\epsilon =1}$ holds on a computer (see http://en.wikipedia.org/wiki/Machine_epsilon). Let compute a orthonormal basis for this vectors in ${\displaystyle IR^{4}}$ with the standard scalar product ${\displaystyle =x_{1}y_{1}+x_{2}y_{2}+x_{3}y_{3}+x_{4}y_{4}}$ and the norm ${\displaystyle \|x\|={\sqrt {x_{1}^{2}+x_{2}^{2}+x_{3}^{2}+x_{4}^{2}}}}$.

Step 1a. ${\displaystyle b_{1}=x_{1}=(1,\epsilon ,0,0)}$

Step 1b. ${\displaystyle o_{1}={\frac {b_{1}}{\|b_{1}\|}}={\frac {b_{1}}{\sqrt {1+\epsilon ^{2}}}}=b_{1}}$ with ${\displaystyle 1+\epsilon ^{2}=1}$

Step 2a. ${\displaystyle b_{2}=x_{2}-{\frac {}{}}b_{1}=(1,0,\epsilon ,0)-{\frac {1}{1+\epsilon ^{2}}}(1,\epsilon ,0,0)=(0,-\epsilon ,\epsilon ,0)}$

Step 2b. ${\displaystyle o_{2}={\frac {b_{2}}{\|b_{2}\|}}={\frac {b_{2}}{\sqrt {2\epsilon ^{2}}}}=(0,-{\frac {1}{\sqrt {2}}},{\frac {1}{\sqrt {2}}},0)}$

Step 3a. ${\displaystyle b_{3}=x_{3}-{\frac {}{}}b_{1}-{\frac {}{}}b_{2}=(1,0,0,\epsilon )-{\frac {1}{1+\epsilon ^{2}}}(1,\epsilon ,0,0)-{\frac {0}{2\epsilon ^{2}}}(0,-\epsilon ,\epsilon ,0)=(0,-\epsilon ,0,\epsilon )}$

Step 3b. ${\displaystyle o_{3}={\frac {b_{3}}{\|b_{3}\|}}={\frac {b_{3}}{\sqrt {2\epsilon ^{2}}}}=(0,-{\frac {1}{\sqrt {2}}},0,{\frac {1}{\sqrt {2}}})}$

It obvious that for the vectors

- ${\displaystyle o_{1}=(1,\epsilon ,0,0)\ }$

- ${\displaystyle o_{2}=(0,-{\frac {1}{\sqrt {2}}},{\frac {1}{\sqrt {2}}},0)}$

- ${\displaystyle o_{3}=(0,-{\frac {1}{\sqrt {2}}},0,{\frac {1}{\sqrt {2}}})}$

the scalarproduct ${\displaystyle =1/2\neq 0}$. All other pairs are also not zero, but they are multiplied with ${\displaystyle \epsilon }$ such that we get a result near zero.

Modified Gram-Schmidt

To solve the problem a modified Gram-Schmidt algorithm is used:

1. Set ${\displaystyle b_{i}=x_{i}}$ for all ${\displaystyle i}$
2. for each ${\displaystyle i}$ from ${\displaystyle 1}$ to ${\displaystyle n}$ compute
1. ${\displaystyle o_{i}={\frac {b_{i}}{\|b_{i}\|}}}$
2. for each ${\displaystyle j}$ from ${\displaystyle i+1}$ to ${\displaystyle n}$ compute ${\displaystyle b_{j}=b_{j}-o_{i}\ }$

The difference is that we compute first our new ${\displaystyle b_{i}}$ and subtract it from all other ${\displaystyle b_{j}}$. We apply the wrongly computed vector to all vectors instead of computing each ${\displaystyle b_{i}}$ separately.

Example (recomputed)

Step 1. ${\displaystyle b_{1}=(1,\epsilon ,0,0)}$, ${\displaystyle b_{2}=(1,0,\epsilon ,0)}$, ${\displaystyle b_{3}=(1,0,0,\epsilon )}$

Step 2a. ${\displaystyle o_{1}={\frac {b_{1}}{\|b_{1}\|}}={\frac {b_{1}}{\sqrt {1+\epsilon ^{2}}}}=b_{1}=(1,\epsilon ,0,0)}$ with ${\displaystyle 1+\epsilon ^{2}=1}$

Step 2b. ${\displaystyle b_{2}=b_{2}-o_{1}=(1,0,\epsilon ,0)-(1,\epsilon ,0,0)=(0,-\epsilon ,\epsilon ,0)\ }$

Step 2c. ${\displaystyle b_{3}=b_{3}-o_{1}=(1,0,0,\epsilon )-(1,\epsilon ,0,0)=(0,-\epsilon ,0,\epsilon )\ }$

Step 3a. ${\displaystyle o_{2}={\frac {b_{2}}{\|b_{2}\|}}={\frac {b_{2}}{\sqrt {2\epsilon ^{2}}}}=(0,-{\frac {1}{\sqrt {2}}},{\frac {1}{\sqrt {2}}},0)}$

Step 3b. ${\displaystyle b_{3}=b_{3}-o_{2}=(0,-\epsilon ,0,\epsilon )-{\frac {\epsilon }{\sqrt {2}}}(0,-{\frac {1}{\sqrt {2}}},{\frac {1}{\sqrt {2}}},0)=(0,-\epsilon /2,-\epsilon /2,\epsilon )}$

Step 4a. ${\displaystyle o_{3}={\frac {b_{3}}{\|b_{3}\|}}={\frac {b_{3}}{\sqrt {3/2\epsilon ^{2}}}}=(0,-{\frac {1}{\sqrt {6}}},-{\frac {1}{\sqrt {6}}},{\frac {2}{\sqrt {6}}})}$

We can easily verify that ${\displaystyle =0}$.

Application

Exploratory Project Pursuit

In the analysis of high-dimensional data we usually analyze projections of the data. The approach results from the Theorem of Cramer-Wold that states that the multidimensional distribution is fixed if we know all one-dimensional projections. Another theorem states that most (one-dimensional) projections of multivariate data are looking normal, even if the multivariate distribution of the data is highly non-normal.

Therefore in Exploratory Projection Pursuit we jugde the interestingness of a projection by comparison with a (standard) normal distribution. If we assume that the one-dimensional data ${\displaystyle x}$ are standard normal distributed then after the transformation ${\displaystyle z=2\Phi ^{-1}(x)-1}$ with ${\displaystyle \Phi (x)}$ the cumulative distribution function of the standard normal distribution then ${\displaystyle z}$ is uniformly distributed in the interval ${\displaystyle [-1;1]}$.

Thus the interesting can measured by ${\displaystyle \int _{-1}^{1}(f(z)-1/2)^{2}dx}$ with ${\displaystyle f(z)}$ a density estimated from the data. If the density ${\displaystyle f(z)}$ is equal to ${\displaystyle 1/2}$ in the interval ${\displaystyle [-1;1]}$ then the integral becomes zero and we have found that our projected data are normally distributed. An value larger than zero indicates a deviation from the normal distribution of the projected data and hopefully an interesting distribution.

Expansion with orthonormal polynomials

Let ${\displaystyle L_{i}(z)}$ a set of orthonormal polynomials with the scalar product ${\displaystyle =\int _{-1}^{1}f(z)g(z)dz}$ and the norm ${\displaystyle \|f\|={\sqrt {}}}$. What can we derive about a densities ${\displaystyle f(z)}$ in the interval ${\displaystyle [-1;1]}$ ?

If ${\displaystyle f(z)=\sum _{i=0}^{I}a_{i}L_{i}(z)}$ for some maximal degree ${\displaystyle I}$ then it holds

${\displaystyle \int _{-1}^{1}f(z)L_{j}(z)dz=\int _{-1}^{1}\sum _{i=0}^{I}a_{i}L_{i}(z)L_{j}(z)dz=a_{j}\int _{-1}^{1}L_{j}(z)L_{j}(z)dz=a_{j}}$

We can also write ${\displaystyle \int _{-1}^{1}f(z)L_{j}(z)dz=E(L_{j}(z))}$ or empirically we get an estimator ${\displaystyle {\hat {a}}_{j}={\frac {1}{n}}\sum _{k=1}^{n}L_{j}(z_{k})}$.

We describe the term ${\displaystyle 1/2=\sum _{i=1}^{I}b_{i}L_{i}(z)}$ and get for our integral

${\displaystyle \int _{-1}^{1}(f(z)-1/2)^{2}dz=\int _{-1}^{1}\left(\sum _{i=0}^{I}(a_{i}-b_{i})L_{i}(z)\right)^{2}dz=\sum _{i,j=0}^{I}\int _{-1}^{1}(a_{i}-b_{i})(a_{j}-b_{j})L_{i}(z)L_{j}(z)dz=\sum _{i=0}^{I}(a_{i}-b_{i})^{2}.}$

So using a orthonormal function set allows us to reduce the integral to a summation of coefficient which can be estimated from the data by plugging ${\displaystyle {\hat {a}}_{j}}$ in the formula above. The coefficients ${\displaystyle b_{i}}$ can be precomputed in advance.

Normalized Legendre polynomials

The only problem left is to find the set of orthonormal polynomials ${\displaystyle L_{i}(z)}$ upto degree ${\displaystyle I}$. We know that ${\displaystyle 1,x,x^{2},...,x^{I}}$ form a basis for this space. We have to apply the Gram-Schmidt orthogonalization to find the orthonormal polynomials. This has been started in the first example.

The resulting polynomials are called normalized Legendre polynomials. Up to a scaling factor the normalized Legendre polynomials are identical to Legendre polynomials. The Legendre polynomials have a recursive expression of the form

${\displaystyle L_{i}(z)={\frac {(2i-1)L_{i-1}(z)-(i-1)L_{i-2}(z)}{i}}}$

So computing our integral reduces to computing ${\displaystyle L_{0}(z_{k})}$ and ${\displaystyle L_{1}(z_{k})}$ and using the recursive relationship to compute the ${\displaystyle {\hat {a}}_{j}}$'s. Please note that the recursion can be numerically unstable!

References

• Halmos, P.R. (1974). Finite-Dimensional Vector Spaces, Springer: New York