# Parallel Spectral Numerical Methods/Finite Precision Arithmetic

For more on this see a text book on numerical methods such as Bradie. Because computers have a fixed amount of memory, floating point numbers can only be stored with a finite number of digits of precision. This limits the accuracy to which the solution to a numerical problem can be obtained in finite time. Most computers use binary IEEE 754 arithmetic to perform numerical calculations. There are other formats, but this will be the one of most relevance to us.

## Exercises

a) In this standard what is the range of precision of numbers in:
i) Single precision
ii) Double precision
b) What does the standard specify for quadruple precision?
c) What does the standard specify about how elementary functions should be computed? How does this affect the portability of programs?
2) Suppose we discretize a function for $x\in [-1,1]$ . For what values of $\epsilon$ is
$\epsilon \log \left(\cosh \left({\frac {x}{\epsilon }}\right)\right)=\lVert x\rVert$ in:

i) Single precision?
ii) Double precision?
3) Suppose we discretize a function for $x\in [-1,1]$ . For what values of $\epsilon$ is
$\tanh \left({\frac {x}{\epsilon }}\right)={\begin{cases}1\quad x\geq 0\\-1\quad x<0\end{cases}}$ in:

i) Single precision?
ii) Double precision?
4)
a) What is the magnitude of the largest 4 byte integer in the IEEE 754 specification that can be stored?
b) Suppose you are doing a simulation with $N^{3}$ grid points and need to calculate $N^{3}$ . If $N$ is stored as a 4 byte integer, what is the largest value of $N$ for which $N^{3}$ can also be stored as a 4 byte integer?