# Embedded Systems/Floating Point Unit

Floating point numbers are ....

Like all information, floating point numbers are represented by bits.

Early computers used a variety of floating-point number formats. Each one required slightly different subroutines to add, subtract, and do other operations on them.

Because some computer applications use floating point numbers a lot, Intel standardized on one particular format, and designed floating-point hardware that calculated much more quickly than the software subroutines. The 80186 shipped with a floating-point co-processor dubbed the 80187. The 80187 was a floating point math unit that handled the floating point arithmetic functions. In newer processors, the floating point unit (FPU) has been integrated directly into the microprocessor.

Many small embedded systems, however, do not have an FPU (internal or external). Therefore, they manipulate floating-point numbers, when necessary, the old way. They use software subroutines, often called a "floating point emulation library".

However, floating-point numbers are not necessary in many embedded systems. Many embedded system programmers try to eliminate floating point numbers from their programs,^{[1]} instead using fixed-point arithmetic. Such programs use less space (fixed-point subroutine libraries are far smaller than floating-point libraries, especially when just one or two routines are put into the system). On microprocessors without a floating-point unit, the fixed-point version of a program usually runs faster than floating-point version. However, these embedded system programmers must figure out exactly how much accuracy a particular application needs, and make sure their fixed-point routines maintain at least that much accuracy.

## Math Routines[edit]

*(Is there a better place in this wikibook for this discussion? It doesn't even mention floating point.)*

Low-end embedded microcontrollers typically don't even have integer multiply in their instruction set[1].

So on low-end CPUs, you must use routines that synthesize basic math operators (multiply, divide, square root, etc.) from even simpler steps. Practically all microprocessors have such routines, posted on the internet by their manufacturer or other users ("Multiplication and Division Made Easy" by Robert Ashby, "Novel Methods of Integer Multiplication and Division", "efficient bit twiddling methods", etc.).

Following the advice known as "Make It Work Make It Right Make It Fast" and "Make It Work Make It Small Make It Fast", many people pick one or two number resolutions that are adequate for the largest and most precise kind of data handled in a program, and use that resolution for everything. For desktop machines, often 32-bit integers and 64 bit "double precision floating point" numbers are more than adequate. For embedded systems, often 24-bit integers and 24-bit "fixed point" numbers are more than adequate. If the software fits in the microcontroller, and is plenty fast enough, it is a waste of valuable human time to try to "optimize" it further.

Alas, sometimes the software does not fit in the microcontroller.

- If you run out of RAM, sometimes you only need 2 bytes or 1 byte or 4 bits or 1 bit to store a particular variable.
- If you run out of time, sometimes you can add lower-precision math routines that quickly calculate the results needed for that inner loop, even though other parts of the code may need higher-precision math routines.
- If you run out of ROM, sometimes you can trade time for ROM space. Rather than a collection of sets of math routines, each one customized to a slightly different width, you can use a single set of math routines that can handle the maximum possible width. If you have some variables less than that width (to save RAM), then you typically sign-extend variables into a full-size register or global buffer, do full-width calculations there, and then truncate and store the result to the small size.

## FFT[edit]

People use many tricks and techniques to speed up Fourier transform calculation. The fast Fourier transform (FFT) is the biggest speedup, but there are several other tricks on top of that that each can give another factor of two improvement.^{[2]}^{[3]}

Many people do FFT using fixed-point arithmetic.

*... more tips and hints here ...*

- "Develop FFT apps on low-power MCUs" by Paul Holden 2005
- "Comparing Floating-Point and Fixed-Point Implementations on ADI Blackfin Processors with LabVIEW"
- "Fixed-Point Fast Fourier Transform (FFT)"
- (program listed for a fixed-point FFT)
- EE-18: Choosing and Using FFTs for ADSP-21xx (a fixed-point DSP)
- Kiss FFT library that can use either fixed or floating point data types.

## Further reading[edit]

- ↑ Avoiding floating point arithmetic on the iPhone
- ↑ Douglas L. Jones. "Decimation-in-time (DIT) Radix-2 FFT". OpenStax-CNX. September 15, 2006.
- ↑ Douglas L. Jones. "Efficient FFT Algorithm and Programming Tricks". OpenStax-CNX. February 24, 2007

- AN660: floating point routines for the Microchip PICmicro
- AN617: fixed point routines for the Microchip PICmicro
- "Algorithm - ArcTan as Fast as You Can - AN2341" fixed point routine for the Cypress PSoC
- "Floating Point Approximations" collected by the Ganssle Group, giving code and test cases. (Assumes you already have floating-point add, subtract, multiply, and divide, and gives formulas for trig, roots, logarithms, and exponents ... various formulas, with different tradeoffs between accuracy, speed, and range).
- AVRfix: A library for fixed point calculation in s15.16, s7.24 and s7.8 format, entirely written in ANSI C for embedded software (with main focus on the Atmel AVR platforms).
- Microchip AN575: IEEE 754 Compliant Floating Point Routines "in a modified IEEE 754 32-bit format together with versions in 24-bit reduced format." ... "float to integer conversion,integer to float conversion,normalize,add/subtract,multiply,divide."
- PICFLOATopen source IEEE 32bit ("single") floating point library for midrange PICmicro processors. Includes most of the common C floating point math functions. The full library plus testing code all fit inside 2 KBytes.
- "fixed point" libraries on SourceForge.