Embedded Systems/ARM Microprocessors

From Wikibooks, open books for an open world
Jump to navigation Jump to search

The ARM architecture is a widely used 32-bit RISC processor architecture. In fact, the ARM family accounts for about 75% of all 32-bit CPUs, and about 90% of all embedded 32-bit CPUs. ARM Limited licenses several popular microprocessor cores to many vendors (ARM does not sell physical microprocessors). Originally ARM stood for Advanced RISC Machines.

Some cores offered by ARM:

  • ARM7TDMI
  • ARM9
  • ARM11

Some examples of ARM based processors:

  • Intel X-Scale (PXA-255 and PXA-270), used in Palm PDAs
  • Philips LPC2000 family (ARM7TDMI-S core), LPC3000 family (ARM9 core)
  • Atmel AT91SAM7 (ARM7TDMI core)
  • ST Microelectronics STR710 (ARM7TDMI core)
  • Freescale MCIMX27 series (ARM9 core)

The lowest-cost ARM processors (in the LPC2000 series) have dropped below US$ 5 in ones, which is less than the cost of many 16-bit and 8-bit microprocessors.

Thumb calling convention[edit | edit source]

In ARM Thumb code, the 16 registers r0 - r15 typically have the same roles they have in all ARM code:

  • r0 - r3, called a1 - a4: argument/scratch/result registers.
  • r4 - r9, called v1 - v6: variables
  • r10, called sl: stack limit
  • r11, called fp: frame pointer (usually not used in Thumb code)
  • r12, called ip
  • r13, called sp: stack pointer
  • r14, called lr: link register
  • r15, called pc: the program counter

The standard C calling convention for ARM Thumb is:[1]

Subroutine-preserved registers[edit | edit source]

When the return address is placed in pc (r15), returning from the subroutine, the sp, fp, sl, and v1-v6 registers must contain the same values they did when the subroutine was called.

The stack[edit | edit source]

Every execution environment has a limit to how low in memory the stack can grow -- the "minimum sp".

In order to give interrupts (which may occur at any time) room to work, at every instant the memory between sp and the "minimum sp" must contain nothing of value to the executing program.

Systems where the application and its library support code is responsible for detecting and handling stack overflow are called "explicit stack limit". In such systems, the sl register must always point at least 256 bytes higher address than the "minimum sp".

Caller-preserved registers[edit | edit source]

A subroutine is free to clobber a1-a4, ip, and lr.

Return values[edit | edit source]

If the subroutine returns a simple value no bigger than one word, the value must be in a1 (r0).

If the subroutine returns a simple floating-point value, the value is encoded in a1; or {a1, a2}; or {a1, a2, a3}, whichever is sufficient to hold the full precision.

A typical subroutine[edit | edit source]

The simplest entry and exit sequence for Thumb functions is:[1]

an_example_Thumb_subroutine:
    PUSH {save-registers, lr} ; one-line entry sequence
    ; ... first part of function ...
    BL subroutine_name 	;Must be in a space of +/- 4 MB 
    ; ... rest of function goes here, perhaps including other function calls
    ; ...
    POP {save-registers, pc} ; one-line exit sequence

ARM calling convention[edit | edit source]

The standard C calling convention for ARM is specified in detail by ARM PLC.[2]

The simplest entry and exit sequence for 32-bit ARM functions is very similar to Thumb functions:[3][4][5]

an_example_ARM32_subroutine:
    PUSH {r4-r11, lr} ; one-line function prologue
    ; ... first part of function ...
    BL subroutine_name 	;Must be in a space of +/- 4 MB 
    ; ... rest of function goes here, perhaps including other function calls
    ; ...
    POP {r4-r11, pc} ; one-line exit sequence (function epilogue)

Using alternate mnemonics for the same instructions,

an_example_ARM32_subroutine:
    ; Push the return address (in LR) and the work registers
    ; "store multiple registers, full descending"
    STMFD sp!,{r4-r11, lr} ; aka PUSH {r4-r11, lr}
    ; (A "sp" alone would leave the stack pointer unchanged.
    ; We must use "sp!" to update the stack pointer appropriately.)
    ; ... first part of function ...
    BL subroutine_name 	;Must be in a space of +/- 4 MB 
    ; ... rest of function goes here, perhaps including other function calls
    ; ...
    ; Pop the return address (into PC) and the work registers
    ; and return automatically.    
    ; "load multiple registers, full descending"
    LDMFD sp!,{r4-r11, pc} ; aka POP {r4-r11, pc}

The BL (branch-and-link) instruction stores the return address in the link register LR (r14) and loads the program counter PC (r15) with the subroutine address. Typical subroutines (as shown above) immediately push that return address onto the stack. That frees up r14 so that the subroutine can call sub-subroutines of its own.

Subroutine-preserved registers[edit | edit source]

Typically r4-r11 are used to hold local variables of the currently-executing routine.

The registers r4-r11 are "subroutine-preserved registers" -- When the subroutine places the return address in pc (r15), returning from the subroutine, the registers r4-r11 and the stack pointer sp (r13) must contain the same values they did when the subroutine was called.

Typical subroutines (as shown above) immediately push the values of those registers onto the stack. That frees up r4-r11 to hold local variables of the currently-executing subroutine.

Optimizing ARM compilers save and restore the precise subset of r4-r11 and r14 (if any) actually modified by that subroutine, since it is a little slower (but otherwise harmless) to save and restore registers that are unused by that subroutine.

scratch registers[edit | edit source]

A subroutine is free to clobber r0-r3, r12, and the link register lr (r14).

The first four registers r0-r3 are used to pass argument values into a subroutine and to return a result value from a function.

Mixed ARM32 and Thumb calls[edit | edit source]

Normal function calls are easy with the BL instruction. A person types

    BL destination_subroutine

and the assembler and linker will automatically Do the Right Thing -- inserting the appropriate (32-bit-long) ARM32 BL instruction for ARM32-to-ARM32 or ARM32-to-Thumb call or the appropriate (32-bit-long)[6] Thumb BL instruction for Thumb-to-Thumb or Thumb-to-ARM32 instruction.

(Some mixed calls and some long branches require the linker to insert code that overwrites scratch register r12 with a temporary value. Exactly how the linker does that can be confusing, especially when you mix in using the BX and BLX instructions.[7][8])

For further reading[edit | edit source]

  1. a b ARM. ARM Software Development Toolkit. 1997. Chapter 9: ARM Procedure Call Standard. Chapter 10: Thumb Procedure Call Standard.
  2. The "Procedure Call Standard for the ARM Architecture"
  3. RealView Compilation Tools Developer Guide "Calling between C, C++, and ARM assembly language"
  4. [1] section "Stacks and Subroutines" p. 59
  5. "Stacking registers for nested subroutines"
  6. [infocenter.arm.com/help/topic/com.arm.doc.ddi0234b/i107462.html "Thumb Branch with Link (BL)"]
  7. "Arm/Thumb: using BX in Thumb code, to call a Thumb function, or to jump to a Thumb instruction in another function"
  8. "Arm / Thumb Interworking"