Embedded Systems/Threading and Synchronization

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

One of the most useful developments in the history of computing is multitasking and multithreading. This technique isn't always available to an embedded system engineer, but some embedded systems and RTOS have multithreading (MT) capability. The chapters in this section will talk about some of the uses of MT, and will discuss some of the common pitfalls associated with MT programming. This page is only going to serve as a brief reference to multi-threaded programming.

Contents

[edit] Preemptive Multithreading

When the first multi-tasking systems were established, they did not have a central controller. Multi-tasking was established by having programs voluntarily give up control to the system, and the system would then give control to another process. This system worked reasonably well, except that any program that was misbehaving would slow down the entire system. For instance, if a program got stuck in an infinite loop, it would never give up control, and the system would freeze.

The solution to this problem is preemptive multithreading. In a preemptive environment, control could be moved from one process to another process at any given time. The process that was "preempted" would not even know that anything had happened, except maybe there would be a larger then average delay between 2 instructions. Preemptive multithreading allows for programs that do not voluntarily give up control, and it also allows a computer to continue functioning when a single process hangs.

There are a number of problems associated with preemptive multithreading that all stem from the fact that control is taken away from one process when it is not necessarily prepared to give up control. For instance, if one process were writing to a memory location, and was preempted, the next process would see half-written data, or even corrupted data in that memory location. Or, if a task was reading in data from an input port, and it was preempted, the timing would be wrong, and data would be missed from the line. Clearly, this is unacceptable.

The solution to this new problem then is the idea of synchronization. Synchronization is a series of tools provided by the preemptive multithreaded operating sytem to ensure that these problems are avoided. Synchronization tools can include timers, "critical sections," and locks. Timers can ensure that a given process may be preempted, but only for a certain time. Critical sections are commands in the code that prevent the system from switching control for a certain time. Locks are commands that prevent interference in atomic operations. These topics will all be discussed in the following chapters.

[edit] Mutexes

The term Mutex is short for "Mutual Exclusion", and is a type of mechanism used in a preemptive environment that can prevent unauthorized access to resources that are currently in use. Mutexes follow several rules:

  1. Mutexes are system wide objects, that are maintained by the kernel.
  2. Mutexes can only be owned by one process at a time
  3. Mutexes can be aquired by asking the kernel to allocate that mutex to the current task
  4. If a Mutex is already allocated, the request function will block until the mutex is available.

In general, it is considered good programming practice to release mutexes as quickly as possible. Some problems with mutexes will be discussed in the chapter on deadlocks.

[edit] Spin Locks

Spin locks is a quick form of synchronization methods. It is named after its behavior - spin in the loop while the condition is false. To implement spin lock system should support test-and-set idiom or give exclusive access to a locking thread by any means (masking interrupts, locking bus).

An advantage of spin locks is that they are very simple. A disadvantage is that they waste CPU cycles in loop waiting. Most common use of spin locks is to syncronize quick access to objects. It is not advisable to do a long computations while spin locked a section.

[edit] Sentinels

[edit] Critical Sections

A critical section is a sequence of computer instructions that may malfunction if interrupted. An atomic operation is a sequence of computer instructions that cannot be interrupted and function correctly. In practice, these two subtly different definitions are often combined. Operating systems provide synchronization objects to meet these requirements, and some actually call these objects as "critical sections," "atomic operations" or "monitors."

An example of a critical section is code that removes data from a queue that is filled by an interrupt. If the critical section is not protected, the interrupt can occur while the dequeuing function is changing pointers, and corrupt the queue pointers. An example of an atomic operation is an I/O read where the process must read all the data at a particular rate, and cannot be preempted while reading.

A generally good programming practice is to have programs exit their critical sections as quickly as possible, because holding a critical section for too long will cause other processes on the system not to get any time, and will cause a major performance decrease. Critical sections should be used with care.

[edit] Priority Scheduling

Many RTOS have a mechanism to distinguish the relative priorities of different tasks. High-priority tasks are executed more frequently than the low priority tasks. Each implementation of priority scheduling will be a little different, however.

[edit] Deadlock

A deadlock occurs when a series of synchronization objects are held in a preemptive MT system in such a way that no process can move forward. Let us take a look at an example:

Let's say that we have 2 threads: T1 and T2. We also have 2 mutexes, M1 and M2.

  1. T1 asks for and acquires mutex M1.
  2. T2 acquires M2
  3. T1 asks for M2, and the system transfers control to T2 until T2 releases M2.
  4. T2 asks for M1, and the system is in deadlock (neither thread can continue until the other releases it's mutex).

This is a very difficult problem to diagnose, and an even more difficult problem to fix. This chapter will provide some general guide-lines for preventing deadlocks.

[edit] Watchdog Timer

In an embedded environment, far away from the lab, and far away from the programmers, engineers, and technicians, all sorts of things can go wrong, and the embedded system needs to be able to fix itself. Remember, once you close the box, and shrink-wrap your product, it's hard to get back in there and fix your mistakes.

In a typical computer systems, cosmic rays flip a bit of RAM about once a month[citation needed]. If that happens to the wrong bit, the program can "hang", stuck in a short infinite loop.

Turning the power off then on again gets it unstuck. But how do you jiggle the power switch when you are on Earth and your embedded system is near Neptune? Or you are in Paris, and your embedded system is in Antarctica?

One of the most important tools of an embedded systems engineer is the Watch-Dog Timer (WDT). A WDT is a timer with a very long fuse (several seconds, usually).

The WDT counts down toward zero(*), like the big red numbers counting down on the bombs in the movies. Left to itself, eventually the counter will reach zero. When the counter reaches zero, the WDT resets the microcontroller (as if the power were turned off, then turned back on).

When the system is running normally, you don't want it to randomly reset itself, so you need to make sure that your program always "feeds the watch-dog" long before time runs out. Good practice is to reset the WDT less than halfway-through it's countdown. For instance, if the WDT has a timer of 20 seconds, then you will want to feed the WDT at least once every 10 seconds.

Unlike when our hero deals with bombs in the movies, feeding the watch-dog doesn't stop the countdown. When the code uses a "reset" or "clear" command to feed the watchdog, it merely sets the WDT back to some large number -- and then the watchdog timer immediately starts counting down from there.

If the programmer fails to feed the watchdog in time -- or if the program hangs for any reason -- then sooner or later WDT will time out, and the program will reset, hopefully getting your system unstuck.

(*) Some watchdogs count up. With this kind of watchdog, "feeding the watchdog" resets it to zero. If it ever reaches some high limit, it resets the system.

[edit] further reading