Embedded Control Systems Design/Operating systems

The Wikibook of

Embedded Control Systems Design

Definition

An operating system is a computer program that manages the hard and software resources of a computer. It provides the interface between application programs and the system hardware. In general, an OS for embedded control systems has the following responsibilities: Task management and scheduling, interrupt servicing, inter process communication and memory management. These detailed topics will be discussed later on, as we first start with a more general approach of operating systems.

Necessity of an operating system

When choosing an operating system for an embedded control system, the first question that should be asked is: Is an operating system really necessary? For simple tasks or systems that only have to do one job, it might be easier and more efficient to create a program that runs directly on the microprocessor, perhaps using a super-loop architecture. An operating system will be needed when the embedded control system has to perform more complex tasks or be able to connect or interface with other devices for example. An operating system will allow for more flexibility. The following list gives an overview of the most common real time operating systems.

List of real-time operating systems

Linux based real-time operating systems :
- RTAI [1]
- RTLinux [2]
- UCLinux [3]
Other real-time operating systems :
- eCos [4]
- Xenomai [5]
- VxWorks [6]
- WinCE [7]
- RTEMS [8] (RTEMS for Embedded Software Developers)
- QNX [9]
- FreeRTOS [10]
- MQX [11]
- BeRTOS [12]

Things to consider when choosing an operating system

To facilitate the choice of the operating system, the following things can be considered:

Compatibility: Not every operating system works on any kind of microprocessor! So when choosing an operating system, one should already have an idea of the available hardware.
System Requirements and Footprint: The footprint of an embedded system is the total size of the system. For example, if the total size of an embedded control system is 600 kB, it can be said that the system's footprint is 600 kB. This is a small footprint, larger footprints would be in the order of some MB. A system's footprint has to be considered to make sure that the available hardware is able to support the system.
Software support: An other important factor in the choice of an operating system is the available supporting software. Are there many editors, compilers and libraries, or only very few? This is related to the next point on the list;
Amount of users: If one operating system is used by many people, there will be more available software for this system. It will also be easier to find documentation, solutions for specific problems or people that are using the same operating system.
Availability of device drivers
Possibility to dynamically add code: Sometimes it might be necessary to add programs to the already working system. Does the operating system allow this?
Certification: Some specific applications demand that the operating system is certified for use in that particular application. An example is the use of an operating system that will be used in space (satellites, rockets...)
Soft/Hard real-time: How important is it that timing constraints are met? In soft real-time, it is not a disaster (but not desirable) that a calculation is not completed within the specified time. Examples are audio and video playing. Nobody will notice a delay of a few microseconds. In hard real-time there is no exception: the system has to meet the specified timing constraints. Hard real time systems are for example surgical robots, space probes or precision mills.

When choosing an operating system, the following trends can also be considered:

Remote access: More and more it is a necessity that the system can be remotely accessed. Manufacturers that ship their machines all over the world cannot go to every location where there is one of their machines installed if they want to access the control system. Therefore remote accessing becomes increasingly important.
Safety Integrity Level: The safety integrity level is a relative measure of the reliability of a system. Systems that control dangerous processes should have a high SIL rating. To reach that SIL level, specific demands on soft and hardware are made. This can also affect the choice of an operating system.
Development of operating systems: There used to be hundreds of embedded operating systems available. When Microsoft introduced their WinCE system, a lot of systems disappeared, and programmers of different systems combined their efforts to create a common embedded operating system, thus further reducing the number of available systems.

Operation system basics

This section aims at a more detailed discussion of the basics of an operating system. As mentioned, the important tasks an embedded operating system has to do are: Task management and scheduling, interrupt servicing, inter process communication and memory management.

Task management and scheduling

A task (= process or thread) is an instance of a computer program that is being executed. While a program itself is just a passive collection of instructions, a process is something which actually executes those instructions. Several processes can be associated with the same program - each would execute independently. Modern computer systems allow multiple processes to be loaded into memory at the same time and, through time-sharing (or multitasking), give an appearance that they are being executed at the same time even if there is just one processor.

The difficult part in task management is to make sure that a specific task can exit without blocking other tasks. A task indeed should not be deleted blindly, because it shares a lot of its components with other tasks, so its memory space and locks should not be released when its causing tasks are still using them.

In general, multiple tasks will be active at the same time. The OS is responsible for sharing the available resources (CPU time, memory, etc.) over the tasks. The CPU is one of the important resources, and deciding how to share the CPU over the tasks is called scheduling.

The simplest approach to the scheduling problem is to assign static properties to all the tasks. That means that the priority is given to the task at the time it is created. Sometimes however, dynamic properties are assigned to a task: the schedule-algorithm has to calculate the task's priority on-line, based on a number of dynamically changing parameters (time till next deadline, amount of work to process, etc.). It's obvious that scheduling is pure overhead: all time spent on calculating which task to run next is lost for the really productive tasks.

Interrupts

An operating system also has to be able to service peripheral hardware, such as timers, motors, sensors, communication devices, disks, etc. All of those can request the attention of the OS asynchronously, i.e. at the time that they want to use the OS, the OS has to make sure it's ready to service the requests. Such a request for attention is called an interrupt.

There are two kinds of interrupts :

Hardware interrupts: the peripheral device can put a bit on a particular hardware channel that triggers the processor on which the OS runs, to signal that the device needs servicing. The result of this trigger is that the processor saves its current state, and jumps to an address in its memory space, that has been connected to the hardware interrupt at initialization time.
Software interrupts or Synchronous interrupt: many processors have built-in software instructions with which the effect of an hardware interrupt can be generated in software. The result of a software interrupt is also a triggering of the processor, so that it jumps to a pre-specified address. Examples of cases where software interrupts appear are perhaps a divide by zero, a memory segmentation fault, etc. So this kind of interrupt is not caused by an hardware event but by a specific machine language operation code. A synchronous interrupt is often called a trap.

Many systems have more than one hardware interrupt line, and the hardware manufacturer typically assembles all these interrupt lines in an interrupt vector. Hardware and software interrupts do share the same interrupt vector, but that vector then provides separate ranges for hardware and software interrupts.

An Interrupt Controller is a piece of hardware that shields the OS from the electronic details of the interrupt lines. Some controllers are able to queue interrupts, such that none of them gets lost.

We discuss interrupts in more detail in other chapters of this book, Embedded Control Systems Design/Real Time Operating systems#Interrupt servicing and Embedded Control Systems Design/Design Patterns#Interrupts.

Inter process communication

Different tasks sometimes need to be synchronized, e.g. the sequence in which they are executed is important. It is also possible that tasks need to exchange data. An example is a peripheral device from which the data is processed by a dedicated task, which in turn passes the result on to another task. Synchronization and data exchanged are together called inter-process communication. An operating system should provide a wide range of inter-process communication primitives, so that tasks can easily be synchronized, or exchange data.

Synchronization
- Semaphore
  A Semaphore is a protected variable (or abstract data type) and constitutes the classic method for restricting access to shared resources (e.g. storage) in a multiprogramming environment. It was invented by Edsger Dijkstra and first used in the THE operating system. The value of the semaphore is initialized to the number of equivalent shared resources it is implemented to control. In the special case where there is a single equivalent shared resource, the semaphore is called a binary semaphore. The general case semaphore is often called a counting semaphore. Semaphores are the classic solution to the dining philosophers problem, although they do not prevent all deadlocks.
- Mutex
  The term mutex is short for "mutual exclusion", and is a type of mechanism used in a preemptive environment that can prevent unauthorized access to resources that are currently in use. Mutexes follow several rules: Mutexes are system wide objects, that are maintained by the kernel. Mutexes can only be owned by one process at a time. Mutexes can be acquired by asking the kernel to allocate that mutex to the current task. If a Mutex is already allocated, the request function will block until the mutex is available. In general, it is considered good programming practice to release mutexes as quickly as possible.
- Condition variable
  To avoid entering a busy waiting state, processes must be able to signal each other about events of interest. Monitors provide this capability through condition variables. When a monitor function requires a particular condition to be true before it can proceed, it waits on an associated condition variable. By waiting, it gives up the lock and is removed from the set of runnable processes. Any process that subsequently causes the condition to be true may then use the condition variable to notify a process waiting for the condition. A process that has been notified regains the lock and can proceed.
Data exchange
- Buffers
  A buffer is a region of memory used to temporarily hold output or input data, comparable to buffers in telecommunication. The data can be output to or input from devices outside the computer or processes within a computer. Buffers can be implemented in either hardware or software, but the vast majority of buffers are implemented in software. Buffers are used when there is a difference between the rate at which data is received and the rate at which it can be processed, or in the case that these rates are variable, for example in a printer spooler.
- FIFO
  This term refers to the way data stored in a queue is processed. Each item in the queue is stored in a queue (simpliciter) data structure. The first data to be added to the queue (enqueue) will be the first data to be removed (dequeue), then processing proceeds sequentially in the same order.
- Messages
  Message passing is a form of communication used in concurrent programming, parallel programming, object-oriented programming, and interprocess communication. Communication is made by the sending of messages to recipients.

Memory management

Terminology

When executing a task the system is in need of RAM memory. This RAM memory is mainly used for code, data management and IPC (Inter-process communication) with other tasks. The RAM makes it seem as if the task is the only one that is claiming the memory while running. Summarized it has three main tasks:

making the memory larger than it seems.
distributing the memory over a various of pages.
protecting the memory from access by other tasks.

When considering these general Operating System requirements, one has to keep in mind that they are not the concerns of real-time systems. The real-time requirements which have to be fulfilled are:

Fast and deterministic memory management: The fastest memory management is, of course, no memory at all. This means that the program can use all the physical RAM. This approach is only useful when programming small embedded control systems.
Page locking: Each task is getting a fixed number of pages in the RAM. Pages that hasn't been used recently are swapped out by a non deterministic process. When a task needs code or data from a swapped page, the system has to retrieve this page from the disk, which often requires another page in the RAM to be swapped out. This means the system has to lock pages of real time tasks. Otherwise there's a good chance that the necessary pages are being swapped out during the process. Page locking is thus a Quality of Service which means that each tasks gets the amount of memory it requires.
Allocation: It is always possible that a task is in need of more memory somewhere during the process. This extra memory is provided by a pool of free pages locked in the physical RAM. One has to be careful using this approach. It is never guaranteed that there are enough free pages available in the memory pool.
Memory mapping: Realtime and embedded systems have to access peripheral devices. These devices store their data in onboard registers. The registers then have to be mapped in the address space of the corresponding device driver task.
Memory sharing: shared memory allows different tasks to communicate with each other. The Operating system deallocates the shared memory and synchronizes access to that memory.
RAM disks: These disks can be used to emulate a hard disk. This means that memory is organized and entered as a file system. The benefits of this kind of memory management are:

avoiding non-deterministic overhead of accessing hard disks.
no extra cost.
no extra space needed.
no reduction of robustness of mechanical disk devices.

Flash disk: When a system is in need of data preservage while the power is switched off, the designer has to use flash disks. These disks can be burned repeatedly and they keep their memory when there is no power. The flash memory is loaded from within the system itself.
Stripped Libraries: RAM is scarce, so designers prefer to use stripped down versions of general utility libraries i.e. C library, GUI library etc...

Shared memory in Linux

There are various ways to allocate shared memory in Linux. When using RTLinux allocating is allowed, when using RTAI it's not.

Allocation at boot time: a block of shared memory can be reserved at boot time. This means that Linux cannot access it for general purposes. There is one problem though, boot-time allocation is not the best option for the average user: when only available for code linked in the image of the kernel, a device driver using this allocation can only be installed or replaced by rebuilding the kernel and rebooting the computer.
Allocation in kernel space: using this approach means that the memory does not have to be ready at boot time. Also there isn't a limit of the memory size. Unfortunately this isn't a real time process.
Allocation in module: In Linux there is an option called ‘Loadable Module Support’. This means that Linux can be divided in different modules that can be accessed and called. These modules are actually bits of code which places the kernel in the memory and takes it back out if wanted. This approach is preferably used at boot time.

Device drivers

The main goal of a device driver is to allow communication between the operating system and the peripheral devices. It also provides a link between the kernel software and the user software. It's important to achieve a sort of systematic way to do this. This makes the interface more recognizable for the user. The driver has the responsibility of hiding the details of the hardware from the operating system and, ultimately, from the user. This chapter will discuss a few of the available device drivers for embedded control systems.

Mechanism and policy

A device driver must provide mechanism, not policy. It has to supply the interfacing capabilities but nothing more (mechanism). A driver leaves the data alone, it may never alter a data stream (policy). It only makes it possible to use a device in a particular way.

Available device drivers

Unix device drivers: In Unix a device driver is actually part of the Unix kernel. This makes the drivers somewhat different from standalone programs. The kernel exists out of base code plus driver code that helps the system communicating with the hardware. The following steps have to be checked when writing a device driver in Unix:
- Namespace: before writing any code it is important to name your device.
- Memory allocation: the kmalloc() function is used. This means that memory is provided in pieces whose size is a power of two. There is a limit to this namely 131056 bytes. Also, kmalloc() requires a second argument: priority. Kfree() will then give you the opportunity to allocate memory.
- Character vs. block devices: these are the two main types of devices under Unix. Character devices use no buffer and block devices are accessed via a cache.
- Interrupts vs. Polling: There is a difference between interrupt device drivers and polling device drivers. In Unix the kernel is not a separate task. Each part or process of the system has its own copy of the kernel^{[citation needed]}. When the system gives a call to execute some task to a process, this is said to be "in kernel mode". The process hasn't transferred control to another process. In kernel mode it is obvious where to place the data because the process is running in its own mode. When interrupts occur however, they will alter this dataplacement and may cause macro's^{[citation needed]} (which allow the system to access user-space memory) to write over random memory space of cause the kernel to crash. The driver has to schedule a interrupt and while doing that it must provide temporary space to put information in. In a block device driver this temporary space is arranged by the buffer. When using a character device driver, the device driver itself is responsible for the allocation.
Complex device drivers: a normal device driver does nothing more than writing and reading a datastream to or from a peripheral device. When a device needs interrupts to communicate with the users software these drivers tend to be a little bit more complex. These kind of drivers are in need of a ISR and possibly also a DSR. There is also another reason why a device driver could become complex: some devices allow shared memory or even Direct Memory Access. This means that no processor is used to exchange data. An operating system must support DMA. When designing a complex device driver it is best to standardize the structure and the API for the device:
- API: when a device offers similar mechanisms as other devices the designer has use the same software interface.
- Structure: The device driver must be able to cope with more layers of functionality of the interfaces.
Comedi:Data acquisition cards are used for measurements and control purposes. Comedi will provide a interface for them.It's a balance between modularity and complexity. Here are some issues that designers have to keep in mind when dealing with Comedi:
- It works on a standard Linux kernel as well on RTAI and RTLinux.
- It consists of two packages: Kcomedilib and Comedilib.
- The cards have the following features:
  - analog input/output channels.
  - digital input/output channels.
- The kernel space structure uses a specific hierarchy:
  - channel: represents the properties of one single data channel.
  - sub-device: functionally identical channels.
  - device: set of subdevices. Thus the interface card itself.
- The basis functionalities are:
  - instruction: synchronously perform one single data acquisition.
  - scan: repeated instructions of a number of different cards.
  - command: start or stop the data acquisition.
- It allows to query the capabilities of the installed Comedi devices.
- It offers event-driven functionality.
- It provides locking primitives when multiple devices are accessed at the same time.

Writing a comedi device driver and the different steps in the writing process are explained here.

Real-time serial line drivers: these kind of device drivers are integrated into RTAI. They provide:
- very configurable address initialization,
- interrupt handling,
- buffering, callbacks, non-intrusive buffer inspection,...
Real-time parallel port drivers: these kind of drivers are integrated in Comedi. They allow the real-time interrupt handler to be connected to the interrupt on a parallel port.
Real-time networking drivers: integrated in the RTAI, it provides a common programming interface between RTOS and the device drivers of ethernet networks.

POSIX

When developing an operating systems, there is a wide range of advantages if the design is according to certain standards. Software is developed in teams, and standards make the design far easier, faster and more transparent. Standards increase the portability of existing applications to new hardware. They also facilitate the cooperation of different pieces of software.

POSIX or "Portable Operating System Interface" is the collective name of a family of related standards specified by the IEEE to define the application programming interface (API) for software compatible with variants of the Unix operating system. POSIX specifies the user and software interfaces to the OS in some 15 different documents. For real-time systems, POSIX has the following standards:

1003.1b	Real-time Extensions	Functions needed for real-time systems; includes support for: real-time signals, priority scheduling, timers, asynchronous I/O, prioritized I/O, synchronized I/O, file sync, mapped files, memory locking, memory protection, message passing, semaphores
1003.1d	Additional Real-time Extensions	Additional interfaces; includes support for: new process create semantics (spawn), sporadic server scheduling, execution time monitoring of processes and threads, I/O advisory information, timeouts on blocking functions, device control
1003.1j	Advanced Real-time Extensions	More real-time functions including support for: typed memory, nanosleep improvements, barrier synchronization, reader/writer locks, spin locks, and persistent notification for message queues
1003.21	Distributed Real-time	Functions to support real-time distributed communication; includes support for: buffer management, send control blocks, asynchronous and synchronous operations, bounded blocking, message priorities, message labels, and implementation protocols

POSIX defines profiles for real-time systems, based on the presence of a file system and the number of processes.

Profiles	Number of Processes	Threads	File System
54	Multiple	Yes	Yes
53	Multiple	Yes	No
52	Single	Yes	Yes
51	Single	Yes	No

Things yet to mention

Kernel vs User space ?
Deadlocks ?