Aros/Developer/Docs/Libraries/Exec

This library manages all the other libraries, devices, resources, interrupts, applications and the computers' memory. To see the relationships. The alternative view on this subject.

The heart & soul of the OS. It provides preemptive multitasking, built-in linked list primitives, message passing primitives, IO primitives, etc. and was heavily re-used throughout the system. If you understand exec, you'll understand the low-level innards of the rest of the system.

Effectively most of AmigaOS and by extension AROS is built on extensive use of message passing which uses signals to indicate when a new message is available and semaphores for safe access to certain data structures.

This is how the Amiga dealt with the issue of portable code - all of the code was written using offsets into a pointer to the "base" of a library and there were no fixed addresses, other than address 4. Every library has a call table of 6 bytes per entry (a jump instruction) that then vectors off into the correct entry point. It was easy to write your own library, provided you had the correct format for your "header". Device drivers "borrowed" the structure of libraries and used the same concept to provide default I/O handlers.

The packing of structures. For structs defined by the API of OS3.x libraries MOS and OS4 let these struct be packed in a m68k compatible way (e.g. mostly at two byte boundaries) and other structs are packed in PPC native way (4 bytes for data of 4 bytes long etc.) We even had a solution to implement it in the AROS includes by using #include to enable this packing

#include <aros/compat_packing_on.h>

/* Definition of structs with compat packing */

#include <aros/compat_packing_off.h>

The contents of these aros/compat_packing_on.h and aros/compat_packing_off.h could then be dependent on arch and compiler.

Another thing to fix is the way LVO tables are set up and the stub code to fake m68k entry of the library functions.

To summarize ppc arch would use native packing for all structs and a pointer table for the LVOs without special stub code. ppcmos would apply special packing to structs that need it and provide the MOS LVO table convention for shared libraries.

Move SysBase away from 4L - This will additionally provide a full debug info for displaying KS function names in crash backtraces.

To make parts of AROS' kickstart fully portable, since our KS is modular. There appeared to be three parts:

BSP - board-support package. Kernel, exec, hardware drivers.
Strap - this part was different only between hosted and native. It included bootloader.resource and dosboot.resource. On native it also includes resident filesystems (AFS, FFS, CDVSFS).
Base. Core modules (graphics, utility, intuition, dos, etc). This part is assumed to be fully machine-independent.

Currently all hosted kickstarts and x86-64 native kickstart follow this scheme. It gives much more comfortable code maintenance. If you want to add some new machine-independent module (last time it was filesystem.resource for example), you add it in one place, and all ports get it. When you test something on one port, you are 100% sure that any other port will behave in the same way.

Other ports (i386, m68k and ppc native) to follow. Currently, m68k-amiga violates this specification because it has hardware-specific code in graphics.library.

Memory

See here for more information

When an application requests memory, it can allocate a little itself or asks Exec to find a suitable area of memory that meets its requirements.

#include <proto/exec.h>
APTR AllocMem( ULONG size, ULONG flags );
void FreeMem( APTR memory, ULONG size );   /* return no status */
APTR AllocVec( ULONG size, ULONG flags );   /* remembers memory set better */ 
void FreeVec( APTR memory );   /* return no status */

flags
MEMF_TOTAL - All available memory . 
MEMF_CLEAR - The allocated memory area is initialized with zeros.
MEMF_LOCAL - Get memory that will not be flushed, if the computer is reset.
MEMF_CHIP - Get memory that is accessible by graphics and sound chip. Required for some functions.
MEMF_FAST - Get memory that is not accessible by graphics and sound chips. You should normally not set. 
MEMF_PUBLIC - This flag must be set, if the memory you allocate is to be accessible by other tasks. 
MEMF_REVERSE - If this flag is set, the order of the search for empty memory blocks is reversed. 
MEMF_NO_EXPUNGE - If not enough free memory is found, AROS tries to free unused memory.

Exec provides the routines AllocEntry() and FreeEntry() to allocate multiple memory blocks in a single call via MemList.

struct	MemList {
    struct  Node ml_Node;
    UWORD   ml_NumEntries;	/* number of entries in this struct */
    struct  MemEntry ml_ME[1];	/* the first entry	*/
};

struct	MemEntry {
union {
    ULONG   meu_Reqs;		/* the AllocMem requirements */
    APTR    meu_Addr;		/* the address of this memory region */
    } me_Un;
    ULONG   me_Length;		/* the length of this memory region */
};

Memory allocated with these functions must be freed after use.

KickMemPtr/KickTagPtr/KickCheckSum stuff (reset proof residents) is now implemented and committed. Forgot to mention that currently it InitResident() them instantly. They should be injected into ResModules list (in correct order too) which isn't trivial because KickTags can be processed only after autoconfig devices have been configured (SysBase or KickMemTags can be located in autoconfig fast RAM) and ResModules list is quite annoying and stupid list..

Actually, no Exec function should fill in Dos/IoErr(). OpenLibrary() (when overridden by LdDaemon) being the notable exception. AllocMem() is not patched by DOS and won't ever set any DOS error codes. Only those exec functions that are patched by DOS may change IoErr() but it is only a side-effect, it is not a supported way to get error codes.

The new memory system apparently treats the mh_Upper field in struct MemHeader as the top-most valid address within the memory area, whereas the old implementation and AOS treat it as one byte higher than the last valid address.

Our AutoDoc for AllocMem() states that the returned memory is aligned to sizeof(struct MemChunk), but the AmigaOS 3.x AutoDoc only guarantees that "The memory block returned is long word aligned." AmigaOS 3.x does in fact return MemChunk-aligned blocks, but are we not subtly changing the API here? MemChunk alignment is a superset of LONG alignment (all MemChunk aligned blocks are also LONG aligned), so this is not an issue. Only if MemChunk-aligned is not long word aligned. MemChunk is a multiple of IPTR. There was a weird set of macros previously, however they have been simplified. So it's either AROS_WORSTALIGN or sizeof(struct MemChunk) (depending on what's larger). And struct MemChunk is intentionally one APTR and one IPTR, so it's also a sizeof(IPTR). See MEMCHUNK_TOTAL definition in exec\memory.h.

If someone wanted to port AROS software to one of the other OSes and that other OS had a memory manager that only provided LONG-aligned blocks, it could be a problem if the software being ported relied on them being MemChunk aligned (the application programmer shouldn't need to know what a MemChunk is anyway). It is the task of the programmer to make it's code compatible with the OSes it wants to run on. We could add a phrase like the following to the AllocMem() autodoc: 'For compatibility with other Amiga like OSes programmers should only count on a LONG aligned memory block as documented in original AmigaOS 3.x'

struct	MemChunk {
    struct  MemChunk *mc_Next;	/* pointer to next chunk */
    ULONG   mc_Bytes;		/* chunk byte size	*/
};

struct	MemHeader {
    struct  Node mh_Node;
    UWORD   mh_Attributes;	/* characteristics of this region */
    struct  MemChunk *mh_First; /* first free region		*/
    APTR    mh_Lower;		/* lower memory bound		*/
    APTR    mh_Upper;		/* upper memory bound+1	*/
    ULONG   mh_Free;		/* total number of free bytes	*/
};

MEMF_CHIP refers to memory needed by the (amiga) CHIPset. IMHO using it for the memory needed by a graphics cards chipset is no different. Graphics memory is not system memory at all, since, for example, you cannot execute code from there in most cases.

Many programs supply MEMF_CHIP flag when they allocate buffers for some planar graphics data. If we talk about system-friendly software, then these data then fed to graphics/intuition. Even our graphics.library uses CHIP allocations inside AllocBitMap(), to stay compatible with Amiga hardware. Consequently, if we don't declare any RAM as CHIP (thus leaving CHIP as VRAM indicator), then we cannot do these allocations. An alternative is: we declare all RAM as CHIP, In this case we have no problem with AllocMem()s for planar graphics. CHIP flag in this case exists purely for backwards compatibility. Yes, GFX card memory needs to be queried in some other way.

Memory Pools

when allocating memory exec functions do not check if the bytesize asked for has wrapped around (ULONG) This can happen if someone tries to allocate for some unknown reason way too much memory and the bytesize is grown internally over 2^32. Instead of failing the user could get allocation done, but with messed up bytesize. This might not be a real world scenario though and not a real problem...

I'd also like to know if someone is extending the memory allocation routines. Need something that can allocate memory that is aligned and would also fit the memory to some boundaries. Sounds like you need a slab allocator - one can be trivially written as a wrapper around AllocMem().

a slab is a large chunk of memory, split into equal size blocks, and memory usage is mapped by a bitmap.

AmigaOS 4.x has an interface for this - we may want to implement and API-compatible version.

The main benefit of slab allocators is the speed, but they add quite a bit of extra memory overheads too. So if you think you're going to fix out of memory problems I suspect you'll be surprised.

Not only that. Slab allocators are great if they are used for system objects, but cannot be really applied for any other use. They are, per definition, meant to simplify and speed up allocation of objects of the same size.

The buddy allocator used in linux (the kernel) is very fast - but also wastes lots of memory (I think it's even worse than a slab allocator). It just doesn't really matter as the kernel isn't using huge amounts of memory and it isn't used as a general allocator.

Well, buddy allocator is just there to get the raw memory (MMU pages) which is further assigned to an address space. A great benefit is that buddy gives you always memory aligned to the allocation size boundary. The disadvantage is, the size is power of two (staring with the MMU page - 4K, being the smallest allocation quantity).

1. Scales nicely from few MB of managed memory up to huge amounts - I've successfully tested it with 12GB AFAIR.
2. The allocation time, including all locking, takes about 250 cpu cycles - less than 0.1 microsecond on 2.5 GHz cpu.
3. Freeing memory is slightly faster - 0.09 microsecond
4. Memory overhead for buddy itself: 0.4%

Buddy itself does not have to waste a lot of memory, but then it's rather slow. The last one is huge 16MB of wasted RAM on 4GB machine… Reducing that one, however, makes the whole buddy allocator awfully slow.

Here's a sort list of allocations and their alignments and boundaries for which I need something better:

                                            Max Size    Boundary        Alignment
        Device Context Base Address Array   2048        PAGESIZE        64
        Device Context                      2048        PAGESIZE        64
        Input Control Context               64          PAGESIZE        64
        Slot Context                        64          PAGESIZE        32
        Endpoint Context                    64          PAGESIZE        32
        Stream Context                      16          PAGESIZE        16
        Stream Array (Linear)               1M          None            16
        Stream Array (Pri/Sec)              4Kb         PAGESIZE        16
        Transfer Ring segments              64K         64KB            16
        Command Ring segments               64K         64KB            64
        Event Ring segments                 64K         64KB            64
        Event Ring Segment Table            512K        None            64
        Scratchpad Buffer Array             248         PAGESIZE        64
        Scratchpad Buffers                  PAGESIZE    PAGESIZE        Page

All of our drivers need to align the memory (if needed) by themselves and this could lead to a mess. Pagesize could be user defined and not necessarily the same as cpu/system, or is this available somewhere?

Drivers aligning memory by themselves leads to more memory usage than is needed. If I were to allocate 64kb on 64kb boundary with alignment of 16 bytes, I'd have to actually allocate at least 128kb.

Is memory fragmentation a real issue with Aros? If not I'd be happy with something like AllocVecAligned(bytesize, alignment, boundary) (allocated from memory accessible from pci devices) instead of OS stealing memory it might never use. Aligned memory allocations would mostly be used in driver code and those tend to keep their memory and never release it (until softboot), not sure though...

Memory fragmentation is a HUGE issue in AROS - on limited memory systems. It's just masked by the fact that most AROS ports are running on systems with enormous quantities of memory.

http://www.morphos-team.net/tlsf (but previously available also AFAIK on Aminet or elsewhere):

void *allocaligned(ULONG size, ULONG flags, ULONG align)
{
  void *ptr = AllocMem(size + align - 1, flags & ~MEMF_CLEAR);
  if (ptr)
  {
    ULONG alignptr = ((ULONG) ptr + align - 1) & (ULONG) -align;
    Forbid();
    FreeMem(ptr, size + align - 1);
    ptr = AllocAbs(size, (APTR) alignptr);
    Permit();
    if (ptr && (flags & MEMF_CLEAR))
      memset(ptr, 0, size);
  }
  return ptr;
}

IMHO a 64kb boundary has explicit 16 byte alignment... Boundary implies only that the allocated memory can not cross over certain region, it does not mean that the memory needs to be aligned to the size of the boundary.

Hardware uses such things as then they do not need to update the full memory address register, but in case of 64Kb boundary it only needs 16bits and thus the memory it offsets can't cross over that border.

In my case, have "Event Ring Segment Table" that contains 64bit address (bits 0-5 are reserved and not used =64byte alignment) that points to "Event Ring segments", now those segments can not cross over the 64Kb boundary although lower 16bits of the address is given (minus bits 0-5). boundary is not alignment.

Don't go partially deallocating memory as that would "trash/render useless" mungwall's around the original allocation or even if it would not it's still the OS job to give decent routines to allocate things cleanly.

prefer to just have this all handled via an allocmem that takes tags as input.. (already have MEMF_HWALIGNED).. huh? Why??! Why not rather AllocMemTags? Don't really like such circular dependancies, i.e. exec.library depending on utility.library which depends on exec.library...

#define MEMF_TAGGED    (1L << 21)    /* Additional allocation tags */

APTR AllocMem(IPTR size, flags, ....)

  buff = AllocMem(680, MEMF_CLEAR | MEMF_ANY);

  IPTR hw_addr;
  hw_buff = AllocMem(1024*4, MEMF_CLEAR | MEMF_TAGGED,
                             MEMT_Alignment, 1024,
                             MEMT_BusDevice, pcidev,
                             MEMT_BusDMAAddress, &hw_addr,
                             TAG_END);

It does not need to allocate on the boundary nor browse the memory lists. 'prefix' is the bits that must be the same throughout the allocation, and is equivalent to the boundary. 'valid_bits' combines the address size and the alignment.

APTR AllocMemAligned(IPTR size, ULONG flags, UQUAD prefix, UQUAD valid_bits)
{
   APTR mem;

   /* Check if a full 64-bit pointer is allowed */

   if((valid_bits & 0x8000000000000000LL) == 0)
   {
      flags |= MEMF_31BIT;
      valid_bits &= 0x7fffffffLL;
   }

   /* Allocate a block large enough to definitely include a suitable
      sub-block */

   mem = AllocMem((size << 1) + ~valid_bits, flags);

   /* Shift starting position of sub-block to fit alignment etc.;
      allocate sub-block and give back the rest */

   if(mem != NULL)
   {
      Forbid();
      FreeVec(mem);
      if((mem & ~valid_bits) != 0)
         mem = (mem & valid_bits | ~valid_bits) + 1;
      if((mem & prefix) != (mem + size & prefix))
         mem = mem + size & prefix;
      mem = AllocAbs(size, mem);
      Permit();
   }

   return mem;
}

some kind of private AllocVecAligned function already written, but IMHO it is bad thing for a driver code to go and browse the memory lists by itself, but at the moment it is the only viable option (Trying to allocate ON the boundary results in that ALL of the memory regions fitting ON the boundary may be exhausted on very limited systems with very low memory)? The problem with a #?Vec#? function instead of #?Mem#? is that having to store the size may upset the alignment and thus waste space. Like AllocVec() too, but IMHO a #?Mem#? function isn't inconvenient for allocating DMA blocks etc., which are often a constant size.

Nouveau driver will add gfx card memory in the memory lists and that (gfx card memory) region and is not accessible from my PCI(-e) device and has to be skipped, only true system memory will be used.

what is the reason for alignOffset? It's for the cases where you need to store something aligned in memory, but also need to store additional data before the aligned area begins.

APTR AllocVecAligned(ULONG bytesize, ULONG requirements, ULONG alignment, ULONG boundary) {
    D(bug("AllocVecAligned(%ld, %lx, %lx, %lx)\n", bytesize, requirements, alignment, boundary));

    UBYTE  *ret;
    ULONG   bytesize_adjusted;

    struct MemHeader *mh;
    struct MemChunk *mc=NULL, *p1, *p2;

    if( ((!bytesize) || (bytesize>boundary)) )
        return NULL;

    /* Add room for stored size */
    bytesize_adjusted = bytesize + AROS_ALIGN(sizeof(ULONG));

    /* Round to a multiple of MEMCHUNK_TOTAL */
    bytesize_adjusted = AROS_ROUNDUP2(bytesize_adjusted, MEMCHUNK_TOTAL);

    /* Be really anal about the size */
    if(bytesize_adjusted<bytesize)
        return NULL;

    Forbid();

My aligned first fit allocation would browse the memory list fulfilling the requirements then it would scheck if chunk contains enough storage space for bytesize_adjusted. If so then it would add sizeof(ULONG) to the ptr and align ptr to the alignment value and check if original bytesize would not span the boundary.

Bytesize_adjusted needs to be tweaked a bit more for this to work. IIRC all allocations are on 64 byte boundary and that means for the new allocation not to break the alignment the allocations must start and end at that alignment. Overhead would then be 128 bytes, which is still better than manualing aligning without AllocAbs().

Aligned allocations are slow for first fit allocation scheme.

Altering allocation scheme is way of my league and anyway it would need some sort of consensus if ever to happen. Main benefit for TLSF type of allocation scheme (if I understand the concept correctly) is that there is no need to search for a block with enough space as the memory list has different lists for different blocks of free space and therefore it returns the best fit for allocation: this can lead to a very small free blocks returned after the allocation (if allocated size is less than the chunk size) instead of first fit that gets the memory from first suitable block and rest is returned free. In my case the first block is rest of the system memory as my driver gets loaded very early and nothing has freed memory, it's size is almost the full 2Gb of my Aros box memory (MEMF_FAST).

Speed of allocations is somewhat useless, why would anyone allocate memory while doing something that requires speed? Fragmentation level can be lowered by using memorypools, but they don't fit for aligned allocations. Speed of allocation does matter a lot. Think about system startup and all other cases, where thousands or even millions of memory allocations are done. Speed does really matter there.

MorphOS alignOffset is still in my mind kind of useless, why not allocate extra data at the end or somewhere else? Because you may want the hardware structure to be a field within the software structure. For example, in Poseidon's OHCI implementation, the software ED and TD structures begin with software-only fields, followed by the hardware structure. Only the hardware structure needs to be aligned. In the split-off OHCI driver I'm working on, these software structures now begin with struct MinNodes, so they have to go before the hardware structures to allow easy placement within struct Lists (unless awkward workarounds are to be used).

MorphOS aligned allocation also lacks the boundary requirement which is a MUST HAVE for hw-driver coding. However, this limitation can probably be worked around quite easily. For example, if the alignment is 128 bytes and the boundary is 1024 bytes, you could round up the requested size to the next power-of-two that's greater-than-or-equal-to the alignment, and use that as the requested alignment. Thus if the needed size rounds up to 512, you request 512 byte alignment, and the boundary won't be crossed.

Aros drivers that could benefit from aligned allocations includes: HDAudio, USB-drivers, AHCI and what else... They all have requirements for aligned allocation, some of our ethernet drivers could also need one, now the aligning is implemented in their code. An aligned allocation function is useful.

As far as i see, all graphics memory management is currently implemented using Allocate() on own private MemHeader, stored in VRAM. Well, Allocate() uses slow algorithm. Additionally, what if Nouveau relies on own allocator ? In this case - what if we introduce an ability to implement memory pool handlers e.g. as classes ? One such class would be exec's pool manager. In this case AllocPooled()/FreePooled() could be used to perform allocations in VRAM. If e.g. Nouveau uses own allocator, it exposes it as a class plus object pointer (the pool itself). So that pool functions can use it.

Libraries

Libraries are made up of four parts: a Library Node, a function (vector) table, set of functions and global data for the library concerned.

struct Library {
    struct Node lib_Node;
    UBYTE lib_Flags;
    UBYTE lib_pad;
    UWORD lib_NegSize;
    UWORD lib_PosSize;
    UWORD lib_Version;
    UWORD lib_Revision;
    APTR lib_IdString;
    ULONG lib_Sum;
    UWORD lib_OpenCnt;
};

If no correct AROS_LH-style register arguments, currently library functions assume stack based variables. Add .conf and AROS_LH(). Some libraries use SDI includes for portability.

Talking about private AROS structures in library bases. (No one cares about some extra LVOs, it is only 6 bytes/LVO). If they are private to AROS then there really is no reason we cannot dynamically allocate them as demand requires then. But if they are initialized to a certain value, like most of the OOP arrays you will need to store the initialization value somewhere. Allocate them with enough space to store it also? One (perhaps ugly) way would be to allocate storage + array, store the initialization value, record the pointer to array itself in the required structure - and access the initialization value via a macro?

It is aroscbase that is set to NULL (I guess it is also freed) when original process exists. Which is of course still in use by detached process.. The app doesn't use detach.o? It should to avoid such problems.

The problem is not arosc.library per se, but its autoopening. Of course, when the program exits the symbol sets are traversed to close all the libraries that were autoopened, and this causes arosc.library to be closed as well.

SetFunction( struct Library *lib, LONG funcOffset, APTR funcEntry) /* change where to look for functions */

Tasks

Exec arranges the time slicing (multitasking) for various tasks depending on their priority and need.

struct Task  {
    struct Node tc_Node;
    UBYTE       tc_Flags;
    UBYTE       tc_State;
    BYTE        tc_IDNestCnt;   /* intr disabled nesting */
    BYTE        tc_TDNestCnt;   /* task disabled nesting */
    ULONG       tc_SigAlloc;    /* sigs allocated */
    ULONG       tc_SigWait;     /* sigs we are waiting for */
    ULONG       tc_SigRecvd;    /* sigs we have received */
    ULONG       tc_SigExcept;   /* sigs we will take excepts for */
    UWORD       tc_TrapAlloc;   /* traps allocated */
    UWORD       tc_TrapAble;    /* traps enabled */
    APTR        tc_ExceptData;  /* points to except data */
    APTR        tc_ExceptCode;  /* points to except code */
    APTR        tc_TrapData;    /* points to trap code */
    APTR        tc_TrapCode;    /* points to trap data */
    APTR        tc_SPReg;       /* stack pointer */
    APTR        tc_SPLower;     /* stack lower bound */
    APTR        tc_SPUpper;     /* stack upper bound + 2*/
    VOID      (*tc_Switch)();   /* task losing CPU */
    VOID      (*tc_Launch)();   /* task getting CPU */
    struct List tc_MemEntry;    /* allocated memory */
    APTR        tc_UserData;    /* per task data */
    };

CHILD_NOTNEW    = 1 
CHILD_NOTFOUND  = 2
CHILD_EXITED    = 3
CHILD_ACTIVE    = 4

Tasks have message ports, and can wait on signals and/or the delivery of a message to the port. which leads to Process in Dos library.

Scheduler

Signals

Each task has its own set of 32 signals, 16 of which are set aside for system use. When one task signals a second task, it asks the OS to set a specific bit in the 32-bit long word set aside for the second task's signals.

Using Wait() to put a task(s) to sleep

mysignals = Wait(1L<<17 | 1L<<19);   /* waiting on signal 17 or 19 to wake up */

SetSignal() can change a task's signal bits, it can also monitor them.

One easy way around this is for a task to sleep briefly within its polling loop by using the timer.device, or the graphics function WaitTOF(), If it's a Process, use the DOS library Delay() or WaitForChar() functions.

Locks

Semaphore

Read more here

Semaphores method of locking, all tasks agree on a locking convention before accessing shared data structures.

struct SignalSemaphore {
    struct  Node ss_Link;
    WORD    ss_NestCount;
    struct  MinList ss_WaitQueue;
    struct  SemaphoreRequest ss_MultipleLink;
    struct  Task *ss_Owner;
    WORD    ss_QueueCount;
};

My proposal is to get rid of Forbid()/Permit where possible and replace Forbid locks with semaphores in AROS. Leave Forbid()/Permit as it is now, but either complain (#warning) or punish the code (with debug log or some other way). If you are going to use forbid(obj) - you may as well just use semaphores, since everything boils down to locking particular resource access.

A programmer should grasp the following if programming in C or another low level language.

struct SignalSemaphore mymemory_lock;

...

int init_all(void)
{
    ...

    InitSemaphore(&mymemory_lock);

    ...
}

int myfunc(...)
{
    ---

    ObtainSemaphore(&mymemory_lock);

    /* Critical section */

    ReleaseSemaphore(&mymemory_lock);

    ...
}

You do lock your object (semaphore) with ObtainSemaphore, release it with ReleaseSemaphore.

When you want to exclude Forbid() during the critical section it would then become something like this in the function.

int myfunc(...)
{
    ---

    ObtainSemaphoreTags(&mymemory_lock, SEM_DONTFORBID, TRUE, TAG_END);

    /* Critical section */

    ReleaseSemaphore(&mymemory_lock);

    ...
}

"Amiga like" to hide semaphores -> they are generally hidden because they do not need accessed by external code/are handled as part of a routine.

ObtainSemaphore actually uses Forbid even in the "happy path", meaning a case where semaphore is available. Semaphore functions forbid only for short period.

Performance improvement, the idea behind it is that the memory allocator locks for the whole operation. I. e. it Forbid()s before start of the allocation and Permit() only when it's done. So decided to try to use semaphore instead of forbidding, this should let other tasks work during the allocation.

Improved semaphore validation. Now alerts will be posted if semaphore API is called in supervisor mode. Switched memory management to use global semaphore instead of Forbid()/Permit() pair. Should improve multitasking. Would be better to make this a run time (boot option) option (with default == use forbid/permit), because there is code around which does not expect that a FreeMem() may break Forbid state.

Lots of programs require original behavior, for example many programs do something like this:

Forbid()
Create new task or process.
Set some task variables and do other stuff.
Permit()

At least m68k must always use original behavior.

Lists

Morphos has a good introduction to this subject as well.

struct Node {
    struct Node *ln_Succ;
    struct Node *ln_Pred;
    UBYTE        ln_Type;
    BYTE         ln_Pri;
    char        *ln_Name;
};

ln_Name is currently in the wrong position compared to above, will be fixed by AROS ABIv1.

This basic node struct is the starting point of many structs used by AmigaOS (TM) and AROS like lists. A lot of information in AROS and other Amiga like OSs are stored in lists.

Before adding a node to a list, it must initialise the ln_Type, ln_Pri, and ln_Name fields to their appropriate values.

struct List {
    struct Node *lh_Head;     /*first node in list*/
    struct Node *lh_Tail;     /*always */
    struct Node *lh_TailPred; /*last node in list*/
    UBYTE        lh_Type;     /*type of node*/
    UBYTE        lh_Pad;      /*byte not used*/
};

1. Set the lh_Head field to the address of the lh_Tail.
2. Clear the lh_Tail field.
3. Set the lh_TailPred field to the address of lh_Head.
4. Set lh_Type to the same data type as the nodes to be kept the list.

from here

#include <proto/exec.h>

void AddHead( struct List *list, struct Node *node );
void AddTail( struct List *list, struct Node *node );
void Enqueue( struct List *list, struct Node *node );
void Insert( struct List *list, struct Node *node, struct Node *pred );

Messages

struct Message {
    struct Node     mn_Node;
    struct MsgPort *mn_ReplyPort;
    UWORD           mn_Length;
};

which leads to IntuiMessage in Intuition.

MsgPort

CreateMsgPort i.e. CreateMessagePort() does all

   allocate a signal
   create a message port
   set the message ports sigbit and task to your allocated signal and self

WaitPort()

   listen for that signal in wait
   respond when messages signal your task

To communicate with a device, you must open a port from the program. Then you must initialize a block of memory which will serve as a "conduit" between your program and the device. In this block you put all the information necessary for the device with which you want to talk. In return, this memory will be filled with information you requested. The last step is to open the device.

Once finished, you must close with order not to lose the memory blocks used. CAUTION, you must free any allocations in the reverse order of their opening.

To establish the communication port, you need the command CreateMsgPort() which will open a "Port Message" (MsgPort) in your program by setting the following structure

struct MsgPort (
    struct Node mp_Node;
    UBYTE mp_Flags;
    UBYTE mp_SigBit;
    void * mp_SigTask;
    struct List mp_MsgList;
);

sending via PutMsg(), if reply required setup mn_ReplyPort first. to wait for a message, initialise mp_SigTask with addr and set mp_SigBit with a signal number remove top message from queue via GetMsg(), 0 if none

Each SendIO must be paired with a corresponding WaitIO. And the program reuses the timerequest although the previous request did not finish. This is not allowed. You should not use GetMsg on a IORequest reply port. When the signal is received, a call to WaitIO will do all the necessary request handling, including an implicit GetMsg. WaitIO will not wait again if the request has already been replied. A manual GetMsg would be improper.

DoIO() and WaitIO() and both (DoIO() actually jumps to WaitIO()) extend the BYTE sized io_Error code to LONG just like OpenDevice() does.

This is how the low-level IPC looks like:

ssize_t uade_ipc_read(void *f, const struct uade_msg *um, const void *buf) 
{ 
    struct MsgPort *msgport = (struct MsgPort *) f; 
    struct UADEMessage *msg;

    /*Wait(1 << msgport->mp_SigBit); 
    WaitPort(msgport); 
    msg = (struct UADEMessage *) GetMsg(msgport);*/ 
     
    /* ugly busy loop, because WaitPort isn't working... */ 
    while (!(msg = (struct UADEMessage *) GetMsg(msgport))) 
    { 
        Delay(1); 
    }

    CopyMemQuick(&msg->um, um, sizeof(struct uade_msg)); 
    CopyMem(msg->data, buf, ntohl(um->size)); 
    ReplyMsg(msg); 
     
    return 1; 
}

ssize_t uade_ipc_write(void *f, const struct uade_msg *um, const void *buf) 
{ 
    struct MsgPort *msgport = (struct MsgPort *) f; 
    struct MsgPort *replyport; 
    struct UADEMessage msg;

    if ((replyport = CreateMsgPort())) 
    { 
        msg.message.mn_Node.ln_Type = NT_MESSAGE; 
        msg.message.mn_Length = sizeof(struct UADEMessage); 
        msg.message.mn_ReplyPort = replyport;

        CopyMemQuick(um, &msg.um, sizeof(struct uade_msg));     
        CopyMem(buf, msg.data, ntohl(um->size)); 
     
        PutMsg(msgport, (struct Message *) &msg); 
        WaitPort(replyport); 
        DeleteMsgPort(replyport);

        return 1; 
    }

    return 0; 
}

uade_ipc_write always creates a new reply message port, which is very inefficient, but had some problems with reusing the same port for different writes. Perhaps the bigger problem is uade_ipc_write, because couldn't use WaitPort or Wait. They work with the first message, but after that uade_ipc_write will always block on WaitPort, no matter how many messages put into the port. Wait is no better unfortunately, so had to use that nasty while(!GetMsg) loop with a 1/50th sec delay.

Is your msgport allocated in a different task/process? If so you need to update the mp_SigTask field in the msgport first before you can use it.

msgport->mp_SigTask = FindTask(NULL);

Note also that mp_SigBit will be allocated in the wrong task so you might want to allocate a new one just to be safe.

msgport->mp_SigBit = AllocSignal(-1);

You can create a port with no signal like this:

struct MsgPort *CreatePortNoSignal (void) { 
    struct MsgPort *port; 
    port = AllocVec(sizeof(*port), MEMF_CLEAR); 
    if (port) { 
        port->mp_Node.ln_Type = NT_MSGPORT; 
        port->mp_Flags = PA_IGNORE; 
        port->mp_MsgList.lh_Head = (struct Node *)&port->mp_MsgList.lh_Tail; 
        port->mp_MsgList.lh_Tail = NULL; 
        port->mp_MsgList.lh_TailPred = (struct Node *)&port->mp_MsgList.lh_Head; 
    } 
    return port; 
}

void DeletePortNoSignal (struct MsgPort *port) { 
    FreeVec(port); 
}

Just remember to set mp_SigTask, mp_SigBit and mp_Flags to the correct values before using it (mp_Flags should be PA_SIGNAL).

IORequest

requires obtaining a message port
allocating memory for a specialized message packet called an I/O request
setting a pointer to the message port in the I/O request
setup the link to the device itself by opening it

Many methods exist for creating an I/O request

Declaring it as a structure. The memory required will be allocated at compile time.
Declaring it as a pointer and calling the AllocMem() function. Call the FreeMem() function to release the memory
Declaring it as a pointer and calling the CreateIORequest() function.

return = OpenDevice(device_name,unit_number, struct IORequest,flags)

DoIO() and SendIO() are most commonly used.

CloseDevice(IORequest)

struct IORequest (
    struct message io_Message;
    struct Device * io_Device   / * Pointer to the device * /
    struct io_Unit Unit *       / * unit * /
    UWORD io_Command            / * Command to perform * /
    UBYTE io_Flags;
    BYTE io_Error;              / * error * /
);

struct IOStdReq {
    struct  Message io_Message;
    struct  Device  *io_Device;     /* device node pointer  */
    struct  Unit    *io_Unit;	    /* unit (driver private)*/
    UWORD   io_Command;	            /* device command */
    UBYTE   io_Flags;
    BYTE    io_Error;		    /* error or warning num */
    ULONG   io_Actual;		    /* actual number of bytes transferred */
    ULONG   io_Length;		    /* requested number bytes transferred*/
    APTR    io_Data;		    /* points to data area */
    ULONG   io_Offset;		    /* offset for block structured devices */
};

/* library vector offsets for device reserved vectors */
#define DEV_BEGINIO	(-30)
#define DEV_ABORTIO	(-36)
/* io_Flags defined bits */
#define IOB_QUICK	0
#define IOF_QUICK	(1<<0)

#define CMD_INVALID	0
#define CMD_RESET	1
#define CMD_READ	2
#define CMD_WRITE	3
#define CMD_UPDATE	4
#define CMD_CLEAR	5
#define CMD_STOP	6
#define CMD_START	7
#define CMD_FLUSH	8
#define CMD_NONSTD	9

struct unit {
    struct   MsgPort unit_MsgPort;
    UBYTE    unit_Flags;
    UBYTE    unit_pad;
    UWORD    unit_OPENCNT;
};

You can find in "Snoopy" an example for patching with SetFunction(). Before you exit your application you have to reset the patch (by calling SetFunction() with the old function's address). But you cannot catch the case when another application has patched the same function.

You could kind of determine if the function has been patched though - by checking if the functions address lives within the memory occupied by the resource in question. Perhaps this is an area AROS could try to improve on by having the setfunction code sit above a more flexible lower implementation that does provide a means for aware apps to find out what else is patching the code - possibly with some means to arbitrate access. for instance - some way an applied patch code say it only wants to read the resulting data (therefore causing the patches which will actually manipulate it to go first).

The proper way to handle an io request together with other signals is this:

OpenDevice();

// put first command in iorequest
SendIO (iorequest)

while (running)
    {
    received = Wait (othersignal | iosignal);

    if (received & iosignal)
        {
        WaitIO (iorequest);

        // handle data in iorequest if needed

        // put next command in iorequest
        SendIO (iorequest);
        }

    if (received & othersignal)
        {
        // handle other signal
        }
    }

if (CheckIO (iorequest) == NULL)
    AbortIO (iorequest);

WaitIO (iorequest);

CloseDevice();

  /*
   * This is an example of using the serial device.
   * First, we will attempt to create a message port with CreateMsgPort()
   * Next, we will attempt to create the I/O request with CreateIORequest()
   * Then, we will attempt to open the serial device with OpenDevice()
   * If successful, we will send the SDCMD_QUERY command to it and reverse our steps. 
   * If we encounter an error at any time, we will gracefully exit.
   *
   * Run from CLI only
   */

  #include <exec/types.h>
  #include <exec/memory.h>
  #include <exec/io.h>
  #include <devices/serial.h>

  #include <exec/exec_protos.h>

  #include <stdio.h>

  void main(void)
  {
  struct MsgPort *SerialMP;       /* pointer to our message port */
  struct IOExtSer *SerialIO;      /* pointer to our I/O request */

      /* Create the message port */
  if (SerialMP=CreateMsgPort())
      {
          /* Create the I/O request */
      if (SerialIO = CreateIORequest(SerialMP,sizeof(struct IOExtSer)))
          {
              /* Open the serial device */
          if (OpenDevice(SERIALNAME,0,(struct IORequest *)SerialIO,0L))

              /* Inform user that it could not be opened */
              printf("Error: %s did not open\n",SERIALNAME);
          else
              {
              /* device opened, send query command to it */
              SerialIO->IOSer.io_Command  = SDCMD_QUERY;
              if (DoIO((struct IORequest *)SerialIO))

                  /* Inform user that query failed */
                  printf("Query  failed. Error - %d\n",SerialIO->IOSer.io_Error);
              else
                  /* Print serial device status - see include file for meaning */
                  printf("\n\tSerial device status: %x\n\n",SerialIO->io_Status);

              /* Close the serial device */
              CloseDevice((struct IORequest *)SerialIO);
              }
          /* Delete the I/O request */
          DeleteIORequest(SerialIO);
          }
      else
          /* Inform user that the I/O request could be created */
          printf("Error: Could create I/O request\n");

      /* Delete the message port */
      DeleteMsgPort(SerialMP);
      }
  else
      /* Inform user that the message port could not be created */
      printf("Error: Could not create message port\n");
  }

Interrupts

struct Interrupt (
    struct Node  is_Node;
    APTR         is_Data;
    VOID         (*is_code)();
};

Devices

An Amiga device is very similar to an Amiga library, except that a device normally controls some sort of I/O hardware, and generally contains a limited set of standard functions which receive commands for controlling I/O.

struct DeviceList {
    BPTR		dl_Next;	/* bptr to next device list */
    LONG		dl_Type;	/* see DLT below */
    struct MsgPort *	dl_Task;	/* ptr to handler task */
    BPTR		dl_Lock;	/* not for volumes */
    struct DateStamp	dl_VolumeDate;	/* creation date */
    BPTR		dl_LockList;	/* outstanding locks */
    LONG		dl_DiskType;	/* 'DOS', etc */
    LONG		dl_unused;
    BSTR		dl_Name;	/* bptr to bcpl name */
};

On AmigaOS filesystems (also called "handlers") are implemented as a single task that wait to receive "packets" from an application (or from DOS, via calls like Open()). Each packet contains a command for the filesystem task (like "open a file" or "create a folder") and any data needed for the command (such as the filename).

So, when a filesystem task receives a packet, it can't do anything until the task that sent the packet has been interrupted. Then it will process the packet. Meanwhile, the sending task (the application) can continue doing other work. The filesystem task signals the sending task when the command has completed.

The function CreateIORequest() is used. Once done, you open the device with a OpenDevice().

Error = OpenDevice (Name, Unit, RequestIO Flags)

Name: Name of the device. Unit: By default 0. Error=? (being able to open ALWAYS an error return NULL)

one closes first what it was last opened. Indeed, if we close the first communication port (MsgPort) and then the query structure (IORequest), expect to have memory lost or worse crash the system.

CloseDevice ()

The structure is released with a IORequest DeleteIORequest() and finally we made a DeleteMsgPort().

Now that we know the commands (DoIO, SendIO, AbortIO and CheckIO to use to communicate (synchronously and asynchronously), we must know what to fill in the structure IOStdReq.

struct IOStdReq (
    ...
    struct Device * io_Device / * Pointer to the device * /
    struct Unit * io_Unit / * Unit * /
    UWORD io_Command / * Command * /
    UBYTE io_Flags / * IOF_QUICK or not * /
    BYTE io_Error / * error * /
    ULONG io_Actual / * Number of bytes transferre * /
    ULONG io_Length / * Number of bytes to transfer * /
    TRPA io_Data / * Pointer to memory area * /
    ...
);

It should fill the field with io_Command command to execute, with the length io_Length total number of bytes to transfer, with io_Data pointer to memory area or begin data.

Console

AROS console.device and console-handler, both to bring them up to 3.x level and to add additional functionality (effectively re-implementing much of the important features from KingCON more cleanly layered on console.device/console-handler instead of stuffing it all in a replacement console-handler).

console.device/handler have private API's to talk to ConClip. the AROS ConClip uses the intuition edit-hook to get cut/paste events from string gadgets and turn them into a message to the ConClip task; I'll make the console.device/handler use the same message format).

Four steps are needed to open the console device:

Create a message port using CreatePort().
Create an I/O request structure of type IOStdReq.
Open an Intuition window and set a pointer to it in the io_Data field of the IOStdReq and the size of the window in the io_Length field. The window must be SIMPLE_REFRESH for use with the CONU_CHARMAP and CONU_SNIPMAP units.
Open the console device by calling OpenDevice()

Console device units:

CONU_LIBRARY  - return device pointer, no console is opened

CONU_STANDARD - synchronous communication to open a standard console

CONU_CHARMAP  - asynchronous mode to open a console with a character map

CONU_SNIPMAP  - open a console with a character map and copy-and-paste support

Console device flags:

 CONFLAG_DEFAULT - redraw the window when it is resized 
 CONFLAG_NODRAW_ON_NEWSIZE - will not redraw the window when it is resized

The character map units, CONU_CHARMAP and CONU_SNIPMAP, are the only units which use the flags parameter to set how the character map is used. CONU_STANDARD units ignore the flags parameter.

To call setbuf() to turn off buffering of stdout. On exit, the stream remains unbuffered (with debug on, I see all following commands executing with no buffering - not good.

The console.device does work fine without it - it just falls back to its internal copy/paste buffer. But what ConClip achieves on classics is to decouple the console.device/handler from the read/write paths to clipboard.device. Pointless, mostly, for those who always have the clipboards in RAM. Not so pointless if you do large copy/paste's and have the clipboard stored on a slow disk.

There are already ESC sequences that specify pen (palette entry) number for text or background. Like it was done on original AmigaOS. However there's one small problem - on deep screens (>3bpp) only four first pens are defined. And four last pens (this is in order to allow pixel inversion to work). Pens between them are undefined. They can be either uninitialized (set to black) or dynamically allocated by some other application and set to the value it wants.

Advanced shells (like VinCED and MUICon) just allocate pens they want, and map color numbers to allocated pen numbers. This is the standard way of working with palette on public screens. On direct-color screens palette still works, this is handled by graphics.library. So it's okay to use palette when you need to use a small fixed number of defined colors, like in your case.

the shell doesn't simply parse arguments from the command line as you believe and then passes the buffer to the invoked program. Rather, it pushes the data into a file-handle used as "string-stream" where data can be read from a memory-buffer. Such tricks are nowhere documented. Actually, the behavior of the shell is nowhere documented, it is a sad collection of hacks.

The only interaction it should have with the rastport is ScrollRaster, RectFill(), Text(), SetAPen(), SetABPenDrMd() and SetSoftStyles().

More compatible KS replacement for emulation. Don't like rom code that pre-allocates too much memory (or even worse, uses static arrays) instead of allocating it when the memory is actually needed. But I am not going to make changes that would make code more unreadable and messy just because it would decrease memory usage by 100 bytes or something :)

RawDoFmt RawPutChar

Extending RawDoFmt() was bad idea. RawDoFmt() should only do what original Autodocs say and all AROS programs should use VNewRawDoFmt(). These constants came from MorphOS, and NULL pointer (RAWFMTFUNC_STRING) is also supported by AmigaOS4. m68k code on these systems can make use of these.

locale.library patched VNewRawDoFmt so that LocVNewRawDoFmt is actually used. When original VNewRawDoFmt is used, the crash does not occur. I did some code comparison and came to some conclusions:

First, the data stream is set in memory in following way:

i : 4 bytes processor : from 5th byte

even if i is downcasted ( (UWORD)i ) it still occupies 4 bytes, that's why this change did not help.

Next, both functions consider that a bare '%d' should be 2 bytes/sizeof(UWORD) long while "%ld" is considered 4 bytes/sizeof(ULONG). The difference however is in how both functions read the data from va stream.

VNewRawDoFmt reads any data smaller or equal to int as int, while LocVNewRawDoFmt reads UWORDS as 2 bytes.

The other notable difference is that VNewRawDoFmt has two fetching macros (fetch_va_arg and fetch_mem_arg) while LocVNewRawDoFmt uses the same macro for both cases (va stream and memory stream).

Which behavior is correct one (fetch_va_arg vs stream_addr)?

stdiowin.c

Original reason is for CON: is compiler/autoinit/stdiowin.c which always sets input and ouput streams as "CON://///AUTO/CLOSE". Which forces open console window if program does any read or write from Input()/Output().

Done some tests on original AOS and there was no CON window when ReadArgs was called on start from WB.

in stdiowin do something like read oldin^{[check spelling]} and reinject into new input. Then ReadArgs should get the EOF originally injected by createnewproc or runcommand.
set __stdiowin to NIL: as default which seems to work as well as I have just tested.

Crash Protection

Introduced kernel.resource in UNIX-hosted and now moving the functionality to it step by step. The first thing is to unify InternalLoadSeg(). Now dos.library registers loaded executables in kernel.resource instead of using own private structures. kernel.resource exports one static variable in order to let gdb to access this list.

Reconfigured AROS with --enable-debug=modules,symbols. I guess "symbols" should add complete debug information to the kernel executable. But after this gdb just printed failure, something like "Dwarf error: failed to resolve reference no 28". And again it was unable to read type information.

BTW the new code does not need SET_START_ADDR() any more, this means gdb debugging will work on all architectures now.

We already have working tc_TrapCode field. If we point it to some code, it will be executed in supervisor mode when CPU hits a trap. At least on i386 native and Windows-hosted. However the code currently receives only alert number, and this is not enough. In order to be able to do any advanced diagnostics we need to have an access to CPU context. We also should be able to modify it. Under classic AmigaOS this handler is called with C convention (arguments on a stack). The stack layout is:

0(sp).l = trap#
4(sp) Processor dependent exception frame

AROS doesn't save CPU context right in stack. Instead it uses a separate storage inside private portion of ETask structure. The simplest thing to do here is to declare TrapCode entry point to be the following:

void TrapCode(ULONG trapNum, struct ExceptionContext *ctx);

So that we could have two pointers on the stack. If we develop the idea a little more, we will see that it's easy to provide binary and source compatibility with m68k AROS port if we declare it as following:

void TrapCode(ULONG trapNum, APTR ContextHandle);

and inside the code we use a macro to access the actual CPU context:

struct ExceptionContext *ctx = GET_CONTEXT(ContextHandle);

Where GET_CONTEXT expands to the following on m68k:

#define GET_CONTEXT(x) ((struct ExceptionContext *)&x)

and on other systems:

#define GET_CONTEXT(x) ((struct ExceptionContext *)x)

Now we have a context pointer. However in order to be able to do something with it we need to declare the actual structure of the context. I would suggest to do it in aros/cpu.h. The context for the same CPU should be uniform for all flavors (hosted and native).

I selected name ExceptionContext for the structure in order to be source-compatible with AmigaOS4 (and perhaps even binary-compatible in future). MorphOS lacks trap handling facilities (not really, but the mechanism is totally different and works by message passing), so there's nothing to take from there.

In AmigaOS4 ExceptionContext has the following definition:

struct ExceptionContext
{
    uint32  Flags;    /* Flags, describing the context (READ-ONLY)*/
    uint32  Traptype; /* Type of trap (READ-ONLY) */
    uint32  msr;      /* Machine state */
    uint32  ip;       /* Return instruction pointer */
    uint32  gpr[32];  /* r0 - r31 */
    uint32  cr;       /* Condition code register */
    uint32  xer;      /* Extended exception register */
    uint32  ctr;      /* Count register */
    uint32  lr;       /* Link register */
    uint32  dsisr;    /* DSI status register. Only set when valid */
    uint32  dar;      /* Data address register. Only set when valid */
    float64 fpr[32];  /* Floating point registers */
    uint64  fpscr;    /* Floating point control and status register */
    /* The following are only used on AltiVec */
    uint8   vscr[16]; /* AltiVec vector status and control register */
    uint8   vr[512];  /* AltiVec vector register storage */
    uint32  vrsave;   /* AltiVec VRSAVE register */
};

Flags is a special field which tells which fields are actually present in this structure. This way a backwards compatibility with possible future CPUs with more registers is provided.

enum enECFlags
{
    ECF_FULL_GPRS = 1<<0, /* Set if all register have been saved */
                          /* If this flag is not set, gpr[14] through */
                          /* gpr[31] are invalid */
    ECF_FPU       = 1<<1, /* Set if the FPU registers have been saved */
    ECF_FULL_FPU  = 1<<2, /* Set if all FPU registers have been saved */
    ECF_VECTOR    = 1<<3, /* Set if vector registers have been saved */
    ECF_VRSAVE    = 1<<4  /* Set if VRSAVE reflects state of vector */
                          /* registers saved */
};

I would suggest to adopt the same structure for PPC AROS. I've looked at task switcher code, it's possible to rearrange registers to match this layout. Anyway this layout is better than what is currently used because is supports AltiVec.

As to i386, as examples i've seen current AROS context (poorly defined), and Windows context. The interesting detail here is that AROS saves either 8087 FPU register set or SSE2 register set. Is this really mutually exclusive? Can't SSE be used together with 8087 FPU? This is what we have in Windows:

#define MAXIMUM_SUPPORTED_EXTENSION  512
typedef struct _FLOATING_SAVE_AREA {
        IPTR    ControlWord;
        IPTR    StatusWord;
        IPTR    TagWord;
        IPTR    ErrorOffset;
        IPTR    ErrorSelector;
        IPTR    DataOffset;
        IPTR    DataSelector;
        UBYTE   RegisterArea[80];
        IPTR    Cr0NpxState;
} FLOATING_SAVE_AREA;

struct AROSCPUContext {
        IPTR    ContextFlags;
        IPTR    Dr0;
        IPTR    Dr1;
        IPTR    Dr2;
        IPTR    Dr3;
        IPTR    Dr6;
        IPTR    Dr7;
        FLOATING_SAVE_AREA FloatSave;
        IPTR    SegGs;
        IPTR    SegFs;
        IPTR    SegEs;
        IPTR    SegDs;
        IPTR    Edi;
        IPTR    Esi;
        IPTR    Ebx;
        IPTR    Edx;
        IPTR    Ecx;
        IPTR    Eax;
        IPTR    Ebp;
        IPTR    Eip;
        IPTR    SegCs;
        IPTR    EFlags;
        IPTR    Esp;
        IPTR    SegSs;
        BYTE    ExtendedRegisters[MAXIMUM_SUPPORTED_EXTENSION];
};

ContextFlags are:

#define CONTEXT_i386    0x10000
#define CONTEXT_i486    0x10000
#define CONTEXT_CONTROL (CONTEXT_i386|0x00000001L)
#define CONTEXT_INTEGER (CONTEXT_i386|0x00000002L)
#define CONTEXT_SEGMENTS        (CONTEXT_i386|0x00000004L)
#define CONTEXT_FLOATING_POINT  (CONTEXT_i386|0x00000008L)
#define CONTEXT_DEBUG_REGISTERS (CONTEXT_i386|0x00000010L)
#define CONTEXT_EXTENDED_REGISTERS (CONTEXT_i386|0x00000020L)
#define CONTEXT_FULL    (CONTEXT_CONTROL|CONTEXT_INTEGER|CONTEXT_SEGMENTS)

As you can see, there are two separare areas for SSE and 8087. Should we adopt the same structure for i386 AROS or it's ambiguous? Anyway i386 AROS context should also be expandable and have some flags. I didn't look at x86-64 port but it should be implemented in a similar way there. I also didn't look at UNIX-hosted ports. Their context is a dark forest, however the idea is similar, and at least it can be converted to a universal form we select.

An alternative idea came to me. The context structure could stay a private blob, and we could have some functions in kernel.resource for parsing it. Something like KrnGetContextReg() and KrnSetContextReg(). This would be more flexible, but perhaps slower and more complex solution.

boot time.
work-time double-fault or supervisor-mode fault.

If you read and study Alert() and underlying code, you'll see rather nontrivial state machine involving ETask. I put many efforts into making it as robust as it is now. The basic idea is that every task can have "Crashed" state, with some context. In recent implementation, there are several types of context. It can be CPU context or mungwall context. More contexts are to come (memory manager context for example). Whenever the task crashes, crash context is recorded in ETask. Then control is passed to Alert() (or, more precise, Exec_ExtAlert()). Exec_ExtAlert() first checks current CPU privilege level. If it's user, then we are likely running on usual task context (it's either direct Alert() call, or a CPU exception leveraged down by exec's handler). In this case Exec_UserAlert() is called. It attempts to bring up a requester via intuition.library. This process ends up in several ways:

Successful completion, user responded "Continue" to recoverable alert. Crash status cleared, Alert() returns.
Double-alert. Alert() is entered once again. In this case a new crash context won't be remembered. Exec_UserAlert() (if entered

again), will see that the task is already in a crash state, and give up the execution to Exec_SystemAlert(), raising alert level to DEADEND. Note that "missing display driver" (AN_SysScrn) also counts. Since the crash context was not overwritten, we are still going to represent original cause of the problem. It's deadend for the task, not for the whole system of course. You'll see Suspend/Reboot choice instead of Continue button, that's all.

There are three subsystem-specific implementations of ShutdownA(), placed in different components:

Default implementation for original IBM AT hardware. Performs cold reboot via AT keyboard controller.
ACPI implementation. Currently supports only cold reboot via ACPI reset register (doesn't need AML interpreter).
EFI implementation, uses EFI runtime interface.

These components patch ShutdownA() vector to install own routine, if usable. EFI has the topmost priority (ACPI doesn't install own patch if efi.resource is present).

In the beginning it was done without patches. exec.library code looked in the following way:

if (EFIBase)
{
    ... efi shutdown...
}
if (ACPIBase)
{
    .... acpi shutdown...
}
... kbc shutdown ...

Really disliked such a pattern. Fourth method would add one more 'if', and so on. This code isn't subject to some consolidation. This is why don't see any abstraction layer for shutdown mechanism.

If the Alert() is called in supervisor level, we suggest we crashed inside e. g. interrupt handler. We simply call Exec_SystemAlert(). Now, Exec_SystemAlert() must do whatever is possible to inform the user about the problem. The alert is 99% deadend here, and the system is in unusable state (it was unable to bring up Intuition requester). We can do absolutely everything here. Clobber screen, VRAM, etc. A reboot will follow.

My experimental code implements Exec_SystemAlert() on a VESA framebuffer via libbootconsole.

I believe there were talks about protecting kickstart code area from writes. Would an application trying to write to this area cause a GURU or a Recoverable Software Failure (I would like the later actually) . Deadend, but software failure. Try tests/exec/traptest illegal <address> on any hosted or x64-64-native.

What do you mean by "setting up a framebuffer"? Do you mean something like "force drop from native mode to VGA mode" Whatever driver's writer wants to.

Where there would be the code that displays something on this "forced framebuffer". How would this code know what is the organization (width/height/bpp/align) of "framebuffer". The callback could fill in some structure, similar to one provided by bootloader.resource. A pointer to the structure would be provided by kernel to this callback.

Would each driver be required to provide a framebuffer of predefined organization? My current suggestion is to leave this up to the driver. It may provide absolutely anything, supported by libbootconsole. Currently this lib supports either EGA/VGA hardware text mode, or chunky framebuffer (both LUT and direct-color).

So a driver itself would receive the structure that contains the text to be displayed? The driver would receive the structure into which it will put framebuffer parameters (type, resolution, address). libbootconsole would then use it.

How do you envision driver registering this callback? Will there be a new function in exec for doing this? In kernel. Something like KrnAddVideoReserCallback(). It's video-specific.

There is also an alternate idea - use machine's BIOS. Does anyone know, is it possible to restore functional BIOS state from within AROS? Again, no care about memory clobber etc. Just initialize some video mode, display text and halt. What is needed except jumping into 16-bit real mode? Will i have to restore interrupt vectors etc ? EFI seems to be more polite about it. At least its runtime service offers some console.

AlertBuffer

Such a define is wrong - software should determine at runtime how much memory is available and setup appropriately. BTW, this hash stuff is redundant at all. We can use exec's AVL routines for association. The same applies to graphics.library and oop.library. It is quite difficult to change kernel memory usage dynamically when library bases have "huge" static arrays.

execbase AlertBuffer is also relatively large. (from A500 point of view). Does it really need to be huge? In fact it needs to hold full alert text (including backtrace). After the text is printed, the system is rebooted. So, theoretically it's possible just to have some pointer. It's safe to trash memory here (well, almost, we should not hurt reset-survival code). BTW, on native ports zero page can be used for this. AFAIK it's reserved anyway.

AROS usage ranges from running one or two programs from a floppy on an Amiga 500 clone, to heavy multitasking desktop usage on a $2000 PC.

Resources

Since dosboot.resource doesn't have machine-specific code any more, going to remove strap from the specification. These two modules will go into base. Filesystems will stay in 'FS' package on native.

Threads

There is no PThread library for AROS. However, parts of the C runtime library are in shared libraries which, by nature, are reentrant code. Only the parts of the C runtimes that are statically linked only can be thread-unsafe.

The current C library make a C context for each task (e.g. thread) so that malloc/free will behave correctly. I think you will get problems if you start sharing file handles between threads. I think you can configure the C library to share it's context with the childs; e.g. all threads would then use the same context but then I think malloc/free won't be multithread safe anymore. For the current implementation of the C library in ABI V1 malloc/free are not thread safe.

At the moment malloc/free is based on semaphore protected (MEMF_SEM_PROTECTED) Exec memory pools, so it is thread safe. This thus means that ABI V1's implementation of the mem function is also thread safe.

However, as you already pointed out some weeks ago, the current C library is not thread safe regarding the fopen()/fread()/fwrite(), etc. functions which use FILE* handles. This is in fact the reason why AROS support for the latest YAM nightly builds is currently broken as with YAM 2.7 many things have been put into separate threads now which require to have all these functions to work thread-safe. On other platforms (OS4/newlib, OS3/clib2) this is already the case.

Hooks

Examples

References

Interrupts

void Disable( void ); Disables interrupts for this task. Interrupts should not be disabled for long periods of time. Wait() will break out of Disable state, but interrupts will be disabled again when the task resumes execution. Each call to Disable() will increase the interrupt disable count.

void Enable( void ); Will decrease interrupt disable count. When the count reaches zero, enables interrupts. Subsequent calls to Enable() do nothing, disable nesting count will remain zero.

void Forbid( void ); Will increase task switching disable count. Wait() will break out of Forbid state but task switching is disabled again when the task resumes execution. Basically Forbid() disables task switching. Use only if you can't use semaphores !

void Permit( void ); Will decrease task switching disable count. When the count reaches zero, switches are again enabled. Subsequent calls to Permit() do nothing.

void AddIntServer( long vector (D0), struct Interrupt *server (A0) ); Will add a global interrupt server to a given exception vector. All interrupt servers in the chain will be called in order. The interrupt server is called as:

is_Code( ULONG is_Data (D0), UWORD *StackFrame (A1) );

The parameters are both in registers and in stack for C-language handlers. StackFrame contains a longword with the exception number and then the 'normal' 68000 stack frame.

Private (task-specific) exception handler can be installed using tc_ExceptData and tc_ExceptCode. If there is no global handler installed for an exception, the task-specific handler is called. If there is no task-specific handler either (tc_ExceptCode = NULL), the task is suspended.

void RemIntServer( long vector (D0), struct Interrupt *server (A0) ); Will remove the specified interrupt server from the interrupt server chain. Remember that the source that generates interrupts for the handler must first be disabled.

void Cause( long vector (D0) ); Calls an interrupt server in supervisor mode. Will behave just like a real interrupt. If there is no global handler installed for that exception and no task-specific handler installed, the task will be suspended.

Cause() is used by the message passing system to implement message ports that create soft-interrupts whenever a message arrives.

Memory Allocation

void *AllocMem( ULONG size (D0), ULONG flags (D1) ); Allocates a block of memory of the requested size. Returns a pointer to the memory block, which is at least long word aligned. flags can be set to MEMF_CLEAR to initialize the allocated memory to all zeros.

The allocation method used is 'first fit'. The first block in the free memory list that is big enough is returned.

void FreeMem( void *block (A0), ULONG size (D0) ); Will free a block of memory. Size must be the size used in the corresponding allocation and block should be the value returned by the allocation.

If the block being freed is adjacent to other free memory blocks, they are merged.

ULONG AvailMem( ULONG flags (D0) ); Will return size of the free memory left. Due to free memory fragmentation it might be impossible to allocate all of it. The flag MEMF_LARGEST can be used to find out the size of the largest contiguous free block of memory.

void AllocRemember( struct List *memlist (A0), ULONG size (D0), ULONG flags(D1) ); Allocates a minimum of requested sized block of memory, and adds accounting information to the list specified (memlist) so that all allocated blocks in that list can be deallocated at the same time. AllocRemember() returns a pointer to the memory block, which is at least long word aligned. Size and flags are the same as for AllocMem().

NULL memlist causes the allocations to be added to the task's allocation list and the allocations will be automatically freed when the task exits if they haven't been freed earlier.

Because programs run from the shell are run synchronously in the same task/process context, it is not usually wise to use the task's memory list. Use your own list, NewList(memlist) before usage and use FreeRememberAll(memlist) at the end.

void FreeRemember( void *block (A0) ); Frees memory which was previously allocated by AllocRemember() . The memory is removed from the allocation list and is returned back to the system.

Do NOT use FreeRemember() on a memory block that is not allocated with AllocRemember()!

void FreeRememberAll( struct List *memlist (A0) ); Frees all allocations that have been associated to the memlist. After FreeRememberAll() the list will be empty and can be reused without NewList().

List Routines

The list header must be initialized (NewList()) before anything is added to the list.

A node can only exist in one list at a time. A node MUST be removed from the current list before adding it into another. Also, a node MUST NOT be removed if it isn't currently in a list. void Insert( struct List *list (A0), struct Node *node (A1), struct Node *pred (A2) ); Adds the node after the pred node in the list. If pred is NULL, adds to the head of the list.

void AddHead( struct List *list (A0), struct Node *node (A1)); Adds the node to the head of the list.

void AddTail( struct List *list (A0), struct Node *node (A1) ); Adds the node to the tail of the list.

void Remove( struct Node *node (A1) ); Removes the node from the list it is in. DO NOT USE if the node is not in a list or you will trash memory!!

struct Node *RemHead( struct List *list (A0) ); Removes the first node from the list. Returns NULL if the list is empty.

struct Node *RemTail( struct List *list (A0) ); Removes the last node from the list. Returns NULL if the list is empty.

void Enqueue( struct List *list (A0), struct Node *node (A1) ); Adds the node to the list according to the ln_Pri field. Smaller number means lower priority. Higher priority nodes end up to the head of the list, but after all of those nodes with an equal priority. This also means that RemHead()/ Enqueue() will round-robin the nodes that have the highest priority.

struct Node *FindName( struct List *list (A0), UBYTE *name (A1) ); Locates a node in the list by name. If no node with the specified name is found, NULL is returned. DO NOT use if the ln_Name fields are not initialized !!

Remember that node names may not be unique. Because of the way FindName() operates, you can pass it a node structure also. You can use the following construct to find all entries in a list with a name.

	struct List *mylist;
	struct Node *node;

	node = (struct Node *)mylist;
	while((node = FindName((struct List *)node, name)))
	{
		/* Do something with the node. */
	}

void NewList( struct List *list (A0) ); Initializes the list header to the state of an empty list.

struct Node *HeadNode( struct List *list (A0) ); Returns the first node from a list or NULL if the list is empty.

struct Node *NextNode( struct Node *list (A0) ); Returns the next node or NULL if the node was the last one.

struct Node *TailNode( struct List *list (A0) ); Returns the last node from a list or NULL if the list is empty.

struct Node *PrevNode( struct Node *list (A0) ); Returns the next node or NULL if the node was the first one.

Task Manipulation

void AddTask( struct Task *task (A1), long InitialPC (D0), long FinalPC (D1) ); Adds a task to the system. Execution will start from InitialPC and when the task exits, execution will jump to FinalPC. If FinalPC is NULL, a default Expunge routine is used instead (recommended). Task structure should contain tc_StackSize and tc_Stack initialized. Other members of the structure are initialized by AddTask(). The new task gets its parent task's Task pointer as an argument.

struct Task *FindTask( char *name (A1) ); Will find a task by the name. If name is NULL, the task structure pointer of the current task will be returned. Returns NULL if no task matching the name is found.

Remember that task names are not unique and many tasks with the same name may be present in the task lists.

WORD SetTaskPri( struct Task *task (A0), WORD priority (D1) ); Set the priority of a task. The higher value, the better priority. This call will return the old priority. Actually priority is only a BYTE value. Priority of -128 is the lowest, 127 is the highest.

OBSERVE: Currently the new priority will take effect the next time the task gets the CPU. This can be considered a misfeature, but because priority changes are not very often done to other tasks, this predictable delay was considered acceptable.

ULONG SetSignal( ULONG newsigs (D0), ULONG sigmask (D1) ); Will set and reset signals. Only signal bits set in the sigmask are changed. Newsigs contain new signal states for signals defined in the sigmask. The old state of the signals are returned.

SetSignal(0, 0) can be used to read the current signal state without affecting the signals.

ULONG Wait( ULONG mask (D0) ); Waits for any of the signals defined in mask. If none of the signals are not active, suspends the task until any of the signals in the mask have been received. This call will return the signals that caused the wakeup. If a signal defined in the mask is already set, no waiting is done. Signals defined in the mask are cleared before returning.

void Signal( struct Task *task (A0), ULONG sigmask (D0) ); Sends a signal or signals to a specified task. If the task is waiting for signals, it is awakened.

Signal() may be called from an interrupt.

ULONG AllocSignal( void ); Allocates a signal bit for the task. Will return the NUMBER of a bit. To get a signal mask, you have to do (1L<<bitnumber).

VOID FreeSignal( ULONG sigbitnum (D0) ); Frees a previously allocated signal bit. Uses the NUMBER of a bit. Parameter should be a bit number returned by the AllocSignal() call.

'Resident' programs

void AddProgram( struct Program *program (A1) ); Will add a program to the system. The program struct includes all information needed for relocation of the program. When a program is added to the system, multiple copies of it may be started. The memory allocated for the struct Program will become owned by the system, thus it must be allocated with AllocMem().

Programs added with AddProgram() can be loaded and relocated by the dos.library call ROMLoadSeg() .

struct Program *RemProgram( char *name (A1) ); Removes a program from the system. This call does not free the memory allocated for the program. The task that used the call becomes the owner of the data and should free the memory before exiting.

Ports, Messages, Devices

void AddPort( struct MsgPort *port (A1) ); Adds a public port to the system. The port should be initialized correctly and NOT be a public port already. See CreatePort().

void RemPort( struct MsgPort *port (A1) ); Removes a public port from the system. Use this call ONLY if the port IS a public port. See DeletePort().

OBSERVE: It is not safe to remove a public port, because there is no record of the tasks holding a reference to the port. A semi-safe solution is to 1) remove the port from the public port list 2) wait some time for any messages to the port and only if no messages arrive, 3) delete the port. This assumes that if a task gets a reference to a public port, it supposedly will use that reference in the near future to send a message into that port.

If you need public ports that may be safely removed, use public semaphores, because they have reference counts.

struct MsgPort *CreatePort( char *name (A1) ); Allocates and initializes a message port. If name is not NULL, the port is added to the system public port list. If the name is NULL, the port becomes private and can later be added to the public port list if so desired. (In this case mp_Node.ln_Name MUST be initialized before AddPort() is called.)

void DeletePort( struct MsgPort *port (A1) ); Frees the message port created by CreatePort(). If the port is a public port, it is also removed from the system list.

Remember that removing a public port is not totally safe. See the description under RemPort().

struct MsgPort *FindPort( char *name (A1) ); Searches the port list for the specified port name. If the port is not found, NULL is returned.

Remember that port names are not unique.

void WaitPort( struct MsgPort *port (A0) ); Waits until there is a message in the port. Does NOT remove the message from the port. Use GetMsg() to get the message. If there already is a message waiting, no waiting is done.

void PutMsg( struct MsgPort *port (A0), struct Message *message (A1) ); Puts a message to the specified message port. After this call the message becomes the property of the receiver and should not be tampered with until it is received back through the replyport. mn_Node.ln_Type becomes NT_MESSAGE.

One-way communication is also possible. In this case the receiver does not reply to the message and must free or reuse the message received. This is only possible between tasks that are written like this.

struct Message *GetMsg( struct MsgPort *port (A0) ); Gets a message from the message port. If no messages are waiting, returns NULL.

Remember that several messages may have been arrived at the message port and you only get one signal. Use while() to process all the messages before waiting again. If you use WaitPort(), it takes care of this for you (checks the port before waiting). Also, special care must be taken to ensure that the message port is empty before exiting.

void ReplyMsg( struct Message *message (A1) ); Will return a message to its reply port. mn_Node.ln_Type becomes NT_REPLYMSG.

Limitation: The ReplyPort should never be a softinterrupt-port. (But you can't create one with CreatePort() anyway:-)

void SendTimerReq( struct TimerReq *request (A1) ); Sends a timer request to the system.

void AddDevice( struct Device *device (A1) ); Adds a device to the system. Device structure should be initialized and be ready to accept Open() calls.

struct Device *RemDevice( struct Device *device (A1) ); Removes a device from the system. If the device is opened by anyone, the call returns NULL.

LONG OpenDevice( char *name (A1), ULONG unit (D0), struct StdIOReq *req (A0), ULONG flags (D1)); Opens a specified device. Struct StdIOReq is filled in by the OpenDevice() call and the internal device Open() call. The device itself maintains lib_OpenCnt. If the open was successful, 0 is returned.

unit, req and flags are pushed into stack for C-language Open() handlers.

void CloseDevice( struct StdIOReq *req (A1) ); Closes the device opened by OpenDevice() (makes an internal call to the device's Close() entry). Device itself maintains lib_OpenCnt.

req is pushed into stack for C-language Close() handlers.

LONG DoIO( struct StdIOReq *req (A1) ); Starts and waits for the completion of the IO request. (Internally calls SendIO()/ WaitIO().)

LONG SendIO( struct StdIOReq *req (A1) ); Sends a request to the device without waiting for a reply. Internally calls the device's BeginIO() entry.

req is pushed into stack for C-language BeginIO() handlers.

struct StdIOReq *CheckIO( struct StdIOReq *req (A1) ); Checks if the request has been returned. Only checks if mn_Node.ln_Type == NT_REPLYMSG.

LONG WaitIO( struct StdIOReq *req (A1) ); Waits for the completion of the request and removes it from the replyport. DO NOT call WaitIO() on a request that has not been sent!

void AbortIO( struct StdIOReq *req (A1) ); Tries to abort the handling of a request by internally calling the device's AbortIO() entry. The device is required to implement this feature. You should call WaitIO() after AbortIO().

req is pushed into stack for C-language AbortIO() handlers.

Libraries

void AddLibrary( struct Library *library (A1) ); Will add a library to the system. The library should be initialized correctly and be ready to receive Open() calls.

struct Library *RemLibrary( struct Library *library (A1) ); Will remove a library from the system. If the library is currently opened by anyone, the call returns NULL.

void InitLibFuncs( struct Library *lib (A0), ULONG *funcs (A1), ULONG num (D0) ); Initializes the library call vectors to the addresses specified in the array funcs. The first element of the array becomes the address of the jump in offset -6, the next in offset -12 and so on. The library structure should have negative size of at least 6*num so that this call will not overwrite anything.

struct Library *OpenLibrary( char *name (A1) ); Will open a specified library. If the library does not exist, or can't be opened, NULL is returned. Otherwise the library base of the library is returned. Internally the library's Open() entry is called and the return value checked to determine if the open was successful. The library itself maintains lib_OpenCnt.

void CloseLibrary( struct Library *library (A1) ); Will close a previously opened library. Internally the library's Close() entry is called. The library itself maintains lib_OpenCnt.

ULONG SetFunction( struct Library *library (A1), LONG offset (D0), ULONG val (D1) ); Will change the specified library call vector to point to the new routine. The old value of the vector is returned. offset is the negative call offset used when calling the library.

DO NOT try to patch Disable()/Enable() nor Forbid()/Permit() .

Other

The BOAR clock runs UTC (GMT) time. The shell variable TIMEZONE should be used to do the adjustment to get the local time. void DateStamp( struct DateStamp *date (A0) ); Will return the current system time. This is an atomic operation, interrupts are not needed to be disabled.

void SetDate( struct DateStamp *date (A0) ); Will set a new system time. This is an atomic operation, interrupts do not need to be disabled.

Semaphores

Semaphore functions use C-language calling conventions (parameters in stack). void InitSemaphore( struct SignalSemaphore *sem ); Initializes a private semaphore for usage.

void AddSemaphore( struct SignalSemaphore *sem, const char *name, BYTE pri ); Adds a semaphore to the system's Public Semaphores list. All initialization is done by AddSemaphore().

void RemSemaphore( struct SignalSemaphore *sem ); Removes a semaphore from the system's Public Semaphores list.

OBSERVE: There may still be locks held on the semaphore and maybe also requests pending. To remove a public semaphore use the reference count as follows:

	Forbid();
	while(1)
	{
		if(sem->ss_Public == 0)
		{
			RemSemaphore(sem);
			break;
		}
		/* Delay() breaks Forbid() state */
		Delay(100);
	}
	Permit();

struct SignalSemaphore *FindSemaphore( const char *name ); Searches the system's Public Semaphores list for a semaphore. Returns NULL if the named semaphore can't be found. Use FreeSemaphore() to relinquish the obtained semaphore handle.

Remember that public semaphore names are not guaranteed to be unique by the system.

void FreeSemaphore( struct SignalSemaphore *reference ); Ends the usage of a public semaphore obtained by FindSemaphore().

LONG AttemptSemaphore( struct SignalSemaphore *sem ); Tries to get an exclusive lock on the semaphore without blocking. Returns zero for failure, non-zero for success.

LONG AttemptSemaphoreShared( struct SignalSemaphore *sem ); Tries to get a shared lock on the semaphore without blocking. Returns zero for failure, non-zero for success.

void ObtainSemaphore( struct SignalSemaphore *sem ); Gets an exclusive (write) lock on the semaphore. Blocks (waits) until the lock is obtained.

void ObtainSemaphoreShared( struct SignalSemaphore *sem ); Gets a shared (read) lock on the semaphore. Blocks (waits) until the lock is obtained.

void ReleaseSemaphore( struct SignalSemaphore *sem ); Releases an obtained lock on the semaphore, whether shared or exclusive.

CPU Information

I also examined AmigaOS v4 SDK and found the following function:

exec.library/GetCPUInfo                                                     exec.library/GetCPUInfo

   NAME   
       GetCPUInfo -- Get information about the current CPU        (V50)

   SYNOPSIS
       void GetCPUInfo(struct TagItem *tagList);

       void GetCPUInfoTags(ULONG tag1, ...);

   FUNCTION
       This function is used to retrieve information about the CPU(s)
       installed.

       This function replaces the ExecBase attention flag mechanism.

   INPUTS
       Input to this function is a tag list containing the items to
       be queried. Each tag item's data must point to a sufficiently
       large storage where the result is copied to. The list of tag
       items below lists the size of the required storage in
       brackets. For example, GCIT_NumberOfCPUs requires a pointer
       to an uint32, GCIT_ProcessorSpeed requires a pointer to a
       variable which is of type uint64.

       Currently, the following items are available:
       
       GCIT_NumberOfCPUs (uint32 *)
           Number of CPUs available in the system. This is likely to
           be 1 at the moment.

       GCIT_Family (uint32 *)
           CPU family as a symbolic constant. Currently, these are
           defined:
               CPUFAMILY_UNKNOWN - Unknown CPU
               CPUFAMILY_60X  - All PowerPC 60x, like 603 and 604
               CPUFAMILY_7X0  - All G3 PowerPC 7X0, like 740, 750,
                                750CXe, 750FX
               CPUFAMILY_74xx - All G4 PowerPC 74xx, like 7400, 7410,
                                7441

       GCIT_Model (uint32 *)
           CPU model as a symbolic constant. Currently, these are
           defined:
               CPUTYPE_UNKNOWN         - Unknown CPU
               CPUTYPE_PPC603E         - PowerPC 603e
               CPUTYPE_PPC604E                  - PowerPC 604e
               CPUTYPE_PPC750CXE                - PowerPC 750CXe
               CPUTYPE_PPC750FX                 - PowerPC 750FX
               CPUTYPE_PPC750GX                 - PowerPC 750GX
               CPUTYPE_PPC7410                  - PowerPC 7410
               CPUTYPE_PPC74XX_VGER     - PowerPC 7440, 7441, 7450, 7451 (Vger types)
               CPUTYPE_PPC74XX_APOLLO  - PowerPC 7445, 7447, 7455, 7457 (Apollo 6/7 types)

       GCIT_ModelString (CONST_STRPTR *)
           CPU model as a read-only string. For example, the 604e would be
           returned as "PowerPC 604e".

       GCIT_Version (uint32 *)
           CPU version and revision. The major and minor numbers are
           returned as a number with the lower 16 bit as 0xVV.R, where
           VV is the version number, and R is the revision. For
           example, on a PPC750FX DD2, the result would be 0x0201,
           depicting a PowerPC 750FX V2.1.
           Note: If a version is not available, the value returned is
           0.

       GCIT_VersionString (CONST_STRPTR *)
           CPU version and revision as a read-only string, in the form
           "major.minor".

       GCIT_FrontsideSpeed (uint64 *)
           CPU frontside bus speed.
           Note: This is actually a 64 bit number.

       GCIT_ProcessorSpeed (uint64 *)
           CPU internal frequency.
           Note: This is actually a 64 bit number.
           
       GCIT_L1CacheSize (uint32 *)
       GCIT_L2CacheSize (uint32 *)
       GCIT_L3CacheSize (uint32 *)
           Size of the appropriate cache, if available, otherwise 0.

        GCIT_CacheLineSize (uint32 *)
            Size of a cache line. Note that this is also the alignment used
            by CacheClearE/CacheClearU.

       GCIT_VectorUnit (uint32 *)
           CPU's vector unit, as a symbolic constant. Currently
           defined are:
               VECTORTYPE_NONE     - No vector unit
               VECTORTYPE_ALTIVEC  - Motorola AltiVec (tm) Unit
              (VECTORTYPE_VMX      - IBM VMX Unit)

       GCIT_Extensions (CONST_STRPTR *)
           CPU feature string. The result is a read-only string that
           describes nonstandard features of the CPU.

       GCIT_CPUPageSize (uint32 *)
       GCIT_ExecPageSize (uint32 *)
            Physical and logical page sizes. CPUPageSize determines the
           supported sizes of the CPU, while ExecPageSize determines the
           supported Exec (i.e. "virtual" page sizes).The latter is the size
           supported by Exec API functions.
           In general, these tags return a bit mask. If bit n is set, a page
           size of 2^n is supported.
           For example, GCIT_CPUPageSize might return 0x1000, i.e. bit 12
           is set, hence the CPU supports hardware pages of 4096 bytes.

        GCIT_TimeBaseSpeed (uint64 *)
            Speed of the CPU timer.
           Note: This is actually a 64 bit number.

   EXAMPLE

       /* Query model and version */
       CONST_STRPTR Model;
       CONST_STRPTR Version;

       IExec->GetCPUInfoTags(
               GCIT_ModelString, &Model,
               GCIT_VersionString, &Version,
               TAG_DONE);

       printf("CPU: %s V%s\n", Model, Version);

MorphOS has this function in exec.library:

exec.library/NewGetSystemAttrsA

    NAME
    NewGetSystemAttrsA -- Get Exec Info           (V50)
    NewGetSystemAttrs -- Varargs Stub for NewGetSystemAttrsA()

    SYNOPSIS
    Result NewGetSystemAttrsA(Data,DataSize,Type,Tags )
    D0                    A0   D0       D1   A1

    ULONG NewGetSystemAttrsA(void*,ULONG,ULONG,struct TagItem*);
    ULONG NewGetSystemAttrs(void*,ULONG,ULONG,...);

    FUNCTION
    Allows you to get informations about the system like cpu type,
    caches and so on.

    INPUTS
    Data     - Ptr to a buffer
    DataSize - size of the buffer
    Type     - Information Type
    Tags     - Additional argument buffer for the type

    o SYSTEMINFOTYPE_SYSTEM
      returns the System family.
      Tags: none
      Data: String[DataSize]

    o SYSTEMINFOTYPE_VENDOR
      returns the System Vendor.
      Tags: none
      Data: String[DataSize]

    o SYSTEMINFOTYPE_REVISION
      returns the System revision *string*. Sorry, this is Openfirmware
      heritage.
      Tags: none
      Data: String[DataSize]

    o SYSTEMINFOTYPE_MAGIC1
      returns the Magic1 field in ExecBase.
      Tags: none
      Data: u_int32_t

    o SYSTEMINFOTYPE_MAGIC2
      returns the Magic2 field in ExecBase.
      Tags: none
      Data: u_int32_t

    o SYSTEMINFOTYPE_MACHINE
      returns the CPU family. Currently only PowerPC
      Tags: none
      Data: u_int32_t

    o SYSTEMINFOTYPE_PAGESIZE
      returns the mmu page size which is needed for mmu related routines.
      Can return 0 if no mmu is there in some embedded systems.
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_CPUVERSION
      returns the CPU type.  Depends on CPU family.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_CPUREVISION
      returns the CPU revision. Depends on CPU family.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_CPUCLOCK
      returns the CPU clock.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int64_t

    o SYSTEMINFOTYPE_PPC_BUSCLOCK
      returns the CPU bus clock.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int64_t

    o SYSTEMINFOTYPE_PPC_CACHEL1TYPE
      returns the CPU L1 Type.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_CACHEL1FLAGS
      returns the CPU L1 Flags.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ICACHEL1SIZE
      returns the CPU L1 instruction cache size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ICACHEL1LINES
      returns the CPU L1 instruction cache line count.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ICACHEL1LINESIZE
      returns the CPU L1 instruction cache line size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DCACHEL1SIZE
      returns the CPU L1 data cache size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DCACHEL1LINES
      returns the CPU L1 data cache line count.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DCACHEL1LINESIZE
      returns the CPU L1 data cache line size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_CACHEL2TYPE
      returns the CPU L2 Type.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_CACHEL2FLAGS
      returns the CPU L2 Flags.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ICACHEL2SIZE
      returns the CPU L2 instruction cache size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ICACHEL2LINES
      returns the CPU L2 instruction cache line count.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ICACHEL2LINESIZE
      returns the CPU L2 instruction cache line size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DCACHEL2SIZE
      returns the CPU L2 data cache size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DCACHEL2LINES
      returns the CPU L2 data cache line count.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DCACHEL2LINESIZE
      returns the CPU L2 data cache line size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_CACHEL3TYPE
      returns the CPU L3 Type.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_CACHEL3FLAGS
      returns the CPU L3 Flags.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ICACHEL3SIZE
      returns the CPU L3 instruction cache size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ICACHEL3LINES
      returns the CPU L3 instruction cache line count.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ICACHEL3LINESIZE
      returns the CPU L3 instruction cache line size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DCACHEL3SIZE
      returns the CPU L3 data cache size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DCACHEL3LINES
      returns the CPU L3 data cache line count.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DCACHEL3LINESIZE
      returns the CPU L3 data cache line size.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_FPU
      returns >0 if the CPU supports FPU instructions.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_ALTIVEC
      returns >0 if the CPU supports Altivec instructions.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_PERFMONITOR
      returns 1 if the CPU supports the Performance Monitor Unit.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DATASTREAM
      returns 1 if the CPU supports datastream instructions.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_RESERVATIONSIZE
      returns the alignment size of the reservation instructions like lwarx.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_BUSTICKS
      returns the bus ticks the cpu needs to increase the timer.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_CPUTEMP
      returns the cpu temperature in 8.24 fixedpoint celcius degrees.
      might not be implemented for all cpu types and MorphOS versions.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_PPC_DABR
      returns 1 if the CPU supports Data Address Breakpoint Register.
      Tags: SYSTEMINFOTAG_CPUINDEX (optional) for the CPU Number
      Data: u_int32_t

    o SYSTEMINFOTYPE_TBCLOCKFREQUENCY
      returns the timebase clock frequency used for system timing.
      this is the same value as returned by timer.device/ReadCPUClock
      base.
      Data: u_int32_t

    o SYSTEMINFOTYPE_UPTIMETICKS
      returns the total system uptime in timebase ticks.
      Data: u_int64_t

    o SYSTEMINFOTYPE_LASTSECTICKS
      returns number of timebase ticks for 'lastsec' measurement.
      To get accurate measurements, you should Forbid(), read this
      tag value, read other *LASTSEC* values, Permit(). Then do
      the math.
      Data: u_int64_t

    o SYSTEMINFOTYPE_RECENTTICKS
      returns number of timebase ticks for 'recent' measurement.
      To get accurate measurements, you should Forbid(), read this
      tag value, read other *RECENT* values, Permit(). Then do the
      math.
      Data: u_int64_t

    o SYSTEMINFOTYPE_CPUTIME
      returns the total system cpu usage in timebase ticks for
      SYSTEMINFOTYPE_UPTIMETICKS time.
      Data: u_int64_t

    o SYSTEMINFOTYPE_LASTSECCPUTIME
      returns the system cpu usage in timebase ticks for
      SYSTEMINFOTYPE_LASTSECTICKS time.
      Data: u_int64_t

    o SYSTEMINFOTYPE_RECENTCPUTIME
      returns the decaying average system cpu usage in timebase ticks
      for SYSTEMINFOTYPE_RECENTTICKS time.
      Data: u_int64_t

    o SYSTEMINFOTYPE_VOLUNTARYCSW
      returns the total number of voluntary task context switches
      (task called Wait(), or RemTask()ed self).
      Data: u_int64_t

    o SYSTEMINFOTYPE_INVOLUNTARYCSW
      returns the total number of involuntary task context switches
      (task was busy and ran for Quantum slice and other task at the
      same priority was made running, or higher priority task appeared
      and was made running).
      Data: u_int64_t

    o SYSTEMINFOTYPE_LASTSECVOLUNTARYCSW
      returns the number of voluntary task context switches for
      SYSTEMINFOTYPE_LASTSECTICKS time.
      Data: u_int32_t

    o SYSTEMINFOTYPE_LASTSECINVOLUNTARYCSW
      returns the number of involuntary task context switches for
      SYSTEMINFOTYPE_LASTSECTICKS time.
      Data: u_int32_t

    o SYSTEMINFOTYPE_LOADAVG1
      returns the average system load for the last minute.
      The returned value is 10.11 fixedpoint value. To get floating
      point value use: (load / 2048.0f);
      Data: u_int32_t

    o SYSTEMINFOTYPE_LOADAVG2
      returns the average system load for the last three minutes.
      The returned value is 10.11 fixedpoint value. To get floating
      point value use: (load / 2048.0f);
      Data: u_int32_t

    o SYSTEMINFOTYPE_LOADAVG3
      returns the average system load for the last fifteen minutes.
      The returned value is 10.11 fixedpoint value. To get floating
      point value use: (load / 2048.0f);
      Data: u_int32_t

    o SYSTEMINFOTYPE_TASKSCREATED
      returns the total number of tasks created.
      Data: u_int64_t

    o SYSTEMINFOTYPE_TASKSFINISHED
      returns the total number of tasks deleted.
      Data: u_int64_t

    o SYSTEMINFOTYPE_TASKSRUNNING
      returns the total number of running tasks.
      Data: u_int32_t

    o SYSTEMINFOTYPE_TASKSLEEPING
      returns the total number of waiting tasks.
      Data: u_int32_t

    o SYSTEMINFOTYPE_LAUNCHTIMETICKS
      returns the timebase for system (sheduler) startup, starting
      from 0.
      Data: u_int64_t

    o SYSTEMINFOTYPE_LAUNCHTIMETICKS1978
      returns the timebase for system (sheduler) startup, starting
      from Jan 1st 1978. this is useful for formatting system boottime
      with dos/DateToStr.
      Data: u_int64_t

    o SYSTEMINFOTYPE_EMULHANDLESIZE
      returns the emulhandle's size.
      Data: u_int32_t

    o SYSTEMINFOTYPE_EXCEPTIONMSGPORT
      returns the global native exception handler's msgport.
      Data: u_int32_t

    o SYSTEMINFOTYPE_TASKEXITCODE
      returns the global native task's exitcode.
      Data: u_int32_t

    o SYSTEMINFOTYPE_TASKEXITCODE_M68K
      returns the global m68k task's exitcode.
      Data: u_int32_t

    o SYSTEMINFOTYPE_EMULATION_START
      returns the address of the abox emulation area.
      Data: u_int32_t

    o SYSTEMINFOTYPE_EMULATION_SIZE
      returns the size of the abox emulation area.
      Data: u_int32_t

    o SYSTEMINFOTYPE_MODULE_START
      returns the address of the module area.
      Data: u_int32_t

    o SYSTEMINFOTYPE_MODULE_SIZE
      returns the size of the module area.
      Data: u_int32_t

    o SYSTEMINFOTYPE_EXCEPTIONMSGPORT
      returns the native Machine Exception MsgPort
      Data: struct MsgPort*

    o SYSTEMINFOTYPE_EXCEPTIONMSGPORT_68K
      returns the 68K Exception MsgPort. This is the msgport the
      default system 68k traphandler sends its msgs. If the system
      68k traphandler is replaced msgs to this exception msgport
      aren't guranteed. Usually the msgport is controlled by the
      ABox Log Server.
      Data: struct MsgPort*

    o SYSTEMINFOTYPE_ALERTMSGPORT
      returns the Alert MsgPort.
      Usually the msgport is controlled by the ABox Log Server.
      Data: struct MsgPort*

    o SYSTEMINFOTYPE_MAXHITCOUNT
      returns the maxhit default for tasks.
      Data: ULONG

    o SYSTEMINFOTYPE_MAXALERTCOUNT
      returns the max alerts default for tasks.
      Data: ULONG

    o SYSTEMINFOTYPE_REGUSER
      returns the registered user
      Tags: none
      Data: String[DataSize]

    o SYSTEMINFOTYPE_FREEBLOCKS
      returns information about the free memory blocks
      Tags: SYSTEMINFOTAG_MEMHEADER (optional) to specify single
            memheader, defaults to all system memory
            SYSTEMINFOTAG_HOOK (optional) call hook rather than
            filling array. See exec/system.h
      Data: struct MemEntry []

    RESULT
    Result is the amount of bytes returned at Data.

If you are interested in implementing it, i'll give you the rest of the information (LVO and definitions). However note that currently no definitions exist for other CPUs than PPC.

It uses DOS error codes in an Exec device.
It adds a spagetti-code goto statement.
It casts iotd to what it already is.
Several other seemingly pointless code changes, such as replacing the eject parameter with an iotd parameter, adding parentheses around a single term etc.

Aros/Developer/Docs/Libraries/Exec

Contents

Overview

Memory

Memory Pools

Libraries

Tasks

Scheduler

Signals

Locks

Semaphore

Lists

Messages

MsgPort

IORequest

Interrupts

Devices

Console

RawDoFmt RawPutChar

stdiowin.c

Crash Protection

AlertBuffer

Resources

Threads

Hooks

Examples

References

Interrupts

Memory Allocation

List Routines

Task Manipulation

'Resident' programs

Ports, Messages, Devices

Libraries

Other

Semaphores

CPU Information

Navigation menu