Aros/Developer/ABIv1

ABI v1 is mainly about structural alignment, LVOs and C library because current AROS has some unusual incompatibilities in library vectors, and it's impossible to fix without breaking binary compatibility. Most of the time auto opening of libraries should be done but discussion is still needed for explaining how to make and use plug-ins. People need to know OpenLibrary() but not for using normal/standard shared libraries.

As to confusing new developers... IMHO we should have some starter guide, explaining what can be done and what is appropriate for what case. Some code with manual opening, with appropriate explanations why this is done, could be a good example of what happens behind the scenes, and be a learning aid instead of confusing thing. As an excuse from my side, you can have a look at new libnet.a. Now we can have auto-open of bsdsocket.library and miami.library. This may greatly aid porting.

With work on the split of arosc.library is progressing. We should still happen before ABIV1 can be released.

- C library typedefs screening
- libbase passing to C library functions
- How to extend OS3.x shared libraries
- struct ETask
- Varargs handling
- dos.library BPTR
- What is part of ABIV1, what is not
- Autodocs screening + update (reference manual)
- screening m68k bugs and specifics

In ABI V1 there is support for calling library function by finding the libbase at an offset of the current relbase (%ebx on i386, A4 or A6 on m68k, which one still has to be discussed on the list). This would allow to make pure and rommable code (e.g. without a .bss section) without the need of these ugly #define hacks. Would prefer to for this to be configurable, in that m68k applications should use A4, but m68k libraries should use A6. Would like it to be the same for programs and libraries. Currently in ABI V1 for each library there is an extra libmodname_rel.a link library generated that calls the function using this offset. It is used in ABI V1 so that per opener libraries can use the C lib and have the C lib also opened for every libbase (librom.a is gone remember). If the register in a library and a program would be different than probably another extra link lib may need to be provided; one for each register. I would like to avoid that.

One more question about C calling convention: where is the first argument, where the last argument. Through which register are the function argument normally accessed ?

First argument is lowest in the stack (at *(ULONG *)(A7 + 4), *(ULONG *)A7 is the return address). The next is at *(ULONG *)(A7 + 8), the next at *(ULONG *)(A7 + 12), and so on.

The function arguments are accessed either through A7 (sp) or A5 (fp), or any other alias of sp or fp. You can't really predict it - the optimizer can have all sorts of fun with reordering stuff.

a) on x86 %ebx register is reserved for the system, so everything is compiled with -ffixed-ebx and for the gcc cross-compiler it is done with a proper patch.

b) C functions that go in a library are compiled as normal C functions. This way C code can just be the raw C code without any boiler plate code; reason for this is:

minimize work needed for porting external code, it will often just need a proper .conf file for the lib to tell which functions have to be exported from the library.
It allows the standard ISO C lib functions be defined in the standard ISO C include files which can be properly separated without name space pollution (e.g. stdio.h, string.h, ...)
another thing I have done in my C lib patches. Some (legacy) code even includes just the function prototype in its code without include the proper include file. IMO we have to support this situation and this means we may not assume any attributes can be added to the function prototype.

c) The function addresses are put in the LVO table of the library by listing them in the .conf file of the library. In any point in the library no matter how deep in the call chain you can access the library base with the AROS_GET_LIBBASE macro (which actually just retrieves the value at the address pointed to by %ebx).

d) For each function in the library a corresponding stub function is generated in the libfoo.a static link lib. The only thing this function does on i386 is set the libbase and jump to the proper address. In order to be able to restore the previous libbase when I come out of the called lib func I implemented %ebx to be a second stack pointer. This way I can push the current libbase on this stack before setting the new libbase. After the function return the stack can be popped to return to the previous the stack after the call afterwards. (I Implemented this stack to grow in the opposite direction of the normal stack and for a new task it is initialized to the end of the stack; this means that at the start SP points to the top of the stack and %ebx to the bottom and both pointers starts to grow to each other).

Looking at how to implement it on other cpus. For most cpus we can probably copy the %ebx stack trick for the libbase pointer. Unfortunately this will not be possible for m68k. This ABI requires for i386 that at all occasions %ebx contains a valid stack pointer where data can be pushed. On m68k this can not be guaranteed as A6 may be anything in a normal program.

So what I want to achieve is to set A6 in d) above but I need a place to store the current libbase. This brings up again some things I have been playing with originally for i386:

Store the stack pointer in the ETask struct. Problem here is that not every task is guaranteed to have an ETask. So the thing to solve here is be able to guarantee that every task has an ETask or otherwise assume that a task without ETask won't ever call shared library functions with C argument passing. (When does this actually happen) This also goes against other patches in my tree where I actually try to get rid of ETask. This would also add a few memory accesses to the stub function, e.g. during each shared lib function call (SysBase->ThisTask->ETask->StackPointer).
Implement a compiler option that will compile the code in such a way that the function caller both sets up the SP and the FP. Functions arguments would always be accessed through FP. In my stub function I could then just put A6 on stack, set it to the libbase and call the routine. Problem here is that all static link libraries that one wants to link into the shared library also have to be compiled with this options (e.g. libgcc.a, ...). I don't know how much effort implementing such a thing in gcc would

take.

Do even a somewhat more hacky/clever trick. Implement a compiler option that always forces to set the frame pointer as the first thing of entering a function and access the arguments through the frame pointer from then on e.g. in pseudo code:

function:
    A6 -> A5
function_FP:
    ...

In the stub code one could then again set the frame pointer, push and set A6 and jump to function_FP (actually this address would be stored in LVO and can be computed from address of 'function'). I think this is the least intrusive patch as it would be OK that all files containing functions that are exported by the library e.g. part of the LVO table are compiled with the option. Again I don't know how much effort this would take.

I have considered other options like moving all function args on the stack one place and then store libbase in the free space, or pushing libbase on stack and then repushing the function arguments but I found this too much overhead in compute cycles and for the latter in stack space.

Here a small overview of the major patches, which will also be done as separate commits.

Implementation of an alternative stack in arossupport (altstack). This basically uses end of stack as an alternative stack that can be used in programs without interfering with the program's stack, the compiler internals or function argument passing. It can thus also be used in stub functions without the need for standard stack manipulation.
Support in setjmp/longjmp to remember alternative stack state when doing a longjmp.
Use the altstack for passing libbase to shared library functions with C argument passing. This way libbase is accessible inside shared library functions without the need of adding it explicitly to the arguments of the functions. This should help porting external code that needs a per opener library base; e.g. a different state for each program that has opened the library.
Up to now the stubs for shared library functions used a global libbase to find the function address in the LVO table. In this patch stub functions are provided that will find the libbase as an offset in the current libbase. This allows to have a per opener library open other per opener libraries. Each time the first is opened also the other libraries are opened and the libbase value is stored in the current libbase. When the first library then calls a function of the other libraries their stub functions will find the right libbase in the current libbase. This feature should also be usable to create pure programs but support for that is not implemented yet.
Also use altstack for passing libbase to the general AROS_LHxxx. As m68k have their own AROS_LHxxx functions it does not have an impact on it.
Use the new C argument passing to convert arosc.library to a 'normal' library compiled with %build_module without the need for special fields in struct ETask.
Also remove program startup (e.g. from compiler/startup/startup.c) information from struct ETask and store it in arosc.library's libbase.

None of the implementations for x86_64, i386 or m68k are final. They still need to be optimized more. For m68k we should find a way to store the libbase in A6 also for C argument passing and not on altstack. I actually assume that the libbase is always stored at the same place, both for functions with register argument passing and C argument passing. Luckily no code exists yet that depends on this feature. > For i386 I would like to use %ebx to pass libbase and not the top of altstack. Problem is compiling the needed code with -ffixed-ebx flag so compiler does not overwrite the register. For other CPUs it would also be good to find a register used for libbase argument passing. I do plan to try to pass libbase in a register on most of archs (e.g. A6 on m68k, maybe %ebx on i386 if it does not have too much speed impact, r12 on PPC, ...).

The implementation where the top of altstack is the libbase is not meant to be the final implementation for any cpu. It is there to be able to get a new arch going rapidly without needing to much work.

If something is not clear or you want to discuss the implementation. We still have room for improvement. I will try to make some documentation the following days to explain current implementation.

tell me also whether the "double stack" is a default solution right now? I'm not sure I like that one, since it seems to be a large hack for me. Couldn't we just implement a secondary stack in struct ETask for library base pointers? Having it there would be more logical to me, at least that's my feeling....In the end I don't mind where the second stack is located. It should be accessible at all times with low overhead (in ETask is probably only one memory indirection more which is I think still acceptable). Are you sure?

bottom stack approach:

1. FindTask(NULL) 2. Read tc_SPLower 3. Read "top of secondary stack" from there 4. Read library base

ETask approach:

1. FindTask(NULL) 2. Read tc_ETask 3. Read "top of secondary stack" from there 4. Read library base

My original patch even compiled whole i386 hosted with -ffixed-ebx and %ebx itself was at all times a pointer to a stack where the top was the libbase. Don't know if I have to revisit this one, there were some questions by the speed impact of this reservation of %ebx; so I left this path. I also never got vesa working on native with the changes I made to compile it with -ffixed-ebx. If going this route a register will need to be reserved for each of the archs.

Basic principle SysBase->ThisTask->tc_SPLower contains a pointer to the top of the stack; **(SysBase->ThisTask->tc_SPLower) is top of stack and most of the time contains libbase. To port current feature what I think you have to do is:

arch/arm-all/exec/newstackswap.S implement pseudo C code there.
in arch/arm-all/clib implement store/restore of
(SysBase->ThisTask->tc_SPLower) in setjmp/longjmp/vfork/vfork_longjmp.

I think this should get the basic going. For the rest it is best to wait for some consensus on how to proceed. Final target should be to use register to pass libbase to functions.

Working on a 'libbase in a register' implementation for m68k (picking A4), and I've run into a bit of a wall. I think it has to be A6, the same as for regcall. This way the libbase could also be used for putting variables in there that are addressed relative and it would not matter if the function was called with stackcall or regcall.

The problem is - where do I store the old A4 when calling a stub? I can't store it on the stack (it would mess up the calling sequence to the 'real' routine), and using altstack removes most of the advantages of using a register to hold the libbase. What was your solution for %EBX when moving from one libbase to another? The solution I did for %ebx was that I made %ebx point to the top of the stack. Putting a value was increasing %ebx with 4 and storing the new value to the (%ebx), etc. Alternative: may be instruct the compiler that a4 (a6 ?) is volatile, and clobbered by function calls. In this case you won't need to save it. I guess this is what's done in AmigaOS stubs.

Alternatively, if we eliminated support for varadic funcstub routines, then it would be trivial to implement a mechanism where the library base is the last argument (first pushed on the list), and the AROS_LIBFUNCSTUBn() macro(s) could use that storage space to hold the old A4/%EBX.

In UCLinux basically there's a per-task slot allocated for each possible library in the system, and that slot is used to keep the library data pointer (basically, the library base in our case - and this gives a hint about how to actually treat the data segment of libraries: just an extension of the library base). Of course this poses a limitation on how many and which libraries can be live (or be dormant) in the system at any given time, but probably a more dynamic method which extends this approach could be thought of. The problem is that libbases need to be shared between Tasks. YAM port is stalled because files opened in one Task could not be accessed in other Task. This should now be solved but file operation are not thread safe yet. the idea is to use A4 to point to this "global data segment", it could actually be the GOT (global offset table) in ELF terms. Also, we're basically talking about an AROS-specific PIC implementation, and the ELF standard might be taken as reference.

I see much potential in the mechanism and it could solve a lot of our problems. I would extend the mechanism so that every LoadSegable object can ask for a slot in this GOT. It then basically becomes a TLS table more then GOT; the latter which stores pointer to functions or variables. Executable could also use this to make itself pure by accessing their variables relative to the address stored in it's TLS slot.

If it is per LoadSegable object the loader/relocator can during loading at the same time relocate the symbol so no lookup has to be done afterwards. Actually Pavel proposed to use TLS but I was afraid of the overhead of caching, locking and lookups but this system would actually solve this.

Would highly suggest using pr_GlobVec for this - know that it's for Process tasks only, but it was originally designed for very much the same purpose, and is unused on AROS (except for m68k, which can easily be modified to accommodate this). If we do, then 'AROS_GET_LIBBASE' would be

#define AROS_GET_LIBBASE(libuid) \
   ({ struct Process *me = (struct Process *)FindTask(NULL); \
      (me->tc_Node.ln_Type == NT_PROCESS) ? \
        (struct Library *)(((IPTR *)me->pr_GlobVec)[libuid]) : \
        (struct Library *)NULL; \
    })

pr_GlobVec[0] is reserved for the number of LIBUIDs supported in the system.

LIBUIDs can ether be statically allocated (yuck) or we have a uid.library that holds a master mapping of library names to uids (or any string to uids) that a library would allocate out of on Init(). Would prefer that it is not based on a string but on a unique address. E.g. address of main of a program, address of libbase in libInit for shared library, etc.

Let's not forget that x86 has two registers that can be used to store a pointer to the TLS: the segment registers %fs and %gs. They are already used on linux and other OS's as well. In that case you set up the register at task starting time. But this can be worked on later, when the whole infrastructure is in place. No, we cannot use them and I really regret that! I came to the same idea as You did now, but then I was informed that it would break compatibility of hosted architectures. That solution would work on native targets, only... BTW. Same applies for ARM architectures. The ARMv7 features special register which can be used as a TLS pointer. Of course, it would introduce incompatibility between hosted and native targets... Yeah! Having AROS hosted is surely a big feature of AROS, but also a huge disadvantage.... because we wouldn't be able to access the data the same way on every AROS target within e.g. x86 architecture? It would be hideously ugly to have to have a different 'linux-i386' and 'i386' compiled binary for each 3rd party program. Setting and retrieving the TLS via %fs/%gs would be permitted in one architecture (pc-i386), but forbidden on another (linux-i386). Therefore, anything that used AROS_GET_LIBBASE would need to be compiled differently for the two cases. It's not forbidden on linux at all. What about Mac? What about Windows? We have hosted versions of AROS for those as well. Is it forbidden on those? Without proper information, don't think we can take any decision? Besides, the compiler / linker could be changed to generate code that gets patched by the elf loader the way the host platform needs it to be. Think we could handle it just like we do with SysBase (an absolute symbol in ELF file, resolved upon loading).

* There is a list of known symbol to SIPTR mappings, stored in a resource (let's call it globals.library), with the following API: 
  - BOOL AddGlobal(CONST_STRPTR symbol, SIPTR value);
  - BOOL RemGlobal(CONST_STRPTR symbol);
  - BOOL ClaimGlobal(CONST_STRPTR symbol, SIPTR *value);
  - VOID ReleaseGlobal(CONST_STRPTR symbol);
  - BOOL AddDynamicGlobal(CONST_STRPTR symbol, APTR func, APTR func_data) where func is:
    SIPTR claim_or_release_value(BOOL is_release, APTR func_data);

Then you can support per-task (well, per seglist) libraries that autoinit.
* The loader, while resolving symbols, if the symbol is 'external', then it:
     * Looks up the symbol in a list attached to the end of the seglist,
       which is a BPTR to a struct that looks like:
          struct GlobalMap {
             BPTR gm_Dummy[2]; /* BNULL, so that this looks like an empty Seg */
             ULONG gm_Magic; /* AROS_MAKE_ID('G','M','a','p') */
             struct GlobalEntry {
                struct MinNode ge_Node;
                STRPTR ge_Symbol;
                SIPTR  ge_Value;
             } *gm_Symbols;
           };
      * If the symbol is already present, use it.
      * If the symbol is not present, use ClaimGlobal() to
        claim its value, then put it into the process's GlobalMap.
      * Stuff the symbol into the in-memory segment for the new SegList
* DOS/UnLoadSeg needs to modified to:
      * On process exit, call ReleaseGlobal() on all the symbols in the trailing seglist

* Libraries can use AddGlobal() in their Init() to add Globals to the system, and RemGlobal() in their Expunge().
  - NOTE: RemGlobal() will fail is there is still a Seg loaded with the named symbol!

Pros

All 'normal' libraries that support global.library can be made 'autoinit', simply by adding 'AddGlobal("LibraryBase", libbase)' in

their Init, though we do have uuid.library...

Per-opener libraries continue to function as they do now (they don't export a global)

how is the libbase passed to these functions? That's the whole problem we try to solve. Sorry, I keep forgetting about non-m68k, where you no longer are passing libbases around via AROS_LHA in the stack. My only problem on m68k is 'libstub' libraries. You have a real difficulty, then.

Per-task libraries can have ETask/pr_GlobVec LIBUID indexed dynamically assigned by the system (ie a 'aroscbaseUID' symbol ), and

injected into the segment.

Cons

Maintain uniqueness of the strings: example PROGDIR:libs/some.library and libs:some.libary. Two versions of the same program installed...

In that case, 'some.library' would not be registering itself with SysBase, correct? And whatever program is opening those libraries is already running (having been loadsegged), right?

In that case, the loader for some.library (lddaemon?) could add a "ThisLibrary_UID" symbol to the GlobalMap segment when loading the library.

Maybe have a pr_SegList[7] GlobalMap in the process too, that is searched for first before the global.library, so that a program can provide symbols to overlays it loads via LoadSeg?

Also think an OS should be able to have a quite good view on how the amount of object present in the system to make a good guess for allocation, think that possible extension of the GOTs should be handleable.

In another mail I made the comment about sharing libbase between different tasks. I also think this can be solved by copying the GOT table of the parent GOT the child's GOT table; and maybe limit it the only those TLS that have indicated they want to be copied.

Another difficulty may be the implementation of RunCommand that reuses the Task structure. I think it can be solved by copying the current TLS table to another place, clearing it and after the program has run restoring the old table.

One possible discussion point is to store this LTS table pointer in ETask or reserve a register for this system wide. I think storing in ETask is OK, I would assume optimizing compilers are smart enough to cache the info in a register when it would give improve performance.

Think the only feature we can't do is have two libbases of a shared library in the same Task. At the moment you can open a peropener base two times and you would get two libbases. It could for example be used by a shared library so that it's malloc allocates from another heap then the malloc inside the main program itself. Another library could choose to share the libbase so that stdout in the library is the same as in the main program and the library can output to the output of the main task. Although I like such flexibility I think we can live without it and can find workarounds if something like that would be needed. It certainly is not needed for porting shared libraries from Linux as they already assume one library per process.

Summary: Hope people forgive me for exploring this mechanism further and not continue with documenting the current implementation.

IMO printf, etc, have to be in arosc. Porting libraries with variadic arguments should IMO not add extra work.

(Note that this is only talking about per-opener libraries, *not* altstack, nor 'stub' libraries where the base is retrieved with AROS_GET_LIBBASE). Instead of having AVL trees, maybe a simpler mechanism for per-opener libraries would be to be more like the AROSTCP implementation, and have the registered-with-SysBase library be a 'factory' that generates libraries. AVL trees are currently not used for peropener libraries; they are used for 'perid' libraries, e.g. that return the same libbase when the same library is opened twice from the same task. And when I look at the generated code MakeLibrary is only called in OpenLib and not in InitLib, so I think we already do what you say. No?

The registered with the system libbase (the 'factory') *only* has Open/Close/Init/Expunge in its negative side, and only struct Library on its positive side. It still has the same name as the 'real' library, though.

In the factory's Open(), it:

Creates the full library, but does *not* register it with SysBase
Calls the full library's Open(), and increments the factory's use count
Returns the full library to the caller.

In the factory's Close(), it:

Calls the full library's close
Decrements it the factory's use count

This should simplify the implementation of per-opener library's calls, and (via genmodule) make it trivial to convert a 'pure' library from a system-global to a per-opener one.

it changes the outer template pattern from "per-process" to "per-opener" (i.e. assuming that the previous one was per process).

I.e. if some code auto-opens a library and - at the same time - also explicitely opens it later in the user code, would the new approach still return the same lib base at both places? Would that again be using AVL trees within the "full Open"? No, with this method they would be two separate bases. Which, depending on circumstance, could be exactly what you want. (ie imagine a 'libgcrypt' - you don't want the two bases sharing memory). In some other case - e.g. a libnix clone with custom heap and malloc/free or open/close - you might exactly want the opposite. Sounds like conventions are still important ;-) (Independently from that it comes to mind, whether it still might be ok if a process shares his lib bases with child tasks, unless the library does DOS I/O.)

What happens if the task is not a process? The call is a NO-OP, or do you get a crash? A NO-OP implies a check on the result, which will slow things down quite a bit. Also, consider the option to allocate an ID at loadseg time, relocating the binary accordingly.

And if I recall correctly, back then we said we could just use a new reloc type to switch from one register to the other, depending on the platform.

I would call our approach MLS (modular local storage), it is some local storage but contrary to TLS it is not bound to Task or Processes.

I think most of the time the symbols can be anonymous and don't need a name. So I would rework your proposal in the following way:

Basic API

off_t AllocGlobVecSlot(void); BOOL FreeGlobVecSlot(off_t);

The first gives you a slot the second frees the slot.

LoadSeg/UnloadSeg

There are special symbols indicating a MLS symbol let's say MLS.1, MLS.2 and MLS.3. Loader would then call AllocGLobVecSlot() three times and replace each of the symbols with the three values.

During UnloadSeg(), three times FreeGlobVecSlot() is called with the three offsets.

library/libInit() & library/libExpunge()

This function is only called once so it can call AllocGlobVecSlot() for slots it want and store it in the libbase. Expunge will the do the right (tm). Storage of libbase itself would use the LoadSeg approach.

pure executable with shared slot, e.g. all the runs from the same

SegList share a slot (don't know of it will ever be needed but it can be handled).

off_t slot = 0; /* offset 0 is not a valid offset */

PROTECT

if (!slot)
   slot = AllocGlobVecSlot();

UNPROTECT

Use case that need non-anonymous slots

Therefor a new API can be given that works on top of API above with internal hashing, AVL look-up or other indexing mechanism.

off_t AllocNamedGlobVecSlot(SIPTR)
BOOL FreeNamedGlobVecSlot(SIPTR)

Peropener libraries could be handled although they will be less efficient as per MLS libraries. Analog to my oldpatch where %ebx was a pointer to a stack with pushed libbase, the MLS slot could point to such a stack. This would the following actions...

When entering a function of the shared library a stub will push the libbase of the stack and popping after function has executed.
Inside the library the libbase can always be retrieved as top of the stack
libInit would init the MLS slot with a (small) allocated stack. One difficulty with these peropener libraries. Is the implications for setjmp/longjmp. If a longjmp is done from below in the call chain of a function in the peropener library to a function above or to the main code in the call chain the stack pointer would need to be put to the old value. This jumping may happen due to abort() or signal(). This is currently done in the altstack implementation but I think is difficult to do for the MLS approach.

I'm not going to try to get these peropener libraries working in the first place.

Also I would like to revisit the requirement to need dos.library for having these kind of libraries. Use-case I have in mind is an internet routing where no file system is present and not dos.library in the firmware but I would still like to use C library in the software though. Or maybe we want to replace dos.library with newdos.library that uses the IOFS approach and gets rids of all these ugly AOS compatibility hacks that keep popping up on the svn commit list :).

SVN tree

In the head of our repository there are now only 3 directories:

admin/
branches/
trunk/

I think it would be worthwhile to add two extra dirs there: tags and imports

Actually after we have a stable ABI I would like to move away as much as possible from the contrib directory to some other repositories. There are several reasons I feel this should be done:

The AROS repository should be for the core AROS code.
I think other contrib projects should be tried to be compiled for all Amiga-like OSes.
The release scheme for AROS and the other programs should not have to be aligned.
Binary versions should be provided on aros-archives and on aminet and/or OS4Depot to install them. (Some clever programs should maybe be provided to make the life of distribution developers easier).
We should avoid parallel forks of programs for AROS and the other amiga OSes.

If there is really a need for a place for hosting AROS projects we may investigate setting up such a server but then including bug tracking, governance, maillist, etc. for each project separately. I personally think there are already enough places like sourceforge, google code, savannah, etc. for people to go to host such projects.

As discussed when we branch ABI V0 and ABI V1 it would also be good to introduce tags. Normally this is done in a directory in the repository called tags. Currently we don't have this directory there. (We do have branches/tags that is a hack I have done because one doesn't have write access in the top directory. I think this directory is not clean and should be removed).

The second directory I would introduce is an imports directory for implementing vendor branches as discussed in the svn book. Currently we use code from several different projects and that code is stored inside the AROS tree; we seem to have problems with keeping this code up to date and merge our changes upstream. Maintainers of up stream projects like the MUI classes etc. have complained about this (to put it lightly).

I think introducing these vendor branches would make it easier to see what changes we have made and make patches to be sent upstream and make it easier to import newer upstream versions of their code.

The AROS repository should be for the core AROS code. I would keep the development stuff. It's nice to have linker libraries etc. with debugging enabled. Maybe we should keep anything which is needed for building AROS under AROS.

I agree. I think in extensive SDK is good to have and put it in the AROS repository; may be separate dev or sdk directory; replacing contrib ?

I still think that the "ports" directory is a good idea, making it easier to build the applications for all AROS flavours. What is currently missing is a way to enable e.g. a monthly build.

I'm not against it but with some remarks:

Normal users should never have to compile their own programs. They should be able to download install packages.
Installing should not be all or nothing. Users should be able to install selected programs.
I would prefer if it if each program has an official maintainer or maintainers. I don't like how it done now: dump some source code in a big source tree and never look back.

Relocateable

OS4 uses real elf executables indeed. I guess they just re-allocate them on load (adding a difference between requested address and used address). There's a second issue - page alignment. This will matter when we have memory protection. AmigaOS4 is a system which runs on a single CPU and limited range of hardware. Our conditions are much broader. And sticking to 64KB alignment (ELF common page size) is a considerable memory and disk space waste.

MOS uses relocable objects. Yes, and own BFD backend. Exactly. They compile things with the -q flag, which preserves elf relocations. It's not that easy to make our own custom format with binutils, I had tried long ago, even got to a point where I could produce a very hunk-like ELF executable (to save space on disk) and in fact I built and committed a loader for it, but never managed to clean up the messy code enough to submit it to the binutils maintainers. Gnu code is madness. :D

See the binfmt_misc kernel module... Linux can support any format someone bothers to write a loader for. Kind of like datatypes for executables :) Nobody really uses it, though, particularly because file associations in file managers kind of makes it irrelevant for most users (e.g. the things you might want to use it for, like starting UAE when clicking on an ADF, or starting Wine when trying to start a Windows executable are generally built into the file managers, and shell users on Linux tends to be masochists that hate stuff happening behind their back).

librom.a is gone in ABI V1 and the shared C library is part of the kernel. Initialization and opening of C library creates problems with the changes.

For i386 I would go for a somewhat larger jmp_buf (128 bytes or so). For other cpus I am counting on your expertise. Current size for x86 is 8 longs, i.e. 32 bytes. Do you want the additional space just in case or do you have some ideas what could be stored there?

I'm always a fan of looking what others were doing. E.g. what is stored by the nix operating systems in jmp_buf and might we need to do it the same way for some reason.

Looking at nix systems is not always the best thing to do. They worry less about future binary compatibility. If they need to break ABI they bump the version of the shared library (e.g libc.so.n -> libc.so.(n+1)) and all the shared libraries depending on it. They then run the old and new version in parallel as needed.

I want to prevent this for arosstdc.library and that's why I want to reserve some extra space for future usage so we can store something more in jmp_buf when needed without breaking ABI.

The alternative is that somebody research this further and comes with a good documentation making sure we don't need to ever extend jmp_buf size in the future.

I consider setjmp/longjmp a core feature of the OS now, not anymore a compiler supported feature. This way it can be safely used in whole of AROS without depending on a certain implementation for a certain compiler.

Macros

On the ABI V1 branch there is a difference between the handling of the libbase if you use AROS_UFH and AROS_LH for defining a function. When it is defined with AROS_UFH all argument are passed on the stack, with AROS_LH the ebx register is used for passing the libbase.

This means that AROS_UFHx defined functions may not be called with the AROS_CALLx macro.

So if want to be friendly to me, try to call AROS_LHx functions with AROS_CALLx or AROS_LC and AROS_UFHx functions with AROS_UFCx. Otherwise I have to do the debugging which takes a lot of time. (I am now booting ABI V1 branch up to the first use of loadseg).

How do I have to define the patches in Snoopy? You should use the same AROS_LH definition as the original and use the AROS_CALL function to call the function.

AROS_UFH2(BPTR, New_CreateDir,
    AROS_UFHA(CONST_STRPTR, name,    D1),
    AROS_UFHA(APTR,         libbase, A6)
)
{
    AROS_USERFUNC_INIT

    // result is exclusive lock or NULL
    BPTR result = AROS_UFC2(BPTR, patches[PATCH_CreateDir].oldfunc,
        AROS_UFCA(CONST_STRPTR, name,    D1),
    AROS_UFCA(APTR,         libbase, A6));

    if (patches[PATCH_CreateDir].enabled)
    {
    main_output("CreateDir", name, 0, (IPTR)result, TRUE);
    }

    return result;

    AROS_USERFUNC_EXIT
}

e.g. (hopefully without a mistake)

AROS_LH1(BPTR, New_CreateDir,
    AROS_LHA(CONST_STRPTR, name, D1),
    struct DosLibrary *, DOSBase, 20, Dos
)
{
    AROS_LIBFUNC_INIT

    // result is exclusive lock or NULL
    BPTR result = AROS_CALL1(BPTR, patches[PATCH_CreateDir].oldfunc,
        AROS_LDA(CONST_STRPTR, name,    D1),
    struct DosLibary *, DOSBase
    );

    if (patches[PATCH_CreateDir].enabled)
    {
    main_output("CreateDir", name, 0, (IPTR)result, TRUE);
    }

    return result;

    AROS_LIBFUNC_EXIT
}

To get the name of this function you have the use AROS_SLIB_ENTRY(New_CreateDir, Dos).

I assume the macros are all right. I do think there is still some problem in AddDataTypes for m68k. I marked it with FIXME, please have a look if something has to be changed there.

I think that these functions have to be converted to use AROS_LH macro's. (It does work on ABI V1 as everything but the libbase is passed on the stack there).

AROS_UFCx() should *only* be used for regcall routines. If VNewRawDoFmt() wants a stackcall routine, do NOT use AROS_UFCx(). Just declare a normal C routine. We should amend the docs to be more clear.

I guess we should switch to muimaster_bincompact.conf on all architectures *including* i386 as soon as we go for ABI v1.

Unfortunately WB 3.x disk won't ever boot correctly without correct Lock structure.

For example WB 3.0 Assign command takes a lock, reads fl_Volume field. The end. (All volume names shows as ??? and "can't cancel <name of assign>" when attempting to assign anything at all)

btw, I committed afs.handler MORE_CACHE (AddBuffers) update to SVN. Most boot disks that use "addbuffers df0: <num>" would get ugly error message without this change.

While debugging another non-working CLI program and again the problem was program reading Lock structure fields. This is more common than originally though. It really isn't worth the time wasted for debugging if most problems are Lock and dos packet related. So whats the plan now?

Does this mean dos routines (like Lock()) should directly call DoPkt(DeviceProc->dvp_Port,action number,arg,..) like "real" dos does or do I need to fit packet.handler somewhere? (and if yes, why?)

I think I'll do first experimentations in m68k-amiga/dos (overriding files in rom/dos, easy to disable/enable changes) until I am sure I am sure I know what I am doing (I should know after all the work with UAE FS but dos is quite strange thing).. It does appear simple enough, just boring.. I can also use UAE directory FS for testing ("only" need to fix dos, no need to touch afs.handler at first).

As far as I understand the topic, in *final* implementation (ABI V1) the packet.handler would go away and all file systems would be rewritten to package based. I made a "short-cut" suggestion to:

"Since the filesystems we use today (SFS, AmberRAM, CDROM, FAT) are all package base (either via packet.handler or SFS packet emulation), why not instead of whole system change do the following:

add missing features to package.handler
migrate SFS to use packet.handler
all newly written modules need to be packet-based

Advantages: less work for 1.0, preparing for future change disadvantages: this solution would probably not change until next major release". DOS would keep using device interface (so afs would not be migrated), it would be forbidden to write new device based handlers, packate.handler would be upgraded to have complete functionality and packate-simulation functionality in SFS (a duplication of packate.handler written before it was created) would be removed and replaced with usage of packate.handler.

Would much prefer that the DOS compatibility work would be done right at the first time in the ABI V1 branch. What we are mainly doing now is developing a compatible DOS for m68k and redoing the work for the rest of archs and integrating the m68k work later on.

Ive never been a fan of the fixes to public structures etc. being done in another branch since (imho) these are bugs, and should be fixed in the main code immediately.

libcall.h

Basic reasoning is that we can use a scratch register to pass libbase to the shared library like arosc.library. If we have this ABI I don't see a reason why we have to introduce a second way of doing things (except for m68k for backwards compatibility) for AROS_Lxxx macros. The reason to do it would only be to work around deficiencies in the dev tools we use.

Some patches for libcall.h that changes how the libbase is passed to functions defined with our AROS_L[CDHP]xxx macros. This close to what I think our final ABI could become with more extensive documentation the next days but here some quick summary:

In AROS shared libraries you can currently have two types of functions, functions defined with m68k registers and ones without. The former function are then handled in the source code by our AROS_L[CDHP]xxx macros; the latter are regular C functions.

Previous patches added the passing of the libbase of the libbase to the normal C functions; a feature planned to be used to ease porting of Linux/Windows/... shared libraries. First version used alt-stack implementation to remember previous value of libbase when setting new value. This was not liked by all people. With help from Jason and Pavel the patch was reworked to use scratch registers to pass the value so the previous value does not need to be remembered as the value is known to the compiler to be clobbered anyway. The following registers are used (typically highest-numbered scratch register)

i386: %edx
x86_64: %r11
arm: ip (=r12)
ppc: r12 (as also used on MOS)
m68k: a1

Stub functions are present in libxxx.a related to the shared .library to add these set the libbase.

Now I just committed a patch that does the same for the function with defined m68k register on all cpus except for 68k. The latter is not possible when keeping binary compatibility with classic AOS on m68k. But for the other cpus now the situation when entering the shared lib function is independent of how they are defined: function arguments are handled as specified by the SYSV standard with the libbase passed in an AROS specified scratch register listed above.

Some more on m68k; the situation is now: functions specified with m68k registers use these registers of course. Those without use now the SYSV standard which is to put the arguments on stack and libbase is then in A1 as said above.

Wondering if we may not better deviate from the standard here and use registers here f.ex. in this order: d5, a4, d4, a3, d3, a2, d2, d1, a0, d0; a1 would still contain the libbase. What do you think ? Of course this would mean having to use an aros specific gcc patch with possible bad performing code and lot's of bug fixing to do; vararg handling has to be looked at; etc. From the other side I think it should be faster especially if we want to keep on supporting older m68k CPUs with very limited or no data cache.

IF* we had a functioning m68k LLVM toolchain, I would say "yes, sounds good, sign me up".

But gcc is just too nasty to work with to get regcall semantics right. Let's suffer with stackcall for now - there are much larger performance hits (ie LONGxLONG multiplies in graphics.library) that greatly overshadow regcall vs stackcall at the moment. Willing to let m68k's ABI for 'C' interfaces stay stackcall for ABIv1. We can revisit this if m68k lives to ABIv2. Prefer to say then that ABI is not fixed yet for m68k. It would mean that all programs using for example using arosc.library will stop working giving no backwards compatibility. If possible I would prefer to not have any backwards compatibility breaks after ABIv1; only extensions.

Why doesn't a libbase go in A6? That is exactly why A1 is used. A scratch register is used in all the Because of several reasons and C compatibility[*] you need to be able to set the libbase in a stub function. If you would use A6 as register you would need to preserve the old value in the stub. For this problem we found no solution where all three of us where content with. Using a scratch register basically allows to pass libbase there without any overhead on the calling side. You need to load the libbase address in a register anyway to compute the address of the function to call using it's LVO number.

[*] A summary Jason once made during our private discussions on the requirements:

1) Must support per-opener globals for the library
2) Should not require any changes to 3rd party library code
3) Should support function calls without library base on the stack, but the library base is available to the called function
4) Should support varadic function calls without library bases on the stack

I may add that 3) follows out of 2) because of function pointers etc. f.ex. parsing routines often get a function pointer passed to fetch a character from a stream. #define tricks won't be compatible with that.

It's been brought up before, but I think we really need to define a reference compiler version per architecture to avoid situations where person A introduces code that does not work for person B, because they use different compiler. In such case the question should be: does the code work on nightly build - if yes that I'm sorry but person B needs to do all the work to make the code work with his setup. We are using a multitude of linux distributions with a multitude of dev environments and there is no way any of us can make his code work on all of those scenarios. I have seen too many times people complaining about "compiler bugs" here and there when the actual problem has nearly always been code bug(s) and compiler finally improved enough to notice that it can optimize the code even more. Very common. So better confirm it first :)

inline asm code you use to jump the library function calls: it looks a bit convoluted to me, but I'm perhaps missing something.

What's wrong with this simpler (and faster and smaller, in terms of generated code) approach?

typedef void (*LibFuncCall)(Library *base, Type1 arg1, ...)__attribute__((regparm(1)));
         
((FuncCall)( ((struct JumpVec *)__SysBase)[LVO].vec))(base, arg1, ...);

Basically, why have you chosen %edx instead of %eax, for passing the libbase argument? It was just the rule to use highest numbered scratch register on all cpus; but I don't have a problem with switching to %eax on i386 to help gcc. We can agree that for i386 we can choose a register better fit in how gcc works at the moment and willing to change. The problem is that I needed to use a similar hack for ARM and there you don't have these function attributes to make a more elegant solution. It would like to have a libbase function attribute that put the libbase in the right register for cpus we support. Agreed but the big question is who is going to do the task and do the maintenance afterwards. Everybody is willing to fix ourselves to one standardized compiler to compile AROS. Well, we fixed ourselves to gcc long ago. For other compilers other approaches might be taken. I was mainly considering the x86 port. Any architecture might have its own specific ABI.

Also your choice is limited to one register that is usable in between calling a function and entering the function. Any of the scratch registers can be used, I just don't see the need for that trampoline. It might also have a great impact on speed due do branch prediction and other such things Michal might have a better clue about. On ARM all scratch register except ip are used for function argument passing. So we need to use that register to be compatible with setting that register when function arguments are already set; e.g. in the stub function.

So I still am interested to know what made gcc decide to optimize away the code and if it is a gcc bug or there is some non-trivial strict aliasing violation in the code or not. Haven't looked deeper, but this seems to be the culprit, solved in 4.5.3: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45967

Anyway that will I think only solve part of the problem as Pavel said that with his compiler he also had problems on x86_64 and with my compiler it booted. x86_64 does not use the jmp trick.

There's also the "thiscall" attribute that could be used.

   `thiscall'
        On the Intel 386, the `thiscall' attribute causes the compiler to
        pass the first argument (if of integral type) in the register ECX.
        Subsequent and other typed arguments are passed on the stack. The
        called function will pop the arguments off the stack.  If the
        number of arguments is variable all arguments are pushed on the
        stack. The `thiscall' attribute is intended for C++ non-static
        member functions. As gcc extension this calling convention can be
        used for C-functions and for static member methods.

Note that both regparm and thiscall are disabled for variadic functions. When I try thiscall attribute I get the message that this attribute is ignored; regparm(1) seems to work fine. I see it was introduced in 4.6. It would be useful to use that one so libraries could also be implemented as C++ classes. Actually I also would prefer %ecx over %eax, but then for the reason that %eax is guaranteed to be clobbered by the return value of a function call and %ecx would maybe be used as global offset table pointer in the whole library by using -ffixed-ecx or something similar.

Could use regparm(2) on gcc <4.6 and just put 0 as first argument, libbase as second. Of course one could then have the discussion if this hack is worse or better than my convoluted jmp hack that started this discussion :).

Might be missing something, but don't you just need to put the libbase as 1st argument? We want to be able to also set the register when a function is called through a function pointer but does not have the libbase in it's argument list; so we don't want to interfere with normal function argument passing.

The purpose is to be able to port external shared libraries without any need to make changes to the source; e.g. to not need to keep AROS specific patches. My ultimate wish is that you take an external shared library code, write a corresponding .conf file and then compile the shared library. This means that you can't use a lot of tricks we use now in out AROS_Lxxx macro's. Also don't want to introduce load time linking to be able to perform this task; e.g. like the .so objects OS4 has introduced. I still want to keep compile time linking. Besides OS4 needs virtual memory in order to be able to really share their .so objects between programs. You basically want an AROS version of PIC code: passing the libbase in a register is an implementation detail, what you need is actually a way to tell the compiler that all global variables defined in the library need to be accessed relatively to this libbase, and loader, or the library init code, has to prepare a libbase containing all these variables. If your objective is to not change the library code, that is. Yes but not yet, I just took the first step to be sure libbase is available when you enter the shared library function as that is part of the ABI and has to be fixed as soon as possible. Later on the PIC mechanism can be implemented fully without needing to break ABI or extend the ABI; it's just an internal affair of how the library uses the libbase it gets passed and initializes the libbase when you open the library. The mechanism to pass the libbase is also made more flexible than just mimicking UNIX shared library behavior. One task could open a library twice and choose to make one or the other active before calling functions of the library. This actually complicates the matter from the point of view of the various compilers on the market today. PIC code is usually implemented transparently to the caller, in our case instead it would be callers' duty to take care of it. I guess it could be done with trampolines, the way it's done with libc now, but you understand this is a heavy design decision. The caller does not have to know if the library is done with PIC or not.

The only responsibility of the caller is to open the library at program entry and put the libbase of the shared library in the scratch register when calling a function of it. One of the problems I see is that currently our library only handles function pointers in it's LVO table and we need probably need something similar for variables. Therefore errno for arosc is currently a #define that calls a function of the library. It also needs to be that way for thread-safety ;). Or it has to be a __thread variable. But I think that makes things even more complicated.

Is there a good recent doc on how uclinux does it on different CPUs (e.g m68k, i386, x86_64, ARM and PPC), the docs I found mentioned that for most cpus shared libraries were not supported (yet).

How would your system look like on the CPUs AROS currently supports? Scratch registers are available on all CPUs, %fs/%gs not.

It is true that I want to find an ABI that best fits with Amiga-type shared libraries and not one that best fits with current existing compilers. But if there are equal choices I would go for the one which is easiest to use in current compilers. The closest would be a system that supported a 'tree of GOTs'. You could think of each GOT as a libbase, and the whole system is compiled as PIC. But, of existing systems, I can't think of any.

The other approach could be to use per task pointers to globally allocated "library bases" (GOT and PLT tables, actually), maybe accessed through the %fs or %gs registers (and you'd save one scratch register that way too).

I guess we can try your way and see how it goes. PIC could probably be implemented by having the compiler treat any functions it encounters as being a method belonging to a "global" class. It could be easier to use this approach once one decides to modify the compiler.

The library from it's side can decide to return the same libbase when the same library is opened twice from the same task; e.g. the expected behavior when mimicking UNIX shared libraries. It can also decide to put only certain variables in the PIC table (e.g. libbase) and make other variables shared. ...

I have not read everything, but it seems you are discussing about which register to use on x86 to store a reference to the GOT or something similar of a shared library? Might I suggest to use the ebx register, as this is the register that's for exactly that? It's the extended base register. And it's also used by ELF for shared libs. I know that AROS' shared libs are not really shared libraries, especially not ELF shared libs, but you will get a lot less problems with gcc if you use ebx, I guess. It probably won't matter anymore but that's how it was originally done. The problem is that %ebx is not a scratch register so if you want to put a new value in there during asm stub code you need to preserve the old value. There was no good solution found to do this and using a scratch register solves that problem. Well, isn't it that you only put the address of the GOT once into ebx and then never touch that register again? At least that's what PIC code generated by GCC does. If it absolutely has to touch it, it also restores it, as it's the only non-volatile register of e{a,b,c,d}x. It generally avoids ebx as much as possible then. A shared library can still use %ebx internally as got register it just has to save previous value of %ebx when entering a function through the LVO table and move %eax to %ebx. On the calling site, it is easier if libbase is a scratch register. We want to be able to call functions in a shared library through a function pointer without having the libbase in it's prototype, f.ex. fgetc for parsing routines. Then you can use small stub code that just gets the libbase from the global variable and put it in the scratch register.

Architectures

m68k

Years ago, there were certain Amiga compilers that had a combined .data/.bss segment that was referenced via the A4 register. When implemented correctly, this meant that the final binary was always 'pure'. Here's an example implementation, where Task->tc_TrapData is used to store A4. Task->tc_TrapCode was a proxy that called a 'C safe' trap routine, if the user had provided one. All references to .data/.bss were indirect, through the A4 register. This allowed the .data/.bss segment to be located anywhere in memory, and there were no .data/.bss symbol fixups in .text.

After all the .text symbols (that only referenced .text, of course) were fixed up by LoadSeg, the program start proceeded as so:

__start:

  UWORD *a4 = AllocVec(sizeof(.data + .bss), MEMF_CLEAR)

  /* Set up .data, from an offset, value table. On AOS, you could
   * do this by calling InternalLoadSeg on a RELA segement that is
   * stored in .rodata or .data, or roll your own mechanism.
   */
  For Each Symbol in .data; do
     Set a4 + SymbolOffset = SymbolValue

  FindTask(NULL)->tc_TrapCode = .text.TrapProxy
  FindTask(NULL)->tc_TrapData = a4

  /* Load all the libraries
   */
  For Each Automatically Loaded Library in .text.Library List
     a4 + Library Offset = OpenLibrary(.text.Library List Item)

  /* Call the C library startup, which calls 'main', etc. etc
   */
  jsr __start_c

  /* Close all the libraries
   */
  For Each Automatically Loaded Library in .text.Library List
     CloseLibrary(a4 + Library Offset)

  /* Free the .data+.bss */
  FreeVec(a4)

  ret

From then on, the compiler makes sure not to use A4, and if A4 would need to get clobbered by a LVO call, it saves it on the stack and restores it after the call. Therefore, every function in the compiled program can get access to .data+.bss through A4, and subsequent invocations of the program (which could be made Resident) would get their own .data+.bss segment, and share the .text segment from the original invocation.

NOTE: One of the horrifying side effects was that you needed to explicitly use some crazy macros to retrieve the data segment back into A4 based upon your task ID for things like BOOPSI classes, interrupt handlers, and any other callbacks. I don't see how to get around this for any 'autopure' solution on AROS without deep compiler magic.

NOTE 2: Adding this support to GCC is not easy. Maybe not even possible for m68k. I only bring this up as a historical reference for how some 'compile as pure' was implementations worked on AOS, and as Things To Think About for the AROS LLVM team.

I think I read somewhere that the variable heap should be stored in A6 on ABI v1 and spilled to the stack when A6 holds library bases instead. Thus, A4 would still be available to general purpose code in AROS even when making pure reentrant code. It's an idea. A4 was used on AOS compiles because few libraries used it for their LVOs.

I don't think it is even needed to fix that in the ABI. I think programs using A4 can run on the same OS as programs using A6 as base pointer. Of course it would be good to use the same approach for whole m68k AROS. To standardize the static link libraries (e.g. libxxx.a), that needs to be defined. For m68k it was already necessary to provide .a file for different code models used. Can we and do want to get rid of this?

Most disassembled Amiga code I've looked at would require a massive amount of spills for that to work, whereas it's pretty rare for any code to need enough address registers for it to be necessary to spill A4. Does not the AROS gcc macro's anyway spill the A6 register on m68k when calling a function in a shared library? Only on unpatched GCC where the frame pointer is A6. The GCC that is recommended for AROS m68k has a minor patch that changes the frame pointer to A5. Since we also compile with -fomit-frame-pointer, which reduces (but not eliminates) the number of frame pointer usages, the A6 spill is very infrequent. snt a6 frame pointer necessary for debug software like muforce? I've never used muforce, but in the Amiga software I've disassembled, I've never seen an example of one that used A6 as frame pointer, so that sounds like it'd be a strange requirement. Most early Amiga compilers at least use(d) A5. Of course, it's been a long time since I've done much on m68k, so I haven't looked at much code generated by gcc or vbcc.

In AROS rom/, we only have:

rom/exec/exec.conf:APTR NewAddTask
rom/graphics/graphics.conf:LONG DoRenderFunc

And in AROS workbench, we only have:

workbench/libs/icon/icon.conf:BOOL GetIconRectangleA
workbench/libs/reqtools/reqtools.conf:ULONG rtEZRequestA
workbench/libs/datatypes/datatypes.conf:ULONG SaveDTObjectA
workbench/libs/datatypes/datatypes.conf:ULONG DoDTDomainA
workbench/libs/workbench/workbench.conf:struct AppIcon *AddAppIconA
workbench/classes/datatypes/png/png.conf:void PNG_GetImageInfo

So it's still *very* rare for A4 to be used in a LVO call, and that saves you a register reload (which you would have to do for EVERY call with A6). (Mesa is not in this list, as it has a number of functions that go all the way to A5!).

A6 is that Library's libbase, and most of the intra-library calls in graphics.library are LVO (register) calls, so this is not surprising. No, these were internal calls to internal subroutines. Other arguments are pushed to stack, but a6 is not pushed. Callee continues to access GfxBase by a6. Then they may have used a hybrid call sequence, ie:

void foo(int a, char b, long d, REGPARM(A6) struct Library *GfxBase);

SAS/C and other compilers were capable of some unusual stunts.

Another thing I'd like to see go into the 68k LLVM compiler would be to make the heap biased to subtract -128 from the heap pointer and use -128 as the base instead of 0. This would allow twice the likelihood that a variable would have a single-byte offset from the heap pointer. If the variable heap grows bigger than 32k+128, we could then bias the heap pointer down to -32768 for a maximum of 64k of a heap on a flat 68k. I think the PhxAss assembler uses that trick internally. Neat trick. Is that a byte or long offset? After checking the M68000PRM, it says that the address-register indirect addressing mode with index with displacement mode always uses 16-bit displacements. So that means that the bias will need to be -32768 instead of -128. The way it's implemented is simple: When we allocate the heap in the startup code, we subtract off the bias (SUBA -32768.w, A6) and in the shutdown code, we add it back on (ADDA -32768.w, A6) before we deallocate it. This will let us use unsigned offsets for all of the variables thus getting us a heap size of 64K instead of 32K. I think MorphOS does a similar trick.

As for using A6 as the heap pointer holder, LLVM's PBQP register allocator(1) will be able to spill address register contents to data registers before the stack anyway so the pressure on the address registers will be less than in normal register allocators. The problem with PBQP is that it needs a matrix solver to work efficiently and that is best accomplished by compiling on an OpenCL-based OS with an up-to-date graphics card. Compiling release 68k code on a 68k would be an exercise in frustration, given the slowness of the PBQP allocator on older systems. Debug code would be fine though since it would normally use the LinearScan register allocator instead.

i386 ABI V1 changes

Changes currently in the ABI_V1 branches (trunk/branches/ABI_V1):

dos.library changes: BPTR is always word based Changes to the handlers %ebx register reserved on i386 as base for relative addressing and as libbase pointer passed to library functions

Ongoing changes in the branches:

Working on C library and ETask. The purpose of this change is to remove all arosc specific extensions from the AROS Task structure. I am extending the autogenerated code from genmodule so that it provides the features so that arosc.library can store all its state in its own per-opener/per-task libbase.

to be done (discussion has to happen and actions to be defined and assigned):

SysBase location Currently it is a special global variable handled by the ELF loader; but some people don't seem to like it... I would like to keep it like that. Another problem is that currently SysBase is one global pointer shared by all tasks. This is not SMP compatible as we need a SysBase at least for each CPU/core separately. Michal then proposed to call a function every time you need SysBase. I proposed to use virtual memory for that; e.g. that SysBase pointer has the same virtual address for each task but points to a different physical address for each CPU.

exec: AllocTrap/FreeTrap and ETask ? See ongoing changes above kernel.resource (can it wait for ABI>1.0 ?). This resource is meant to gather all arch and cpu specific code for exec.library. This way exec.library would not need any arch specific code but would use a call to kernel.resource functions when arch specific information or actions are needed. This change can be delayed until after ABI V1.0 as exec.library is fixed; but programs that use the kernel.resource directly will not be compatible with the next iteration of the ABI.

dos.library compatibility Switch everything to DOS packets and remove the current system based on devices. This has been heavily discussed on the mail list. The current AROS implementation uses devices for message passing and the classic system used Port and DOS Packets. The latter is considered by many people as a hack on AmigaOS without fully following the rest of the OS internal structure. That's also reason why the alternative for AROS was developed in the first place. But it became clear that we'll have to implement a DOS packets interface anyway and thus the question became if we should have two alternative implementations in AROS anyway. In the beginning I was also a big opponent to the DOS packets mostly because the device interface allowed to run some code on the callers task and thus avoid task switches to reduce throughput. Using the PA_CALL and PA_FASTCALL feature of AROS ports the same could be achieved. In the end it was concluded that everything you could do with the device interface could also be done in the ports interface and vice versa and that having two systems next to each other is bloat. In current implementation file-handles and locks are the same and this should also be changed when switching to the DOS packets interface.

C library

The problem with the std C functions is that the compiler may not know at call time that he needs to set %ebx. Some code just includes the prototype to a std C function and not the include file. Additionally you have to be sure that you can pass function pointers to a std C function e.g.

#include <stdio.h>

int parser(int (*parsefunc)(FILE *))
{
   '''

   while (parsefunc(myfile) != EOF)
   {
       ...
   }
}

int main(void)
{
    ...

    parser(fgetc);

    ...
}

That's why use the stubs in the static lib library to set the %ebx value to the C libbase which is a global variable (or some offset from the current libbase). Problem is that I need to spil the old value of %ebx, and I did not find another easy way than making %ebx a stack pointer and pushing the new value on that stack and popping it after the function call. If you would push the old value on the stack it would be interpreted as return address or as function argument.

One more question about C calling convention: where is the first argument, where the last argument. Through which register are the function argument normally accessed? First argument is lowest in the stack (at *(ULONG *)(A7 + 4), *(ULONG *)A7 is the return address). The next is at *(ULONG *)(A7 + 8), the next at *(ULONG *)(A7 + 12), and so on.

The function arguments are accessed either through A7 (sp) or A5 (fp), or any other alias of sp or fp. It is unpredictable - the optimizer can have all sorts of fun with reordering stuff.

But AFAICS this does not solve the problem of passing libbase to C functions with stack based argument passing. To summarize again: as I need the capability of using function pointers to functions in the standard C library I need to set the libbase in the stub function. At that point the old value in A6 needs to be preserved but there is not easy and fast way to store it.

Another question is how will A4 be set when entering such a XIP library? What is the overhead? Ah, I see your issue more clearly now. You're talking about a special case for AROS C library, not for the general struct Library * case.

Not AROS C specific but for all functions using stack based argument passing. That is best used for all ported code as otherwise you will just introduce stub function that only generate calling overhead (e.g. arguments from stack in register in the libxxx.a stub and the other way again inside the library itself). Why do we have the lib<foo>.a stubs libraries at all? On m68k, the <proto/foo.h> should be using the inlines anyway, and the stubs libraries are never used at all. Should be the same on the other architectures too? Why do we have the stub libs? Because C has function pointers.

You do realize that gcc lets you use static inlines as function pointers, right?

#include <stdio.h>

static inline int foo(int bar)
{
    return bar + 0x100;
}

void calc(int i, int (*func)(int val))
{
    printf("0x%x => 0x%x\n", i, func(i));
}

int main(int argc, char **argv)
{
    int i;

    for (i = 0; i < 10; i++)
        calc(i, foo);
}

The problem is that some code also includes function prototypes of which will give an error is the function is also static inline. Some code - especially BSD - even only includes the function prototypes without including the header file so no chance of defining it as static inline. Additionally, for the std C includes and I think also for SDL etc. there is a separation between the includes, if you include one header file only part of the lib interface is exposed. Our proto includes expose the full interface of the library. I think doing it though inline function will increase the porting effort of libraries and of the programs that use it.

Imagine GL case - ported code will not have #include <proto/mesa.h> but will have #include <GL/gl.h>. The makefiles will also have -lGL, thus the libGL.a will be linked. The way that the system is set up now, makes it very easy to port stuff that uses GL or SDL (or probably any library ported from outside world)

The stub functions are also required for at least GL for the purpose of AROSMesaGetProcAddress. The parameter is the name of the function, the return is the pointer to the function. The function however needs to meet the interface defined in GL itself, so it can't be a "amiga library function", because such function also have libbase as a parameter. It needs to be a function from a stub, which will call the library function with library base passed.

Example:

PFNGLBINDVERTEXARRAYPROC glBindVertexArray = AROSMesaGetProcAddress("glBindVertexArray");

glBindVertexArray(10);

This will call glBindVertexArray function in the stub library, which will then call Mesa_glBindVertexArray(10, MesaBase). (at least that's how it works now).

Why not make the C library like any other library, with an enumerated list of functions in the arosc.conf, and a libbase? The AROSC libbase could double as the pid. This would eliminate the 'specialness' of arosc.library.

AROSCBase would be opened by the -lauto init code, just like any other library. That all is implemented and is not the problem. (BTW auto-opening will not be done by -lauto but by -larosc or uselibs=arosc; e.g. how it should be :) ).

The problem is setting the library base when entering the function. As function pointers need to be supported (as explained in another mail) the compiler does not know it needs to set the libbase for a function. That's why I set the libbase in the stub function. The problem is that I need to preserve the old libbase value and there is no fast way to store the old value. That's why I implemented a second stack so that I can just push the new libbase value.

??? no static version of arosc.library This makes building arosc quite complex and all programs should be able to use the shared version anyway. I don't think currently any program uses the static version of library. ??? remove librom.a I would split arosc.library in three parts: a std C subset that can be put in ROM as the first library, that thus does not need dos.library. This should then replace all usage of librom. a std C implementation, stdio is done as a light weight wrapper around amiga OS filehandles Full POSIX implementation; possibly also including some BSD functions. This would then provide a (nearly) full POSIX implementation. This library should be optional and real amiga programs should not need this library.

varargs handling (stdarg.h, va_arg, SLOWSTACK_ARGS, ...) Is currently heavily discussed on the mailing list, but I myself don't have the time to delve into it. A summary of different proposals with some example code to see the effect on real code would be handy. My proposal is to switch to the startvalinear & co. from the adtools gcc compiler. This will use the same solution as is used for OS4. Advantage is that a limited change of code is needed to adapt this. Disadvantage is that the adtools gcc has to be used for compiling AROS.

oop.library optimization (can it wait for ABI>1.0 ?) I think current oop.library and the hidds are still sub-optimal and need some more investigation. I think this work can wait till after ABI V1.0 and mentioning in the ABI V1.0 that programs that use oop.library or hidds directly won't be compatible with ABI V2.0.

libcall.h/cpu.h clean-up (can it wait for ABI>1.0 ?) IMO some cruft has been building up in these files and they could use a good discussion of what things can be removed and what things can be combined and then properly documented. Probably this does not impact the ABI so it may be delayed but changes to these files may cause source compatibility problems. E.g. programs will not compile anymore with the new version if they depend on some specific features and need to be adapted but programs compiled with the old version will keep on running on AROS.

How to extend OS3.x shared libraries ? Where to put AROS extension without hurting compatibility with future versions of MOS and/or AOS ?

Discuss and flame about these patches on the mail list before applying them to the main trunk. Possibly improve or rewrite part of them.

Write (i386) ABI reference manual and put it on the web. This will involve a lot of discussion on the mail list to nail down the V1.0 version several APIs. During the writing of this manual a lot of things will pop up that have not been thought through yet, so ABI V1 and AROS 1.0 can only be released if this task is finished.

One of the important discussions is to clear out what is handled by the compiler and what is defined by the ABI of AROS itself. This border is quite vague at the moment.

For each function in the library a corresponding stub function is generated in the libfoo.a static link lib. The only thing this function does on i386 is set the libbase and jump to the proper address. In order to be able to restore the previous libbase when I come out of the called lib func I implemented %ebx to be a second stack pointer. This way I can push the current libbase on this stack before setting the new libbase. After the function return the stack can be popped to return to the previous the stack after the call afterwards. (I Implemented this stack to grow in the opposite direction of the normal stack and for a new task it is initialized to the end of the stack; this means that at the start SP points to the top of the stack and %ebx to the bottom and both pointers starts to grow to each other).

With a normal stack, there may be a possibility in future to detect a stack overflow using the MMU, and even to extend the stack automatically. However, if the stack area actually consists of two stacks growing in opposite directions, I'm not sure there's a good way to detect when they meet in the middle.

Then you can make the stack pointers point to two different memory pages; there is nothing that forces them to point to the same stack. But for the current implementation - where often stack is allocated by user code the current solution is the most compatible.

And since this would only be for arosc applications, it shouldn't impact BCPL and legacy AOS apps.

I think we can in the end avoid the second stack if the code for the library is compiled in a special way. If function arguments would never be accessed through the stack pointer but always through the frame/argument pointer we could setup the frame/argument pointer so it points to the function arguments and push the old libbase on the normal stack and then jump inside the normal function (after the frame pointer setup). If we could get this working it would get my preference. No need anymore to reserve %ebx for the system on i386; only inside the libraries it would be a fixed register containing the libbase. I think on m68k it could work in much the same way but now with A6.

There are two ways to peg a register: One is never allow it to be allocated, the other is as I described before with a custom calling convention that passes the libbase in as a "this" pointer as C++ would do it. The latter way is generally preferred since the compiler can sometimes get away with stuffing the libbase to the stack and retrieving it later in a high-pressure register loading situation. Now this may good idea - leveraging the C++ compiler. Hmm. Would it be a good investigation to see if we could abuse the C++ compiler to generate the code we want? It may require static class data (ie class foo { static int val; }), namespace manipulation, and some other crazy things.

Here is my crazy prototype. Actually compiles and runs, and appears to generate the correct C interface stubs.

/****** Library definition *********/

struct fooBase {
    int a;
    int b;
};

/****** Library C++ prototype ******/

class fooBaseClass {
private:
    static struct fooBase base;
public:
    static int Open(void);
    static int DoSomething(int a);
    static int DoSomethingElse(int b);
};

/****** 'C++' implementation ***********/

struct fooBase fooBaseClass::base = {};

int fooBaseClass::Open(void)
{
    base.a = 0;
    base.b = 1;
    return 1;
}

int fooBaseClass::DoSomething(int new_a)
{
    int old_a = base.a;
    base.a = new_a;
    return old_a;
}

int fooBaseClass::DoSomethingElse(int new_b)
{
    int old_b = base.b;
    base.b = new_b;
    return old_b;
}
/***** 'C' interface ***************/

extern "C" {
int OpenFoo(void);
int DoSomething(int a);
int DoSomethingElse(int b);
};

int OpenFoo(void)
{
    return fooBaseClass::Open();
}

int DoSomething(int a)
{
    return fooBaseClass::DoSomething(a);
}

int DoSomethingElse(int b)
{
    return fooBaseClass::DoSomethingElse(b);
}

/******** test app ************/
#include <stdio.h>

int main(int argc, char **argv)
{
   if (!OpenFoo()) {
       printf("OpenFoo(): Failed\n");
       return 0;
   }

   DoSomething(100);
   DoSomethingElse(101);

   printf("DoSomething: %d\n", DoSomething(0));
   printf("DoSomethingElse: %d\n", DoSomethingElse(0));

   return 0;
}

Since the library structure would use a custom-calling convention anyway, it wouldn't need to use the C calling convention anyway. The problem is with variadic arguments. If you have more parameters than what you can put in registers we'd need some sort of way to dictate the calling convention. The .a stubs could contain a small subroutine that uses C calling convention for its varargs inputs as long as it calls the body of the subroutine with the alternative libcall calling convention. Can we make a libcall convention that regloads all of the time and a second one that uses maybe a parameter counter in %ECX or something for a varargs capable calling convention for libraries?

Current LLVM-calling conventions dictate that the C-calling convention supports varargs and the Fastcall-calling convention does not but uses as many registers as possible. Fastcalls are preferred for tail recursion calls since the compiler can convert them into iterations instead. C-calling convention is the default in many cases but if the function does not use variadic arguments it can sometimes be optimized into a Fastcall by the compiler.

I passed along my ideas for an object-oriented library format to Chris Handley for use with the PortablE compiler. If I rummage around in my outbox, I may still have some of the emails. Here's a text-based diagram I found of an object-oriented library base with interface inheritance:

--------------------------------
interface names (stored as EStrings)
--------------------------------
interface offsets to initialize hash table
--------------------------------
hash table (array aligned to pointer size boundary)
--------------------------------
globals (varying number and size)
--------------------------------
size of hash table (UINT containing number of array indexes in hash table)
--------------------------------
pointer to hash table (to allow the global space to grow)
--------------------------------
parent library handle
--------------------------------
library structure
======================
standard jump table entries for library (including init and expunge, etc.)
--------------------------------
interface1 jump table entries
--------------------------------
interface 2 jump table entries
--------------------------------
...
--------------------------------
the actual code

The double line created by the = signs represents the address pointed to by the library handle with the globals above it and the jump table below it.

Hosted

AFAIK POSIX subsystem will be separated from other elements. I'd like to ask what will be done with POSIX filesystem. IMHO approach used in ixemul.library is poor. The library in fact uses own internal filesystem. This includes /dev emulation. What if we get DEV: handler on DOS level? I think this will be better because:

This integrates with the system better, every native program can have access to this filesystem if needed.
Since this is the complete filesystem, all filesystem calls work as expected, not only open() and close().

You may navigate it normally, for example chdir("/dev"); open("null") will also work.

There will be no need for separate ZERO: handler since this gets replaced by DEV:zero.
We also get DEV:null, DEV:urandom, etc.
We may get DEV:sdXXX entries for disks.
We may implement full set of UNIX IOCTLs by introducing a dedicated IOCTL packet. This will make

porting UNIX programs very simple. I'm against this. We should not pollute amiga side of things with compatibility things from POSIX/Linux. IMHO the better place to have these things is to do them in the POSIX compliant C library then. I like how cygwin does things: it allows to have mount points in the cygwin file name space that don't need to correspond with the DOS/Win file name space. I'm not sure, I now call the lib arosuxc.library but if somebody wants to port ixemul, our library may not be needed anymore.

If not I'll continue with moving the arosc code to arosuxc. I think one of the big difference between the AROS POSIX implementation and ixemul is that in the former stdio is wrapped around dos.library calls like Read/Write/Seek etc. and the latter uses DOS packets directly. I'm not sure I like the usage of DOS packets but it may be needed for a good compatibility with UNIX/Linux. AFAIK also for ixemul it's file name space is directly linked with the Amiga name space.

IMHO ixemul is too heavyweight. Perhaps things can be made simpler. BTW, if we manage to beat ixemul in performance, our project may become cross-platform IMHO. May be we should give it another name, like just posix.library? Additionally, "arosuxc.library" hurts my ears because it contains "sux". :)

I think one of the big difference between the AROS POSIX implementation and ixemul is that in the former stdio is wrapped around dos.library calls like Read/Write/Seek etc. and the latter uses DOS packets directly. I'm not sure I like the usage of DOS packets but it may be needed for a good compatibility with UNIX/Linux. I guess it has something to do with select() implementation.

AFAIK also for ixemul it's file name space is directly linked with the Amiga name space. It's the same as in current arosc. First component treated as volume name. So /usr becomes usr:, etc. Plain "/" gives a virtual directory listing volume names and assigns. BTW ixemul's virtual filesystem needs re-engineering IMHO. For example you can't cd /dev and list entries. Of course stat() on these entries does not give what is expected (major/minor numbers etc.). /proc is completely missing (despite it could be useful). One day I thought about rewriting the whole thing, however this would actually mean forking the development since currently ixemul.library is grabbed by MorphOS team as private project, and it's impossible to join the development. So, anyway, I believe forking would mean name change in order to prevent clashes in future.

The other problem is that now configure scripts for findutils and coreutils think we have a getmnt function :| Yes this is how the others do it, they use all sorts of additional functions to get info about mounted file systems. I would not want to implement them until we need them for some other stuff.

The only other alternative I have found by studying findutils/gnulib/lib/mountlist.c is another field in struct statfs (compiler/clib/include/sys/mount.h) called f_fstypename and which we do not currently have. So adding it might potentially break binary compatibility, so I would shift this for the ABI v1 switch and revert to mnt_names array usage for now.

In ABI V1 librom.a is gone. There is one library in arosstdc.library that will be usable by all modules. It will be initialized as one of the first modules. Therefor also the i/o functions are moved to arosstdcdos.library which is adisk^{[check spelling]}-based module.

How are you handling the ctype.h family of functions? I would like to see an option for LANG=C only stripped down set.

Currently only the "C" locale is supported by the string functions in arosstdc.library. This also allows me to mark most of the string C functions as not needing the C libbase reducing the overhead of calling one of these functions. People worried about overhead of putting all C function in a shared can later on still provide inlined version of the functions in the headers or use the __builtin_ variants when the compiler is gcc.

I think we need a more in depth general discussion about the interaction of the locale of the C library and Amigas locale.library and different character encodings and ...

And I am up to speed with the delicate C and POSIX library internal like vfork, function pointers, ... In the short time I become somewhat of a git power user and my patches are now a nice tree of small commits that are locally rebased on main trunk on a regular basis. This allows me to efficiently work on the patches; part of the overview would be lost if I commit it to the main trunk. That's why I want to get this patch up to a level where there is a known solution for each of the cpus before I commit it and let you, cpu experts do it's thing. For example my -ffixed-ebx changes for i386 seem to break native vesa driver but I won't fix it as I am a native nor vesa expert.

But AFAICS this does not solve the problem of passing libbase to C functions with stack based argument passing. To summarize again: as I need the capability of using function pointers to functions in the standard C library I need to set the libbase in the stub function. At that point the old value in A6 needs to be preserved but there is not easy and fast way to store it. You're talking about a special case for AROS C library, not for the general struct Library * case. Why not make the C library like any other library, with an enumerated list of functions in the arosc.conf, and a libbase? The AROSC libbase could double as the pid. This would eliminate the 'specialness' of arosc.library. AROSCBase would be opened by the -lauto init code, just like any other library.

References

Since AOS doesn't have the extended task structures, this code sets acpd = NULL->iet_acpd Fix arosc not to use private fields in task structure. You can use AVL trees for association (OS3.5+). Or duplicate these functions in arosc statically.

This is done in the ABI V1 tree. There it is a library like any other using a per task libbase. The latter implemented with AVL trees. But in the ABI V1 the lookup in the AVL tree is only needed during OpenLibrary as the libbase is passed through the %ebx register.

Remember also that arosc.library relies on some other AROS extensions like NewAddTask() and NewStackSwap(). It will conflict with the changes I have done for ABI V1. I'll bracket my arosc changes with AROS_FLAVOUR_BINCOMPAT - just remove them when you merge the ABI V1, since they won't be needed anymore.

Perhaps this is a silly thought but I am curious if it would be possible to make OpenLibrary support the use of an internal avl list of library bases for tasks/libraries that support them - so that external libraries could open the correct ones for a particular task (e.g. mesa/egl/glu/ect sharing the same library base without having to implement their own per task lists)?

In ABI V1 there is a peridbase option in the lib .conf file (by default it returns a unique libbase per task with task defined as a unique combination of task pointer and process return address). I may decide to change the option name before merging in main trunk. IMO it is not exec's task to decide when to make a new libbase or not; it has to be decided in the OpenLib (e.g. LVO == 1) function of a library.

LVOs

Long ago, I promised to sum up some information about ABI v1 TODOs. Here is the first piece. This is analysis of needed fixes to exec.library LVO table. The goal is to make AROS functions binary-compatible with AmigaOS 3.9 and potentially with MorphOS. This includes (re)moving LVOs that conflict with MorphOS.

But I try to reserve LVOs for all functions defined in POSIX.1-2008 so adding POSIX functions will not be that difficult to handle.

Below you'll find fragments from AROS exec.conf file and AmigaOS 3.9 and MorphOS v2.5 exec_lib.fd files with numbered LVOs. I stripped original AmigaOS v3.1 functions in order to keep lists shorter.

AROS (current):

131 ULONG ObtainQuickVector(APTR interruptCode) (A0)
132 .skip 2 # MorphOS: NewSetFunction(), NewCreateLibrary()
134 IPTR NewStackSwap(struct StackSwapStruct *newStack, APTR function, struct StackSwapArgs *args) (A0, A1, A2)
135 APTR TaggedOpenLibrary(LONG tag) (D0)
136 ULONG ReadGayle() ()
137 STRPTR VNewRawDoFmt(CONST_STRPTR FormatString, VOID_FUNC PutChProc, APTR PutChData, va_list VaListStream) (A0, A2, A3, A1)
138 .skip 1 # MorphOS: CacheFlushDataArea()
139 struct AVLNode *AVL_AddNode(struct AVLNode **root, struct AVLNode *node, AVLNODECOMP func) (A0, A1, A2)
140 struct AVLNode *AVL_RemNodeByAddress(struct AVLNode **root, struct AVLNode *node) (A0, A1)
141 struct AVLNode *AVL_RemNodeByKey(struct AVLNode **root, AVLKey key, AVLKEYCOMP func) (A0, A1, A2)
142 struct AVLNode *AVL_FindNode(const struct AVLNode *root, AVLKey key, AVLKEYCOMP func) (A0, A1, A2)
143 struct AVLNode *AVL_FindPrevNodeByAddress(const struct AVLNode *node) (A0)
144 struct AVLNode *AVL_FindPrevNodeByKey(const struct AVLNode *root, AVLKey key, AVLKEYCOMP func) (A0, A1, A2)
145 struct AVLNode *AVL_FindNextNodeByAddress(const struct AVLNode *node) (A0)
146 struct AVLNode *AVL_FindNextNodeByKey(const struct AVLNode *node, AVLKey key, AVLKEYCOMP func) (A0, A1, A2)
147 struct AVLNode *AVL_FindFirstNode(const struct AVLNode *root) (A0)
148 struct AVLNode *AVL_FindLastNode(const struct AVLNode *root) (A0)
149 APTR AllocVecPooled(APTR pool, ULONG size) (D0, D1)
150 void FreeVecPooled(APTR pool, APTR memory) (D0, D1)
151 BOOL NewAllocEntry(struct MemList *entry, struct MemList **return_entry, ULONG *return_flags) (A0, A1, D0)
152 APTR NewAddTask(struct Task *task, APTR initialPC, APTR finalPC, struct TagItem *tagList) (A1, A2, A3, A4)
153 .skip 14
167 BOOL AddResetCallback(struct Interrupt *resetCallback) (A0)
168 void RemResetCallback(struct Interrupt *resetCallback) (A0)
169 .skip 2 # MorphOS: private9(), private10()
171 .skip 2 # MorphOS: DumpTaskState(), AddExecNotifyType()
173 ULONG ShutdownA(ULONG action) (D0)
# MorphOS functions follow:
# private11()
# AvailPool()
# private12()
# PutMsgHead()
# NewGetTaskPIDAttrsA()
# NewSetTaskPIDAttrsA()
##end functionlist

Conversely, 68k AROS could have the AVL functions at OS3.9-compatible LVOs, with the #?VecPooled() functions at OS3.9-unused LVOs or in amiga.lib. For (some) simplicity, other archs should follow either the 68k or PPC LVOs rather than use a third set.

AmigaOS 3.9:

131 ObtainQuickVector(interruptCode)(a0)
##private
132 execPrivate14()()
133 execPrivate15()()
134 execPrivate16()()
135 execPrivate17()()
136 execPrivate18()()
137 execPrivate19()()
##public
*--- functions in V45 or higher ---
*------ Finally the list functions are complete
138 NewMinList(minlist)(a0)
##private
139 execPrivate20()()
140 execPrivate21()()
141 execPrivate22()()
##public
*------ New AVL tree support for V45. Yes, this is intentionally part of Exec!
142 AVL_AddNode(root,node,func)(a0/a1/a2)
143 AVL_RemNodeByAddress(root,node)(a0/a1)
144 AVL_RemNodeByKey(root,key,func)(a0/a1/a2)
145 AVL_FindNode(root,key,func)(a0/a1/a2)
146 AVL_FindPrevNodeByAddress(node)(a0)
147 AVL_FindPrevNodeByKey(root,key,func)(a0/a1/a2)
148 AVL_FindNextNodeByAddress(node)(a0)
149 AVL_FindNextNodeByKey(root,key,func)(a0/a1/a2)
150 AVL_FindFirstNode(root)(a0)
151 AVL_FindLastNode(root)(a0)
##private
152 *--- (10 function slots reserved here) ---
##bias 972
##end

Presumably this will mean that MorphOS binaries that call these functions won't work under PPC AROS. Alternatively, how about keeping these functions at MorphOS-compatible LVOs for PPC AROS, and putting the AVL functions either at MorphOS-unused LVOs or in amiga.lib.

MorphOS:

131 ObtainQuickVector(interruptCode)(a0)
132 NewSetFunction(library,function,offset,tags)(a0,a1,d0,a2)
133 NewCreateLibrary(tags)(a0)
134 NewPPCStackSwap(newStack,function,args)(a0,a1,a2)
135 TaggedOpenLibrary(LibTag)(d0)
136 ReadGayle()()
137 VNewRawDoFmt(FmtString,PutChProc,PutChData,args)(base,sysv)
138 CacheFlushDataArea(Address,Size)(a0,d0)
139 CacheInvalidInstArea(Address,Size)(a0,d0)
140 CacheInvalidDataArea(Address,Size)(a0,d0)
141 CacheFlushDataInstArea(Address,Size)(a0,d0)
142 CacheTrashCacheArea(Address,Size)(a0,d0)
143 AllocTaskPooled(Size)(d0)
144 FreeTaskPooled(Address,Size)(a1,d0)
145 AllocVecTaskPooled(Size)(d0)
146 FreeVecTaskPooled(Address)(a1)
147 FlushPool(poolHeader)(a0)
148 FlushTaskPool()()
149 AllocVecPooled(poolHeader,memSize)(a0,d0)
150 FreeVecPooled(poolHeader,memory)(a0/a1)
151 NewGetSystemAttrsA(Data,DataSize,Type,Tags)(a0,d0,d1,a1)
152 NewSetSystemAttrsA(Data,DataSize,Type,Tags)(a0,d0,d1,a1)
153 NewCreateTaskA(Tags)(a0)
154 NewRawDoFmt(FmtString,PutChProc,PutChData,...)(base,sysv)
155 AllocateAligned(memHeader,byteSize,alignSize,alignOffset)(base,sysv)
156 AllocMemAligned(byteSize,attributes,alignSize,alignOffset)(base,sysv)
157 AllocVecAligned(byteSize,attributes,alignSize,alignOffset)(base,sysv)
158 AddExecNotify(hook)(base,sysv)
159 RemExecNotify(hook)(base,sysv)
160 FindExecNode(type,name)(d0/a0)
161 AddExecNodeA(innode,Tags)(a0/a1)
162 AllocVecDMA(byteSize,requirements)(d0/d1)
163 FreeVecDMA(memoryBlock)(a1)
164 AllocPooledAligned(poolHeader,byteSize,alignSize,alignOffset)(base,sysv)
165 AddResident(resident)(base,sysv)
166 FindTaskByPID(processID)(base,sysv)
##private
167 private7()()
168 private8()()
169 private9()()
170 private10()()
##public
171 DumpTaskState(task)(a0)
172 AddExecNotifyType(hook,type)(base,sysv)
173 ShutdownA(TagItems)(base,sysv)
##private
174 private11()()
##public
175 AvailPool(poolHeader,flags)(base,sysv)
##private
176 private12()()
##public
177 PutMsgHead(port,message)(base,sysv)
178 NewGetTaskPIDAttrsA(TaskPID,Data,DataSize,Type,Tags)(d0,a0,d1,d2,a1)
179 NewSetTaskPIDAttrsA(TaskPID,Data,DataSize,Type,Tags)(d0,a0,d1,d2,a1)
##end

The following problems can be seen: 1. AVL tree functions are not compatible with both AmigaOS 3.9. Actually they were meant to be compatible, but AVL support author misplaced them. 2. AllocVecPooled() and FreeVecPooled() are not compatible with MorphOS. They are also not compatible with AmigaOS3.9 AVL functions in MorphOS they occupy their offsets; and AVL tree functions in MorphOS are moved to separate btree.library (which provides also red-black trees). 3. AROS-specific NewAddTask() and NewAllocEntry() functions occupy LVOs owned by another MorphOS functions.

My proposals to fix all this: 1. Move AVL functions to AmigaOS 3.9-compatible offsets (142-151). There is a strong reason to keep them in exec.library because are useful for building associative arrays (like GfxAssociate() does), also I'm going to use them in new protected memory allocator. In AROS applications AVL functions are already made popular by libraries using per-opener base using AVL trees for associating base with task. I don't know how popular AVL tree functions are in AmigaOS applications. 2. Move AllocVecPooled() and FreeVecPooled() to libamiga.a. They are small and simple enough for this. 3. Remove NewAddTask() whose functionality is covered by MorphOS NewCreateTaskA() function, which is much simpler to use. 4. NewAllocEntry() is subject to further discussion. First, I don't like its declaration (it could return struct MemList * instead of BOOL). Second, perhaps we could accommodate its functionality into existing AllocEntry(). And, third variant is to move it to some LVO which is reserved in both AmigaOS and MorphOS (like 169).

Any other opinions?

Other libraries will follow (layers, intuition, graphics. etc.).

arch/common is for drivers where it is difficult to say to which CPU and/or arch they belong: for example a graphics driver using the PCI API could as well run inside hosted linux as on PPC native. Then it's arch-independent code and it should be in fact outside of arch. Currently they are in workbench/devs/drivers. Can be discussed, but looks like it's just a matter of being used up to a particular location. At least no one changed this.

t is not about the fact that it is small or large program. Problem is > that code in repository is often taken as start by programmers to start > new projects. This way bad habits get spread.

It's arguable that doing everything manually is a bad habit. AmigaOS pure programs always do this, because there's no other way. Ported code also does this.

In the past much time was spent to get rid of these manual opening of libraries often filled with bugs as the exception clauses are not in the common code path. Annoyed to see this code being re-added. Just for small CLI tools. BTW, binary size gets even smaller.

When in ABI V1 I do think to implement a compile switch that allows to make programs residentable without needing to do something. Why wait? We can have resident-able CLI utilities right now. Isn't it useful? BTW, want to tell about ABI v1, in general. I don't have time to write an article about this, but i tried to implement ET_EXEC type files as AROS executable. I came to a conclusion that this is not feasible. It becomes more difficult to load them, because of static address assignment. It's much easier to implement a BFD backend that produces relocatable files but executes final linking steps (omitted with -r). As to -fPIC, it does not provide any improvement. GOT is overhead, nothing else, for non-UNIX systems. The resulting code is slower and not smaller than with large code model. I committed a patch to AROS gcc which tells it to use large-code model by default on x86-64.

ET_EXEC is not appropriate for AROS because it's non-relocatable snapshot of address space. Yes, we actually can relocate it, provided that we keep relocs in the output (using -q option), and relocation is even simpler in this case (adding a fixed offset to all absolute addresses), but there is another issue: section alignment. ELF suggests that different sections (code and data) have different memory-access rights. Code is read-only and data are not executable. This implies that these sections are aligned to memory-page borders. In order to match different page sizes (4KB, 8KB, 16KB, etc), linker uses "common page size", which is 64KB. This means that there are empty 64KB between code and data in the file. It does not occupy space on disk, because it is encoded in section offsets. It also does not occupy space in memory on UNIX (where address space is virtual, it simply misses that portion), but it would occupy these 64KB if loaded on AROS, where memory has 1:1 mapping. Yes, we could shorten this size, for example to 4KB, but can we have problems, especially on hosted AROS? Can there be host OSes using larger than 4KB pages? If so, this will mean that we can't correctly work on these OSes. This issue can be bypassed by splitting the file section-by-section, in the same way as it is now. However in this case ET_EXEC is more difficult to handle than ET_REL, and has no advantages over current loader. It is more difficult because for ET_REL relocations implicit addends are destroyed, and we need to calculate them back. What I described here implies that in near future, AROS will have memory protection. I know how Michal and I will implement it, but it's not ready yet. I tested another approach: I implemented a backend for ld linker, which produced relocatable files as output. It worked fine, it even build binaries with -fPIC successfully. It was done as an experiment in using small PIC code model for x86-64. I am sorry that it did not survive. I erased it upon discovering that PIC gives no real advantage on AROS (because it is designed to share the same code mapped in different address space, mitigating x86-64 addressing limitation was a pure side-effect and was less efficient than using large code model). I thought that this work is not needed, and failed to think about future. Here is a summary of what it did: 1. The backend was implemented as a new emulation (-melf_aros_x86_64) which was used by default. It was possible to build an ET_EXEC file by specifying -melf_x86_64 on the command line. This can allow to get rid of $(KERNEL_CC) for native ports (removing the need for one more toolchain). 2. -fPIC worked, producing working binaries (however they had no advantage). My aim was not to implement base-relative data, so I did not develop it further, lacking needed knowledge. 3. When an undefined symbol is discovered, ld says where it was referenced from (as usual). It's much better than "there are undefined symbols" in collect-aros. 4. collect-aros' job was degraded to collecting symbols set. It did not need to supply additional options like -r. BTW, -r was not broken, it worked, producing partially linked file. 5. Resulting binaries were marked by ELFOSABI_AROS. ABI version field also can be filled in (this is what we want).

So, if you like this result, and agree with me about leaving relocatable binary format, I can reimplement my backend x86-64 (in a slightly architecturally better way than it was done) and provide as a reference implementation. It will need to be ported to i386, PPC and ARM (new enlightened collect-aros will not work with old compilers, and I think it's redundant to keep both collect-aros versions).

AmigaOS v4 executable. Yes, they are ET_EXEC. And they have 64KB gap. So AmigaOS v4 either uses virtual memory (this is likely), or wastes 64KB of RAM per loaded executable. In theory we could implement similar virtual memory system, but: a) I think it would not fit to hosted environment b) This would mandate virtual memory, and m68k binary compatibility would suffer. The system would not work without it. c) Why should we reimplement AmigaOS v4 internals just in order to use ET_EXEC format? ET_REL suffices, and I succeeded in overcoming the small linker's limitations.

As one of the m68k maintainers, I say 'ET_REL' is great! Keep ET_REL!

I don't mind if it is ET_EXEC or ET_REL as long there is a difference between an executable and an object (.o) file. Also symbol name strings don't have to be in an executable, are they there with your new linker.

ABI Change

I thought the plan was that all new development would now go into ABIv1, not just ABI-related stuff (with some things backported to v0 as individual developers see fit). We did agree that ABI V1 would be the main development branch and I think it also means that everything that is done on ABI V0 has to be merged into ABI V1. I'm not sure it has to imply everything has to be committed to main trunk first before putting it in ABI V0 branch.

I do expect some hectic times ahead on main trunk; some of my commits will break everything but i386 hosted. So I think it would be better for people who just want to improve or add a driver that they could work in a somewhat less turbulent place. >From the other side I do see that if the development is happening on ABI_V0 this branch will not always be stable. I am open to either solution.

Are the nightly builds now based on ABIv1 or ABIv0, given that I didn't do 'svn switch' or anything else?

If nothing has changed main trunk will be build, this is thus the ABI V1 development branch.

When I merged the difference between previous and current version of nlist into trunk it created a svn:mergeinfo property. Merging happens always in the working copy. It requires an additional commit to bring the result in the repository.

In the end there should be no checks left for AROS_FLAVOUR and AROS_FLAVOUR_BINCOMPAT. IMO it is an ABI issue.

I would propose to change the libraries to not use register based argument passing but just C argument passing so no stubs are needed anymore and one can just use the mesa source. This would be a bigger change. The bigger question actually is what is *THE* correct way of creating shared libraries going forward? IMO ported libraries from external code should be able to be built with as least changes as possible to the source code, this means C argument passing.

Starting work on Mesa I knew next to nothing about shared libraries, so I just checked how dos/exec/graphics looks like and duplicated the design. The register based argument passing seemed to me important as I thought this is requirement for compiling shared libraries for m68k - am I correct?

It is not a requirement; arosc.library builds fine on m68k. But I do think it would be good that we could put arguments in registers automatically for C function call (e.g. NOT use SYSV m68k calling convention but another one). I think it would be ideal if all functions calls on m68k - in a library or not - would use registers for arguments first before using stack. Don't know if gcc supports that.

I am thinking about two things:

Implement a check in the nightly build that checks if there is an update to the upstream code to what is in the AROS vendor branch.
In the nightly build generate a diff of the code in the AROS tree to the vendor branch. Put it on the website so the upstream coders can see what changes we have done in their code to get it to work on AROS.

But as discussed in another thread, the first target is to get rid of as much code in contrib as possible and try to not have an AROS branch of the code.

Integrating

A rapid merge schedule, but only one feature in development at a time. Deadline for each feature is one month from the start. If the deadline cannot be accomplished, the feature moved to a branch, trunk is rolled back, and we move on to the next feature.

For example, suppose we merge in the DOS Packet changes tomorrow. Deadline for completion Jan 1, 2011. If we can't get a working AROS on all currently maintained ports, we branch that off, roll back, and move on to the next item on the milestone list.

There are 11 items on the ABI v1 list. If we take *one* task per month, have a hard deadline per ABI v1 task (and we need some to stick to this deadline!), and all work together on meeting that task, I think we can do this within 2011. Maybe even sooner.

Focus. That's what we need. Focus on the single, testable task at hand.

Let's not rush all the changes - that's untestable, and will lead to lots of frustration. Just one ABI change at a time, and we'll get there.

What I want to achieve is make sure that for this development we don't follow "when its done" model. The cap should not be seen as "by day X you must do 100% of what is needed" but rather "you have until day X to do what you can - be sure to select most important stuff first".

After ABI V1 is merged, we will effectively have two main lines. Some people will make changes to -stable, because they will want users to have those. Some people will make changes to trunk, because they will use ABI V1 code or will just want to go "future AROS". Some other people will even hold off doing changes and will wait "until AROS is back to normal". People will have the synchronization problems pretty much the same you are now having. That is why I feel it is important to have a clear message on when things will be "back to normal", even if it will not be 100% what we wanted to have.

will you work on -stable or trunk.

How will you be synchronizing between -stable and trunk.

Will you even be synchronizing between these two paths?

Merging

What is a possibility is to work from now on in a branch for further development: branches/abiv1/trunk-DOS is made for DOS related changes that break backwards compatibility. Nightly for m68k could also be switched to this branch. I will first need to bring this branch up to date before one can start on it. Any objections to branching 'trunk' to 'stable' and merging V1 into trunk?

How about one of these approaches:

a) merge the branch into trunk (rom/dos) ifdef'ing the incompatible changes for amiga target

b) merge the branch into trunk (arch/amiga-m68k/dos) and continue incompatible development in arch

Moving the development of amiga-m68k into branch, will make the project dormant/unmantained or even incompatible already in mid-term, since no one will be interested in extra effort in synchronizing trunk to this branch. Also the promise of fixing AROS thanks to amiga-m68k port will largely be nullified, because no one will be interested in extra effort of moving the changes from branch to trunk.

I completely agree with you that having ABI V1 worked in trunk would be a bad idea since i386 AROS (the one that is most used by people) would be backwards-compatibility-broken all the time. The amiga-m68k port however is a completely different topic, since it is still at level of development and everyone expects it to be broken many times. That's why in my view things which are not nice for i386 are acceptable for amiga-m68k.

I also think cost of repetitive merging trunk to branch (merge + compile + check if everything works + fix bugs) is much higher than cost of one-time removal + fix of amiga-m68k specific dos.library.

Anyhow, the decision is up to Toni and how it is easier for him to work. I just wanted to make sure all possibilities are being considered before making the final call. :)

Would you consider changing your work model to something like:

a) commit all non-breaking changes into trunk

b) commit all breaking changes into linux-i386-abiv1 "architecture"

While you still would have to make modifications to your incompatible -code based on other people's work, you would not have to merge changes into compatible files and what is more important you would make your work visible to other devs, so that people could see that there are code paths used by you and would take better care at not trying to break things too much for you?

what will the nightly builds for i386 be based on: -stable or trunk? Is to be decided but we could even do both.
what sources will be available for download on aros.org: -stable or trunk?
what will the nightly builds for other archs be based on: -stable, trunk or "at maintainers discretion"
what is the current usability status of ABI V1 linux-i386: can it boot into Wanderer, are core/contrib apps running? It does boot into Wanderer, contrib compiles limited testing on apps there but gcc should be able to compile a hello world program. Gallium has not been converted to use the new rellibase feature and is untested as it is for native.
what is the current usability status of ABI V1 pc-i385: can it boot into Wanderer, are core/contrib apps running? Not compiled nor tested.
The abi page lists 11 topics for ABI V1. Currently 3 are implemented and 2 are in progress. Staf can you update the status for remaining 6? The things in progress are mostly completed. The next big task is the dos.library compatibility but this is what started this whole discussion. The rest is not started and I would like if it other people would volunteer for some of them.

IMO they are not needed to be finished before merging into main trunk. I think most of the discussions only make sense if people can see the real code:

1 how to extend OS3.x libraries 
2 SysBase location 
3  ABI V1 reference doc 
4 varargs handling

Also for i386 native I would like somebody else doing the implementation. As code in arch/i386-pc is not compiled and not tested I do expect work to make it compatible with ABI V1 especially for inline asm, etc.

How will we distinguish pre-ABI V1 and ABI V1 binaries - I'm mostly interested in core components. Maybe we can have all .conf files modified to show common major version (50.0? 60.0?). this big version number may break some old apps on m68k. AFAICR Some progs fail to start if version number is not equal to what they expect. For the executable format I would like to switch to a proper ELF program (with relocation data included though; as it is on OS4.x) and not the relocatable object we use now.

SVN Branches

The branches are available in the repository in branches/ABI_V1 - trunk-DOS: For changes to DOS. Currently it has changes made to BSTR/BPTR to also have it word based on i386 and not byte based. Later also the removal of the device based file system should be done in this branch.

trunk-Misc: Some misc changes, mostly order of structure fields (struct Node, mouseX, mouseY, etc.)
trunk-rellibbase: Use of %ebx to use as a base for relative addressing. This is used for passing the libbase to library functions. It allows to also pass the libbase to libraries that use C argument passing. It is not implemented yet but it should also be able to be used to generate pure binaries.
trunk-genmodule_pob: extension of the peropener base libraries. A library with a per opener base can now open other libraries with a per opener base each time itself is opened. The libbase of the child library has to be stored in the parent libbase and an ..._offset global variable is used to access the libbase from the child library in the parents libbase. The parent library has to be linked with a special link lib of the child library e.g. -lmodname_rel (f.ex. -larosc_rel or uselibs=arosc_rel). With this change it should be possible to have libraries that use arosc and allocate memory from the heap of each of the programs that uses this library. This branch builds on trunk-rellibbase.
trunk-arosc_etask: This patch uses rellibbase in combination with per task libbase to convert arosc in a normal library using %build_module. The per task or per id libbases genmodule code was merged in main trunk but is only usable for functions with C argument passing when also rellibbase is available. Using rellibbase also part of ETask is moved into arosc libbase. Purpose is in the end to fully remove the need of ETask. This branch builds on trunk-genmodule_pob.
trunk-aroscsplit: This is the branch where I am splitting arosc as explained on the list. Currently arosstdc.library is split off from arosc.library. It contains ANSI-C functions. Next step is to transform current arosc.library to arosnix.library for POSIX functionality, possibly improving standard compliance of the code. This branch builds on trunk-arosc_etask.

Currently only i386 hosted is tested and will work. So if you want to have it working for anything else you'll first need to fix it.

Changes may be committed to the branches. Please notify me when you do as the commits in branches are filtered out from the svn announce list. Also I need to merge the changes to higher branches and test them. As I am using SVK here at home I would prefer to do this merging myself to keep the svk properties in order.

The trunk-rellibbase change would need to be reflected in the compiler infrastructure since it would be generating the function calls. Also, if generating pure reentrant code, the same principle would apply.

In case you haven't been following my posts, I think LLVM would make a good addition to the AROS compiler toolbox. It currently uses 3 calling conventions internally and allows for several system-specific ones. The C calling convention supports varargs and is the common one. FastCall uses all of the registers for calling and the stack for spilling, similar to how it is done now except there is no varargs support in this calling convention. Cold calling convention changes as few registers as possible and uses mostly stackframes so that things that aren't called very often won't interfere with the code that calls them very much. Cold calling doesn't support varargs either.

In addition to those calling conventions, system specific conventions are allowed. For this we'll need a library calling convention in order to make the libraries' base pointers get loaded in time for use. The same goes for the pure reentrant base-relative calling convention.

Also, I'll need to know how this will affect the x86_64 version of AROS. If it doesn't need the changes, I might actually be able to start there.

In order to do this I've looked up the following documentation on the LLVM site: http://llvm.org/releases/2.8/docs/CodeGenerator.html#x86 which tells (in brief) the features of the x86 backend on LLVM, and http://llvm.org/releases/2.8/docs/SystemLibrary.html which tells how to wrap the system-specific code in classes and #ifdef structures within the LLVM source tree.

Since I plan to support ABI v1, I'll be needing to collaborate with Staf on the progress made on that front. Among the things I need to know are:

Are the FS and GS segment pointers in use in AROS?

Not for i386 ABI V1 as it complicates things for the hosted version. I think Michal is using them for x64_64 native though.

How are the extensions handled (such as AVX, SSE, MMX)?

Don't know, probably something still to be discussed before finalizing ABI V1.

Have these three libraries:

- arosstdc.library: All C99 functions except those that need dos.library
- arosstdcdos.library: All C99 functions that need dos.library
- arosnixc.library: POSIX functions

arosstdc.library is part of the ROM as one of the first modules so that I could remove librom.a and this leads to arosstdcdos.library that can only be initialized after dos.library. arosstdcdos.library is still a disk-based library.

Moved all math functions which are currently in compiler/mlib into arosstdc.library as they are also part of C99. This means that arosstdc.library becomes bigger maybe giving problems for the ROM for m68k. Given that m68k does not accept .bss section anymore in ROM modules Probably won't be able to put arosstdc.library in the ROM anymore if it needs to work also on m68k. I'll probably need to maintain my own branch where this has happened and librom.a is removed.

Although it seemed the way to go at the time now don't have a problem anymore with making a separate disk-based shared library arosstdm.library which will contain the math stuff. Prefer the former though.

So should the math stuff should be part of arosstdc.library or in a separate library?

this implies that exec.library and kernel.resource can no longer use string functions, such as strlen, memset, etc. In the version of the patch, yes as it is initialized right after exec.library as for the the following list:

    &Kernel_ROMTag,            /* SingleTask,  127  */
    &HostLib_ROMTag,           /* SingleTask,  125  */
    &Expansion_ROMTag,         /* SingleTask,  110  */
    &Exec_resident,            /* SingleTask,  105  */
    &AROSStdC_ROMTag,          /* ColdStart,   104  */
    ...

Considered splitting of functions that don't need a libbase or special exec.library in a separate resource that would come as the first module initialized but I did leave everything together.

An alternative split would be to have: - arosromc.resource: Functions not needing a libbase or exec.library - arosstdc.library: Functions needing libbase, exec.library, dos.library

Another alternative is to still provide a mini librom.a that may _only_ be used by the modules coming before AROSStdC_ROMTag. screen C library and most functions can be put in a resource and that's why my preference now is for a arosstdc.resource that is the first module in ROM. It is not that good idea from m68k point of view.

In worst case any module that runs before expansion may only have slow and precious chip memory available, if hardware only has autoconfig fast ram or accelerator fast ram in non-standard locations (enabled by card's diag boot rom)

The biggest problem cases are Blizzard A1200 accelerators (which are very very common). A3000/A4000 fortunately usually always have mainboard fast ram or accelerator RAM is mapped to known mainboard ram addresses.

All fast ram is guaranteed available after priority 105 RTF_COLDSTART, when diag init module is run and it should also be the first RTF_COLDSTART module for best compatibility.

Also adding extra modules to high priorities (105 or higher) or adjusting priorities can cause problems with some boards, for example CyberStormPPC because it adds multiple residents and assumes correct ordering with OS modules.

The split stays as it is now and we static link needed C functions in exec.library and modules initialized before it.