Aros/Developer/Docs/Libraries/OOP

The case for a binary object model. The ability to be able to use C++ and other programming languages from a VM that statically compiles at install time, especially if you want to ditch the JIT in the process.

Also, I'll be needing to wrap any usage of OS-specific structures or variables in a library that links at install time as well. This could simply be a hosted AROS though.

But Amiga style libraries are not objects in any meaningful way. The object-orientation in AmigaOS are not centered around libraries, but around other structures. Moving towards more OO might be interesting, but adding interfaces to the libraries isn't achieving that.

In order to do OOP as a binary object model, I got the source to Marius Schwarz' OOP4a. There were some serious flaws in it. It used linear searches to find the names of the commands and didn't support interface inheritance. I know how to do Java-style interfaces in a .library but the dependence on the linear name lookups made things unusable.

When I started making my Object.library implementation I decided that interface inheritance is preferable to multiple inheritance because it yields no dreaded diamond dependencies. I used hash tables to do name lookups instead of a linear search and wrote it in C rather than 68k Assembly. I'm not quite sure how AROS does its library headers without FD files but I realize that they have to be processor independent on AROS.

This brings us back to the subject: A binary object model. OOP4a used subdirectories within libs:classes/ on 68k AmigaOS to indicate the inheritance structure hierarchy of the prototype class libraries that it came with. This would keep the namespaces separate from between the libraries so that no object would interfere with the name of another object (within reason). I was going to use a separate interface file for each interface in a libs:interfaces/ directory that would be a global repository of all interfaces other than the direct inheritance interface (referred to on OS 4 as "main"). The direct inheritance interface could be contained in the same directory as the library it defines even though my original plan was to use a hash table for the name lookup.

The way inheritance was implemented in OOP4a was for a parent link to be held in the library's global area in the positive offsets of A6. This meant that when opening a class library, its parent directory would be accessed and the parent library opened initiating recursion leading up to root class where a null was encountered. In my Object.library implementation the Object.library class was the root class anyway thus eliminating some specialized cases where a parent class might not have a name lookup, and so on.

oop.library is doing some of these things but I'm convinced it is the right way to do this. I do like Amiga LVO tables over hash tables to avoid the name binding during load. I could maybe live with one hash table for the directories and then LVO tables in each directory.

I think the OS4 interfaces of the libraries are actually doing this also. It allows to have different interfaces in one library with each having their own name and LVO table.

It does require that the developer tells which method or function comes at which place in the LVO table during compilation of the library though.

Recreating OS 4-style interface tables is easy. What bothers me about the interfaces is that you need a pointer to the interface in whatever register (preferably a different register for each interface on PPC and other register-rich processors). I'd assume that the base pointer for the library would be contained in the interface structure as well.

After having heard the objections to the hash table name lookup, I can tell that it would be slow to open such a library. Likewise the inheritance chain would be even slower since it would have to execute a name lookup for each stage of direct inheritance until it was resolved. This brings us to the big question: How would we store the interfaces if they were more than just a name? Fat pointers? Making the hash lookup faster by using the same hash for each stage of inheritance?

A framework should not require a name-mangling scheme but instead, uses subdirectories within libs: to organize the namespace dependencies and support Java-style interface inheritance as well. The downside is that I'm planning on using hash-tables of names instead of requiring FD or SFD files like AmigaOS does for its libraries to initialize its vtables.

When I looked at the source to OOP.library I didn't like the way that it did everything at runtime. I was going to make Object.library to supplement OOP.library. Do you think that would be suitable or should it be added directly to Exec.library like OS 4 does it? My concern with doing it too much like OS 4 does it is that OS 4 has a utility to generate the C skeleton code from XML descriptors derived from an SFD file. This means that making a compiler for a language other than C to generate libraries, that makes things very tricky.

In order to allow LLVM to generate code for multiple flavors of AROS, I have to replace or wrap some functions of the C runtime library in other code to make sure that they don't make assumptions about variable sizes and structure sizes. In GCC sizeof and offsetof are rendered as macros for minimum overhead but this gets in the way of portability so they will need to be implemented with functions.

If we wanted to take it a step or two farther and make it generate code for all OSs, we'd need to make no assumptions about the structures themselves, thus FILE * would have to be replaced by void * (or i8 * as LLVM represents it; a pointer to a byte) in every instance in the wrapper code only to be reverted to FILE * at second link time on the destination platform. For an early attempt at wrapping the C stdio.h headers in cross-platform LLVM code, you can look at the project the Mattathias team started at in the SVN browser.

Clearly the LVO method is faster at load time, and load time linking can be slow - witness apps like OpenOffice where optimizing load time linking has been a significant part of speeding up the application startup on Linux. But I'm unsure how much it would matter in most cases...

There's also nothing that precludes supporting/using both direct offsets and offering name lookups. E.g. I'm plodding along on a Ruby compiler (in Ruby), and while I use C++ style vtables where possible, it also needs to support name lookups because if certain constructs are used in Ruby you can't determine at compile time the full set of method names that will be used, and you also need to support "send" which means it needs a name => method mapping. So I have the vtables, but will add a hash table that maps names to vtable slots for cases where it's not feasible to determine the offset at compile time.

E.g. adding a function pointer to the library definitions that can be used to query the library for meta-data such as name => lvo lookup at runtime would be a fairly non-invasive change that would only slow down apps that for whatever reason actually need name based lookup.

As for the OOP method used in AmigaOS and AROS, that is slow on a whole another level... It made sense on Amiga hardware because it can be made very memory efficient, but there are approaches that can be made far faster and allow the same extensibility... I'm not sure if it matters though - one thing we really ought to have for both cases would be some real measurements and profiling..

The practical use case was never introduced on OS4, unfortunately. What the OS4 interfaces try to mimic a little bit, is the COM known on Windows or XPCOM known from mozilla suite. There, you have objects which can implement several interfaces, all of them inherited from IUnknown.

The COM method QueryInterface is replaced in exec by exec.library/GetInterface. The COM/XPCOM solution can, in contrast to OS4 version, be called on *any* interface we want. COM-style objects would serve the same type of purpose on OS 4 if there were additional programming languages available. Unfortunately there are none. C++ classes need interfaces to implement themselves in a binary library format though.

This link has some C-like pseudo-code demonstrating the implementations and usages of an interface in an interface-inheritance chain.

This link is a higher-level pseudo-code if you want a guide to how the low-level C code works.

None at the moment. And, as far as I know, only few system libraries are using anything else than the "main" interface (most famous example is expansion.library and it's pci interface)

I suggest reading this for inspiration:

Protocol Extension: A technique for Structuring Large Extensible Software Systems (Dr. Michael Franz, 1994)

Protocol Extension uses vtables, but allows them to be dynamically updated at runtime by propagating changes down the inheritance chain.Calls are far faster than Amiga/AROS BOOPSI-style dispatch - the vtable for the most derived classes always holds the right method pointer to call, so the cost compared to C++ virtual calls is just an extra indirection.

The downside is that plain vtables would grow large if the class hierarchy is large since every class vtable needs a slot for every possible method. This can be mitigated by splitting the API's into smaller interfaces that are not inherited from a single root class, in effect creating a shallow, sparse trie of method pointers.

You still have the lookup cost if you want name based lookup of course, but for apps that can be compiled against a symbol table containing the static method id's you very good performance with flexibility as good as dispatcher function based OO systems like BOOPSI (as you can add/remove methods at runtime as long as the vtables are big enough, with a fallback to a dispatcher as the worst case for fully dynamic calls).

(incidentally Franz' wrote his PhD on a method for architecture independent binaries called Semantic Dictionary Encoding; more recently his claim to fame is trace trees, together with Andreas Gal, as used by TraceMonkey and LuaJIT)

I think I may limit the subdirectory thing to namespace resolution. Other than that, most of what I had already written looks good. I planned on having LVOs as well as a name lookup originally. This saves me from having a separate file for the method names in the interface files. Also, unlike the Protocol Extension technique, I'll maintain a root class since it only defined a "toString" method anyway, though I may rename the method "DebugPrint".

I thought at least in part it was to cope with the `problem' that amigaos libraries can only version forward and must remain 100% backward compatible. The only real way to reset the interfaces to remove deprecated snot is to put a version in the library name, and then you're stuck with lots of libraries. Of course except for amigaos-only libraries there's no practical way to do anything else, so the usefulness of this idea probably isn't huge. I think it's used for things like MMU support too which are system specific.

But as with system libraries in general it isn't really a solution intended for application programming which is why boopsi was created.

Note that the 'C-object' based mechanism (i.e. structs in structs) has lots of shortcomings too. This is what glib and gtk+ uses and after a lot of over-engineering they managed to address most problems - such as structure sizes growing breaking binary compatibility (store a separate private data chunk which is allocated and used more like the way boopsi allocates objects), add interfaces (look it up in a cast at run-time called from a simple function), property handlers, events, and so on. But it's big, slow, hard to use requires a ton of boilerplate, and in general just not very nice ... it's been a few years since I've used it but I don't think the core has changed much since then.

every public function (and most internal ones, sigh) has a 'cast' macro which checks the class type of the object using a tree scan, or a linear scan iirc for interfaces (actually most USERS of an object do another cast which did this

same check too - so error messages point to the source ...) (and usually every member access either uses a macro too or calls another function which does, or uses the get/set interface)

private data requires a class lookup (same as boopsi for data pointer), and also means every object instantiation is at least 2 allocated memory blocks
IIRC every 'property' is keyed on a string which it has to look up in a global table to convert to a dynamic integer which then needs an if/then if/else tree to process since it isn't static (i.e. much worse than boopsi).
the event handling was stuff of nightmares, i don't think anybody deserves to know how that works.

Note that a lot of the mess was supposedly to support language bindings to things other than c.

It's possible to fix some of that with a simpler implementation but you're still stuck with some serious limitations since it is essentially a static binding. So it's still difficult to cope with versioning and binary compatibility and dynamic interfaces.

x86 is particularly good at executing crappy branchy code, and cpu's are so damn fast anyway i doubt moving to another outdated technique will really be worth it unless you gained a lot. And by the time you add all the necessary baggage I have a good feeling you probably wont. There are always trade-offs ...

Also depends on the problem being fixed. e.g. 'i want to write/call system libraries in/from c++' is different to 'i want a c-based application level object system'.

I am not satisfied with our current oop.library implementation. So I did not put any effort in it as I first want to improve the whole OO system. I find there is still too much happening during a method call. I have some ideas but no time to test if my ideas make any sense. If anyone wants to have a look I can try to put down my rough & vague ideas.

References

APTR OOP_NewObject(struct OOP_IClass *classPtr, UBYTE *classID, struct TagItem *tagList) 
OOP_AttrBase OOP_ObtainAttrBase(STRPTR interfaceID) 
OOP_MethodID OOP_GetMethodID(STRPTR interfaceID, ULONG methodOffset) 
void OOP_AddClass(OOP_Class *classPtr) 
void OOP_ReleaseAttrBase(STRPTR interfaceID) 
void OOP_DisposeObject(OOP_Object *obj) 
void OOP_RemoveClass(OOP_Class *classPtr)

OOP_AttrBase OOP_GetAttrBase(STRPTR interfaceID) 
IPTR OOP_GetAttr(OOP_Object *object, OOP_AttrID attrID, IPTR *storage) 
IPTR OOP_SetAttrs(OOP_Object *object, struct TagItem *attrList) 
BOOL OOP_ObtainAttrBases(struct OOP_ABDescr *abd) 
void OOP_ReleaseAttrBases(struct OOP_ABDescr *abd) 
LONG OOP_ParseAttrs(struct TagItem *tags, IPTR *storage, ULONG numattrs, OOP_AttrCheck *attrcheck, OOP_AttrBase attrbase) 
void *OOP_GetMethod(OOP_Object *obj, OOP_MethodID mid)