diff --git a/doc/technical/code/linkingStructure.txt b/doc/technical/code/linkingStructure.txt new file mode 100644 index 000000000..914da6756 --- /dev/null +++ b/doc/technical/code/linkingStructure.txt @@ -0,0 +1,142 @@ +Linking and Application Structure +================================= +:Date: Autumn 2014 +:Author: Ichthyostega +:toc: + +This page focusses on some quite intricate aspects of the code structure, +the build system organisation and the interplay of application parts on +a rather technical level. + +Arrangement of code +------------------- +Since ``code'' may denote several different entities, the place ``where'' +some piece of code is located differs according to the context in question. + +Visibility vs Timing: the translation unit +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +To start with, when it comes to building code in C/C++, the fundamental entity +is _a single translation unit_. Assembler code is emitted while the compiler +progresses through a translation unit. Each translation unit is self contained +and represents a path of definition and understanding. Each translation unit +starts anew at a state of complete ignorance, at the end leading to a fully +specified, coherent operational structure. + +Within this _definition of a coded structure_, there is an inherent tension +between the _absoluteness_ of a definition (a definition in mathematical sense +can not be changed, once given) and the _order of spelling out_ this definition. +When put in such an abstract way, all of this might seem self evident and trivial, +but let's just consider the following complications in practice... + +- Headers are included into multiple translation units. Which means, they appear + in several disjoint contexts, and must be written in a way independent of the + specific context. +- Macros, from the point of their definition onwards, change the way the compiler + ``sees'' the actual code. +- Namespaces are ``open'' -- meaning they can be re-opened several times and + populated with further definitions. Generally speaking, the actual contents of + any given namespace will be different in each and every translation unit. +- a Template is not in itself code, but a constructor function for actual code. + It needs to be instantiated with concrete type arguments to produce code. + And when this happens, the template instantiation picks up definitions + _as visible at that specific point_ in the path through the translation unit. + A template instantiation might even pick up specific definitions depending + on the actual parameters, and the current content of the namespace these + parameter types are defined in. So it very much matters at which point a + specific template instantiation is first mentioned. +- it is possible to generate globally visible (or statically visible) code + from a template instantiation. This may even happen several times when + compiling multiple translation units; the final linking stage will + silently remove such duplicate instantiations stemming from templates -- + and this resolution step just assumes that these duplicate code entities + are actually equivalent. Mind me, this is an assumption and can not be + easily verified by the compiler. With a bit of criminal energy (think + namespaces), it is very much possible to produce several instantiations + of, say, a static initialiser within a template class, which are in + fact different operations. Such a setup puts us at the mercy of the + random way in which the linker sees these instances. + +Now the quest is to make _good use_ of these various ways of defining things. +We want to write code which clearly conveys its meaning, without boring the +reader with tedious details not necessary to understand the main point in +question. And at the same time we want to write code which is easy to +understand, easy to write and can be altered, extended and maintained. +footnote:[Put blatantly, a ``simple clean language'' without any means of expression +would not be of much help. All the complexities of reality would creep into the usage +of our ``ideal'' language, and, even worse, be mixed up there with the entropy of +doing the same things several times in a different way.] + +Since it is really hard to reconcile all these conflicting goals, we are bound +to rely on *patterns of construction*, which are known to work out well in +this regard. + + +Code size and Code Bloat +~~~~~~~~~~~~~~~~~~~~~~~~ +Each piece of code incurs costs of various kinds + +- it needs to be understood by the reader. Otherwise it will die + sooner or later and from then on haunt the code base as a zombie. +- writing code produces bugs and defects at a largely constant rate. + The best code, the perfect code is code you _do not write_. +- actual implementation produces machine code, which occupies + space, needs to be loaded into memory (think caching) and performed. +- to the contrary, mere definitions are for free, _but_ -- +- even definitions consume compiler time (read: development cycle turnaround) +- and since we're developing with debug builds, each and every definition + produces debug information in each and every translation unit referring it. + +Thus, for every piece of code we must ask ourselves how _visible_ this code +is. And we must consider the dependencies the code incurs. It pays off to +turn something into a detail and ``push it into the backyard''. This explains +why we're using the frontend - backend split so frequently. + + +Source dependencies vs binary dependencies +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +To _use_ stuff while writing code, a definition or at least a declaration needs to +be brought into scope. This is fine as long as definitions are rather cheap, +omitting and hiding the details of implementation. The user does not need to understand +these details, and the compiler does not need to parse them. + +The situation is somewhat different when it comes to _binary dependencies_ though. +At execution time, there are just pieces of data, and functions able to process this +specific data. Thus, whenever a specific piece of data is to be used, the corresponding +functions need to be loaded and made available. Most of the time we're linking dynamically, +and thus the above means that a dynamic library providing those functions needs to be loaded. +This other dynamic library becomes a dependency of our executable or library; it is recorded +in the 'dynamic' section of the headers of our ELF binary (executable or library). Such a +'needed' dependency is recorded there in the form of a ``SONAME'': this is an unique, symbolic +ID denoting the library we're depending on. At runtime, its the responsibility of the system's +dynamic linker to translate these SONAMEs into actual libraries installed somewhere on the system, +to load those libraries and to map the respective memory pages into our current process' address +space, and finally to _relocate_ the references in our assembly code to point properly to the +functions of this library we're depending on. + +Application Layer structure and dependency structure +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The Lumiera application uses a layered architecture, where upper layers may depend on the services +of lower layers, but not vice versa. The top layer, the GUI is defined to be _strictly optional_. +As long as all we had to deal with was code in upper layers using and invoking services in lower +layers, there would not be much to worry. Yet to produce any tangible value, software has to +collaborate on shared data. So the naive ``natural'' form of architecture would be to build +everything around shared knowledge about the layout of this data. Unfortunately such an approach +endangers the most central property of software, namely to be ``soft'', to be able to adapt to +change. Inevitably, data centric architectures either grow into a rigid immobile structure, +or they breed an intangible insider culture with esoteric knowledge and obscure conventions +and incantations. The only known solution to this problem (incidentally a solution known +since millennia), is to rely on subsidiarity. ``Tell, don't ask'' + +This gets us into a tricky situation regarding binary dependencies. Subsidiarity leads to an +interaction pattern based on handshakes and exchanges, which leads to mutual dependency. One +side places a contract for offering some service, the other side reshapes its internal entities +to comply to that contract superficially. Generally speaking, to handle the entities involved +in each handshake, effectively we need the internal functions of both sides. Which is in +contradiction to a ``clean'' layer hierarchy. + +For a tangible example, lets assume the our backend has to do some work on behalf of the GUI; +so the backend offers a contract to outline the properties of stuff it can work on. In compliance +with this contract, the GUI hands some data entities to the backend to work on -- but by their +very nature, these data entities are and remain GUI entities. When the backend invokes compliant +operations on these entities, it effectively invokes functionality implemented in the GUI. Which +makes the backend _binary dependent on the GUI_.