DOC: extend the description of dependency pitfals and intricasies

Still not complete, but a complete outline now
2014-10-02 04:08:26 +02:00 · 2014-10-02 04:08:26 +02:00 · 073efdf6a4
commit 073efdf6a4
parent b5de8523b1
1 changed files with 97 additions and 22 deletions
--- a/doc/technical/code/linkingStructure.txt
+++ b/doc/technical/code/linkingStructure.txt
@ -25,8 +25,8 @@ specified, coherent operational structure.
 Within this _definition of a coded structure_, there is an inherent tension
 between the _absoluteness_ of a definition (a definition in mathematical sense
 can not be changed, once given) and the _order of spelling out_ this definition.
-When put in such an abstract way, all of this might seem self evident and trivial,
-but let's just consider the following complications in practice...
+When described in such an abstract way, these observations might be deemed self evident
+and trivial, but let's just consider the following complications in practice...

 - Headers are included into multiple translation units. Which means, they appear
  in several disjoint contexts, and must be written in a way independent of the
@ -34,8 +34,8 @@ but let's just consider the following complications in practice...
 - Macros, from the point of their definition onwards, change the way the compiler
  ``sees'' the actual code.
 - Namespaces are ``open'' -- meaning they can be re-opened several times and
-  populated with further definitions. Generally speaking, the actual contents of
-  any given namespace will be different in each and every translation unit.
+  populated with further definitions. The actual contents of any given namespace
+  will be slightly different in each and every translation unit.
 - a Template is not in itself code, but a constructor function for actual code.
  It needs to be instantiated with concrete type arguments to produce code.
  And when this happens, the template instantiation picks up definitions
@ -70,10 +70,13 @@ Since it is really hard to reconcile all these conflicting goals, we are bound
 to rely on *patterns of construction*, which are known to work out well in
 this regard.

+[yellow-background]#to be written#
+Import order, forward decls, placement of ctors, wrappers, PImpl
+

 Code size and Code Bloat
 ~~~~~~~~~~~~~~~~~~~~~~~~
-Each piece of code incurs costs of various kinds
+Each piece of code incurs cost of various kinds

 - it needs to be understood by the reader. Otherwise it will die
  sooner or later and from then on haunt the code base as a zombie.
@ -86,28 +89,28 @@ Each piece of code incurs costs of various kinds
 - and since we're developing with debug builds, each and every definition
  produces debug information in each and every translation unit referring it.

-Thus, for every piece of code we must ask ourselves how _visible_ this code
-is. And we must consider the dependencies the code incurs. It pays off to
+Thus, for every piece of code we must ask ourselves how much _visible_ this
+code is. And we must consider the dependencies the code incurs. It pays off to
 turn something into a detail and ``push it into the backyard''. This explains
 why we're using the frontend - backend split so frequently.


-Source dependencies vs binary dependencies
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Source and binary dependencies
+------------------------------
 To _use_ stuff while writing code, a definition or at least a declaration needs to
 be brought into scope. This is fine as long as definitions are rather cheap,
 omitting and hiding the details of implementation. The user does not need to understand
 these details, and the compiler does not need to parse them.

 The situation is somewhat different when it comes to _binary dependencies_ though.
-At execution time, there are just pieces of data, and functions able to process this
-specific data. Thus, whenever a specific piece of data is to be used, the corresponding
-functions need to be loaded and made available. Most of the time we're linking dynamically,
-and thus the above means that a dynamic library providing those functions needs to be loaded.
+At execution time, all we get is pieces of data, and functions able to process specific
+data. Thus, whenever some piece of data is to be used, the corresponding functions need
+to be loaded and made available. Most of the time we're linking dynamically,
+and thus the above means that a _dynamic library_ providing those functions needs to be loaded.
 This other dynamic library becomes a dependency of our executable or library; it is recorded
 in the 'dynamic' section of the headers of our ELF binary (executable or library). Such a
 'needed' dependency is recorded there in the form of a ``SONAME'': this is an unique, symbolic
-ID denoting the library we're depending on. At runtime, its the responsibility of the system's
+ID denoting the library we're depending on. At runtime, it is the responsibility of the system's
 dynamic linker to translate these SONAMEs into actual libraries installed somewhere on the system,
 to load those libraries and to map the respective memory pages into our current process' address
 space, and finally to _relocate_ the references in our assembly code to point properly to the
@ -121,8 +124,8 @@ As long as all we had to deal with was code in upper layers using and invoking s
 layers, there would not be much to worry. Yet to produce any tangible value, software has to
 collaborate on shared data. So the naive ``natural'' form of architecture would be to build
 everything around shared knowledge about the layout of this data. Unfortunately such an approach
-endangers the most central property of software, namely to be ``soft'', to be able to adapt to
-change. Inevitably, data centric architectures either grow into a rigid immobile structure,
+endangers the most central property of software, namely to be ``soft'', to adapt to change.
+Inevitably, data centric architectures either grow into a rigid immobile structure,
 or they breed an intangible insider culture with esoteric knowledge and obscure conventions
 and incantations. The only known solution to this problem (incidentally a solution known
 since millennia), is to rely on subsidiarity. ``Tell, don't ask''
@ -130,13 +133,85 @@ since millennia), is to rely on subsidiarity. ``Tell, don't ask''
 This gets us into a tricky situation regarding binary dependencies. Subsidiarity leads to an
 interaction pattern based on handshakes and exchanges, which leads to mutual dependency. One
 side places a contract for offering some service, the other side reshapes its internal entities
-to comply to that contract superficially. Generally speaking, to handle the entities involved
-in each handshake, effectively we need the internal functions of both sides. Which is in
-contradiction to a ``clean'' layer hierarchy.
+to comply to that contract superficially. Dealing with the entities involved in such a handshake
+effectively involves the internal functions of both sides. Which is in contradiction to a
+``clean'' layer hierarchy.

-For a tangible example, lets assume the our backend has to do some work on behalf of the GUI;
+For a more tangible example, lets assume our backend has to do some work on behalf of the GUI;
 so the backend offers a contract to outline the properties of stuff it can work on. In compliance
-with this contract, the GUI hands some data entities to the backend to work on -- but by their
-very nature, these data entities are and remain GUI entities. When the backend invokes compliant
+with this contract, the GUI hands over some data entities to the backend to work on -- but by their
+very nature, these data entities are and always remain GUI entities. When the backend invokes compliant
 operations on these entities, it effectively invokes functionality implemented in the GUI. Which
 makes the backend _binary dependent on the GUI_.
+
+While this problem can not be resolved in principle, there are ways to work around it, to the degree
+necessary to get hierarchically ordered binary dependencies -- which is what we need to make a lower
+layer operative, standalone, without the upper layer(s). The key is to introduce an _abstraction_,
+and then to _segregate_ along the realm of this abstraction, which needs to be chosen large enough
+in scope to cast the service and its contract entirely in terms of this abstraction, but at the same
+time it needs to be kept tight enough to prevent details of the client to leak into the abstraction.
+When this is achieved (which is the hard part), then any operations dealing with the abstraction solely
+can be migrated into the entity offering the service, while the client hides the extended knowledge about
+the nature of the manipulated data behind a builder function footnote:[frequently this leads to the
+``type erasure'' pattern, where specific knowledge about the nature of the fabricated entities -- thus
+a specific type -- is relinquished and dropped once fabrication is complete], but retains ownership
+on these entities, passing just a reference to the service implementation. This move ties the binary
+dependency on the client implementation to this factory function -- as long as _this factory_ remains
+within the client, the decoupling works and eliminates binary cross dependencies.
+
+This solution pattern can be found at various places within the code base; in support we link with
+strict dependency checking (Link flag `--no-undefined`), so every violation of the predefined
+hierarchical dependency order of our shared modules is spotted immediately during build.
+
+
+Finding dependencies at start-up
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+[yellow-background]#to be written#
+
+
+Transitive binary dependencies
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Binary dependencies can be recursive:
+When our code depends on some library, this library might in turn depend on other libraries.
+At runtime, the dynamic linker/loader will detect all these transitive dependencies and try to load
+all the required shared libraries; thus our binary is unable to start, unless all these dependencies
+are already present on the target system. It is the job of the packager to declare all necessary dependencies
+in the software package definition, so users can install them through the package manager of the distribution.
+
+There is a natural tendency to define those installation requirements too wide. For one, it is better
+to be on the safe side, otherwise users won't be able to run the executable at all. And on top of that,
+there is the general tendency towards frameworks, toolkit sets and library collections -- basically
+a setup which is known to work under a wide range of conditions. Using any of these typically means
+to add a _standard set of dependencies_, which is often way more than actually required to load and
+execute our code. One way to fight this kind of ``distribution dependency bloat'' is to link `--as-needed`.
+In this mode, the linker silently drops any binary dependency not necessary for _this concrete piece
+of code_ to work. This is just awesome, and indeed we set this toggle by default in our build process.
+But there are some issues to be aware of.
+
+Static registration magic
+^^^^^^^^^^^^^^^^^^^^^^^^^
+[yellow-background]#to be written#
+
+Relative dependency location
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Locating binary dependencies relative to the executable (as described above) is complicated when several
+of _our own dynamically linked modules_ depend on each other transitively. For example, a plug-in might
+depend on `liblumierabackend.so`, which in turn depends on `liblumierasupport.so`. Now, when we link
+`--as-needed`, the linker will add the direct dependency, but omit the transitive dependency on the
+support library. Which means, at runtime, that we'd need to find the support library _when we are
+about to load the backend library_. With the typical, external libraries already installed to the
+system this works, since the linker has built-in ``magic'' knowledge about the standard installation
+locations of system libraries. Not so for our own loadable components -- recall, the idea was to provide
+a self-contained directory tree, which can be relocated in the file system as appropriate, without the
+need to ``install'' the package officially. The GNU dynamic linker can handle this requirement, though,
+if we supply an additional, relative search information _with the library pulling in the transitive
+dependency_. In our example, `liblumierabackend.so` needs an additional search path to locate
+`liblumierasupport.so` _relative_ to the backend lib (and not relative to the executable). For this
+reason, our build system by default supplies such a search hint with every Lumiera lib or dynamic
+module -- assuming that our own shared libraries are installed into a subdirectory `modules` below
+the location of the executable; other dynamic modules (plug-ins) may be placed in sibling directories.
+So, to summarise, the build defines the following `RPATH` and `RUNPATH` specs:
+
+for executables::      `$ORIGIN/modules`
+for libs and modules:: `$ORIGIN/../modules`
+