LUMIERA.clone/doc/technical/howto/UsingBoost.txt

190 lines
12 KiB
Text

Using Boost Libraries
=====================
//Menu: label using Boost
_some arbitrary hints and notes regarding the use of link::http://www.boost.org[Boost Libraries] in Lumiera_
Notable Features
----------------
Some of the Boost Libraries are especially worth mentioning. You should familiarise yourself with
those features, as we're using them heavily throughout our code base. As it stands, the C/C\++ language(s)
are quite lacking and outdated, compared with today's programming standards. To fill some of these gaps
and to compensate for compiler deficiencies, some members of the C\++ committee and generally very
knowledgeable programmers created a set of C\++ libraries generally known as *Boost*. Some of these
especially worthy additions were proposed and subsequently included into the new C++11 standard.
C++11 Features
~~~~~~~~~~~~~~
In Lumiera, we heavily rely on some features proposed and matured in the Boost libraries,
and meanwhile included in the current language standard. These features are thus now provided
through the standard library accompanying the compiler. For sake of completeness, we'll mention
them here. In the past _we used the Boost-Implementation_ of these facilities, but we don't need
Boost for this purpose anymore.
.memory
The `<memory>` libraries (formerly `<boost/memory.hpp>` rsp. `<tr1/memory>`) define a family of smart-pointers
to serve several needs of basic memory management. In almost all cases, they're superior to using `std::auto_ptr`. +
When carefully combining these nifty templates with the RAII pattern, most concerns for memory
management, clean-up and error handling simply go away. (but please understand how to avoid
circular references and care for the implications of parallelism though)
.functional
The `function` template adds generic functor objects to C++. In combination with the `bind` function
(which binds or ties an existing function invocation into a functor object), this allows to ``erase'' (hide)
the difference between functions, function pointers and member functions at your interfaces and thus enables
building all sorts of closures, signals (generic callbacks) and notification services. Picking up on these
concepts might be mind bending at start, but it's certainly worth the effort (in terms of programmer
productivity)
.hashtables and hash functions
The `unordered_*` collection types amend a painful omission in the STL. To work properly, these collection
implementations need a way to calculate a _hash value_ for every key (rsp. entry in case of the Set-container).
The hash function to use can be defined as additional parameter; there are also some conventions to pick a
hash function automatically. Currently (2014), we have two options for the hash function implementation:
The `std::hash` or the `boost::hash` implementation.
(-> read more link:HashFunctions.html[here...])
.STATIC_ASSERT
a helper to check and enforce some conditions regarding types _at compile time_.
In case of assertion failure a compilation error is provoked, which should at least give a clue
towards the real problem guarded by the static assertion. It is good practice to place an extended
source code comment near the static assertion statement to help solving the actual issue.
Relevant Bosst extensions
~~~~~~~~~~~~~~~~~~~~~~~~~
.noncopyable
Inheriting from `boost::noncoypable` inhibits any copy, assignment and copy construction. It's a highly
recommended practice _by default to use that for every new class you create_ -- unless you know for sure
your class is going to have _value semantics_. The C++ language has kind of a ``fixation'' on value
semantics, passing objects by value, and the language adds a lot of magic on that behalf. Which might lead
to surprising results if you aren't aware of the fine details.
.type traits
Boost provides a very elaborate collection of type trait templates, allowing to ``detect'' several
properties of a given type at compile time. Since C++ has no reflection and only a very weak introspection
feature (RTTI, run time type information), using these type traits is often indispensable.
.enable_if
a simple but ingenious metaprogramming trick, allowing to control precisely in which cases the compiler
shall pick a specific class or function template specialisation. Basically this allows to control the
code generation, based on some type traits or other (metaprogramming) predicates you provide. Again,
since C++ is lacking introspection features, we're frequently forced to resort to metaprogramming
techniques, i.e to influence the way the compiler translates our code from within that very code.
.metaprogramming library
A very elaborate, and sometimes mind-bending library and framework. While heavily used within
Boost to build the more advanced features, it seems too complicated and esoteric for general purpose
and everyday use. Code written using the MPL tends to be very abstract and almost unreadable for
people without math background. In Lumiera, we _try to avoid using MPL directly._ Instead, we
supplement some metaprogramming helpers (type lists and tuples), written in a simple LISP style,
which -- hopefully -- should be decipherable without having to learn an abstract academic
terminology and framework.
.lambda
In a similar vein, the `boost::lambda` library might be worth exploring a bit, yet doesn't add
much value in practical use. It is stunning to see how this library adds the capability to define
real _lambda expressions_ on-the-fly, but since C++ was never designed for language extensions of
that kind, using lambda in practice is surprisingly tricky, sometimes even obscure and rarely
not worth the hassle. (An notable exception might be when it comes to defining parser combinators)
.operators
The `boost::operators` library allows to build families of types/objects with consistent
algebraic properties. Especially, it eases building _equality comparable_, _totally ordered_,
_additive_, _mulitplicative_ entities: You're just required to provide some basic operators
and the library will define all sorts of additional operations to care for the logical
consequences, removing the need for writing lots of boilerplate code.
.lexical_cast
Converting numerics to string and back without much ado, as you'd expect it from any decent language.
.boost::format
String formatting with `printf` style directives. Interpolating values into a template string for
formatted output -- but typesafe, using defined conversion operators and without the dangers of
the plain-C `printf` famility of functions. But beware: `boost::format` is implemented on top of
the C++ output stream operations (`<<` and manipulators), which in turn are implemented based
on `printf` -- you can expect it to be 5 to 10 times slower than the latter, and it has
quite some compilation overhead and size impact (-> see our own
link:http://git.lumiera.org/gitweb?p=lumiera/ichthyo;a=blob;f=src/lib/format-string.hpp;h=716aa0e3d23f09269973b7659910d74b3ee334ea;hb=37384f1b681f5bbfa7dc4d50b8588ed801fbddb3[formatting front-end]
to reduce this overhead)
.variant and any
These library provide a nice option for building data structures able to hold a mixture of
multiple types, especially types not directly related to each other. `boost::variant` is a
typeseafe union record, while `boost::any` is able to hold just any other type you provide
_at runtime_, still with some degree of type safety when retrieving the stored values.
Both libraries are compellingly simple to use, yet add some overhead in terms of size,
runtime, and compile time.
.regular expressions
Boost provides a full blown regular expression library, supporting roughly the feature set of
perl regular expressions. The usage and handling is somewhat brittle though, when compared
with perl, python, java, etc.
.program-options and filesystem
Same as the aforementioned, these two libraries just supply a familiar programming model for these tasks
(parsing the command line and navigating the filesystem) which can be considered quasi standard today,
and is available pretty much in the same style in Java, Python, Ruby, Perl and others.
Negative Impact
---------------
Most Boost libraries are _header only_ and all of them make heavy use of template related features of C++.
Thus, _every inclusion of a Boost library_ might lead to _increased compilation times._ We pay that penalty
per compilation unit (not per header). Yet still, using a boost library within a header frequently included
throughout the code base might dangerously leverage that effect.
debug mode
~~~~~~~~~~
Usually, when developing, we translate our code without optimisation and with full debugging informations
switched on. Unfortunately, C++ templates were never designed to serve as a functional metaprogramming language
to start with -- but that's exactly what we're (ab)using them for. The Boost libraries drive that to quite
some extreme. This leads to lots and lots of debugging information to be added to the object files,
mentioning each and every intermediary type created in the course of expanding the metaprogramming
facilities. Even seemingly simple things may result in object files becoming several megabytes large.
Fortunately, all of this overhead gets removed when _stripping_ your executable and libraries (or when
compiling without debug information). So this is solely an issue relevant for the developers, as it increases
compilation, linking and startup times.
runtime overhead and template bloat
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The core Boost libraries (the not-so experimental ones) have a reputation for being of very high
quality. The're written by experts with a deep level of understanding of the language, the usual
implementation and the performance implications. Mostly, those quite elaborate metaprogramming
techniques where chosen exactly to minimise runtime overhead or code size.
Since each instantiation of a template constitutes a completely new class, carelessly written
template code can lead to heavily bloated executables. Every instantiated _function_ and every
class with _virtual methods_ (i.e. with a VTable) adds to the weight. But this negative effect can
be balanced by the ability of reducing inline code. According to my own experience, I'd be much
more concerned _about my own code adding template bloat,_ then being concerned about the Boost
libraries (those people know very well what they're doing...)
some practical guidelines
~~~~~~~~~~~~~~~~~~~~~~~~~
Facilities like `boost::format`, `boost::variant`, `boost::any`, `boost::lambda` and the more elaborate
metaprogramming stuff add considerable weight. A good advice is to confine those features to the implementation
level: use them within individual translation units (`*.cpp`) where this makes sense, but don't express
general interfaces in terms of those library constructs.
Actually, for the most relevant of these, namely `boost::format` and `boost::variant`, we have either
created a lightweight front-end or our own simplified implementation in the support library, leading
to a significant reduction in overall code size.
Beyond those somewhat problematic entities, there used to be several incredibly useful tools from the
Boost library, which create only moderate overhead -- nothing to be really concerned about. Fortunately,
mostly these have meanwhile been adapted into the official standard library, and can thus be used without
creating a dependency on Boost.
- the _functional_ tools (`function`, `bind` and friends), the _hash functions_, _lexical_cast_ and the
_regular expressions_ create a moderate overhead. Probably fine in general purpose code, but you should
be aware that there is a price tag. About the same as with many STL other features.
- the `shared_ptr` `weak_ptr`, `intrusive_ptr` and `unique_ptr` are really indispensable and can be used
liberally everywhere. Same for `enable_if` and the `type_traits`. The impact of those features on
compilation times and code size is negligible and the runtime overhead is zero, compared to _performing
the same stuff manually_ (obviously a ref-counting pointer like `smart_ptr` has the size of two raw pointers
and incurs an overhead on each creation, copy and disposal of a variable -- but that's besides the point I'm
trying to make here, and generally unavoidable)