Indeed — this change set is kind of sad.
Because I still admire the design of the GAVL library,
and would love to use it for processing of raw video.
However, up to now, we never got to the point of actually
doing so. For the future, I am not sure if there remains
room to rely on lib-GAVL, since FFmpeg roughly covers
a similar ground (and a lot beyond that). And providing
a plug-in for FFmpeg is unavoidable, practically speaking.
So I still retain the nominal dependency on lib-GAVL
in the Build system (since it is still packaged in Debian).
But it is pointless to rely on this library just for an
external type-def `gavl_time_t`. We owe much to this
inspiration, but it can be expected that we'll wrap
these raw time-values into a dedicated marker type
soon, and we certainly won't be exposing any C-style
interface for time calculations in future, since
we do not want anyone to side-step the Lumiera
time handling framework in favour of working
„just with plain numbers“
NOTE: lib-GAVL hompage has moved to Github:
https://github.com/bplaum/gavl
* Lumiera source code always was copyrighted by individual contributors
* there is no entity "Lumiera.org" which holds any copyrights
* Lumiera source code is provided under the GPL Version 2+
== Explanations ==
Lumiera as a whole is distributed under Copyleft, GNU General Public License Version 2 or above.
For this to become legally effective, the ''File COPYING in the root directory is sufficient.''
The licensing header in each file is not strictly necessary, yet considered good practice;
attaching a licence notice increases the likeliness that this information is retained
in case someone extracts individual code files. However, it is not by the presence of some
text, that legally binding licensing terms become effective; rather the fact matters that a
given piece of code was provably copyrighted and published under a license. Even reformatting
the code, renaming some variables or deleting parts of the code will not alter this legal
situation, but rather creates a derivative work, which is likewise covered by the GPL!
The most relevant information in the file header is the notice regarding the
time of the first individual copyright claim. By virtue of this initial copyright,
the first author is entitled to choose the terms of licensing. All further
modifications are permitted and covered by the License. The specific wording
or format of the copyright header is not legally relevant, as long as the
intention to publish under the GPL remains clear. The extended wording was
based on a recommendation by the FSF. It can be shortened, because the full terms
of the license are provided alongside the distribution, in the file COPYING.
* most usages are drop-in replacements
* occasionally the other convenience functions can be used
* verify call-paths from core code to identify usages
* ensure reseeding for all tests involving some kind of randomness...
__Note__: some tests were not yet converted,
since their usage of randomness is actually not thread-safe.
This problem existed previously, since also `rand()` is not thread safe,
albeit in most cases it is possible to ignore this problem, as
''garbled internal state'' is also somehow „random“
...discovered by during investigation of latest Scheduler failures.
The root of the problems is that block overflow can potentially trigger
expansion of the allocation pool. Under some circumstances, this on-the fly
allocation requires a rotation of index slots, thereby invalidating
existing iterators.
While such behaviour is not uncommon with storage data structures (see std::vector),
in this case it turns out problematic because due to performance considerations,
a usage pattern emerged which exploits re-using existing storage »Slots« with known
deadline. This optimisation seems to have significant leverage on the
planning jobs, which happen to allocated and arrange a whole strike of
Activities with similar deadlines.
One of these problem situations can easily be fixed, since it is triggered
through the iterator itself, using a delegate function to request a storage expansion,
at which point the iterator is able to re-link and fix its internal index.
This solution also has no tangible performance implications in optimised code.
Unfortunately there remains one obscure corner case where such an pool expansion
could also have invalidated other iterators, which are then used later to
attach dependency relations; even a partial fix for that problem seems
to cause considerable performance cost of about -14% in optimised code.
...refine the handling of FrameRates close to the definition bounds
...implement the actual rule to scale allocator capacity on announcement
...hook up into the seedCalcStream() with a default of +25FPS
+ test coverage
Over time, a collection of microbenchmark helper functions was
extracted from occasional use -- including a variant to perform
parallelised microbenchmarks. While not used beyond sporadic experiments yet,
this framework seems a perfect fit for measuring the SyncBarrier performance.
There is only one catch:
- it uses the old Threadpool + POSIX thread support
- these require the Threadpool service to be started...
- which in turn prohibits using them for libary tests
And last but not least: this setup already requires a barrier.
==> switch the existing microbenchmark setup to c++17 threads preliminarily
(until the thread-wrapper has been reworked).
==> also introduce the new SyncBarrier here immediately
==> use this as a validation test of the setup + SyncBarrier
...regarding the kind of activity (the verb),
and also for some special case access of payload data;
deliberately asserting the correct verb, but no mandatory check,
since this whole Activity-Language is conceived as cohesive
and essentially sealed (not meant to be extended)
Further extensive testing with parameter variations,
using the test setup in `BlockFlow_test::storageFlow()`
- Tweaks to improve convergence under extreme overload;
sudden load peaks are now accomodated typically < 5 sec
- Make the test definition parametric, to simplify variations
- Extract the generic microbenchmark helper function
- Documentation
There seems to be a ''sweet spot'' for somewhat larger Epoch sizes around 500 slots.
At least in the test setup used here, which works with a load of 200 Frames / sec,
which is significantly over the typical value of 50fps (video + audio) for simple playback.
The optimisation of averaged allocation times can not be much improved **below 30ns**.
Overall, this can be considered a good result,
since this allocation scheme does way more than just allocate memory,
it also provides a means to track dependencies and lifecycle.
__For context__:
- we should strive at processing one frame in ~ 10ms
- for 10 Activity records per Frame, we currently use < 0.5 µs for
memory and dependency management in the scheduler
- this leaves enough room for the further administrative efforts
(priority queue, job planning, buffer management)
BUT -> +50% runtime in -O3 (+20ns)
Investigation seems to indicate
- that the increased (+1 Epochs, 10 -> 11) moving average
caused the Algo to perform worse (strong effect)
- that the Optimiser has problems with boost::rational, which however
yields only a minute effect (+5ns), and only on the critical path
The access via Meyers Singleton has no adverse effect,
rather the new setup gives a tiny benefit (46ns -> 37ns).
Surprisingly, the increased pre-allocation has no observable effect.
On the long run, there will be a central Render Engine parametrisation;
some parameters can even be expected to be dynamic; thus prepare the
BlockFlow allocator to fit in with this expectation
For comparison: use individual managment by refcount.
This supports the conclusion that BlockFlow is more than just a
custom allocator; it also supports a non-trivial lifetime management,
and this comes at a cost.
Playing around with various load patterns uncovers further weak spots
in the regulation mechanism. As a remedy, introduce a stronger feed-back
and especially set the target load factor from 100% -> 90%
to add some headroom to absorb intermittent load peaks
Presumably ''much more observation and fine-tuning'' will be necessary
under real-world load conditions (⟹ Ticket #1318 for later)
- BUG: must prevent the Epoch size to become excessive low
- Problem: feedback signal should not be overly aggressive
Fine-Tuning:
- Dose for Overflow-compensation is delicate
- Moving average and Overflow should be balanced
- ideally the compensatory actions should be one order of magnitude
slower than the characteristic regulation time
Improvement: perform Moving-Average calculations in doubles
...leading to PATHETICALLY bad timing comparison
...it seems clear that the Epoch-Step went to zero
(which was neither anticipated, nor protected against)
However, even individual heap allocations fare surprisingly well
under full optimisation; just they don't solve our problem with
tracking dependencies; the most simplest solution that would
also fulfil this requirement would be using shared_ptr
..as a heuristic to regulate optimal Epoch duration;
when Epochs are discarded, the effective fill factor can be used
to guess an Epoch duration time, which would (in hindsight)
lead to perfect usage of storage space
..using a simplistic implementation for now: scale down the
Epoch-stepping by 0.9 to increase capacity accordingly.
This is done on each separate overflow event, and will be
counterbalanced by the observation of Epoch fill ratio
performed later on clean-up of completed Epochs
further implementation makes clear that the AllocationHandle,
which is the primary usage front-end, has to rely both on
services of the underlying ExtentFamily allocator, as well
as on the BlockFlow itself for managing the Epoch spacing.
- fix a bug in IterExplorer: when iterating a »state core« directly,
the helper CoreYield passed the detected type through ValueTypeBindings.
This is logically wrong, because we never want to pick up some typedefs,
rather we always want to use the type directly returned from CORE::yield()
Here the iterator returns an Epoch&, which itself is again iterable
(it inherits from std::array<Activity, N>). However, it is clear
that we must not descent into such a "flatMap" style recursive expansion
- draft a simple scheme how to regulate Epoch lengths dynamically
- add diagnostics to pinpoint a given Activity and find out into which
Epoch it has been allocated; used to cover the allocator behaviour
- add preliminary deadline-check (directly instead of using the Activity)
- with this shortcut, now able to implement discarding obsoleted Epochs
- Iteration and use of the underlying `ExtentFamily` is also settled by now
💡 ''Implementation concept for the allocation scheme complete and validated''
..this is the most simple case, where no Epochs are opened yet
..add diagnostics to inspect alloc count and deadlines
..add accessors for the first/last underlying Extent
...continue to proceed test-driven
...scheduler internals turn out to be intricate and cohesive,
and thus the only hope is to adhere to strict testing discipline
- the idea is to use slot-0 in each extent for administrative metadata
- to that end, a specialised GATE-Activity is placed into slot-0
- decision to use the next-pointer for managing the next free slot
- thus we need the help of the underlying ExtentFamily for navigating Extents
Decision to refrain from any attempt to "fix" excessive memory usage,
caused by Epochs still blocked by pending IO operations. Rather, we
assume the engine uses sane parametrisation (possibly with dynamic adjustment)
Yet still there will be some safety limit, but when exceeding this limit,
the allocator will just throw, thereby killing the playback/render process
- decision how to handle the Extent storage (by forced-cast)
- decision to place the administrative record directly into the Extent
TODO not clear yet how to handle the implicit limitation for future deadlines