Since C++17 we can use the std::filesystem instead (and we ''do use it'' indeed)
- relocate the `/lib/file.hpp` header
- adapt the self-discovery of the executable to using std::filesystem
Furthermore, some recherche regarding XVideo and Video Output
- remove obsolete configuration settings
- walk through all settings according to the documentation
https://www.doxygen.nl/manual/config.html
- now try to use the new feature to rely on Clang for C++ parsing
- walk through the doxygen-warnings.txt and fix some obvious misspellings
and structural problems in the documentation comments.
With Debian-Trixie, we are now using Doxygen 1.9.8 —
which produces massively better results in various fine points.
However, there are still problems with automatic cross links,
especially from implementation to the corresponding test classes.
Some pre C++11 features are marked deprecated and will be rejected with C++20
Notably the old marker inferfaces for unary (and binary) functions are no longer needed, since function-like objects can be detected by traits or concepts nowadays
Moreover we can get rid of some boost(bind) usages and use a λ
BuilderDoxygen and BuilderGCH are external plug-ins,
not developed in this project and probably unmaintained.
TODO: decide how to fix or replace them...
With the ability to invoke a Render Node graph,
the development on branch `play` reached some kind of milestone
regarding the »Playback Vertical Slice«.
This is a good opportunity to update the reference platform
and upgrade the preview releases and packaging setup accordingly.
This will include adjustments to compile on recent compilers and
upgrade the build system to support Python-3.
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
Remove left-overs from the preceding prototypical implementation,
which is now obliterated by the change to a flexibly configured `FeedManifold`
with structured, typed storage for buffers and for parameter data.
The Render Node invocation sequence, as rearranged and reworked for the »Playback Vertical Slice«, now seems reasonably clear and settled.
Adding extensive documentation to describe the conventions and structures worked out thus far;
moreover, start makeover of old documentation in the !TiddlyWiki to remove concepts obviously obsoleted now...
The overall goal is eventually to arrive at something akin to a ''»Dummy Media-processing Library«''
* this will offer some „Functionality“
* it will work on different ''kinds'' or ''flavours'' of data
* it should provide operations that can be packaged into ''Nodes''
However — at the moment I have no clue how to get there...
And thus I'll start out with some rather obvious basic data manipulation functions,
and then try to give them meaningful names and descriptors. This in turn
will allow to build some multi-step processing netwaorks — which actually
is the near-term goal for the ''main effort'' (which is after all, to get
the Render Node code into some sufficient state of completion)...
There is an insidious problem when the Transformer takes references to internal state
within upstream iterators or state core. This problem only manifests when
a invariant based filtering or grouping operation is added after the Transformer,
because such an operation (notably Filter) will typically attempt to establish
the invariant from the constructor (to avoid dangling state). Unfortunately
doing so involves pulling data ''before the overall pipeline is moved into final location''
A workaround is to make the Transformer ''disengage'' on copy, so to provoke
a refresh and new pull in the new location after the copy / move / swap.
This only works if the transformer function as such is idempotent.
Indeed the solution worked out yesterday could be extracted and turned generic.
Some in-depth testing is necessary though, and possibly some qualifications to allow pass-through of references...
Moreover, last days I started collecting notes regarding problem solving patterns,
which I tend to use frequently, but which might not be obvious and thus can easily
be forgotten. In fact, I had encountered several cases, where I did invent some
roughly similar solution repeatedly, having forgotten about already settled matters.
Hopefully the habit of collecting notes and hints at a central location serves to remedy
Basically I am sick of writing for-loops in those cases
where the actual iteration is based on one or several data sources,
and I just need some damn index counter. Nothing against for-loops
in general — they have their valid uses — sometimes a for-loop is KISS
But in these typical cases, an iterator-based solution would be a
one-liner, when also exploiting the structured bindings of C++17
''I must admit that I want this for a loooooong time —''
...but always got intimidated again when thinking through the fine points.
Basically it „should be dead simple“ — as they say
Well — — it ''is'' simple, after getting the nasty aspects of tuple binding
and reference data types out of the way. Yesterday, while writing those
`TestFrame` test cases (which are again an example where you want to iterate
over two word sequences simultaneously and just compare them), I noticed that
last year I learned about the `std::apply`-to-fold-expression trick, and
that this solution pattern could be adapted to construct a tuple directly,
thereby circumventing most of the problems related to ''perfect forwarding''
So now we have a new util function `mapEach` (defined in `tuple-helper.hpp`)
and I have learned how to make this application completely generic.
As a second step, I implemented a proof-of-concept in `IterZip_test`,
which indeed was not really challenging, because the `IterExplorer`
is so very sophisticated by now and handles most cases with transparent
type-driven adaptors. A lot of work went into `IterExplorer` over the years,
and this pays off now.
The solution works as follows:
* apply the `lib::explore()` constructor function to the varargs
* package the resulting `IterExplorer` instantiations into a tuple
* build a »state core« implementation which just lifts out the three
iterator primitives onto this ''product type'' (i.e. the tuple)
* wrap it in yet another `IterExplorer`
* add a transformer function on top to extract a value-tuple for each ''yield'
As expected, works out-of-the-box, with all conceivable variants and wild
mixes of iterators, const, pointers, references, you name it....
PS: I changed the rendering of unsigned types in diagnostic output
to use the short notation, e.g. `uint` instead of `unsigned int`.
This dramatically improves the legibility of verification strings.
I changed the rendering of unsigned types in diagnostic output
to use the short notation, e.g. `uint` instead of `unsigned int`.
This dramatically improves the legibility of verification strings.
Moreover, I took the opportunigy to look through the existing page
with codeing style guides to explicitly write down some conventions
formed over years of usage.
I did not just »make up« those light heartedly, rather these conventions
are the result of a craftsman's ''attentive observation and self-reflection.''
* based on reproducible data in `TestFrame`
* using Murmur64A hash-chaining to »mark« with a parameter
This emulates the simplest case of 1:1 processing and can also be applied ''in-place''
- the starting point is the idea to build a dedicated ''turnout system''
- `StateAdapter`, `BuffTable` ⟶ `FeedManifold` and _Invocation_ will be fused
- actually, the `TurnoutSystem` will be ''pulled'' and orchestrate the invocation
- the structure is assumed to be recursive
The essence of the Node-Invocation, as developed 2009 / 2011 remains intact,
yet it will be organised along a clearer structure
The behaviour seems consistent and the schedule breaks at the expected point.
At first sight, concurrency seems slightly to low; detailed investigation
however shows that this is due to the structure of the load graph,
and in fact the run time comes close to optimal values.
the `BreakingPoint` tool conducts a binary search to find the ''stress factor''
where a given schedule breaks. There are some known deviations related to the
measurement setup, which unfortunately impact the interpretation of the
''stress factor'' scale. Earlier, an attempt was made, to watch those factors
empirically and work a ''form factor'' into the ''effective stress factor''
used to guide this measurement method.
Closer investigation with extended and elastic load patters now revealed
a strong tendency of the Scheduler to scale down the work resources when not
fully loaded. This may be mistaken by the above mentioned adjustments as a sign
of a structural limiation of the possible concurrency.
Thus, as a mitigation, those adjustments are now only performed at the
beginning of the measurement series, and also only when the stress factor
is high (implying that the scheduler is actually overloaded and thus has
no incentive for scaling down).
These observations indicate that the »Breaking Point« search must be taken
with a grain of salt: Especially when the test load does ''not'' contain
a high degree of inter dependencies, it will be ''stretched elastically''
rather than outright broken. And under such circumstances, this measurement
actually gauges the Scheduler's ability to comply to an established
load and computation goal.
- use parameters known to produce a clean linear model
- assert on properties of this linear model
Add extended documentation into the !TiddlyWiki,
with a textual account of the various findings,
also including some of the images and diagrams,
rendered as SVG
Investigate the behaviour over a wider range of job loads,
job count and worker pool sizes. Seemingly the processing
can not fully utilise the available worker pool capacity.
By inspection of trace-dumps, one impeding mechanism could
be identified: the »stickiness« of the contention mitigation.
Whenever a worker encounters repeated contention, it steps up
and adds more and more wait cycles to remove pressure from the
schedule coordination. As such this is fine and prevents further
degradation of performance by repeated atomic synchronisation.
However, this throttling was kept up needlessly after further
successful work-pulls. Since job times of several milliseconds
can be expected on average in media processing, such a long
retention would spread a performance degradation over a duration
of several frames. Thus, the scheme for step-down was changed
to decrease the throttling by a power series rather than just
documenting the level.
- repeated invocations of the same test setup for statistics
- the usual nasty 64-node graph with massive fork out
- limit concurrency to 4 cores
- tabulate data to look for clues regarding a trigger criteria
Hypothesis: The Scheduler slips off schedule when all of the
following three criteria are met:
- more than 55% glitches with Δ > 2ms
- σ > 2ms
- ∅Δ > 4ms
...the idea is to use the sum of node weights per level
to create a schedule, which more closely reflects the distribution
of actual computation time. Hopefully such a schedule can then be
squeezed or stretched by a time factor to find out a ''breaking point'',
at which the Scheduler is no longer able to keep up.
The last round of refactorings yielded significant improvements
- parallelisation now works as expected
- processing progresses closer to the schedule
- run time was reduced
The processing load for this test is tuned in a way to overload the
scheduler massively at the end -- the result must be correct non the less.
There was one notable glitch with an assertion failure from the memory manager.
Hopefully I can reproduce this by pressing and overloading the Scheduler more...
* added benchmark over synchronous execution as point of reference
* verified running times and execution pattern
* Scheduler **behaves as expected** for this example
Some test-runs performed excitingly smooth,
but in one case the processing was was drastically delayed,
due to heavy contention. The relevance of this incident is not clear yet,
since this test run uses a rather atypical load with very short actual work jobs.
Anyway, the dump-logs are documented with this commit.
At various places, concepts and drafts from the early stage of the
Lumiera Project are still reflected in the online documentation pages.
During the last months, development focussed on the Render Engine,
causing a shift in some parts of the design, and obsoleting other
parts altogether (notably we consider to use IO_URING for async IO)
As follow-up to the rework of thread-handling, likewise also
the implementation base for locking was switched over from direct
usage of POSIX primitives to the portable wrappers available in
the C++ standard library. All usages have been reviewed and
modernised to prefer λ-functions where possible.
With this series of changes, the old threadpool implementation
and a lot of further low-level support facilities are not used
any more and can be dismantled. Due to the integration efforts
spurred by the »Playback Vertical Slice«, several questions of
architecture could be decided over the last months. The design
of the Scheduler and Engine turned out different than previously
anticipated; notably the Scheduler now covers a wider array of
functionality, including some asynchronous messaging. This has
ramifications for the organisation of work tasks and threads,
and leads to a more deterministic memory management. Resource
management will be done on a higher level, partially superseding
some of the concepts from the early phase of the Lumiera project.
The EventLog seems to provide all the building blocks, but we need
some higher level special matchers (and maybe we also want to hide
some of the basic EventLog matchers). A soulution might be to wrap
the EventMatcher and delegate all follow-up builder calls.
This seems adequate, since the EventLog-Matcher is basically used as black box,
building up more elaborate matchers from the provided basic matchers...
Spent some time again to understand how EventLog matching works.
My feelings towards this piece of code are always the same: it is
somewhat too "tricky", but I am not aware of any other technique
to get this degree of elaborate chained matching on structured records,
short of building a dedicated matching engine from scratch.
The other alternative would be to use a flat textual log (instead of
the structured log records from EventLog), but then we'd have to
generate quite intricate regular expressions from the builder,
and I'm really doubtful it would be easier and clearer....
- introduce a new entity: RenderDrive
- it supersedes the CalcPlanCalculation, but is managed by CalcStream
- moreover, the RenderDrive will house a IterTreeExplorer-Pipeline
- define the concerns and relationships more clearly (see Drawing)
- prerequisite to disentangle the Job-planning "mechanics"
- decision: the Monad-style iteration framework will be abandoned
- the job-planning will be recast in terms of the iter-tree-explorer
- job-planning and frame dispatch will be disentangled
- the Scheduler will deliberately offer a high-level interface
- on this high-level, Scheduler will support dependency management
- the low-level implementation of the Scheduler will be based on Activity verbs