This is the version from 4.Apr
- add a subchapter about the current NLE landscape
- add a subchapter on tracks vs. trackless
- add a bit more information in the navigation subchapter.
===========================================================
Remark from Ichthyo(committer):
Converted from PDF to raw text with
pdftotext wouter250404v1.pdf WorkflowProposals.txt
Images extracted with
pdfimages -png wouter250404v1.pdf
...and then cropped in Gimp ("crop to content");
images 06 and 11 needed to be split in two parts
cropped in Gimp ("crop to content")
NOTE: Adding the images separately to the Website-repository,
in order to save storage space in the main repository,
because content, once added, can not be removed from Git....
- included the list of "Personas", quoting from Wouter's document
- add some details from the ensuing discussion
- close the document with a section "Conclusions" to list the
most relevant open points
Topic: proposals for Lumiera Workflow
Present:
- Wouter Verweijlen
- Benny Lyons
- Hermann Voßeler
Note: This commit creates a new subsection
for the discussion related to Wouter's »Lumiera Workflow Proposals«....
Starting with ''preview release'' `v0.pre.04`, branch and version tags
will be handled in accordance to the **Git-flow** naming scheme.
Notably this implies that from now on the version in-tree will indicate
the ''next expected release,'' adorned by a suffix to mark the preview.
To accommodate this transition to Git-flow
- the new branch `integration` will be introduced
- the version number will once (and the last time for this release)
be adjusted ''before'' forking the release branch
- branch `master` will transition to reflect the latest released state
- several existing branches will be discontinued, notably
`gui`, `steam`, `vault`, `release`, `play`
Starting with the upcoming ''preview release'', branches, branch names and tags
will be rearranged to follow the Git-flow pattern instead of the existing
ad-hoc organisation with a release branch.
The documentation provided here defines the actual naming conventions
and some fine points regarding the version number upgrades
and placement of release tags.
Furthermore, two helper-scripts are provided to automate version number updates
- `buildVersion.py` : extract current version from git tag and allow to bump version
- `setVersion` : manipulate all relevant files with `sed` to update the version info
This changeset removes various heuristics and marker-traits
by a constraint to tuple_like types. Furthermore, several usages
of `apply` can thereby be generalised to work on any tuple_like.
This generalisation is essential for the passing generic data blocks
via `FeedManifold` into the node invocation
This resolves an intricate problem related to metaprogramming with
variadic templates and function signatures. Due to exceptional complexity,
a direct solution was blocked for several years, and required a better
organisation of the support code involved; several workarounds were
developed, gradually leading to a transition path, which could now
be completed in an focused clean-up effort over the last week.
Metaprogramming with sequences of types is organised into three layers:
- simple tasks can be solved with the standard facilities of the language,
using pattern match with variadic template specialisations
- the ''type-sequence'' construct `Types<T...>` takes the centre stage
for the explicit definition of collections of types; it can be re-bound
to other variadic templates and supports simple direct manipulation
- for more elaborate and advanced processing tasks, a ''Loki-style type list''
can be obtained from a type-sequence, allowing to perform recursive
list processing task with a technique similar to LISP.
Indeed — this change set is kind of sad.
Because I still admire the design of the GAVL library,
and would love to use it for processing of raw video.
However, up to now, we never got to the point of actually
doing so. For the future, I am not sure if there remains
room to rely on lib-GAVL, since FFmpeg roughly covers
a similar ground (and a lot beyond that). And providing
a plug-in for FFmpeg is unavoidable, practically speaking.
So I still retain the nominal dependency on lib-GAVL
in the Build system (since it is still packaged in Debian).
But it is pointless to rely on this library just for an
external type-def `gavl_time_t`. We owe much to this
inspiration, but it can be expected that we'll wrap
these raw time-values into a dedicated marker type
soon, and we certainly won't be exposing any C-style
interface for time calculations in future, since
we do not want anyone to side-step the Lumiera
time handling framework in favour of working
„just with plain numbers“
NOTE: lib-GAVL hompage has moved to Github:
https://github.com/bplaum/gavl
Since C++17 we can use the std::filesystem instead (and we ''do use it'' indeed)
- relocate the `/lib/file.hpp` header
- adapt the self-discovery of the executable to using std::filesystem
Furthermore, some recherche regarding XVideo and Video Output
- remove obsolete configuration settings
- walk through all settings according to the documentation
https://www.doxygen.nl/manual/config.html
- now try to use the new feature to rely on Clang for C++ parsing
- walk through the doxygen-warnings.txt and fix some obvious misspellings
and structural problems in the documentation comments.
With Debian-Trixie, we are now using Doxygen 1.9.8 —
which produces massively better results in various fine points.
However, there are still problems with automatic cross links,
especially from implementation to the corresponding test classes.
Some pre C++11 features are marked deprecated and will be rejected with C++20
Notably the old marker inferfaces for unary (and binary) functions are no longer needed, since function-like objects can be detected by traits or concepts nowadays
Moreover we can get rid of some boost(bind) usages and use a λ
BuilderDoxygen and BuilderGCH are external plug-ins,
not developed in this project and probably unmaintained.
TODO: decide how to fix or replace them...
With the ability to invoke a Render Node graph,
the development on branch `play` reached some kind of milestone
regarding the »Playback Vertical Slice«.
This is a good opportunity to update the reference platform
and upgrade the preview releases and packaging setup accordingly.
This will include adjustments to compile on recent compilers and
upgrade the build system to support Python-3.
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
Remove left-overs from the preceding prototypical implementation,
which is now obliterated by the change to a flexibly configured `FeedManifold`
with structured, typed storage for buffers and for parameter data.
The Render Node invocation sequence, as rearranged and reworked for the »Playback Vertical Slice«, now seems reasonably clear and settled.
Adding extensive documentation to describe the conventions and structures worked out thus far;
moreover, start makeover of old documentation in the !TiddlyWiki to remove concepts obviously obsoleted now...
The overall goal is eventually to arrive at something akin to a ''»Dummy Media-processing Library«''
* this will offer some „Functionality“
* it will work on different ''kinds'' or ''flavours'' of data
* it should provide operations that can be packaged into ''Nodes''
However — at the moment I have no clue how to get there...
And thus I'll start out with some rather obvious basic data manipulation functions,
and then try to give them meaningful names and descriptors. This in turn
will allow to build some multi-step processing netwaorks — which actually
is the near-term goal for the ''main effort'' (which is after all, to get
the Render Node code into some sufficient state of completion)...
There is an insidious problem when the Transformer takes references to internal state
within upstream iterators or state core. This problem only manifests when
a invariant based filtering or grouping operation is added after the Transformer,
because such an operation (notably Filter) will typically attempt to establish
the invariant from the constructor (to avoid dangling state). Unfortunately
doing so involves pulling data ''before the overall pipeline is moved into final location''
A workaround is to make the Transformer ''disengage'' on copy, so to provoke
a refresh and new pull in the new location after the copy / move / swap.
This only works if the transformer function as such is idempotent.
Indeed the solution worked out yesterday could be extracted and turned generic.
Some in-depth testing is necessary though, and possibly some qualifications to allow pass-through of references...
Moreover, last days I started collecting notes regarding problem solving patterns,
which I tend to use frequently, but which might not be obvious and thus can easily
be forgotten. In fact, I had encountered several cases, where I did invent some
roughly similar solution repeatedly, having forgotten about already settled matters.
Hopefully the habit of collecting notes and hints at a central location serves to remedy
Basically I am sick of writing for-loops in those cases
where the actual iteration is based on one or several data sources,
and I just need some damn index counter. Nothing against for-loops
in general — they have their valid uses — sometimes a for-loop is KISS
But in these typical cases, an iterator-based solution would be a
one-liner, when also exploiting the structured bindings of C++17
''I must admit that I want this for a loooooong time —''
...but always got intimidated again when thinking through the fine points.
Basically it „should be dead simple“ — as they say
Well — — it ''is'' simple, after getting the nasty aspects of tuple binding
and reference data types out of the way. Yesterday, while writing those
`TestFrame` test cases (which are again an example where you want to iterate
over two word sequences simultaneously and just compare them), I noticed that
last year I learned about the `std::apply`-to-fold-expression trick, and
that this solution pattern could be adapted to construct a tuple directly,
thereby circumventing most of the problems related to ''perfect forwarding''
So now we have a new util function `mapEach` (defined in `tuple-helper.hpp`)
and I have learned how to make this application completely generic.
As a second step, I implemented a proof-of-concept in `IterZip_test`,
which indeed was not really challenging, because the `IterExplorer`
is so very sophisticated by now and handles most cases with transparent
type-driven adaptors. A lot of work went into `IterExplorer` over the years,
and this pays off now.
The solution works as follows:
* apply the `lib::explore()` constructor function to the varargs
* package the resulting `IterExplorer` instantiations into a tuple
* build a »state core« implementation which just lifts out the three
iterator primitives onto this ''product type'' (i.e. the tuple)
* wrap it in yet another `IterExplorer`
* add a transformer function on top to extract a value-tuple for each ''yield'
As expected, works out-of-the-box, with all conceivable variants and wild
mixes of iterators, const, pointers, references, you name it....
PS: I changed the rendering of unsigned types in diagnostic output
to use the short notation, e.g. `uint` instead of `unsigned int`.
This dramatically improves the legibility of verification strings.
I changed the rendering of unsigned types in diagnostic output
to use the short notation, e.g. `uint` instead of `unsigned int`.
This dramatically improves the legibility of verification strings.
Moreover, I took the opportunigy to look through the existing page
with codeing style guides to explicitly write down some conventions
formed over years of usage.
I did not just »make up« those light heartedly, rather these conventions
are the result of a craftsman's ''attentive observation and self-reflection.''
* based on reproducible data in `TestFrame`
* using Murmur64A hash-chaining to »mark« with a parameter
This emulates the simplest case of 1:1 processing and can also be applied ''in-place''
- the starting point is the idea to build a dedicated ''turnout system''
- `StateAdapter`, `BuffTable` ⟶ `FeedManifold` and _Invocation_ will be fused
- actually, the `TurnoutSystem` will be ''pulled'' and orchestrate the invocation
- the structure is assumed to be recursive
The essence of the Node-Invocation, as developed 2009 / 2011 remains intact,
yet it will be organised along a clearer structure
The behaviour seems consistent and the schedule breaks at the expected point.
At first sight, concurrency seems slightly to low; detailed investigation
however shows that this is due to the structure of the load graph,
and in fact the run time comes close to optimal values.
the `BreakingPoint` tool conducts a binary search to find the ''stress factor''
where a given schedule breaks. There are some known deviations related to the
measurement setup, which unfortunately impact the interpretation of the
''stress factor'' scale. Earlier, an attempt was made, to watch those factors
empirically and work a ''form factor'' into the ''effective stress factor''
used to guide this measurement method.
Closer investigation with extended and elastic load patters now revealed
a strong tendency of the Scheduler to scale down the work resources when not
fully loaded. This may be mistaken by the above mentioned adjustments as a sign
of a structural limiation of the possible concurrency.
Thus, as a mitigation, those adjustments are now only performed at the
beginning of the measurement series, and also only when the stress factor
is high (implying that the scheduler is actually overloaded and thus has
no incentive for scaling down).
These observations indicate that the »Breaking Point« search must be taken
with a grain of salt: Especially when the test load does ''not'' contain
a high degree of inter dependencies, it will be ''stretched elastically''
rather than outright broken. And under such circumstances, this measurement
actually gauges the Scheduler's ability to comply to an established
load and computation goal.
- use parameters known to produce a clean linear model
- assert on properties of this linear model
Add extended documentation into the !TiddlyWiki,
with a textual account of the various findings,
also including some of the images and diagrams,
rendered as SVG
Investigate the behaviour over a wider range of job loads,
job count and worker pool sizes. Seemingly the processing
can not fully utilise the available worker pool capacity.
By inspection of trace-dumps, one impeding mechanism could
be identified: the »stickiness« of the contention mitigation.
Whenever a worker encounters repeated contention, it steps up
and adds more and more wait cycles to remove pressure from the
schedule coordination. As such this is fine and prevents further
degradation of performance by repeated atomic synchronisation.
However, this throttling was kept up needlessly after further
successful work-pulls. Since job times of several milliseconds
can be expected on average in media processing, such a long
retention would spread a performance degradation over a duration
of several frames. Thus, the scheme for step-down was changed
to decrease the throttling by a power series rather than just
documenting the level.
- repeated invocations of the same test setup for statistics
- the usual nasty 64-node graph with massive fork out
- limit concurrency to 4 cores
- tabulate data to look for clues regarding a trigger criteria
Hypothesis: The Scheduler slips off schedule when all of the
following three criteria are met:
- more than 55% glitches with Δ > 2ms
- σ > 2ms
- ∅Δ > 4ms
...the idea is to use the sum of node weights per level
to create a schedule, which more closely reflects the distribution
of actual computation time. Hopefully such a schedule can then be
squeezed or stretched by a time factor to find out a ''breaking point'',
at which the Scheduler is no longer able to keep up.
The last round of refactorings yielded significant improvements
- parallelisation now works as expected
- processing progresses closer to the schedule
- run time was reduced
The processing load for this test is tuned in a way to overload the
scheduler massively at the end -- the result must be correct non the less.
There was one notable glitch with an assertion failure from the memory manager.
Hopefully I can reproduce this by pressing and overloading the Scheduler more...
* added benchmark over synchronous execution as point of reference
* verified running times and execution pattern
* Scheduler **behaves as expected** for this example
Some test-runs performed excitingly smooth,
but in one case the processing was was drastically delayed,
due to heavy contention. The relevance of this incident is not clear yet,
since this test run uses a rather atypical load with very short actual work jobs.
Anyway, the dump-logs are documented with this commit.
At various places, concepts and drafts from the early stage of the
Lumiera Project are still reflected in the online documentation pages.
During the last months, development focussed on the Render Engine,
causing a shift in some parts of the design, and obsoleting other
parts altogether (notably we consider to use IO_URING for async IO)
As follow-up to the rework of thread-handling, likewise also
the implementation base for locking was switched over from direct
usage of POSIX primitives to the portable wrappers available in
the C++ standard library. All usages have been reviewed and
modernised to prefer λ-functions where possible.
With this series of changes, the old threadpool implementation
and a lot of further low-level support facilities are not used
any more and can be dismantled. Due to the integration efforts
spurred by the »Playback Vertical Slice«, several questions of
architecture could be decided over the last months. The design
of the Scheduler and Engine turned out different than previously
anticipated; notably the Scheduler now covers a wider array of
functionality, including some asynchronous messaging. This has
ramifications for the organisation of work tasks and threads,
and leads to a more deterministic memory management. Resource
management will be done on a higher level, partially superseding
some of the concepts from the early phase of the Lumiera project.