- relocate some code into a dedicated translation unit to reduce #includes
- actually set the thread-ID (the old implementation had only a TODO at that point)
While it would be straight forward from an implementation POV
to just expose both variants on the API (as the C++ standard does),
it seems prudent to enforce the distinction, and to highlight the
auto-detaching behaviour as the preferred standard case.
Creating worker threads just for one computation and joining the results
seemed like a good idea 30 years ago; today we prefer Futures or asynchronous
messaging to achieve similar results in a robust and performant way.
ThreadJoinable can come in handy however for writing unit tests, were
the controlling master thread has to wait prior to perform verification.
So the old design seems well advised in this respect and will be retained
- cut the ties to the old POSIX-based custom threadpool framework
- remove operations deemed no longer necessary
- sync() obsoleted by the new SyncBarrier
- support anything std::invoke supports
...which is the technique used in the existing Threadpool framwork.
As expected, such a solution is significantly slower than the new
atomics-based implementation. Yet how much slower is still striking.
Timing measurements in concurrent usage situation.
Observed delay is in the order of magnitude of known scheduling leeway;
assuming thus no relevant overhead related to implementation technique
Over time, a collection of microbenchmark helper functions was
extracted from occasional use -- including a variant to perform
parallelised microbenchmarks. While not used beyond sporadic experiments yet,
this framework seems a perfect fit for measuring the SyncBarrier performance.
There is only one catch:
- it uses the old Threadpool + POSIX thread support
- these require the Threadpool service to be started...
- which in turn prohibits using them for libary tests
And last but not least: this setup already requires a barrier.
==> switch the existing microbenchmark setup to c++17 threads preliminarily
(until the thread-wrapper has been reworked).
==> also introduce the new SyncBarrier here immediately
==> use this as a validation test of the setup + SyncBarrier
Using the same building blocks, this operation can be generalised even more,
leading to a much cleaner implementation (also with better type deduction).
The feature actually used here, namely summing up all values,
can then be provided as a convenience shortcut, filling in std::plus
as a default reduction operator.
...first used as part of the test harness;
seemingly this is a generic and generally useful shortcut,
similar to algorithm::reduce (or some kind of fold-left operation)
Intended as replacement for the Mutex/ConditionVar based barrier
built into the exiting Lumiera thread handling framework and used
to ensure safe hand-over of a bound functor into the starting new
thread. The standard requires a comparable guarantee for the C++17
concurrency framework, expressed as a "synchronizes_with" assertion
along the lines of the Atomics framework.
While in most cases dedicated synchronisation is thus not required
anymore when swtiching to C++17, some special extended use cases
remain to be addressed, where the complete initialisation of
further support framework must be ensured.
With C++20 this would be easy to achieve with a std::latch, so we
need a simple workaround for the time being. After consideration of
the typical use case, I am aiming at a middle ground in terms of
performance, by using a yield-wait until satisfying the latch condition.
The investigation for #1279 leads to the following conclusions
- the features and the design of our custom thread-wrapper
almost entirely matches the design chosen meanwhile by the C++ committee
- the implementation provided by the standard library however uses
modern techniques (especially Atomics) and is more precisely worked out
than our custom implementation was.
- we do not need an *active* threadpool with work-assignment,
rather we'll use *active* workers and a *passive* pool,
which was easy to implement based on C++17 features
==> decision to drop our POSIX based custom implementation
and to retrofit the Thread-wrapper as a drop-in replacement
+++ start this refactoring by moving code into the Library
+++ create a copy of the Threadwrapper-code to build and test
the refactorings while the application itself still uses
existing code, until the transition is complete
...which however brings the problem that we can no longer block the destructor
of WorkForce by simply joining on all joinable threads (there is a race
between testing joinable() and invoking join(), which does not tolerate
non-joinable state.
There is a second problem: we need to detect and clean-up terminated workers,
even for just finding out how many workers are still active. Fortunately
doing so also solves the waiting problem in the destructor
While in principle it would be possible (and desirable)
to control worker behaviour exclusively through the Work-Functor's return code,
in practice we must concede that Exceptions can always happen from situations
beyond our control. And while it is necessary for the WorkForce-dtor to
join and block (we can not just pull away the resources from running threads),
the same destructor (when called out of order) must somehow be able
at least to ask the running threads to terminate.
Especially for unit tests this becomes an obnoxious problem -- otherwise
each test failure would cause the test runner to hang.
Thus adding an emergency halt, and also improve setup for tests
with a convenience function to inject a work-function-λ
- investigate consistency guarantees through acquire-release
==> turns out we do not need a fence, but it is tantamount
to have a guard variable and actually load and check
the value to ensure we indeed get a happens-before
- elaborate design of the WorkForce
+ no shared control variables necessary
+ no ability to forcibly shut-down the WorkForce
+ rather, all control will be exerted through the return value
of the Work-Functor
No new functionality, and implementation works as expected.
This test case covers an especially tricky setup, where a calculation
shall be triggered from an external event, while ensuring that the actual
processing can start only after also the regular time-bound scheduling
has taken place (this might be used to prevent an unexpectedly early
external signal to cause writing into an output buffer before the
defined window of data delivery)
...based on the new ability in the ActivityDetector, we can now assign
a custom λ, which deflects back the ctx.post() call into the ActivityLang
instance used for this test case.
While the previously seen behaviour was correct, it was not the call sequence
expected in the real implementation; with this change, on the main-chain
activation the post() now immediately dispatches the notification, which in turn
dispatches the rest of the chain, so that the JobFunctor is indeed
called in this second test case as expected
Up to now, the DiagnosticFun mock in ActivityDetector only
created an EventLog entry on invocation and was able to retunr
a canned result value. Yet for the job invocation scenario test,
it would be desirable to hook-in a λ with a fake implementation
into the ExecutionContext. As a further convenience, the
return value is now default initialised, instead of being
marked as uninitialised until invocation of "returning(val)"
...seems to work, but not really happy with the test setup,
since in real usage the post()-calls would dispatch, while here,
using the ActivityDetector, these calls just log invoation,
and thus the activation is not passed on
...regarding the kind of activity (the verb),
and also for some special case access of payload data;
deliberately asserting the correct verb, but no mandatory check,
since this whole Activity-Language is conceived as cohesive
and essentially sealed (not meant to be extended)
...to show in test that indeed the actual time is retrieved on each activation,
we can assign a λ -- which is rigged to increase the time on each access
It is not sufficient just to pass this "current time" as parameter
into the ActivityLang::dispatchChain(), since some Activities within
this chain will essentially be long-running (think rendering); thus
we need a real callback from within the chain. The obvious solution
is to make this part of the Execution Context, which is an abstraction
of the scheduler environment anyway
...turns out there is still a lot of leeway in the possible implementation,
and seemingly it is too early to decide which case to consider the default.
Thus I'll proceed with the drafted preliminary solution...
- on primary-chain, an inhibited Gate dispatches itself into future for re-check
- on Notification, activation happens if and only if this very notification opens the Gate
- provide a specifically wired requireDirectActivation() to allow enforcing a minimal start time
...assembled from parts already implemented
TODO
- need a way to access the »current scheduler time«
- need builder extension points to connect notifications
...this completes the basic setup
- Term builder mechanism working properly
- Memory allocator behaves sane
- the simple default wiring allows to invoke a Job
While the ''general direction'' seems clear, some in-depth
analysis was required to find out what information can reasonably
be expected to be available at this point.
The decision was made to shift the actual deadline calculation
into the Job-Planning altogether, assuming that a preliminary solution
based on data implicitly available there will be enough to implement
simple linear playback, while precise management of job start times
can be added in later, when observation of actual timing behaviour
is available...
Solved by special treatment of a notification, which happens
to decrement the latch to zero: in this case, the chain is
dispatched, but also the Gate is locked permanently to block
any further activations scheduled or forwareded otherwise