* most usages are drop-in replacements
* occasionally the other convenience functions can be used
* verify call-paths from core code to identify usages
* ensure reseeding for all tests involving some kind of randomness...
__Note__: some tests were not yet converted,
since their usage of randomness is actually not thread-safe.
This problem existed previously, since also `rand()` is not thread safe,
albeit in most cases it is possible to ignore this problem, as
''garbled internal state'' is also somehow „random“
Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.
My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.
- now there can not be any direct dispatch anymore when entering events
- thus there is no decision logic at entrance anymore
- rather the work-function implementation moved down into Layer-2
- so add a unit-test like coverage there (integration in SchedulerService_test)
This amounts to a rather massive refactoring, prompted by the enduring problems
observed when pressing the scheduler. All the various glitches and (fixed) crashes
are related to the way how planning-jobs enter the schedule items,
which is also closely tied to the difficulties getting the locking
for planning-jobs correct.
The solution pursued hereby is to reorder the main avenues into the
scheduler implementation. There is now a streamlined main entrance,
which **always** enqueues only, allowing to omit most checks and
coordination. On the other hand, the complete coordination and dispatch
of the work capacity is now shifted down into the SchedulerCommutator,
thereby linking all coordination and access control close together
into a single implementation facility.
If this works out as intended
- several repeated checks on the Grooming-Token could be omitted (performance)
- the planning-job would no longer be able to loose / drop the Token,
thereby running enforcedly single-threaded (as was the original intention)
- since all planning effectively originates from planning-jobs, this
would allow to omit many safety barriers and complexities at the
scheduler entrance avenue, since now all entries just go into the queue.
WIP: tests pass compiler, but must be adapted / reworked
The last round of refactorings yielded significant improvements
- parallelisation now works as expected
- processing progresses closer to the schedule
- run time was reduced
The processing load for this test is tuned in a way to overload the
scheduler massively at the end -- the result must be correct non the less.
There was one notable glitch with an assertion failure from the memory manager.
Hopefully I can reproduce this by pressing and overloading the Scheduler more...
The rework from yesterday turned out to be effective ... unfortunately
a bit to much: since now late follow-up notifications take precedence,
a single worker tends to process the complete chain depth-first, because
the first chain will be followed and processed, even before the worker
was able to post the tasks for the other branches. Thus this single
worker is the only one to get a chance to proceed.
After some consideration, I am now leaning towards a fundamental change,
instead of just fixing some unfavourable behaviour pattern: while the
language semantics remains the same, the scheduler should no longer
directly dispatch into the next chain **from λ-post**. That is, whenever
a POST / NOTIFY is issued from the Activity-chain, the scheduler goes
through prioritisation.
This has further ramifications: we do not need a self-inhibition mechanism
any more (since now NOTIFY picks up the schedule time of the target).
With these changes, processing seems to proceed more smoothly,
albeit still with lots of contention on the Grooming token,
at least in the example structure tested here.
use a feature of the Activity-Language prepared for this purpose:
self-Inhibition of the Chain. This prevents a prerequisite-NOTIFY
to trigger a complete chain of available tasks, before these tasks
have actually reached their nominal scheduling time.
This has the effect to align the computations much more strictly
with the defined schedule
The main (test) thread is kept in a blocking wait until the
planned schedule is completed. If however the schedule overruns,
the wake-up job could just be triggered prematurely.
This can easily be prevented by adding a dependency from the last
computation job to the wake-up job. If the computation somehow
flounders, the SAFETY_TIMEOUT (5s) will eventually raise
an exception to let the test fail cleanly (shutting down
the Scheduler automatically)
...this is an interesting test failure, which highlights inconsistencies
with handling of deadlines when processing follow-up from NOTIFY-triggers
There was also some fuzziness related to the ''meaning'' of λ-post,
leading to at least one superfluous POST invocation for each propagation;
fixing this does not solve the problem yet removes unnecessary overhead
and lock-contention
...playing around with the graph for the Scheduler integration test
...single threaded run time seemed to behave irregular
...but in fact it is very close to what can be expected
based on an ''averaged node weight''
Fortunately its very simple to add that into the existing node statistics
Basically this is all done and settled already: this is the `usageExample()`
from `TestChainLoadTest`. However, the focus is slightly different here:
We want a demonstration that the Scheduler can work flawlessly through
a massive load. Thus the plan is to use much more challenging parameters,
and then lean back and watch what happens....
...which turns out to be due to the DUMP-Statements,
which seem to create quite some contention on their own.
Test cases with very tight schedule will slip away then;
without print statement everything is GREEN now
...which can be deliberately attached (or not attached) to the
individual node invocation functor, allowing to study the effect
of actual load vs. zero-load and worker contention
The first complete integration test with Chain-Load
highlighted some difficulties with the overall load regulation:
- it works well in the standard case (but is possibly to eager to scale up)
- the scale-up sometimes needs several cycles to get "off the ground"
- when the first job is dispatched immediately instead of going
through the queue, the scheduler fails to boot up
The test case "scheduleRenderJob()" -- while deliberately operated
quite artificially with a disabled WorkForce (so the test can check
the contents in the queue and then progress manually -- led to discovery
of an open gap in the logic: in the (rare) case that a new task is
added ''from the outside'' without acquiring the Grooming-Token, then
the new task could sit in the entrace queue, in worst case for 50ms,
until the next Scheduler-»Tick« routinely sweeps this queue. Under
normal conditions however, each dispatch of another activity will
also sweep the entrance queue, yet if there happens to be no other
task right now, a new task could be stuck.
Thinking through this problem also helped to amend some aspects
of Grooming-Token handling and clarified the role of the API-functions.
...especially to prevent a deadline way too far into the future,
since this would provoke the BlockFlow (epoch based) memory manager
to run out of space.
Just based on gut feeling, I am now imposing a limit of 20seconds,
which, given current parametrisation, with a minimum spacing of 6.6ms
and 500 Activities per Block would at maximum require 360 MiB for
the Activities, or 3000 Blocks. With *that much* blocks, the
linear search would degrade horribly anyway...
WorkForce scales down automatically after 2 seconds when
workers fall idle; thus we need to step up automatically
with each new task.
Later we'll also add some capacity management to both the
LoadController and the Job-Planning, but for now this rather
crude approach should suffice.
NOTE: most of the cases in SchedulerService_test verify parts
of the component integration and thus need to bypass this
automatism, because the test code wants to invoke the
work-Function directly (without any interference
from running workers)
While building increasingly complex integration tests for the Scheduler,
it turns out helpful to be able to manipulate the "full concurreency"
as used by Scheduler, WorkForce and LoadController.
In the current test, I am facing a problem that new entries from the
threadsafe entrance queue are not propagated to the priority queue
soon enough; partly this is due to functionality still to be added
(scaling up when new tasks are passed in) -- but this will further
complicate the test setup.
The invocation structure is effectively determined by the
Activity-chain builder from the Activity-Language; but, taking
into account the complexity of the Scheduler code developed thus far,
it seems prudent to encapsulate the topic of "Activities" altogether
and expose only a convenience builder-API towards the Job-Planning
The problem with passing the deadline was just a blatant symptom
that something with the overall design was not quite right, leading
to mix-up of interfaces and implementation functions, and more and more
detail parameters spreading throughout the call chains.
The turning point was to realise the two conceptual levels
crossing and interconnected within the »Scheduler-Service«
- the Activity-Language describes the patterns of processing
- the Scheduler components handle time-bound events
So by turning the (previously private) queue entry into an
ActivationEvent, the design could be balanced.
This record becomes the common agens within the Scheduler,
and builds upon / layers on top of the common agens of the
Language, which is the Activity record.
This is the first kind of integration,
albeit still with a synthetic load.
- placed two excessive load peaks in the scheduling timeline
- verified load behaviour
- verified timings
- verified that the scheduler shuts down automatically when done
- sample distance to scheduler head whenever a worker asks for work
- moving average with N = worker-pool size and damp-factor 2
- multiply with the current concurrency fraction
as an aside, the header lib/test/microbenchmark.hpp
turns out to be prolific for this kind of investigation.
However, it is somewhat obnoxious that the »test subject«
must expose the signature <size_t(size_t)>.
Thus, with some metaprogramming magic, an generic adaptor
can be built to accept a range of typical alternatives,
and even the quite obvious signature void(void).
Since all these will be wrapped directly into a lambda,
the optimiser will remove these adaptations altogether.
- An important step towards a complete »Scheduler Service«
- Correct timing pattern could be verified in detail by tracing
- Spurred some further concept and design work regarding Load-control
- draft the duty cycle »tick«
- investigate corner cases of state updates and allocation managment
- implement start and forcible stop of the scheduler service
Obviously the better choice and a perfect fit for our requirements;
while the system-clock may jump and even move backwards on time service
adjustments, the steady clock just counts the ticks since last boot.
In libStdC++ both are implemented as int64_t and use nanoseconds resolution
- Ensure the grooming-token (lock) is reliably dropped
- also explicitly drop it prior to trageted sleeps
- properly signal when not able to acquire the token before dispatch
- amend tests broken by changes since yesterday
Notably the work-function is now completely covered, by adding
this last test, and the detailed investigations yesterday
ultimately unveiled nothing of concern; the times sum up.
Further reflection regarding the overall concept led me
to a surprising solution for the problem with priority classes.
...especially for the case »outgoing to sleep«
- reorganise switch-case to avoid falling through
- properly handle the tendedNext() predicate also in boundrary cases
- structure the decision logic clearer
- cover the new behaviour in test
Remark: when the queue falls empty, the scheduler now sends each
worker once into a targted re-shuffling delay, to ensure the
sleep-cycles are statistically evenly spaced
...there seemed to be an anomaly of 50...100µs
==> conclusion: this is due to the instrumentation code
- it largely caused by the EventLog, which was never meant
to be used in performance-critical code, and does hefty
heap allocations and string processing.
- moreover, there clearly is a cache-effect, adding a Factor 2
whenever some time passed since the last EventLog call
==> can be considered just an artifact of the test setup and
will have no impact on the scheduler
remark: this commit adds a lot of instrumentation code
To cover the visible behaviour of the work-Function,
we have to check an amalgam of timing delays and time differences.
This kind of test tends to be problematic, since timings are always
random and also machine dependent, and thus we need to produce pronounced effects
- organise by principles rather than implementing a mechanism
- keep the first version simple yet flexible
- conduct empiric research under synthetic load
Basic scheme:
- tend for next
- classify free capacity
- scattered targeted wait
The »Scheduler Service« will be assembled
from the components developed during the last months
- Layer-1
- Layer-2
- Activity-Language
- Block-Flow
- Work-Force