- repeated invocations of the same test setup for statistics - the usual nasty 64-node graph with massive fork out - limit concurrency to 4 cores - tabulate data to look for clues regarding a trigger criteria Hypothesis: The Scheduler slips off schedule when all of the following three criteria are met: - more than 55% glitches with Δ > 2ms - σ > 2ms - ∅Δ > 4ms
127 lines
6.1 KiB
Text
127 lines
6.1 KiB
Text
Scheduler Integration Testing
|
||
-----------------------------
|
||
:date: 2023-12-09
|
||
|
||
After successfully completing the unit tests for the newly built scheduler,
|
||
the next goal was to get some complete integration test runs. Moreover,
|
||
in preparation for Load Testing, the `TestChainLoad` was developed,
|
||
which is a generator for artificial yet reproducivle test calculations.
|
||
This component generates a graph of interconnected nodes, where each
|
||
node will be invoked as an individual render job. However, nodes can
|
||
have predecessors and successors; in order to calculate a node's hash value,
|
||
the hash values of all predecessors must have been calculated.
|
||
|
||
In order to see what is happening in the scheduler and the job functors
|
||
generated by `TestChainLoad::setupSchedule()`, some print statements were
|
||
added temporarily. Together, these generate a trace to indicate all planning,
|
||
scheduling and dispatching events, together with an abbreviated thread-ID
|
||
and a time relative to the first scheduler time encountered during this
|
||
test run. There is a changeset in the Git history which _removes_ all
|
||
these print statements -- and by reverting this git commit, it should be
|
||
possible to re-introduce this simplistic instrumentation.
|
||
|
||
This directory documents some Dump-Logs and detail findings from the
|
||
Debugging- and Investigation-Sessions done at that time.
|
||
|
||
.TestChainLoad_test::usageExample
|
||
Generally speaking, this test presents a slightly unusal load.
|
||
It is comprised of peaks -- one for _each level of the graph_ -- each
|
||
requiring typically 5 jobs to be calculated, all with the same start time.
|
||
The graph is composed of small join-trees, 2-3 nodes deep, which are started
|
||
interleaved. In the version used here, the actual calculations produce no
|
||
tangible CPU load and thus the job processing is absolutely dominated by
|
||
the scheduler management overhead. Several workers, pulling concurrently,
|
||
may create a slowdown due to contention on the »Grooming-Token«
|
||
|
||
SchedulerTestGraph.pdf::
|
||
Visualisation of the computation structure used for these tests
|
||
|
||
Dump-01::
|
||
example of a nice and clean run
|
||
|
||
Dump-02::
|
||
example of a run with insufficient planning pre-roll;
|
||
while some nodes miss their prerequisite calculations,
|
||
overall the Scheduler copes quite well.
|
||
+
|
||
- only two workers show up after the schedule of the first tick
|
||
- these manage to work through the initial backlog (...5000) in 900µs
|
||
- then the tick runs, and then the backlog is done after 1500µs
|
||
- now scheduling is ahead of schedule, causing mostly a single worker
|
||
to do the complete work.
|
||
- occasionally a second worker shows up, but the contention does not
|
||
cause much (if any) delay
|
||
- typically one _level_ is comprised of 5 jobs with the same start time,
|
||
and these are completed after ~300µs
|
||
|
||
Dump-03::
|
||
A test run with drastic failure
|
||
+
|
||
- seemingly, a larger fraction of the work capacity shows up simulatnously,
|
||
roughly at t ≈ 5000µs
|
||
- likewise by accident, at that point the planning is already mostly complete
|
||
- massive contention can be observed between the workers attempting to work
|
||
through the backlog, but (probably due to this contention) are never able
|
||
to catch up.
|
||
- final wake-up job is scheduled for t = 17000µs
|
||
- it is actually invoked at ≈ 27000µs (~10ms late), wakes up the main (test)
|
||
thread, which then aborts, since the final exit hash is still 0.
|
||
- notably there are very long stretches where a single thread repeatedly
|
||
attempts to acquire the GroomingToken, yet fails, while no other progress
|
||
can be seen in the log
|
||
- moreover the "now" times reported for various threads differ drastically,
|
||
and there is quite some contradiction to the order in which the dump strings
|
||
appear in the log. This _may hint at additional contention generated by the
|
||
dump output itself...._
|
||
|
||
Dump-04::
|
||
Example of a successful run with a computational load of 100µs per step.
|
||
This example was obtained form running `TestChainLoad_test::usageExample()`
|
||
with additional print-statements inserted into the code.
|
||
|
||
Dump-05::
|
||
Example of processing a more complicated and hightly interdependent graph.
|
||
This is a capture of SchedulerService_test::processSchedule(),
|
||
using a fixed level-spacing of 1ms, but a computational base-weight of 500µs.
|
||
+
|
||
- When looking at the graph, it is clear that this schedule is unfavourable,
|
||
dictating excess wait times at the beginning and overloading the Scheduler
|
||
(concurrency=8) towards the end.
|
||
- Yet in spite of these difficult conditions, parallelisation happens mostly
|
||
at those parts of the graph where it is possible
|
||
- Note moreover, that the actual computation time spent in those nodes with
|
||
a computational weight (suffix `.1` ... `.3`) is mostly in excess of the
|
||
configured times. For example, a node with weight 3 (≙ 3·500µs) often needs
|
||
2-3ms to complete. This is probably due to cache effects, since the weight
|
||
calibration is performed in a hot microbenchmark loop, while here, in a
|
||
more realistic setting, the cpu-load code is invoked intermingled with
|
||
management code, atomic exchanges, sleep-waits and OS scheduler activity.
|
||
|
||
Dump-06::
|
||
The same test setup, but after rework of the Scheduler entrance and
|
||
with better schedule focus and a contention mitigation scheme (28.12.23)
|
||
+
|
||
- refTime(singleThr): 31.8014ms
|
||
- runTime(Scheduler): 42.0706ms
|
||
|
||
|
||
Time-01::
|
||
Search for the »breaking Point« where the computation slips off schedule.
|
||
+
|
||
- using a load graph with 64 nodes in a single chain
|
||
- this chain progresses for the first half with no or low concurrency
|
||
- after node-23 the graph forks out massively up to 16 nodes in parallel
|
||
- concurrency is set to 4 and the worker pool limited accordingly
|
||
+
|
||
The overall density of the schedule (»stress factor«) is increased in small steps,
|
||
repeating each series 30 times and computing averages. The Scheduler is able to
|
||
follow initially, with very low standard derivation and low average delta to
|
||
the scheduled overall time. When approaching stress factor = 0.5, the number
|
||
of _glitches_ (with Δ > 2ms) increases, while the average delta and the
|
||
standard derivation exhibit a steep rise. This leads to the following
|
||
trigger criteria
|
||
+
|
||
- more than 55% glitches with Δ > 2ms
|
||
- σ > 2ms
|
||
- ∅Δ > 4ms
|
||
|