The last round of refactorings yielded significant improvements - parallelisation now works as expected - processing progresses closer to the schedule - run time was reduced The processing load for this test is tuned in a way to overload the scheduler massively at the end -- the result must be correct non the less. There was one notable glitch with an assertion failure from the memory manager. Hopefully I can reproduce this by pressing and overloading the Scheduler more...
99 lines
5 KiB
Text
99 lines
5 KiB
Text
Scheduler Integration Testing
|
|
-----------------------------
|
|
:date: 2023-12-09
|
|
|
|
After successfully completing the unit tests for the newly built scheduler,
|
|
the next goal was to get some complete integration test runs. Moreover,
|
|
in preparation for Load Testing, the `TestChainLoad` was developed,
|
|
which is a generator for artificial yet reproducivle test calculations.
|
|
This component generates a graph of interconnected nodes, where each
|
|
node will be invoked as an individual render job. However, nodes can
|
|
have predecessors and successors; in order to calculate a node's hash value,
|
|
the hash values of all predecessors must have been calculated.
|
|
|
|
In order to see what is happening in the scheduler and the job functors
|
|
generated by `TestChainLoad::setupSchedule()`, some print statements were
|
|
added temporarily. Together, these generate a trace to indicate all planning,
|
|
scheduling and dispatching events, together with an abbreviated thread-ID
|
|
and a time relative to the first scheduler time encountered during this
|
|
test run. There is a changeset in the Git history which _removes_ all
|
|
these print statements -- and by reverting this git commit, it should be
|
|
possible to re-introduce this simplistic instrumentation.
|
|
|
|
This directory documents some Dump-Logs and detail findings from the
|
|
Debugging- and Investigation-Sessions done at that time.
|
|
|
|
.TestChainLoad_test::usageExample
|
|
Generally speaking, this test presents a slightly unusal load.
|
|
It is comprised of peaks -- one for _each level of the graph_ -- each
|
|
requiring typically 5 jobs to be calculated, all with the same start time.
|
|
The graph is composed of small join-trees, 2-3 nodes deep, which are started
|
|
interleaved. In the version used here, the actual calculations produce no
|
|
tangible CPU load and thus the job processing is absolutely dominated by
|
|
the scheduler management overhead. Several workers, pulling concurrently,
|
|
may create a slowdown due to contention on the »Grooming-Token«
|
|
|
|
SchedulerTestGraph.pdf::
|
|
Visualisation of the computation structure used for these tests
|
|
|
|
Dump-01::
|
|
example of a nice and clean run
|
|
|
|
Dump-02::
|
|
example of a run with insufficient planning pre-roll;
|
|
while some nodes miss their prerequisite calculations,
|
|
overall the Scheduler copes quite well.
|
|
+
|
|
- only two workers show up after the schedule of the first tick
|
|
- these manage to work through the initial backlog (...5000) in 900µs
|
|
- then the tick runs, and then the backlog is done after 1500µs
|
|
- now scheduling is ahead of schedule, causing mostly a single worker
|
|
to do the complete work.
|
|
- occasionally a second worker shows up, but the contention does not
|
|
cause much (if any) delay
|
|
- typically one _level_ is comprised of 5 jobs with the same start time,
|
|
and these are completed after ~300µs
|
|
|
|
Dump-03::
|
|
A test run with drastic failure
|
|
+
|
|
- seemingly, a larger fraction of the work capacity shows up simulatnously,
|
|
roughly at t ≈ 5000µs
|
|
- likewise by accident, at that point the planning is already mostly complete
|
|
- massive contention can be observed between the workers attempting to work
|
|
through the backlog, but (probably due to this contention) are never able
|
|
to catch up.
|
|
- final wake-up job is scheduled for t = 17000µs
|
|
- it is actually invoked at ≈ 27000µs (~10ms late), wakes up the main (test)
|
|
thread, which then aborts, since the final exit hash is still 0.
|
|
- notably there are very long stretches where a single thread repeatedly
|
|
attempts to acquire the GroomingToken, yet fails, while no other progress
|
|
can be seen in the log
|
|
- moreover the "now" times reported for various threads differ drastically,
|
|
and there is quite some contradiction to the order in which the dump strings
|
|
appear in the log. This _may hint at additional contention generated by the
|
|
dump output itself...._
|
|
|
|
Dump-04::
|
|
Example of a successful run with a computational load of 100µs per step.
|
|
This example was obtained form running `TestChainLoad_test::usageExample()`
|
|
with additional print-statements inserted into the code.
|
|
|
|
Dump-05::
|
|
Example of processing a more complicated and hightly interdependent graph.
|
|
This is a capture of SchedulerService_test::processSchedule(),
|
|
using a fixed level-spacing of 1ms, but a computational base-weight of 500µs.
|
|
+
|
|
- When looking at the graph, it is clear that this schedule is unfavourable,
|
|
dictating excess wait times at the beginning and overloading the Scheduler
|
|
(concurrency=8) towards the end.
|
|
- Yet in spite of these difficult conditions, parallelisation happens mostly
|
|
at those parts of the graph where it is possible
|
|
- Note moreover, that the actual computation time spent in those nodes with
|
|
a computational weight (suffix `.1` ... `.3`) is mostly in excess of the
|
|
configured times. For example, a node with weight 3 (≙ 3·500µs) often nees
|
|
2-3ms to complete. This is probably due to cache effects, since the weight
|
|
calibration is performed in a hot microbenchmark loop, while here, in a
|
|
more realistic setting, the cpu-load code is invoked intermingled with
|
|
management code, atomic exchanges, sleep-waits and OS scheduler activity.
|
|
|