...which can be deliberately attached (or not attached) to the
individual node invocation functor, allowing to study the effect
of actual load vs. zero-load and worker contention
Within Chain-Load, the infrastructure to add this crucial feature
is minimal: each node gets a `weight` parameter, which is assigned
using another RandomDraw-Rule (by default `weight==0`).
The actual computation load will be developed as a separate component
and tied in from the node calculation job functor.
...during development of the Chain-Load, it became clear that we'll often
need a collection of small trees rather than one huge graph. Thus a rule
for pruning nodes and finishing graphs was added. This has the consequence
that there might now be several exit nodes scattered all over the graph;
we still want one single global hash value to verify computations,
thus those exit hashes must now be picked up from the nodes and
combined into a single value.
All existing hash values hard coded into tests must be updated
...with this change, processing is ''ahead of schedule'' from the beginning,
which has the nice side effect that the problematic contention situation
with these very short computation jobs can not arise, and most of the schedule
is processed by a single worker.
Processing pattern is now pretty much as expected
This is a trick to get much better scheduling and timing guesses.
Instead of targeting a specific level, rather a fixed number of nodes
is processed in each chunk, yet still always processing complete levels.
The final level number to expect can be retrieved from the chain-load graph.
With this refactoring, we can now schedule a wake-up job precisely
after the expected completion of the last level
Scheduling a wake-up job behind the end of the planned schedule did the trick.
Sometimes there is ''strong contention'' immediately after full provision of the WorkForce,
but this seems to be as expected, since the »Jobs« currently used have no
actually relevant run time on their own. It is even more surprising that
the Capacity-control logic is able to cope with this situation in a matter
of just some milliseconds, bringing the average Lag at ~ 300µs
two rather obvious bugfixes
(well, after watching the Scheduler in action...)
- the first planning-chunk needs an offset
- the future to block on must be setup before any dispatch happens
- prime diagnostics with the first time invocation
- print timings relative to this first invocation
- DUMP output to watch the crucial scheduling operations
... so this (finally) is the missing cornerstone
... traverse the calculation graph and generate render jobs
... provide a chunk-wise pre-planning of the next batch
... use a future to block the (test) thread until completed
- test setup without actual scheduler
- wire the callbacks such to verify
+ all nodes are touched
+ levels are processed to completion
+ the planning chunk stops at the expected level
+ all node dependencies are properly reported through the callbacks
- decided to abstract the scheduler invocations as λ
- so this functor contains the bare loop logic
Investigation regarding hash-framework:
It turns out that boost::hash uses a different hash_combine,
than what we have extracted/duplicated in lib/hash-value.hpp
(either this was a mistake, or boost::hash did use this weaker
function at that time and supplied a dedicated 64bit implementation later)
Anyway, should use boost::hash for the time being
maybe also fix the duplicated impl in lib/hash-value.hpp
- use a ''special encoding'' to marshal the specific coordinates for this test setup
- use a fixed Frame-Grid to represent the ''time level''
- invoke hash calculation through a specialised JobFunctor subclass
The number of nodes was just defined as template argument
to get a cheap implementation through std::array...
But actually this number of nodes is ''not a characteristics of the type;''
we'd end up with a distinct JobFunctor type for each different test size,
which is plain nonsensical. Usage analysis reveals, now that the implementation
is ''basically complete,'' that all of the topology generation and statistic
calculation code does not integrate deeply with the node storage, but
rather just iterates over all nodes and uses the ''first'' and ''last'' node.
This can actually be achieved very easy with a heap-allocated plain array,
relying on the magic of lib::IterExplorer for all iteration and transformation.
- use a dedicated context "dropped off" the TestChainLoad instance
- encode the node-idx into the InvocationInstanceID
- build an invocation- and a planning-job-functor
- let planning progress over an lib::UninitialisedStorage array
- plant the ActivityTerm instances into that array as Scheduling progresses
Since Chain-Load shall be used for performance testing of the scheduler,
we need a catalogue of realistic load patterns. This extended effort
started with some parameter configurations and developed various graph
shapes with different degree of connectivity and concurrency, ranging
from a stable sequence of very short chains to large and excessively
interconnected dependency networks.
Through introduction of a ''pruning rule'', it is possible
to create exit nodes in the middle of the graph. With increased
intensity of pruning, it is possible to ''choke off'' the generation
and terminate the graph; in such a case a new seed node is injected
automatically. By combination with seed rules, an equilibrium of
graph start and graph termination can be achieved.
Following this path, it should be possible to produce a pattern,
which is random but overall stable and well suited to simulate
a realistic processing load.
However, finding proper parameters turns out quite hard in practice,
since the behaviour is essentially contingent and most combinations
either lead to uninteresting trivial small graph chunks, or to
large, interconnected and exponentially expanding networks
... special rule to generate a fixed expansion on each seed
... consecutive reductions join everything back into one chain
... can counterbalance expansions and reductions
...as it turns out, the solution embraced first was the cleanest way
to handle dynamic configuration of parameters; just it did not work
at that time, due to the reference binding problem in the Lambdas.
Meanwhile, the latter has been resolved by relying on the LazyInit
mechanism. Thus it is now possible to abandon the manipulation by
side effect and rather require the dynamic rule to return a
''pristine instance''.
With these adjustments, it is now possible to install a rule
which expands only for some kinds of nodes; this is used here
to crate a starting point for a **reduction rule** to kick in.
- present the weight centres relative to overall level count
- detect sub-graphs and add statistics per subgraph
- include an evaluation for ''all nodes''
- include number of levels and subgraphs
- iterate over all nodes and classify them
- group per level
- book in per level statistics into the Indicator records
- close global averages
...just coded, not yet tested...
The graph will be used to generate a computational load
for testing the Scheduler; thus we need to compute some
statistical indicators to characterise this load.
As starting point sum counts and averages will be aggregated,
accounting for particular characterisation of nodes per level.
It seams indicated to verify the generated connectivity
and the hash calculation and recalculation explicitly
at least for one example topology; choosing a topology
comprised of several sub-graphs, to also verify the
propagation of seed values to further start-nodes.
In order to avoid addressing nodes directly by index number,
those sub-graphs can be processed by ''grouping of nodes'';
all parts are congruent because topology is determined by
the node hashes and thus a regular pattern can be exploited.
To allow for easy processing of groups, I have developed a
simplistic grouping device within the IterExplorer framework.
- with the new pruning option, start-Nodes can now be anywhere
- introduce predicates to detect start-Nodes and exit-Nodes
- ensure each new seed node gets the global seed on graph construction
- provide functionality to re-propagate a seed and clear hashes
- provide functionality to recalculate the hashes over the graph
this is only a minor rearrangement in the Algorithm,
but allows to re-boot computation should node connectivity
go to zero. With current capabilities, this could not happen,
but I'm considering to add a »pruning« parameter to create the
possibility to generate multiple shorter chains instead of one
complete chain -- which more closely emulates reality for
Scheduler load patterns.
RandomDraw as a library component was extracted and (grossly) augmented
to cut down the complexity exposed to the user of TestChainLoad.
To control the generated topology, random-selected parameters
must be configured, defining a probability profile; while
this can be achieved with simple math, getting it correct
turned out surprisingly difficult.
...start with putting the topology generator to work
- turns out it is still challenging to write the ctrl-rules
- and one example tree looked odd in the visualisation
- which (on investigation) indicated unsound behaviour
...this is basically harmless, but involves an integer wrap-around
in a variable not used under this conditions (toReduce), but also
a rather accidental and no very logical round-up of the topology.
With this fix, the code branch here is no longer overloaded with two
distinct concerns, which I consider an improvement
by default, a linear chain without any forking is generated,
and the result hash is computed by hash-chaining from the seed.
Verify proper connections and validate computed hash
..as can be expected, had do chase down some quite hairy problems,
especially since consumption of the fixed amount of nodes is not
directly linked to the ''beat'' of the main loop and thus boundary
conditions and exhausted storage can happen basically anywhere.
Used a simple expansion rule and got a nod graph,
which looks coherent in DOT visualisation.
writing a control-value rule for topology generation typically
involves some modulus and then arthmetic operations to map
only part of the value range to the expected output range.
These calculations are generic, noisy and error-prone.
Thus introduce a helper type, which allows the client just
to mark up the target range of the provided value to map and
transform to the actually expected result range, including some
slight margin to absorb rounding errors. Moreover, all calculations
done in double, to avoid the perils of unsigned-wrap-around.
...these were developed driven by the immediate need
to visualise ''random generated computation patterns''
for ''Scheduler load testing.''
The abstraction level of this DSL is low
and structures closely match some clauses of the DOT language;
this approach may not yet be adequate to generate more complex
graph structures and was extracted as a starting point
for further refinements....
With all the preceding DSL work, this turns out to be surprisingly easy;
the only minor twist is the grouping of nodes into (time)levels,
which can be achieved with a "lagging" update from the loop body
Note: next step will be to extract the DSL helpers into a Library header
...using a pre-established example as starting point
It seems that building up this kind of generator code
from a set of free functions in a secluded namespace
is the way most suitable to the nature of the C++ language
..the idea is to generate a Graphviz-DOT diagram description
by traversing the internal data structures of TestChainLoad.
- refreshed my Graphviz knowledge
- work out a diagram scheme that can be easily generated
- explore ways to structure code generation as a DSL to keep it legible
...introduce statistical control functions (based on hash)
...add processing stage for current set of nodes
...process forking, reduction and injection of new nodes
- use a specialised class, layered on top of std::array
- use additional storage to mark filling degree
- check/fail on link owerflow directly there
We still use fixed size inline storage for the node links,
yet adding this comparatively small overhead in storage helps
getting the code simpler and adding links is now constant-complexity
A »Node« represents one junction point in the dependency graph,
knows his predecessors and successors and carries out one step
of the chained hash calculation.