After an extended break due to "real life issues"....
Pick up the investigation, with the goal to ascertain a valid definition
and understanding of all test parameters. A first step is to establish
a baseline ''without using a computational load''; this might be some kind
of base overhead of the scheduler.
However -- the way the test scaffolding was built, it is difficult to
create a feedback loop for the statistical test setup with binary search,
since it is not really clear how the single control parameter of the test algorithm,
the so called "stress factor", shall be interpreted and how it can be
combined with a base load.
An extended series of tests, while watching the observed value patterns qualitatively,
seems to corroborate the former results, indicating that the base expense
in my test setup (using a debug build) is at ~200µs / Node / core.
Yet the difficulty to interpret this result and arrive at a logical and generic model
prevents me from translating this into a measurement scheme, which can
be executed independently from a specific test setup and hardware
While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.
Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.
...based on the adapted time-factor sequence
implemented yesterday in TestChainLoad itself
- in this case, the TimeBase from the computation load is used as level speed
- this »base beat« is then modulated by the timing factor sequence
- working in an additional stress factor to press the schedule uniformly
- actual start time will be added as offset once the actual test commences
...so IterExplorer got yet another processing layer,
which uses the grouping mechanics developed yesterday,
but is freely configurable through λ-Functions.
At actual usage sit in TestChainLoad, now only the actual
aggregation computation must be supplied, and follow-up computations
can now be chained up easily as further transformation layers.
...causing the system to freeze due to excess memory allocation.
Fortunately it turned out this was not an error in the Scheduler core
or memory manager, but rather a sloppiness in the test scaffolding.
However, this incident highlights that the memory manager lacks some
sanity checks to prevent outright nonsensical allocation requests.
Moreover it became clear again that the allocation happens ''already before''
entering the Scheduler — and thus the existing sanity check comes too late.
Now I've used the same reasoning also for additional checks in the allocator,
limiting the Epoch increment to 3000 and the total memory allocation to 8GiB
Talking of Gibitbytes...
indeed we could use a shorthand notation for that purpose...
The last round of refactorings yielded significant improvements
- parallelisation now works as expected
- processing progresses closer to the schedule
- run time was reduced
The processing load for this test is tuned in a way to overload the
scheduler massively at the end -- the result must be correct non the less.
There was one notable glitch with an assertion failure from the memory manager.
Hopefully I can reproduce this by pressing and overloading the Scheduler more...
..initial gauging is a tricky subject,
since existing computer's performance spans a wide scale
Allowing
- pre calibration -98% .. +190%
- single run ±20%
- benchmark ±5%
...which can be deliberately attached (or not attached) to the
individual node invocation functor, allowing to study the effect
of actual load vs. zero-load and worker contention
...during development of the Chain-Load, it became clear that we'll often
need a collection of small trees rather than one huge graph. Thus a rule
for pruning nodes and finishing graphs was added. This has the consequence
that there might now be several exit nodes scattered all over the graph;
we still want one single global hash value to verify computations,
thus those exit hashes must now be picked up from the nodes and
combined into a single value.
All existing hash values hard coded into tests must be updated
Invent a special JobFunctor...
- can be created / bound from a λ
- self-manages its storage on the heap
- can be invoked once, then discards itself
Intention is to pass such one-time actions to the Scheduler
to cause some ad-hoc transitions tied to curren circumstances;
a notable example will be the callback after load-test completion.
... so this (finally) is the missing cornerstone
... traverse the calculation graph and generate render jobs
... provide a chunk-wise pre-planning of the next batch
... use a future to block the (test) thread until completed
- decided to abstract the scheduler invocations as λ
- so this functor contains the bare loop logic
Investigation regarding hash-framework:
It turns out that boost::hash uses a different hash_combine,
than what we have extracted/duplicated in lib/hash-value.hpp
(either this was a mistake, or boost::hash did use this weaker
function at that time and supplied a dedicated 64bit implementation later)
Anyway, should use boost::hash for the time being
maybe also fix the duplicated impl in lib/hash-value.hpp
- use a dedicated context "dropped off" the TestChainLoad instance
- encode the node-idx into the InvocationInstanceID
- build an invocation- and a planning-job-functor
- let planning progress over an lib::UninitialisedStorage array
- plant the ActivityTerm instances into that array as Scheduling progresses
Introduced as remedy for a long standing sloppiness:
Using a `char[]` together with `reinterpret_cast` in storage management helpers
bears danger of placing objects with wrong alignment; moreover, there are increasing
risks that modern code optimisers miss the ''backdoor access'' and might apply too
aggressive rewritings.
With C++17, there is a standard conformant way to express such a usage scheme.
* `lib::UninitialisedStorage` can now be used in a situation (e.g. as in `ExtentFamily`)
where a complete block of storage is allocated once and then subsequently used
to plant objects one by one
* moreover, I went over the code base and adapted the most relevant usages of
''placement-new into buffer'' to also include the `std::launder()` marker
... special rule to generate a fixed expansion on each seed
... consecutive reductions join everything back into one chain
... can counterbalance expansions and reductions
...as it turns out, the solution embraced first was the cleanest way
to handle dynamic configuration of parameters; just it did not work
at that time, due to the reference binding problem in the Lambdas.
Meanwhile, the latter has been resolved by relying on the LazyInit
mechanism. Thus it is now possible to abandon the manipulation by
side effect and rather require the dynamic rule to return a
''pristine instance''.
With these adjustments, it is now possible to install a rule
which expands only for some kinds of nodes; this is used here
to crate a starting point for a **reduction rule** to kick in.
It seams indicated to verify the generated connectivity
and the hash calculation and recalculation explicitly
at least for one example topology; choosing a topology
comprised of several sub-graphs, to also verify the
propagation of seed values to further start-nodes.
In order to avoid addressing nodes directly by index number,
those sub-graphs can be processed by ''grouping of nodes'';
all parts are congruent because topology is determined by
the node hashes and thus a regular pattern can be exploited.
To allow for easy processing of groups, I have developed a
simplistic grouping device within the IterExplorer framework.
- with the new pruning option, start-Nodes can now be anywhere
- introduce predicates to detect start-Nodes and exit-Nodes
- ensure each new seed node gets the global seed on graph construction
- provide functionality to re-propagate a seed and clear hashes
- provide functionality to recalculate the hashes over the graph
up to now, random values were completely determined by the
Node's hash, leading to completely symmetrical topology.
This is fine, but sometimes additional randomness is desirable,
while still keeping everything deterministic; the obvious solution
is to make the results optionally dependent on the invocation order,
which is simply to achieve with an additional state field. After some
tinkering, I decided to use the most simplistic solution, which is
just a multiplication with the state.
...so this was yet another digression, caused by the desire
somehow to salvage this problematic component design. Using a
DSL token fluently, while internally maintaining a complex and
totally open function based configuration is a bit of a stretch.
For context: I've engaged into writing a `LazyInit` helper component,
to resolve the inner contradiction between DSL use of `RandomDraw`
(implying value semantics) and the design of a processing pipeline,
which quite naturally leads to binding by reference into the enclosing
implementation.
In most cases, this change (to lazy on-demand initialisation) should be
transparent for the complete implementation code in `RandomDraw` -- with
one notable exception: when configuring an elaborate pipeline, especially
with dynamic changes of the probability profile during the simulation run,
then then obviously there is the desire to use the existing processing
pipeline from the reconfiguration function (in fact it would be quite
hard to explain why and where this should be avoided). `LazyInit` breaks
this usage scenario, since -- at the time the reconfiguration runs --
now the object is not initialised at all, but holds a »Trojan« functor,
which will trigger initialisation eventually.
After some headaches and grievances (why am I engaging into such an
elaborate solution for such an accidental and marginal topic...),
unfortunately it occurred to me that even this problem can be fixed,
with yet some further "minimal" adjustments to the scheme: the LazyInit
mechanism ''just needs to ensure'' that the init-functor ''sees the
same environment as in eager init'' -- that is, it must clear out the
»Trojan« first, and it ''could apply any previous pending init function''
fist. That is, with just a minimal change, we possibly build a chain
of init functors now, and apply them in given order, so each one
sees the state the previous one created -- as if this was just
direct eager object manipulation...
...this is a more realistic demo example, which mimics
some of the patterns present in RandomDraw. The test also
uses lambdas linking to the actual storage location, so that
the invocation would crash on a copy; LazyInit was invented
to safeguard against this, while still allowing leeway
during the initialisation phase in a DSL.
...oh my.
This is getting messy. I am way into danger territory now....
I've made a nifty cool design with automatically adapted functors;
yet at the end of the day, this does not bode well with a DSL usage,
where objects appear to be simple values from a users point of view.
- Helper function to find out of two objects are located
"close to each other" -- which can be used as heuristics
to distinguish heap vs. stack storage
- further investigation shows that libstdc++ applies the
small-object optimisation for functor up to »two slots«
in size -- but only if the copy-ctor is trivial. Thus
a lambda capturing a shared_ptr by value will *always*
be maintained in heap storage (and LazyInit must be
redesigned accordingly)...
- the verify_inlineStorage() unit test will now trigger
if some implementation does not apply small-object optimisation
under these minimal assumptions
the RandomDraw rules developed last days are meant to be used
with user-provided λ-adapters; employing these in a context
of a DSL runs danger of producing dangling references.
Attempting to resolve this fundamental problem through
late-initialisation, and then locking the component into
a fixed memory location prior to actual usage. Driven by
the goal of a self-contained component, some advanced
trickery is required -- which again indicates better
to write a library component with adequate test coverage.
...now using the reworked partial-application helper...
...bind to *this and then recursively re-invoke the adaptation process
...need also to copy-capture the previously existing mapping-function
first test seems to work now
Investigation in test setup reveals that the intended solution
for dynamic configuration of the RandomDraw can not possibly work.
The reason is: the processing function binds back into the object instance.
This implies that RandomDraw must be *non-copyable*.
So we have to go full circle.
We need a way to pass the current instance to the configuration function.
And the most obvious and clear way would be to pass it as function argument.
Which however requires to *partially apply* this function.
So -- again -- we have to resort to one of the functor utilities
written several years ago; and while doing so, we must modernise
these tools further, to support perfect forwarding and binding
of reference arguments.
- strive at complete branch coverage for the mapping function
- decide that the neutral value can deliberately lie outside
the value range, in which case the probability setting
controls the number of _value_ result incidents vs
neutral value result incidents.
- introduce a third path to define this case clearly
- implement the range setting Builder-API functions
- absorb boundrary and illegal cases
For sake of simplicity, since this whole exercise is a byproduct,
the mapping calculations are done in doubles. To get even distribution
of values and a good randomisation, it is thus necessary to break
down the size_t hash value in a first step (size_t can be 64bit
and random numbers would be subject to rounding errors otherwise)
The choice of this quantiser is tricky; it must be a power of two
to guarantee even distribution, and if chosen to close to the grid
of the result values, with lower probabilities we'd fail to cover
some of the possible result values. If chosen to large, then
of course we'd run danger of producing correlated numbers on
consecutive picks.
Attempting to use 4 bits of headroom above the log-2 of the
required value range. For example, 10-step values would use
a quantiser of 128, which looks like a good compromise.
The following tests will show how good this choice holds up.
This highly optimised function was introduced about one year ago
for handling of denomals with rational values (fractions), as
an interim solution until we'll switch to C++20.
Since this function uses an unrolled loop and basically
just does a logarithmic search for the highest set bit,
it can just be declared constexpr. Moreover, it is now
relocated into one of the basic utility headers
Remark: the primary "competitor" is the ilogb(double),
which can exploit hardware acceleration. For 64bit integers,
the ilog2() is only marginally faster according to my own
repeated invocation benchmarks.
The first step was to allow setting a minimum value,
which in theory could also be negative (at no point is the
code actually limited to unsigned values; this is rather
the default in practice).
But reconsidering this extensions, then you'd also want
the "neutral value" to be handled properly. Within context,
this means that the *probability* controls when values other
than the neutral value are produced; especially with p = 1.0
the neutral value shall not be produced at all
...since the Policy class now defines the function signature,
we can no longer assume that "input" is size_t. Rather, all
invocations must rely on the generic adaptaion scheme.
Getting this correct turns out rather tricky again;
best to rely on a generic function-composition.
Indeed I programmed such a helper several years ago,
with the caveat that at that time we used C++03 and
could not perfect-forward arguments. Today this problem
can be solved much more succinct using generic Lambdas.
to define this as a generic library component,
any reference to the actual data source moust be extracted
from the body of the implementation and supplied later
at usage site. In the actual case at hand the source
for randomness would be the node hash, and that is
absolutely an internal implementation detail.
The idea is to use some source of randomness to pick a
limited parameter value with controllable probability.
While the core of the implementation is nothing more
than some simple numeric adjustments, these turn out
to be rather intricate and obscure; the desire to
package these technicalities into a component
however necessitates to make invocations
at usage site self explanatory.
...these were developed driven by the immediate need
to visualise ''random generated computation patterns''
for ''Scheduler load testing.''
The abstraction level of this DSL is low
and structures closely match some clauses of the DOT language;
this approach may not yet be adequate to generate more complex
graph structures and was extracted as a starting point
for further refinements....