The Lumiera »Reference Platform« is now upgraded to Debian/Buster, which provides GCC-14 and Clang-20. Thus the compiler support for C++20 language features seems solid enough, and C++23, while still in ''experimental stage'' can be seen as a complement and addendum. This changeset * upgrades the compile switches for the build system * provides all the necessary adjustments to keep the code base compilable Notable changes: * λ-capture by value now requires explicit qualification how to handle `this` * comparison operators are now handled transparently by the core language, largely obsoleting boost::operators. This change incurs several changes to implicit handling rules and causes lots of ambiguities — which typically pinpoint some long standing design issues, especially related to MObjects and the ''time entities''. Most tweaks done here can be ''considered preliminary'' * unfortunately the upgraded standard ''fails'' to handle **tuple-like** entities in a satisfactory way — rather an ''exposition-only'' concept is introduced, which applies solely to some containers from the STL, thereby breaking some very crucial code in the render entities, which was built upon the notion of ''tuple-like'' entities and the ''tuple protocol''. The solution is to abandon the STL in this respect and **provide an alternative implementation** of the `apply` function and related elements.
2192 lines
78 KiB
C++
2192 lines
78 KiB
C++
/*
|
||
TEST-CHAIN-LOAD.hpp - produce a configurable synthetic computation load
|
||
|
||
Copyright (C)
|
||
2023, Hermann Vosseler <Ichthyostega@web.de>
|
||
|
||
**Lumiera** is free software; you can redistribute it and/or modify it
|
||
under the terms of the GNU General Public License as published by the
|
||
Free Software Foundation; either version 2 of the License, or (at your
|
||
option) any later version. See the file COPYING for further details.
|
||
|
||
*/
|
||
|
||
/** @file test-chain-load.hpp
|
||
** Generate synthetic computation load for Scheduler performance tests.
|
||
** The [Scheduler](\ref scheduler.hpp) is a service to invoke Render Job instances
|
||
** concurrently in accordance to a time plan. To investigate the runtime and performance
|
||
** characteristics of the implementation, a well-defined artificial computation load is
|
||
** necessary, comprised of the invocation of an extended number of Jobs, each configured
|
||
** to carry out a reproducible computation. Data dependencies between jobs can be established
|
||
** to verify handling of dependent jobs and job completion messages within the scheduler.
|
||
**
|
||
** # Random computation structure
|
||
** A system of connected hash values is used as computation load, akin to a blockchain.
|
||
** Each processing step is embodied into a node, with a hash value computed by combining
|
||
** all predecessor nodes. Connectivity is represented as bidirectional pointer links, each
|
||
** nodes knows its predecessors and successors (if any), while the maximum _fan out_ or
|
||
** _fan in_ and the total number of nodes is limited statically. All nodes are placed
|
||
** into a single pre-allocated memory block and always processed in ascending dependency
|
||
** order. The result hash from complete processing can thus be computed by a single linear
|
||
** pass over all nodes; yet alternatively each node can be _scheduled_ as an individual
|
||
** computation job, which obviously requires that it's predecessor nodes have already
|
||
** been computed, otherwise the resulting hash will not match up with expectation.
|
||
** If significant per-node computation time is required, an artificial computational
|
||
** load can be generated, controlled by a weight setting computed for each node.
|
||
**
|
||
** The topology of connectivity is generated randomly, yet completely deterministic through
|
||
** configurable _control functions_ driven by each node's (hash)value. This way, each node
|
||
** can optionally fork out to several successor nodes, but can also reduce and combine its
|
||
** predecessor nodes; additionally, new chains can be spawned (to simulate the effect of
|
||
** data loading Jobs without predecessor) and chains can be deliberately pruned, possibly
|
||
** splitting the computation into several disjoint sub-graphs. Anyway, the computation always
|
||
** begins with the _root node_, establishes the node links and marks each open end as an
|
||
** _exit node_ — until all the nodes in the pre-allocated node space were visited. Hash
|
||
** values of all exit nodes will be combined into one characteristic hash for the graph,
|
||
** The probabilistic rules controlling the topology can be configured using the lib::RandomDraw
|
||
** component, allowing either just to set a fixed probability or to define elaborate dynamic
|
||
** configurations based on the graph height or node connectivity properties.
|
||
** - expansionRule: controls forking of the graph behind the current node
|
||
** - reductionRule: controls joining of the graph into a combining successor node
|
||
** - seedingRule: controls injection of new start nodes in the middle of the graph
|
||
** - pruningRule: controls insertion of exit nodes to cut-off some chain immediately
|
||
** - weightRule: controls assignment of a Node::weight to command the ComputationalLoad
|
||
**
|
||
** ## Usage
|
||
** A TestChainLoad instance is created with predetermined maximum fan factor and a fixed
|
||
** number of nodes, which are immediately allocated and initialised. Using _builder notation,_
|
||
** control functions can then be configured. The [topology generation](\ref TestChainLoad::buildTopology)
|
||
** then traverses the nodes, starting with the seed value from the root node, and establishes
|
||
** the complete node connectivity. After this priming, the expected result hash should be
|
||
** [retrieved](\ref TestChainLoad::getHash). The node structure can then be traversed or
|
||
** [scheduled as Render Jobs](\ref TestChainLoad::scheduleJobs).
|
||
**
|
||
** A special use case is _not to build any topology,_ rather only #setWeight().
|
||
** All nodes ** will then be at level-0 and scheduled at t=0, causing the scheduler
|
||
** to process best effort in arbitrary order.
|
||
**
|
||
** ## Test support
|
||
** A tool for generating roughly calibrated [computational load](\ref ComputationalLoad) is included,
|
||
** to be controlled by the Node::weight stepping. Load can either be generated by arithmetic (hash)
|
||
** computations, or by accessing and adding memory in a private allocated data block. To make this
|
||
** load controllable, the instance is configured with a _time base_ setting, with sensible settings
|
||
** between 50µs to 100ms; moreover, a _calibration run_ is necessary once per runtime (static variables);
|
||
** the actual invocation uses a multiple of this base setting, as determined by the Node::weight.
|
||
**
|
||
** This allows to render individual steps in the calculation graph much more expensive than the
|
||
** associated organisational code and setup, and thus creates the foundation for building realistic
|
||
** test scenarios. For a given graph instance, a [runtime weight](\ref TestChainLoad::levelScheduleSequence)
|
||
** can be estimated heuristically, expressed in multiples of the Node::weight ≡ 1, taking into account
|
||
** the possible parallelisation _per level of nodes_ described by an idealised model.
|
||
**
|
||
** For the actual test run, a ScheduleCtx is built, using an actual scheduler instance. Specialised
|
||
** render job functors are provided to perform incremental job planning and invocation of individual
|
||
** nodes in the graph as computation steps, optionally with a computation load. The scheduler is
|
||
** triggered by inserting the initial planning job in a _pre roll phase,_ blocking the main thread
|
||
** until a callback job is invoked, which is set as final dependency behind the exit node of the
|
||
** complete graph, returning a observed runtime in microseconds from the nominal start point of
|
||
** the schedule.
|
||
**
|
||
** ## Observation tools
|
||
** The generated topology can be visualised as a graph, using the Graphviz-DOT language.
|
||
** Nodes are rendered from bottom to top, organised into strata according to the time-level
|
||
** and showing predecessor -> successor connectivity. Seed nodes are distinguished by
|
||
** circular shape.
|
||
**
|
||
** The complete graph can be [performed synchronously](\ref TestChainLoad::performGraphSynchronously),
|
||
** allowing to watch a [baseline run-time](\ref TestChainLoad::calcRuntimeReference) when execution
|
||
** all nodes consecutively, using the configured load but without any time gaps. The run time
|
||
** in µs can be compared to the timings observed when performing the graph through the Scheduler.
|
||
** Moreover, Statistics can be computed over the generated graph, allowing to draw some
|
||
** conclusions regarding node distribution and connectivity.
|
||
**
|
||
** @see TestChainLoad_test
|
||
** @see SchedulerStress_test
|
||
** @see random-draw.hpp
|
||
*/
|
||
|
||
|
||
#ifndef VAULT_GEAR_TEST_TEST_CHAIN_LOAD_H
|
||
#define VAULT_GEAR_TEST_TEST_CHAIN_LOAD_H
|
||
|
||
|
||
#include "vault/common.hpp"
|
||
#include "lib/test/transiently.hpp"
|
||
|
||
#include "vault/gear/job.h"
|
||
#include "vault/gear/scheduler.hpp"
|
||
#include "vault/gear/special-job-fun.hpp"
|
||
#include "lib/uninitialised-storage.hpp"
|
||
#include "lib/test/microbenchmark.hpp"
|
||
#include "lib/incidence-count.hpp"
|
||
#include "lib/time/timevalue.hpp"
|
||
#include "lib/time/quantiser.hpp"
|
||
#include "lib/iter-explorer.hpp"
|
||
#include "lib/format-string.hpp"
|
||
#include "lib/format-cout.hpp"
|
||
#include "lib/hash-combine.hpp"
|
||
#include "lib/random-draw.hpp"
|
||
#include "lib/dot-gen.hpp"
|
||
#include "lib/util.hpp"
|
||
|
||
#include <functional>
|
||
#include <utility>
|
||
#include <future>
|
||
#include <memory>
|
||
#include <string>
|
||
#include <vector>
|
||
#include <tuple>
|
||
#include <array>
|
||
|
||
|
||
namespace vault{
|
||
namespace gear {
|
||
namespace test {
|
||
|
||
using util::_Fmt;
|
||
using util::min;
|
||
using util::max;
|
||
using util::isnil;
|
||
using util::limited;
|
||
using util::unConst;
|
||
using util::toString;
|
||
using util::isLimited;
|
||
using util::showHashLSB;
|
||
using lib::time::Time;
|
||
using lib::time::TimeValue;
|
||
using lib::time::FrameRate;
|
||
using lib::time::Duration;
|
||
using lib::test::benchmarkTime;
|
||
using lib::test::microBenchmark;
|
||
using lib::test::Transiently;
|
||
using lib::meta::_FunRet;
|
||
|
||
using std::string;
|
||
using std::function;
|
||
using std::make_pair;
|
||
using std::forward;
|
||
using std::string;
|
||
using std::swap;
|
||
using std::move;
|
||
using std::chrono_literals::operator ""s;
|
||
|
||
namespace err = lumiera::error;
|
||
namespace dot = lib::dot_gen;
|
||
|
||
namespace { // Default definitions for structured load testing
|
||
|
||
const size_t DEFAULT_FAN = 16; ///< default maximum connectivity per Node
|
||
const size_t DEFAULT_SIZ = 256; ///< default node count for the complete load graph
|
||
|
||
const auto SAFETY_TIMEOUT = 5s; ///< maximum time limit for test run, abort if exceeded
|
||
const auto STANDARD_DEADLINE = 30ms; ///< deadline to use for each individual computation job
|
||
const size_t DEFAULT_CHUNKSIZE = 64; ///< number of computation jobs to prepare in each planning round
|
||
const double UPFRONT_PLANNING_BOOST = 2.6; ///< factor to increase the computed pre-roll to ensure up-front planning
|
||
const size_t GRAPH_BENCHMARK_RUNS = 5; ///< repetition count for reference calculation of a complete node graph
|
||
const size_t LOAD_BENCHMARK_RUNS = 500; ///< repetition count for calibration benchmark for ComputationalLoad
|
||
const double LOAD_SPEED_BASELINE = 100; ///< initial assumption for calculation speed (without calibration)
|
||
const microseconds LOAD_DEFAULT_TIME = 100us; ///< default time delay produced by ComputationalLoad at `Node.weight==1`
|
||
const size_t LOAD_DEFAULT_MEM_SIZE = 1000; ///< default allocation base size used if ComputationalLoad.useAllocation
|
||
const Duration SCHEDULE_LEVEL_STEP{_uTicks(1ms)}; ///< time budget to plan for the calculation of each »time level« of jobs
|
||
const Duration SCHEDULE_NODE_STEP{Duration::NIL}; ///< additional time step to include in the plan for each job (node).
|
||
const Duration SCHEDULE_PLAN_STEP{_uTicks(100us)}; ///< time budget to reserve for each node to be planned and scheduled
|
||
const Offset SCHEDULE_WAKE_UP{_uTicks(10us)}; ///< tiny offset to place the final wake-up job behind any systematic schedule
|
||
const bool SCHED_DEPENDS = false; ///< explicitly schedule a dependent job (or rely on NOTIFY)
|
||
const bool SCHED_NOTIFY = true; ///< explicitly set notify dispatch time to the dependency's start time.
|
||
|
||
inline uint
|
||
defaultConcurrency()
|
||
{
|
||
return work::Config::getDefaultComputationCapacity();
|
||
}
|
||
|
||
inline double
|
||
_uSec (microseconds ticks)
|
||
{
|
||
return std::chrono::duration<double, std::micro>{ticks}.count();
|
||
}
|
||
}
|
||
|
||
struct LevelWeight
|
||
{
|
||
size_t level{0};
|
||
size_t nodes{0};
|
||
size_t endidx{0};
|
||
size_t weight{0};
|
||
};
|
||
|
||
/**
|
||
* simplified model for expense of a level of nodes, computed concurrently.
|
||
* @remark assumptions of this model
|
||
* - weight factor describes expense to compute a node
|
||
* - nodes on the same level can be parallelised without limitation
|
||
* - no consideration of stacking / ordering of tasks; rather the speed-up
|
||
* is applied as an average factor to the summed node weights for a level
|
||
* @return guess for a compounded weight factor
|
||
*/
|
||
inline double
|
||
computeWeightFactor (LevelWeight const& lw, uint concurrency)
|
||
{
|
||
REQUIRE (0 < concurrency);
|
||
double speedUp = lw.nodes? lw.nodes / std::ceil (double(lw.nodes)/concurrency)
|
||
: 1.0;
|
||
ENSURE (1.0 <= speedUp);
|
||
return lw.weight / speedUp;
|
||
}
|
||
|
||
struct Statistic;
|
||
|
||
|
||
|
||
|
||
/***********************************************************************//**
|
||
* A Generator for synthetic Render Jobs for Scheduler load testing.
|
||
* Allocates a fixed set of #numNodes and generates connecting topology.
|
||
* @tparam maxFan maximal fan-in/out from a node, also limits maximal parallel strands.
|
||
* @see TestChainLoad_test
|
||
*/
|
||
template<size_t maxFan =DEFAULT_FAN>
|
||
class TestChainLoad
|
||
: util::MoveOnly
|
||
{
|
||
|
||
public:
|
||
/** Graph Data structure */
|
||
struct Node
|
||
: util::MoveOnly
|
||
{
|
||
using _Arr = std::array<Node*, maxFan>;
|
||
using Iter = typename _Arr::iterator;
|
||
using CIter = typename _Arr::const_iterator;
|
||
|
||
/** Table with connections to other Node records */
|
||
struct Tab : _Arr
|
||
{
|
||
Iter after = _Arr::begin();
|
||
|
||
Iter end() { return after; }
|
||
CIter end() const{ return after; }
|
||
friend Iter end (Tab & tab){ return tab.end(); }
|
||
friend CIter end (Tab const& tab){ return tab.end(); }
|
||
|
||
Node* front() { return empty()? nullptr : _Arr::front(); }
|
||
Node* back() { return empty()? nullptr : *(after-1); }
|
||
|
||
void clear() { after = _Arr::begin(); } ///< @warning pointer data in array not cleared
|
||
|
||
size_t size() const { return unConst(this)->end()-_Arr::begin(); }
|
||
bool empty() const { return 0 == size(); }
|
||
|
||
Iter
|
||
add(Node* n)
|
||
{
|
||
if (after != _Arr::end())
|
||
{
|
||
*after = n;
|
||
return after++;
|
||
}
|
||
NOTREACHED ("excess node linkage");
|
||
}
|
||
|
||
};
|
||
|
||
size_t hash;
|
||
size_t level{0}, weight{0};
|
||
Tab pred{0}, succ{0};
|
||
|
||
Node(size_t seed =0)
|
||
: hash{seed}
|
||
{ }
|
||
|
||
void
|
||
clear()
|
||
{
|
||
hash = 0;
|
||
level = weight = 0;
|
||
pred.clear();
|
||
succ.clear();
|
||
}
|
||
|
||
Node&
|
||
addPred (Node* other)
|
||
{
|
||
REQUIRE (other);
|
||
pred.add (other);
|
||
other->succ.add (this);
|
||
return *this;
|
||
}
|
||
|
||
Node&
|
||
addSucc (Node* other)
|
||
{
|
||
REQUIRE (other);
|
||
succ.add (other);
|
||
other->pred.add (this);
|
||
return *this;
|
||
}
|
||
Node& addPred(Node& other) { return addPred(&other); }
|
||
Node& addSucc(Node& other) { return addSucc(&other); }
|
||
|
||
size_t
|
||
calculate()
|
||
{
|
||
for (Node* entry: pred)
|
||
lib::hash::combine(hash, entry->hash);
|
||
return hash;
|
||
}
|
||
|
||
friend bool isStart (Node const& n) { return isnil (n.pred); };
|
||
friend bool isExit (Node const& n) { return isnil (n.succ); };
|
||
friend bool isInner (Node const& n) { return not (isStart(n) or isExit(n)); }
|
||
friend bool isFork (Node const& n) { return 1 < n.succ.size(); }
|
||
friend bool isJoin (Node const& n) { return 1 < n.pred.size(); }
|
||
friend bool isLink (Node const& n) { return 1 == n.pred.size() and 1 == n.succ.size(); }
|
||
friend bool isKnot (Node const& n) { return isFork(n) and isJoin(n); }
|
||
|
||
|
||
friend bool isStart (Node const* n) { return n and isStart(*n); };
|
||
friend bool isExit (Node const* n) { return n and isExit (*n); };
|
||
friend bool isInner (Node const* n) { return n and isInner(*n); };
|
||
friend bool isFork (Node const* n) { return n and isFork (*n); };
|
||
friend bool isJoin (Node const* n) { return n and isJoin (*n); };
|
||
friend bool isLink (Node const* n) { return n and isLink (*n); };
|
||
friend bool isKnot (Node const* n) { return n and isKnot (*n); };
|
||
};
|
||
|
||
|
||
/** link Node.hash to random parameter generation */
|
||
class NodeControlBinding;
|
||
|
||
/** Parameter values limited [0 .. maxFan] */
|
||
using Param = lib::Limited<size_t, maxFan>;
|
||
|
||
/** Topology is governed by rules for random params */
|
||
using Rule = lib::RandomDraw<NodeControlBinding>;
|
||
|
||
private:
|
||
using NodeTab = typename Node::Tab;
|
||
using NodeIT = lib::RangeIter<Node*>;
|
||
|
||
std::unique_ptr<Node[]> nodes_;
|
||
size_t numNodes_;
|
||
|
||
Rule seedingRule_ {};
|
||
Rule expansionRule_{};
|
||
Rule reductionRule_{};
|
||
Rule pruningRule_ {};
|
||
Rule weightRule_ {};
|
||
|
||
Node* frontNode() { return &nodes_[0]; }
|
||
Node* afterNode() { return &nodes_[numNodes_]; }
|
||
Node* backNode() { return &nodes_[numNodes_-1];}
|
||
|
||
public:
|
||
explicit
|
||
TestChainLoad(size_t nodeCnt =DEFAULT_SIZ)
|
||
: nodes_{new Node[nodeCnt]}
|
||
, numNodes_{nodeCnt}
|
||
{
|
||
REQUIRE (1 < nodeCnt);
|
||
}
|
||
|
||
|
||
size_t size() const { return numNodes_; }
|
||
size_t topLevel() const { return unConst(this)->backNode()->level; }
|
||
size_t getSeed() const { return unConst(this)->frontNode()->hash; }
|
||
|
||
|
||
auto
|
||
allNodes()
|
||
{
|
||
return lib::explore (NodeIT{frontNode(),afterNode()});
|
||
}
|
||
auto
|
||
allNodePtr()
|
||
{
|
||
return allNodes().asPtr();
|
||
}
|
||
|
||
auto
|
||
allExitNodes()
|
||
{
|
||
return allNodes().filter([](Node& n){ return isExit(n); });
|
||
}
|
||
auto
|
||
allExitHashes() const
|
||
{
|
||
return unConst(this)->allExitNodes().transform([](Node& n){ return n.hash; });
|
||
}
|
||
|
||
/** global hash is the combination of all exit node hashes != 0 */
|
||
size_t
|
||
getHash() const
|
||
{
|
||
auto combineHashes = [](size_t h, size_t hx){ lib::hash::combine(h,hx); return h;};
|
||
return allExitHashes()
|
||
.filter([](size_t h){ return h != 0; })
|
||
.reduce(lib::iter_explorer::IDENTITY
|
||
,combineHashes
|
||
);
|
||
}
|
||
|
||
|
||
/** @return the node's index number, based on its storage location */
|
||
size_t nodeID(Node const* n){ return n - frontNode(); };
|
||
size_t nodeID(Node const& n){ return nodeID (&n); };
|
||
|
||
|
||
|
||
/* ===== topology control ===== */
|
||
|
||
TestChainLoad&&
|
||
seedingRule (Rule r)
|
||
{
|
||
seedingRule_ = move(r);
|
||
return move(*this);
|
||
}
|
||
|
||
TestChainLoad&&
|
||
expansionRule (Rule r)
|
||
{
|
||
expansionRule_ = move(r);
|
||
return move(*this);
|
||
}
|
||
|
||
TestChainLoad&&
|
||
reductionRule (Rule r)
|
||
{
|
||
reductionRule_ = move(r);
|
||
return move(*this);
|
||
}
|
||
|
||
TestChainLoad&&
|
||
pruningRule (Rule r)
|
||
{
|
||
pruningRule_ = move(r);
|
||
return move(*this);
|
||
}
|
||
|
||
TestChainLoad&&
|
||
weightRule (Rule r)
|
||
{
|
||
weightRule_ = move(r);
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
/** Abbreviation for starting rules */
|
||
static Rule rule() { return Rule(); }
|
||
static Rule value(size_t v) { return Rule().fixedVal(v); }
|
||
|
||
static Rule
|
||
rule_atStart (uint v)
|
||
{
|
||
return Rule().mapping([v](Node* n)
|
||
{
|
||
return isStart(n)? Rule().fixedVal(v)
|
||
: Rule();
|
||
});
|
||
}
|
||
|
||
static Rule
|
||
rule_atJoin (uint v)
|
||
{
|
||
return Rule().mapping([v](Node* n)
|
||
{
|
||
return isJoin(n) ? Rule().fixedVal(v)
|
||
: Rule();
|
||
});
|
||
}
|
||
|
||
static Rule
|
||
rule_atLink (uint v)
|
||
{
|
||
return Rule().mapping([v](Node* n)
|
||
{ // NOTE: when applying these rules,
|
||
// successors are not yet wired...
|
||
return not (isJoin(n) or isStart(n))
|
||
? Rule().fixedVal(v)
|
||
: Rule();
|
||
});
|
||
}
|
||
|
||
static Rule
|
||
rule_atJoin_else (double p1, double p2, uint v=1)
|
||
{
|
||
return Rule().mapping([p1,p2,v](Node* n)
|
||
{
|
||
return isJoin(n) ? Rule().probability(p1).maxVal(v)
|
||
: Rule().probability(p2).maxVal(v);
|
||
});
|
||
}
|
||
|
||
|
||
/** preconfigured topology: only unconnected seed/exit nodes */
|
||
TestChainLoad&&
|
||
configure_isolated_nodes()
|
||
{
|
||
pruningRule(value(1));
|
||
weightRule(value(1));
|
||
return move(*this);
|
||
}
|
||
|
||
/** preconfigured topology: isolated simple 2-step chains */
|
||
TestChainLoad&&
|
||
configureShape_short_chains2()
|
||
{
|
||
pruningRule(rule().probability(0.8));
|
||
weightRule(value(1));
|
||
return move(*this);
|
||
}
|
||
|
||
/** preconfigured topology: simple 3-step chains, starting interleaved */
|
||
TestChainLoad&&
|
||
configureShape_short_chains3_interleaved()
|
||
{
|
||
pruningRule(rule().probability(0.6));
|
||
seedingRule(rule_atStart(1));
|
||
weightRule(value(1));
|
||
return move(*this);
|
||
}
|
||
|
||
/** preconfigured topology: simple interwoven 3-step graph segments */
|
||
TestChainLoad&&
|
||
configureShape_short_segments3_interleaved()
|
||
{
|
||
seedingRule(rule().probability(0.8).maxVal(1));
|
||
reductionRule(rule().probability(0.75).maxVal(3));
|
||
pruningRule(rule_atJoin(1));
|
||
weightRule(value(1));
|
||
return move(*this);
|
||
}
|
||
|
||
/** preconfigured topology: single graph with massive »load bursts«
|
||
* @see TestChainLoad_test::showcase_StablePattern() */
|
||
TestChainLoad&&
|
||
configureShape_chain_loadBursts()
|
||
{
|
||
expansionRule(rule().probability(0.27).maxVal(4));
|
||
reductionRule(rule().probability(0.44).maxVal(6).minVal(2));
|
||
weightRule (rule().probability(0.66).maxVal(3));
|
||
setSeed(55); // ◁─────── produces a prelude with parallel chains,
|
||
return move(*this);// then fork at level 17 followed by bursts of load.
|
||
}
|
||
|
||
|
||
|
||
/**
|
||
* Use current configuration and seed to (re)build Node connectivity.
|
||
* While working in-place, the wiring and thus the resulting hash values
|
||
* are completely rewritten, progressing from start and controlled by
|
||
* evaluating the _drawing rules_ on the current node, computing its hash.
|
||
*/
|
||
TestChainLoad&&
|
||
buildTopology()
|
||
{
|
||
NodeTab a,b, // working data for generation
|
||
*curr{&a}, // the current set of nodes to carry on
|
||
*next{&b}; // the next set of nodes connected to current
|
||
Node* node = frontNode();
|
||
size_t level{0};
|
||
|
||
// transient snapshot of rules (non-copyable, once engaged)
|
||
Transiently originalExpansionRule{expansionRule_};
|
||
Transiently originalReductionRule{reductionRule_};
|
||
Transiently originalseedingRule {seedingRule_};
|
||
Transiently originalPruningRule {pruningRule_};
|
||
Transiently originalWeightRule {weightRule_};
|
||
|
||
// prepare building blocks for the topology generation...
|
||
auto moreNext = [&]{ return next->size() < maxFan; };
|
||
auto moreNodes = [&]{ return node <= backNode(); };
|
||
auto spaceLeft = [&]{ return moreNext() and moreNodes(); };
|
||
auto addNode = [&](size_t seed =0)
|
||
{
|
||
Node* n = *next->add (node++);
|
||
n->clear();
|
||
n->level = level;
|
||
n->hash = seed;
|
||
return n;
|
||
};
|
||
auto apply = [&](Rule& rule, Node* n)
|
||
{
|
||
return rule(n);
|
||
};
|
||
auto calcNode = [&](Node* n)
|
||
{
|
||
n->calculate();
|
||
n->weight = apply(weightRule_,n);
|
||
};
|
||
|
||
// visit all further nodes and establish links
|
||
while (moreNodes())
|
||
{
|
||
curr->clear();
|
||
swap (next, curr);
|
||
size_t toReduce{0};
|
||
Node* r = nullptr;
|
||
REQUIRE (spaceLeft());
|
||
for (Node* o : *curr)
|
||
{ // follow-up on all Nodes in current level...
|
||
calcNode(o);
|
||
if (apply (pruningRule_,o))
|
||
continue; // discontinue
|
||
size_t toSeed = apply (seedingRule_, o);
|
||
size_t toExpand = apply (expansionRule_,o);
|
||
while (0 < toSeed and spaceLeft())
|
||
{ // start a new chain from seed
|
||
addNode(this->getSeed());
|
||
--toSeed;
|
||
}
|
||
while (0 < toExpand and spaceLeft())
|
||
{ // fork out secondary chain from o
|
||
Node* n = addNode();
|
||
o->addSucc(n);
|
||
--toExpand;
|
||
}
|
||
if (not toReduce)
|
||
{ // carry-on chain from o
|
||
r = spaceLeft()? addNode():nullptr;
|
||
toReduce = apply (reductionRule_, o);
|
||
}
|
||
else
|
||
--toReduce;
|
||
if (r) // connect chain from o...
|
||
r->addPred(o);
|
||
else // space for successors is already exhausted
|
||
{ // can not carry-on, but must ensure no chain is broken
|
||
ENSURE (not next->empty());
|
||
if (o->succ.empty())
|
||
o->addSucc (next->back());
|
||
}
|
||
}
|
||
ENSURE (not isnil(next) or spaceLeft());
|
||
if (isnil(next)) // ensure graph continues
|
||
addNode(this->getSeed());
|
||
ENSURE (not next->empty());
|
||
++level;
|
||
}
|
||
ENSURE (node > backNode());
|
||
// all open nodes on last level become exit nodes
|
||
for (Node* o : *next)
|
||
calcNode(o);
|
||
//
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
/**
|
||
* Set the overall seed value.
|
||
* @note does not propagate seed to consecutive start nodes
|
||
*/
|
||
TestChainLoad&&
|
||
setSeed (size_t seed = rani())
|
||
{
|
||
frontNode()->hash = seed;
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
/**
|
||
* Set a fixed weight for all nodes
|
||
* @note no change to topology (works even without any topology)
|
||
*/
|
||
TestChainLoad&&
|
||
setWeight (size_t fixedNodeWeight =1)
|
||
{
|
||
for (Node& n : allNodes())
|
||
n.weight = fixedNodeWeight;
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
/**
|
||
* Recalculate all node hashes and propagate seed value.
|
||
*/
|
||
TestChainLoad&&
|
||
recalculate()
|
||
{
|
||
size_t seed = this->getSeed();
|
||
for (Node& n : allNodes())
|
||
{
|
||
n.hash = isStart(n)? seed : 0;
|
||
n.calculate();
|
||
}
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
/**
|
||
* Clear node hashes and propagate seed value.
|
||
*/
|
||
TestChainLoad&&
|
||
clearNodeHashes()
|
||
{
|
||
size_t seed = this->getSeed();
|
||
for (Node& n : allNodes())
|
||
n.hash = isStart(n)? seed : 0;
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
|
||
/* ===== Operators ===== */
|
||
|
||
std::string
|
||
generateTopologyDOT()
|
||
{
|
||
using namespace dot;
|
||
|
||
Section nodes("Nodes");
|
||
Section layers("Layers");
|
||
Section topology("Topology");
|
||
|
||
// Styles to distinguish the computation nodes
|
||
Code BOTTOM{"shape=doublecircle"};
|
||
Code SEED {"shape=circle"};
|
||
Code TOP {"shape=box, style=rounded"};
|
||
Code DEFAULT{};
|
||
|
||
// prepare time-level zero
|
||
size_t level(0);
|
||
auto timeLevel = scope(level).rank("min ");
|
||
|
||
for (Node& n : allNodes())
|
||
{
|
||
size_t i = nodeID(n);
|
||
string tag{toString(i)+": "+showHashLSB(n.hash)};
|
||
if (n.weight) tag +="."+toString(n.weight);
|
||
nodes += node(i).label(tag)
|
||
.style(i==0 ? BOTTOM
|
||
:isnil(n.pred)? SEED
|
||
:isnil(n.succ)? TOP
|
||
: DEFAULT);
|
||
for (Node* suc : n.succ)
|
||
topology += connect (i, nodeID(*suc));
|
||
|
||
if (level != n.level)
|
||
{// switch to next time-level
|
||
layers += timeLevel;
|
||
++level;
|
||
ENSURE (level == n.level);
|
||
timeLevel = scope(level).rank("same");
|
||
}
|
||
timeLevel.add (node(i));
|
||
}
|
||
layers += timeLevel; // close last layer
|
||
|
||
// combine and render collected definitions as DOT-code
|
||
return digraph (nodes, layers, topology);
|
||
}
|
||
|
||
TestChainLoad&&
|
||
printTopologyDOT()
|
||
{
|
||
cout << "───═══───═══───═══───═══───═══───═══───═══───═══───═══───═══───\n"
|
||
<< generateTopologyDOT()
|
||
<< "───═══───═══───═══───═══───═══───═══───═══───═══───═══───═══───"
|
||
<< endl;
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
/**
|
||
* Conduct a number of benchmark runs over processing the Graph synchronously.
|
||
* @return runtime time in microseconds
|
||
* @remark can be used as reference point to judge Scheduler performance;
|
||
* - additional parallelisation could be exploited: ∅w / floor(∅w/concurrency)
|
||
* - but the Scheduler also adds overhead and dispatch leeway
|
||
*/
|
||
double
|
||
calcRuntimeReference(microseconds timeBase =LOAD_DEFAULT_TIME
|
||
,size_t sizeBase =0
|
||
,size_t repeatCnt=GRAPH_BENCHMARK_RUNS
|
||
)
|
||
{
|
||
return microBenchmark ([&]{ performGraphSynchronously(timeBase,sizeBase); }
|
||
,repeatCnt)
|
||
.first; // ∅ runtime in µs
|
||
}
|
||
|
||
/** Emulate complete graph processing in a single threaded loop. */
|
||
TestChainLoad&& performGraphSynchronously(microseconds timeBase =LOAD_DEFAULT_TIME
|
||
,size_t sizeBase =0);
|
||
|
||
TestChainLoad&&
|
||
printRuntimeReference(microseconds timeBase =LOAD_DEFAULT_TIME
|
||
,size_t sizeBase =0
|
||
,size_t repeatCnt=GRAPH_BENCHMARK_RUNS
|
||
)
|
||
{
|
||
cout << _Fmt{"runtime ∅(%d) = %6.2fms (single-threaded)\n"}
|
||
% repeatCnt
|
||
% (1e-3 * calcRuntimeReference(timeBase,sizeBase,repeatCnt))
|
||
<< "───═══───═══───═══───═══───═══───═══───═══───═══───═══───═══───"
|
||
<< endl;
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
/** overall sum of configured node weights **/
|
||
size_t
|
||
calcWeightSum()
|
||
{
|
||
return allNodes()
|
||
.transform([](Node& n){ return n.weight; })
|
||
.resultSum();
|
||
}
|
||
|
||
/** calculate node weights aggregated per level */
|
||
auto
|
||
allLevelWeights()
|
||
{
|
||
return allNodes()
|
||
.groupedBy([](Node& n){ return n.level; }
|
||
,[this](LevelWeight& lw, Node const& n)
|
||
{
|
||
lw.level = n.level;
|
||
lw.weight += n.weight;
|
||
lw.endidx = nodeID(n);
|
||
++lw.nodes;
|
||
}
|
||
);
|
||
}
|
||
|
||
/** sequence of the summed compounded weight factors _after_ each level */
|
||
auto
|
||
levelScheduleSequence (uint concurrency =1)
|
||
{
|
||
return allLevelWeights()
|
||
.transform([schedule=0.0, concurrency]
|
||
(LevelWeight const& lw) mutable
|
||
{
|
||
schedule += computeWeightFactor (lw, concurrency);
|
||
return schedule;
|
||
});
|
||
}
|
||
|
||
/** calculate the simplified/theoretic reduction of compounded weight through concurrency */
|
||
double
|
||
calcExpectedCompoundedWeight (uint concurrency =1)
|
||
{
|
||
return allLevelWeights()
|
||
.transform([concurrency](LevelWeight const& lw){ return computeWeightFactor (lw, concurrency); })
|
||
.resultSum();
|
||
}
|
||
|
||
|
||
|
||
Statistic computeGraphStatistics();
|
||
TestChainLoad&& printTopologyStatistics();
|
||
|
||
class ScheduleCtx;
|
||
friend class ScheduleCtx; // accesses raw storage array
|
||
|
||
ScheduleCtx setupSchedule (Scheduler& scheduler);
|
||
};
|
||
|
||
|
||
|
||
/**
|
||
* Policy/Binding for generation of [random parameters](\ref TestChainLoad::Param)
|
||
* by [»drawing«](\ref random-draw.hpp) based on the [node-hash](\ref TestChainLoad::Node).
|
||
* Notably this policy template maps the ways to spell out [»Ctrl rules«](\ref TestChainLoad::Rule)
|
||
* to configure the probability profile of the topology parameters _seeding, expansion, reduction
|
||
* and pruning._ The RandomDraw component used to implement those rules provides a builder-DSL
|
||
* and accepts λ-bindings in various forms to influence mapping of Node hash into result parameters.
|
||
*/
|
||
template<size_t maxFan>
|
||
class TestChainLoad<maxFan>::NodeControlBinding
|
||
: public std::function<Param(Node*)>
|
||
{
|
||
protected:
|
||
/** by default use Node-hash directly as source of randomness */
|
||
static size_t
|
||
defaultSrc (Node* node)
|
||
{
|
||
return node? node->hash:0;
|
||
}
|
||
|
||
static size_t
|
||
level (Node* node)
|
||
{
|
||
return node? node->level:0;
|
||
}
|
||
|
||
static double
|
||
guessHeight (size_t level)
|
||
{ // heuristic guess for a »fully stable state«
|
||
double expectedHeight = 2*maxFan;
|
||
return level / expectedHeight;
|
||
}
|
||
|
||
|
||
|
||
/** Adaptor to handle further mapping functions */
|
||
template<class SIG>
|
||
struct Adaptor
|
||
{
|
||
static_assert (not sizeof(SIG), "Unable to adapt given functor.");
|
||
};
|
||
|
||
/** allow simple rules directly manipulating the hash value */
|
||
template<typename RES>
|
||
struct Adaptor<RES(size_t)>
|
||
{
|
||
template<typename FUN>
|
||
static auto
|
||
build (FUN&& fun)
|
||
{
|
||
return [functor=std::forward<FUN>(fun)]
|
||
(Node* node) -> _FunRet<FUN>
|
||
{
|
||
return functor (defaultSrc (node));
|
||
};
|
||
}
|
||
};
|
||
|
||
/** allow rules additionally involving the height of the graph,
|
||
* which also represents time. 1.0 refers to _stable state generation,_
|
||
* guessed as height Level ≡ 2·maxFan . */
|
||
template<typename RES>
|
||
struct Adaptor<RES(size_t,double)>
|
||
{
|
||
|
||
template<typename FUN>
|
||
static auto
|
||
build (FUN&& fun)
|
||
{
|
||
return [functor=std::forward<FUN>(fun)]
|
||
(Node* node) -> _FunRet<FUN>
|
||
{
|
||
return functor (defaultSrc (node)
|
||
,guessHeight(level(node)));
|
||
};
|
||
}
|
||
};
|
||
|
||
/** rules may also build solely on the (guessed) height. */
|
||
template<typename RES>
|
||
struct Adaptor<RES(double)>
|
||
{
|
||
|
||
template<typename FUN>
|
||
static auto
|
||
build (FUN&& fun)
|
||
{
|
||
return [functor=std::forward<FUN>(fun)]
|
||
(Node* node) -> _FunRet<FUN>
|
||
{
|
||
return functor (guessHeight(level(node)));
|
||
};
|
||
}
|
||
};
|
||
};
|
||
|
||
|
||
|
||
|
||
/* ========= Graph Statistics Evaluation ========= */
|
||
|
||
struct StatKey
|
||
: std::pair<size_t,string>
|
||
{
|
||
using std::pair<size_t,string>::pair;
|
||
operator size_t const&() const { return this->first; }
|
||
operator string const&() const { return this->second;}
|
||
};
|
||
const StatKey STAT_NODE{0,"node"}; ///< all nodes
|
||
const StatKey STAT_SEED{1,"seed"}; ///< seed node
|
||
const StatKey STAT_EXIT{2,"exit"}; ///< exit node
|
||
const StatKey STAT_INNR{3,"innr"}; ///< inner node
|
||
const StatKey STAT_FORK{4,"fork"}; ///< forking node
|
||
const StatKey STAT_JOIN{5,"join"}; ///< joining node
|
||
const StatKey STAT_LINK{6,"link"}; ///< 1:1 linking node
|
||
const StatKey STAT_KNOT{7,"knot"}; ///< knot (joins and forks)
|
||
const StatKey STAT_WGHT{8,"wght"}; ///< node weight
|
||
|
||
const std::array KEYS = {STAT_NODE,STAT_SEED,STAT_EXIT,STAT_INNR,STAT_FORK,STAT_JOIN,STAT_LINK,STAT_KNOT,STAT_WGHT};
|
||
const uint CAT = KEYS.size();
|
||
const uint IDX_SEED = 1; // index of STAT_SEED
|
||
|
||
namespace {
|
||
template<class NOD>
|
||
inline auto
|
||
prepareEvaluations()
|
||
{
|
||
return std::array<std::function<uint(NOD&)>, CAT>
|
||
{ [](NOD& ){ return 1; }
|
||
, [](NOD& n){ return isStart(n);}
|
||
, [](NOD& n){ return isExit(n); }
|
||
, [](NOD& n){ return isInner(n);}
|
||
, [](NOD& n){ return isFork(n); }
|
||
, [](NOD& n){ return isJoin(n); }
|
||
, [](NOD& n){ return isLink(n); }
|
||
, [](NOD& n){ return isKnot(n); }
|
||
, [](NOD& n){ return n.weight; }
|
||
};
|
||
}
|
||
}
|
||
|
||
using VecU = std::vector<uint>;
|
||
using LevelSums = std::array<uint, CAT>;
|
||
|
||
/**
|
||
* Distribution indicators for one kind of evaluation.
|
||
* Evaluations over the kind of node are collected per (time)level.
|
||
* This data is then counted, averaged and weighted.
|
||
*/
|
||
struct Indicator
|
||
{
|
||
VecU data{};
|
||
uint cnt{0}; ///< global sum over all levels
|
||
double frac{0}; ///< fraction of all nodes
|
||
double pS {0}; ///< average per segment
|
||
double pL {0}; ///< average per level
|
||
double pLW{0}; ///< average per level and level-width
|
||
double cL {0}; ///< weight centre level for this indicator
|
||
double cLW{0}; ///< weight centre level width-reduced
|
||
double sL {0}; ///< weight centre on subgraph
|
||
double sLW{0}; ///< weight centre on subgraph width-reduced
|
||
|
||
void
|
||
addPoint (uint levelID, uint sublevelID, uint width, uint items)
|
||
{
|
||
REQUIRE (levelID == data.size()); // ID is zero based
|
||
REQUIRE (width > 0);
|
||
data.push_back (items);
|
||
cnt += items;
|
||
pS += items;
|
||
pL += items;
|
||
pLW += items / double(width);
|
||
cL += levelID * items;
|
||
cLW += levelID * items/double(width);
|
||
sL += sublevelID * items;
|
||
sLW += sublevelID * items/double(width);
|
||
}
|
||
|
||
void
|
||
closeAverages (uint nodes, uint levels, uint segments, double avgheight)
|
||
{
|
||
REQUIRE (levels == data.size());
|
||
REQUIRE (levels > 0);
|
||
frac = cnt / double(nodes);
|
||
cL = pL? cL/pL :0; // weighted averages: normalise to weight sum
|
||
cLW = pLW? cLW/pLW :0;
|
||
sL = pL? sL/pL :0;
|
||
sLW = pLW? sLW/pLW :0;
|
||
pS /= segments; // simple averages : normalise to number of levels
|
||
pL /= levels; // simple averages : normalise to number of levels
|
||
pLW /= levels;
|
||
cL /= levels-1; // weight centres : as fraction of maximum level-ID
|
||
cLW /= levels-1;
|
||
ASSERT (avgheight >= 1.0);
|
||
if (avgheight > 1.0)
|
||
{ // likewise for weight centres relative to subgraph
|
||
sL /= avgheight-1; // height is 1-based, while the contribution was 0-based
|
||
sLW /= avgheight-1;
|
||
}
|
||
else
|
||
sL = sLW = 0.5;
|
||
}
|
||
};
|
||
|
||
/**
|
||
* Statistic data calculated for a given chain-load topology
|
||
*/
|
||
struct Statistic
|
||
{
|
||
uint nodes{0};
|
||
uint levels{0};
|
||
uint segments{1};
|
||
uint maxheight{0};
|
||
double avgheight{0};
|
||
VecU width{};
|
||
VecU sublevel{};
|
||
|
||
std::array<Indicator, CAT> indicators;
|
||
|
||
explicit
|
||
Statistic (uint lvls)
|
||
: nodes{0}
|
||
, levels{0}
|
||
{
|
||
reserve (lvls);
|
||
}
|
||
|
||
void
|
||
addPoint (uint levelWidth, uint sublevelID, LevelSums& particulars)
|
||
{
|
||
levels += 1;
|
||
nodes += levelWidth;
|
||
width.push_back (levelWidth);
|
||
sublevel.push_back (sublevelID);
|
||
ASSERT (levels == width.size());
|
||
ASSERT (0 < levels);
|
||
ASSERT (0 < levelWidth);
|
||
for (uint i=0; i< CAT; ++i)
|
||
indicators[i].addPoint (levels-1, sublevelID, levelWidth, particulars[i]);
|
||
}
|
||
|
||
void
|
||
closeAverages (uint segs, uint maxSublevelID)
|
||
{
|
||
segments = segs;
|
||
maxheight = maxSublevelID + 1;
|
||
avgheight = levels / double(segments);
|
||
for (uint i=0; i< CAT; ++i)
|
||
indicators[i].closeAverages (nodes,levels,segments,avgheight);
|
||
}
|
||
|
||
private:
|
||
void
|
||
reserve (uint lvls)
|
||
{
|
||
width.reserve (lvls);
|
||
sublevel.reserve(lvls);
|
||
for (uint i=0; i< CAT; ++i)
|
||
{
|
||
indicators[i] = Indicator{};
|
||
indicators[i].data.reserve(lvls);
|
||
}
|
||
}
|
||
};
|
||
|
||
|
||
|
||
/**
|
||
* Operator on TestChainLoad to evaluate current graph connectivity.
|
||
* In a pass over the internal storage, all nodes are classified
|
||
* and accounted into a set of categories, thereby evaluating
|
||
* - the overall number of nodes and levels generated
|
||
* - the number of nodes in each level (termed _level width_)
|
||
* - the fraction of overall nodes falling into each category
|
||
* - the average number of category members over the levels
|
||
* - the density of members, normalised over level width
|
||
* - the weight centre of this category members
|
||
* - the weight centre of according to density
|
||
*/
|
||
template<size_t maxFan>
|
||
inline Statistic
|
||
TestChainLoad<maxFan>::computeGraphStatistics()
|
||
{
|
||
auto totalLevels = uint(topLevel());
|
||
auto classify = prepareEvaluations<Node>();
|
||
Statistic stat(totalLevels);
|
||
LevelSums particulars{0};
|
||
size_t level{0},
|
||
sublevel{0},
|
||
maxsublevel{0};
|
||
size_t segs{0};
|
||
uint width{0};
|
||
auto detectSubgraphs = [&]{ // to be called when a level is complete
|
||
if (width==1 and particulars[IDX_SEED]==1)
|
||
{ // previous level actually started new subgraph
|
||
sublevel = 0;
|
||
++segs;
|
||
}
|
||
else
|
||
maxsublevel = max (sublevel,maxsublevel);
|
||
};
|
||
|
||
for (Node& node : allNodes())
|
||
{
|
||
if (level != node.level)
|
||
{// Level completed...
|
||
detectSubgraphs();
|
||
// record statistics for previous level
|
||
stat.addPoint (width, sublevel, particulars);
|
||
// switch to next time-level
|
||
++level;
|
||
++sublevel;
|
||
ENSURE (level == node.level);
|
||
particulars = LevelSums{0};
|
||
width = 0;
|
||
}
|
||
// classify and account..
|
||
++width;
|
||
for (uint i=0; i<CAT; ++i)
|
||
particulars[i] += classify[i](node);
|
||
}
|
||
ENSURE (level == topLevel());
|
||
detectSubgraphs();
|
||
stat.addPoint (width, sublevel, particulars);
|
||
stat.closeAverages (segs, maxsublevel);
|
||
return stat;
|
||
}
|
||
|
||
|
||
|
||
/**
|
||
* Print a tabular summary of graph characteristics
|
||
* @remark explanation of indicators
|
||
* - »node« : accounting for all nodes
|
||
* - »seed« : seed nodes start a new subgraph or side chain
|
||
* - »exit« : exit nodes produce output and have no successor
|
||
* - »innr« : inner nodes have both predecessors and successors
|
||
* - »fork« : a node linked to more than one successor
|
||
* - »join« : a node consuming data from more than one predecessor
|
||
* - »link« : a node in a linear processing chain; one input, one output
|
||
* - »LEVL« : the overall number of distinct _time levels_ in the graph
|
||
* - »SEGS« : the number of completely disjoint partial subgraphs
|
||
* - »knot« : a node which both joins data and forks out to multiple successors
|
||
* - `frac` : the percentage of overall nodes falling into this category
|
||
* - `∅pS` : averaged per Segment (warning: see below)
|
||
* - `∅pL` : averaged per Level
|
||
* - `∅pLW` : count normalised to the width at that level and then averaged per Level
|
||
* - `γL◆` : weight centre of this kind of node, relative to the overall graph
|
||
* - `γLW◆` : the same, but using the level-width-normalised value
|
||
* - `γL⬙` : weight centre, but relative to the current subgraph or segment
|
||
* - `γLW⬙` : same but using level-width-normalised value
|
||
* Together, these values indicates how the simulated processing load
|
||
* is structured over time, assuming that the _»Levels« are processed consecutively_
|
||
* in temporal order. The graph can unfold or contract over time, and thus nodes can
|
||
* be clustered irregularly, which can be seen from the weight centres; for that
|
||
* reason, the width-normalised variants of the indicators are also accounted for,
|
||
* since a wider graph also implies that there are more nodes of each kind per level,
|
||
* even while the actual density of this kind did not increase.
|
||
* @warning no comprehensive connectivity analysis is performed, and thus there is
|
||
* *no reliable indication of subgraphs*. The `SEGS` statistics may be misleading,
|
||
* since these count only completely severed and restarted graphs.
|
||
*/
|
||
template<size_t maxFan>
|
||
inline TestChainLoad<maxFan>&&
|
||
TestChainLoad<maxFan>::printTopologyStatistics()
|
||
{
|
||
cout << "INDI: cnt frac ∅pS ∅pL ∅pLW γL◆ γLW◆ γL⬙ γLW⬙\n";
|
||
_Fmt line{"%4s: %3d %3.0f%% %5.1f %5.2f %4.2f %4.2f %4.2f %4.2f %4.2f\n"};
|
||
Statistic stat = computeGraphStatistics();
|
||
for (uint i=0; i< CAT; ++i)
|
||
{
|
||
Indicator& indi = stat.indicators[i];
|
||
cout << line % KEYS[i]
|
||
% indi.cnt
|
||
% (indi.frac*100)
|
||
% indi.pS
|
||
% indi.pL
|
||
% indi.pLW
|
||
% indi.cL
|
||
% indi.cLW
|
||
% indi.sL
|
||
% indi.sLW
|
||
;
|
||
}
|
||
cout << _Fmt{"LEVL: %3d\n"} % stat.levels;
|
||
cout << _Fmt{"SEGS: %3d h = ∅%3.1f / max.%2d\n"}
|
||
% stat.segments
|
||
% stat.avgheight
|
||
% stat.maxheight;
|
||
cout << "───═══───═══───═══───═══───═══───═══───═══───═══───═══───═══───"
|
||
<< endl;
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
|
||
|
||
|
||
/* ========= Configurable Computational Load ========= */
|
||
|
||
/**
|
||
* A calibratable CPU load to be invoked from a node job functor.
|
||
* Two distinct methods for load generation are provided
|
||
* - tight loop with arithmetics in register
|
||
* - repeatedly accessing and adding memory marked as `volatile`
|
||
* The `timeBase` multiplied with the given `scaleStep´ determines
|
||
* the actual run time. When using the _memory method_ (`useAllocation`),
|
||
* a heap block of `staleStep*sizeBase` is used, and the number of
|
||
* repetitions is chosen such as to match the given timing goal.
|
||
* @note since performance depends on the platform, it is mandatory
|
||
* to invoke the [self calibration](\ref ComputationalLoad::calibrate)
|
||
* at least once prior to use. Performing the calibration with default
|
||
* base settings is acceptable, since mostly the overall expense is
|
||
* growing linearly; obviously the calibration is more precisely
|
||
* however when using the actual `timeBase` and `sizeBase` of
|
||
* the intended usage. The calibration watches processing
|
||
* speed in a microbenchmark with `LOAD_BENCHMARK_RUNS`
|
||
* repetitions; the result is stored in a static
|
||
* variable and can thus be reused.
|
||
*/
|
||
class ComputationalLoad
|
||
: util::MoveAssign
|
||
{
|
||
using Sink = volatile size_t;
|
||
|
||
static double&
|
||
computationSpeed (bool mem) ///< in iterations/µs
|
||
{
|
||
static double cpuSpeed{LOAD_SPEED_BASELINE};
|
||
static double memSpeed{LOAD_SPEED_BASELINE};
|
||
return mem? memSpeed : cpuSpeed;
|
||
}
|
||
|
||
public:
|
||
microseconds timeBase = LOAD_DEFAULT_TIME;
|
||
size_t sizeBase = LOAD_DEFAULT_MEM_SIZE;
|
||
bool useAllocation = false;
|
||
|
||
/** cause a delay by computational load */
|
||
double
|
||
invoke (uint scaleStep =1)
|
||
{
|
||
if (scaleStep == 0 or timeBase < 1us)
|
||
return 0; // disabled
|
||
return useAllocation? benchmarkTime ([this,scaleStep]{ causeMemProcessLoad (scaleStep); })
|
||
: benchmarkTime ([this,scaleStep]{ causeComputationLoad(scaleStep); });
|
||
}
|
||
|
||
/** @return averaged runtime in current configuration */
|
||
double
|
||
benchmark (uint scaleStep =1)
|
||
{
|
||
return microBenchmark ([&]{ invoke(scaleStep);}
|
||
,LOAD_BENCHMARK_RUNS)
|
||
.first; // ∅ runtime in µs
|
||
}
|
||
|
||
void
|
||
calibrate()
|
||
{
|
||
TRANSIENTLY(useAllocation) = false;
|
||
performIncrementalCalibration();
|
||
useAllocation = true;
|
||
performIncrementalCalibration();
|
||
}
|
||
|
||
void
|
||
maybeCalibrate()
|
||
{
|
||
if (not isCalibrated())
|
||
calibrate();
|
||
}
|
||
|
||
bool
|
||
isCalibrated() const
|
||
{
|
||
return computationSpeed(false) != LOAD_SPEED_BASELINE;
|
||
}
|
||
|
||
private:
|
||
uint64_t
|
||
roundsNeeded (uint scaleStep)
|
||
{
|
||
auto desiredMicros = scaleStep*timeBase.count();
|
||
return uint64_t(desiredMicros*computationSpeed(useAllocation));
|
||
}
|
||
|
||
auto
|
||
allocNeeded (uint scaleStep)
|
||
{
|
||
auto cnt = roundsNeeded(scaleStep);
|
||
auto siz = max (scaleStep * sizeBase, 1u);
|
||
auto rep = max (cnt/siz, 1u);
|
||
// increase size to fit
|
||
siz = cnt / rep;
|
||
return make_pair (siz,rep);
|
||
}
|
||
|
||
void
|
||
causeComputationLoad (uint scaleStep)
|
||
{
|
||
auto round = roundsNeeded (scaleStep);
|
||
Sink sink;
|
||
size_t scree{sink};
|
||
for ( ; 0 < round; --round)
|
||
lib::hash::combine (scree,scree);
|
||
sink = scree;
|
||
sink = sink + 1;
|
||
}
|
||
|
||
void
|
||
causeMemProcessLoad (uint scaleStep)
|
||
{
|
||
auto [siz,round] = allocNeeded (scaleStep);
|
||
lib::UninitialisedDynBlock<size_t> memBlock{siz};
|
||
Sink sink;
|
||
*memBlock.front() = sink+1;
|
||
for ( ; 0 < round; --round)
|
||
for (size_t i=0; i<memBlock.size()-1; ++i)
|
||
memBlock[i+1] += memBlock[i];
|
||
sink = *memBlock.back();
|
||
sink = sink + 1;
|
||
}
|
||
|
||
double
|
||
determineSpeed()
|
||
{
|
||
uint step4gauge = 1;
|
||
double micros = benchmark (step4gauge);
|
||
auto stepsDone = roundsNeeded (step4gauge);
|
||
return stepsDone / micros;
|
||
}
|
||
|
||
void
|
||
performIncrementalCalibration()
|
||
{
|
||
double& speed = computationSpeed(useAllocation);
|
||
double prev{speed},delta;
|
||
do {
|
||
speed = determineSpeed();
|
||
delta = abs(1.0 - speed/prev);
|
||
prev = speed;
|
||
}
|
||
while (delta > 0.05);
|
||
}
|
||
};
|
||
|
||
|
||
/** a »throw-away« render-job */
|
||
inline SpecialJobFun
|
||
onetimeCrunch (milliseconds runTime)
|
||
{ // ensure calibration prior to use
|
||
ComputationalLoad().maybeCalibrate();
|
||
//
|
||
return SpecialJobFun{
|
||
[runTime](JobParameter) -> void
|
||
{
|
||
ComputationalLoad crunch;
|
||
crunch.timeBase = runTime;
|
||
crunch.invoke();
|
||
}};
|
||
}
|
||
|
||
|
||
|
||
|
||
/**
|
||
* @param timeBase time delay produced by ComputationalLoad at `Node.weight==1`;
|
||
* can be set to zero to disable the synthetic processing load on nodes
|
||
* @param sizeBase allocation base size used; also causes switch to memory-access based load
|
||
* @see TestChainLoad::calcRuntimeReference() for a benchmark based on this processing
|
||
*/
|
||
template<size_t maxFan>
|
||
TestChainLoad<maxFan>&&
|
||
TestChainLoad<maxFan>::performGraphSynchronously (microseconds timeBase, size_t sizeBase)
|
||
{
|
||
ComputationalLoad compuLoad;
|
||
compuLoad.timeBase = timeBase;
|
||
if (not sizeBase)
|
||
{
|
||
compuLoad.sizeBase = LOAD_DEFAULT_MEM_SIZE;
|
||
compuLoad.useAllocation =false;
|
||
}
|
||
else
|
||
{
|
||
compuLoad.sizeBase = sizeBase;
|
||
compuLoad.useAllocation =true;
|
||
}
|
||
compuLoad.maybeCalibrate();
|
||
|
||
size_t seed = this->getSeed();
|
||
for (Node& n : allNodes())
|
||
{
|
||
n.hash = isStart(n)? seed : 0;
|
||
if (n.weight)
|
||
compuLoad.invoke (n.weight);
|
||
n.calculate();
|
||
}
|
||
return move(*this);
|
||
}
|
||
|
||
|
||
|
||
|
||
|
||
/* ========= Render Job generation and Scheduling ========= */
|
||
|
||
/**
|
||
* Baseclass: JobFunctor to invoke TestChainLoad
|
||
*/
|
||
class ChainFunctor
|
||
: public JobClosure
|
||
{
|
||
|
||
static lib::time::Grid&
|
||
testGrid() ///< Meyer's Singleton : a fixed 1-f/s quantiser
|
||
{
|
||
static lib::time::FixedFrameQuantiser gridOne{FrameRate::STEP};
|
||
return gridOne;
|
||
}
|
||
|
||
/* === JobFunctor Interface === */
|
||
|
||
string diagnostic() const =0;
|
||
void invokeJobOperation (JobParameter) =0;
|
||
|
||
JobKind
|
||
getJobKind() const
|
||
{
|
||
return TEST_JOB;
|
||
}
|
||
|
||
InvocationInstanceID
|
||
buildInstanceID (HashVal) const override
|
||
{
|
||
return InvocationInstanceID();
|
||
}
|
||
|
||
size_t
|
||
hashOfInstance (InvocationInstanceID invoKey) const override
|
||
{
|
||
std::hash<size_t> hashr;
|
||
HashVal res = hashr (invoKey.frameNumber);
|
||
return res;
|
||
}
|
||
|
||
public:
|
||
|
||
/** package the node-index to invoke.
|
||
* @note per convention for this test, this info will be
|
||
* packaged into the lower word of the InvocationInstanceID
|
||
*/
|
||
static InvocationInstanceID
|
||
encodeNodeID (size_t idx)
|
||
{
|
||
InvocationInstanceID invoKey;
|
||
invoKey.code.w1 = idx;
|
||
return invoKey;
|
||
};
|
||
|
||
static size_t
|
||
decodeNodeID (InvocationInstanceID invoKey)
|
||
{
|
||
return size_t(invoKey.code.w1);
|
||
};
|
||
|
||
static Time
|
||
encodeLevel (size_t level)
|
||
{
|
||
return Time{testGrid().timeOf (FrameCnt(level))};
|
||
}
|
||
|
||
static size_t
|
||
decodeLevel (TimeValue nominalTime)
|
||
{
|
||
return testGrid().gridPoint (nominalTime);
|
||
}
|
||
};
|
||
|
||
|
||
|
||
/**
|
||
* Render JobFunctor to invoke the _calculation_ of a single Node.
|
||
* The existing Node connectivity is used to retrieve the hash values
|
||
* from predecessors — so these are expected to be calculated beforehand.
|
||
* For setup, the start of the ChainLoad's Node array is required.
|
||
* @tparam maxFan controls expected Node memory layout
|
||
*/
|
||
template<size_t maxFan>
|
||
class RandomChainCalcFunctor
|
||
: public ChainFunctor
|
||
{
|
||
using Node = typename TestChainLoad<maxFan>::Node;
|
||
using Watch = lib::IncidenceCount;
|
||
|
||
Node* startNode_;
|
||
ComputationalLoad* compuLoad_;
|
||
Watch* watch_;
|
||
|
||
public:
|
||
RandomChainCalcFunctor(Node& startNode, ComputationalLoad* load =nullptr, Watch* watch =nullptr)
|
||
: startNode_{&startNode}
|
||
, compuLoad_{load}
|
||
, watch_{watch}
|
||
{ }
|
||
|
||
|
||
/** render job invocation to trigger one Node recalculation */
|
||
void
|
||
invokeJobOperation (JobParameter param) override
|
||
{
|
||
if (watch_) watch_->markEnter();
|
||
size_t nodeIdx = decodeNodeID (param.invoKey);
|
||
size_t level = decodeLevel (TimeValue{param.nominalTime});
|
||
Node& target = startNode_[nodeIdx];
|
||
ASSERT (target.level == level);
|
||
// invoke the »media calculation«
|
||
if (compuLoad_ and target.weight)
|
||
compuLoad_->invoke (target.weight);
|
||
target.calculate();
|
||
if (watch_) watch_->markLeave();
|
||
}
|
||
|
||
string diagnostic() const override
|
||
{
|
||
return _Fmt{"ChainCalc(w:%d)◀%s"}
|
||
% maxFan
|
||
% util::showAdr(startNode_);
|
||
}
|
||
};
|
||
|
||
|
||
/**
|
||
* Render JobFunctor to perform chunk wise planning of Node jobs
|
||
* to calculate a complete Chain-Load graph step by step.
|
||
*/
|
||
template<size_t maxFan>
|
||
class RandomChainPlanFunctor
|
||
: public ChainFunctor
|
||
{
|
||
using Node = typename TestChainLoad<maxFan>::Node;
|
||
|
||
function<void(size_t,size_t)> scheduleCalcJob_;
|
||
function<void(Node*,Node*)> markDependency_;
|
||
function<void(size_t,size_t,size_t,bool)> continuation_;
|
||
|
||
size_t maxCnt_;
|
||
Node* nodes_;
|
||
|
||
size_t currIdx_{0}; // Note: this test-JobFunctor is statefull
|
||
|
||
public:
|
||
template<class CAL, class DEP, class CON>
|
||
RandomChainPlanFunctor(Node& nodeArray, size_t nodeCnt,
|
||
CAL&& schedule, DEP&& markDepend,
|
||
CON&& continuation)
|
||
: scheduleCalcJob_{forward<CAL> (schedule)}
|
||
, markDependency_{forward<DEP> (markDepend)}
|
||
, continuation_{forward<CON> (continuation)}
|
||
, maxCnt_{nodeCnt}
|
||
, nodes_{&nodeArray}
|
||
{ }
|
||
|
||
|
||
/** render job invocation to trigger one batch of scheduling;
|
||
* the installed callback-λ should actually place a job with
|
||
* RandomChainCalcFunctor for each node, and also inform the
|
||
* Scheduler about dependency relations between jobs. */
|
||
void
|
||
invokeJobOperation (JobParameter param) override
|
||
{
|
||
size_t start{currIdx_};
|
||
size_t reachedLevel{0};
|
||
size_t targetNodeIDX = decodeNodeID (param.invoKey);
|
||
for ( ; currIdx_<maxCnt_; ++currIdx_)
|
||
{
|
||
Node* n = &nodes_[currIdx_];
|
||
if (currIdx_ <= targetNodeIDX)
|
||
reachedLevel = n->level;
|
||
else // continue until end of current level
|
||
if (n->level > reachedLevel)
|
||
break;
|
||
scheduleCalcJob_(currIdx_, n->level);
|
||
for (Node* pred: n->pred)
|
||
markDependency_(pred,n);
|
||
}
|
||
ENSURE (currIdx_ > 0);
|
||
continuation_(start, currIdx_-1, reachedLevel, currIdx_ < maxCnt_);
|
||
}
|
||
|
||
|
||
string diagnostic() const override
|
||
{
|
||
return "ChainPlan";
|
||
}
|
||
};
|
||
|
||
|
||
|
||
|
||
/**
|
||
* Setup and wiring for a test run to schedule a computation structure
|
||
* as defined by this TestChainLoad instance. This context is linked to
|
||
* a concrete TestChainLoad and Scheduler instance and holds a memory block
|
||
* with actual schedules, which are dispatched in batches into the Scheduler.
|
||
* It is *crucial* to keep this object *alive during the complete test run*,
|
||
* which is achieved by a blocking wait on the callback triggered after
|
||
* dispatching the last batch of calculation jobs. This process itself
|
||
* is meant for test usage and not thread-safe (while obviously the
|
||
* actual scheduling and processing happens in the worker threads).
|
||
* Yet the instance can be re-used to dispatch further test runs.
|
||
*/
|
||
template<size_t maxFan>
|
||
class TestChainLoad<maxFan>::ScheduleCtx
|
||
: util::MoveOnly
|
||
{
|
||
TestChainLoad& chainLoad_;
|
||
Scheduler& scheduler_;
|
||
|
||
lib::UninitialisedDynBlock<ScheduleSpec> schedule_;
|
||
|
||
FrameRate levelSpeed_{1, SCHEDULE_LEVEL_STEP};
|
||
FrameRate planSpeed_{1, SCHEDULE_PLAN_STEP};
|
||
TimeVar nodeExpense_{SCHEDULE_NODE_STEP};
|
||
double stressFac_{1.0};
|
||
bool schedNotify_{SCHED_NOTIFY};
|
||
bool schedDepends_{SCHED_DEPENDS};
|
||
uint blockLoadFac_{2};
|
||
size_t chunkSize_{DEFAULT_CHUNKSIZE};
|
||
TimeVar startTime_{Time::ANYTIME};
|
||
microseconds deadline_{STANDARD_DEADLINE};
|
||
microseconds preRoll_{guessPlanningPreroll()};
|
||
ManifestationID manID_{};
|
||
|
||
std::vector<TimeVar> startTimes_{};
|
||
std::promise<void> signalDone_{};
|
||
|
||
std::unique_ptr<ComputationalLoad> compuLoad_;
|
||
std::unique_ptr<RandomChainCalcFunctor<maxFan>> calcFunctor_;
|
||
std::unique_ptr<RandomChainPlanFunctor<maxFan>> planFunctor_;
|
||
|
||
std::unique_ptr<lib::IncidenceCount> watchInvocations_;
|
||
|
||
|
||
/* ==== Callbacks from job planning ==== */
|
||
|
||
/** Callback: place a single job into the scheduler */
|
||
void
|
||
disposeStep (size_t idx, size_t level)
|
||
{
|
||
schedule_[idx] = scheduler_.defineSchedule(calcJob (idx,level))
|
||
.manifestation(manID_)
|
||
.startTime (jobStartTime(level, idx))
|
||
.lifeWindow (deadline_);
|
||
Node& n = chainLoad_.nodes_[idx];
|
||
if (isnil (n.pred)
|
||
or schedDepends_)
|
||
schedule_[idx].post();
|
||
}// Node with dependencies will be triggered by NOTIFY
|
||
// and thus must not necessarily be scheduled explicitly.
|
||
|
||
/** Callback: define a dependency between scheduled jobs */
|
||
void
|
||
setDependency (Node* pred, Node* succ)
|
||
{
|
||
size_t predIdx = chainLoad_.nodeID (pred);
|
||
size_t succIdx = chainLoad_.nodeID (succ);
|
||
bool unlimitedTime = not schedNotify_;
|
||
schedule_[predIdx].linkToSuccessor (schedule_[succIdx], unlimitedTime);
|
||
}
|
||
|
||
/** continue planning: schedule follow-up planning job */
|
||
void
|
||
continuation (size_t chunkStart, size_t lastNodeIDX, size_t levelDone, bool work_left)
|
||
{
|
||
if (work_left)
|
||
{
|
||
size_t nextChunkEndNode = calcNextChunkEnd (lastNodeIDX);
|
||
scheduler_.continueMetaJob (calcPlanScheduleTime (lastNodeIDX+1)
|
||
,planningJob (nextChunkEndNode)
|
||
,manID_);
|
||
}
|
||
else
|
||
{
|
||
auto wakeUp = scheduler_.defineSchedule(wakeUpJob())
|
||
.manifestation (manID_)
|
||
.startTime(jobStartTime (levelDone+1, lastNodeIDX+1) + SCHEDULE_WAKE_UP)
|
||
.lifeWindow(SAFETY_TIMEOUT)
|
||
.post();
|
||
for (size_t exitIDX : lastExitNodes (chunkStart))
|
||
wakeUp.linkToPredecessor (schedule_[exitIDX]);
|
||
} // Setup wait-dependency on last computations
|
||
}
|
||
|
||
|
||
std::future<void>
|
||
performRun()
|
||
{
|
||
auto finished = attachNewCompletionSignal();
|
||
size_t numNodes = chainLoad_.size();
|
||
size_t firstChunkEndNode = calcNextChunkEnd(0);
|
||
schedule_.allocate (numNodes);
|
||
compuLoad_->maybeCalibrate();
|
||
calcFunctor_.reset (new RandomChainCalcFunctor<maxFan>{chainLoad_.nodes_[0], compuLoad_.get(), watchInvocations_.get()});
|
||
planFunctor_.reset (new RandomChainPlanFunctor<maxFan>{chainLoad_.nodes_[0], chainLoad_.numNodes_
|
||
,[this](size_t i, size_t l){ disposeStep(i,l); }
|
||
,[this](auto* p, auto* s) { setDependency(p,s);}
|
||
,[this](size_t s,size_t n,size_t l, bool w)
|
||
{ continuation(s,n,l,w); }
|
||
});
|
||
startTime_ = anchorSchedule();
|
||
scheduler_.seedCalcStream (planningJob(firstChunkEndNode)
|
||
,manID_
|
||
,calcLoadHint());
|
||
return finished;
|
||
}
|
||
|
||
|
||
public:
|
||
ScheduleCtx (TestChainLoad& mother, Scheduler& scheduler)
|
||
: chainLoad_{mother}
|
||
, scheduler_{scheduler}
|
||
, compuLoad_{new ComputationalLoad}
|
||
, calcFunctor_{}
|
||
, planFunctor_{}
|
||
{ }
|
||
|
||
/** dispose one complete run of the graph into the scheduler
|
||
* @return observed runtime in µs
|
||
*/
|
||
double
|
||
launch_and_wait()
|
||
try {
|
||
return benchmarkTime ([this]
|
||
{
|
||
awaitBlocking(
|
||
performRun());
|
||
})
|
||
-_uSec(preRoll_); // timing accounted without pre-roll
|
||
}
|
||
ERROR_LOG_AND_RETHROW(test,"Scheduler testing")
|
||
|
||
auto
|
||
getScheduleSeq()
|
||
{
|
||
if (isnil (startTimes_))
|
||
fillDefaultSchedule();
|
||
|
||
return lib::explore(startTimes_)
|
||
.transform([&](Time jobTime) -> TimeVar
|
||
{
|
||
return jobTime - startTimes_.front();
|
||
});
|
||
}
|
||
|
||
double
|
||
getExpectedEndTime()
|
||
{
|
||
return _raw(startTimes_.back() - startTimes_.front()
|
||
+ Duration{nodeExpense_}*(chainLoad_.size()/stressFac_));
|
||
}
|
||
|
||
auto
|
||
getInvocationStatistic()
|
||
{
|
||
return watchInvocations_? watchInvocations_->evaluate()
|
||
: lib::IncidenceCount::Statistic{};
|
||
}
|
||
|
||
double
|
||
calcRuntimeReference()
|
||
{
|
||
microseconds timeBase = compuLoad_->timeBase;
|
||
size_t sizeBase = compuLoad_->useAllocation? compuLoad_->sizeBase : 0;
|
||
return chainLoad_.calcRuntimeReference (timeBase, sizeBase);
|
||
}
|
||
|
||
double getStressFac() { return stressFac_; }
|
||
|
||
|
||
|
||
/* ===== Setter / builders for custom configuration ===== */
|
||
|
||
ScheduleCtx&&
|
||
withInstrumentation (bool doWatch =true)
|
||
{
|
||
if (doWatch)
|
||
{
|
||
watchInvocations_.reset (new lib::IncidenceCount);
|
||
watchInvocations_->expectThreads (work::Config::COMPUTATION_CAPACITY)
|
||
.expectIncidents(chainLoad_.size());
|
||
}
|
||
else
|
||
watchInvocations_.reset();
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withPlanningStep (microseconds planningTime_per_node)
|
||
{
|
||
planSpeed_ = FrameRate{1, Duration{_uTicks(planningTime_per_node)}};
|
||
preRoll_ = guessPlanningPreroll();
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withChunkSize (size_t nodes_per_chunk)
|
||
{
|
||
chunkSize_ = nodes_per_chunk;
|
||
preRoll_ = guessPlanningPreroll();
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withPreRoll (microseconds planning_headstart)
|
||
{
|
||
preRoll_ = planning_headstart;
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withUpfrontPlanning()
|
||
{
|
||
withChunkSize (chainLoad_.size());
|
||
preRoll_ *= UPFRONT_PLANNING_BOOST;
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withLevelDuration (microseconds fixedTime_per_level)
|
||
{
|
||
levelSpeed_ = FrameRate{1, Duration{_uTicks(fixedTime_per_level)}};
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withBaseExpense (microseconds fixedTime_per_node)
|
||
{
|
||
nodeExpense_ = _uTicks(fixedTime_per_node);
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withSchedDepends (bool explicitly)
|
||
{
|
||
schedDepends_ = explicitly;
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withSchedNotify (bool doSetTime =true)
|
||
{
|
||
schedNotify_ = doSetTime;
|
||
return move(*this);
|
||
}
|
||
|
||
/**
|
||
* Establish a differentiated schedule per level, taking node weights into account
|
||
* @param stressFac further proportional tightening of the schedule times
|
||
* @param concurrency the nominally available concurrency, applied per level
|
||
* @param formFac further expenses to take into account (reducing the stressFac);
|
||
*/
|
||
ScheduleCtx&&
|
||
withAdaptedSchedule (double stressFac =1.0, uint concurrency=0, double formFac =1.0)
|
||
{
|
||
if (not concurrency) // use hardware concurrency (#cores) by default
|
||
concurrency = defaultConcurrency();
|
||
ENSURE (isLimited (1u, concurrency, 3*defaultConcurrency()));
|
||
REQUIRE (formFac > 0.0);
|
||
stressFac /= formFac;
|
||
withLevelDuration (compuLoad_->timeBase);
|
||
fillAdaptedSchedule (stressFac, concurrency);
|
||
return move(*this);
|
||
}
|
||
|
||
double
|
||
determineEmpiricFormFactor (uint concurrency=0)
|
||
{
|
||
if (not watchInvocations_) return 1.0;
|
||
auto stat = watchInvocations_->evaluate();
|
||
if (0 == stat.activationCnt) return 1.0;
|
||
// looks like we have actual measurement data...
|
||
ENSURE (0.0 < stat.avgConcurrency);
|
||
if (not concurrency)
|
||
concurrency = defaultConcurrency();
|
||
double worktimeRatio = 1 - stat.timeAtConc(0) / stat.coveredTime;
|
||
double workConcurrency = stat.avgConcurrency / worktimeRatio;
|
||
double weightSum = chainLoad_.calcWeightSum();
|
||
double expectedCompoundedWeight = chainLoad_.calcExpectedCompoundedWeight(concurrency);
|
||
double expectedConcurrency = weightSum / expectedCompoundedWeight;
|
||
double formFac = 1 / (workConcurrency / expectedConcurrency);
|
||
double expectedNodeTime = _uSec(compuLoad_->timeBase) * weightSum / chainLoad_.size();
|
||
double realAvgNodeTime = stat.activeTime / stat.activationCnt;
|
||
formFac *= realAvgNodeTime / expectedNodeTime;
|
||
return formFac;
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withJobDeadline (microseconds deadline_after_start)
|
||
{
|
||
deadline_ = deadline_after_start;
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withAnnouncedLoadFactor (uint factor_on_levelSpeed)
|
||
{
|
||
blockLoadFac_ = factor_on_levelSpeed;
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withManifestation (ManifestationID manID)
|
||
{
|
||
manID_ = manID;
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withLoadTimeBase (microseconds timeBase =LOAD_DEFAULT_TIME)
|
||
{
|
||
compuLoad_->timeBase = timeBase;
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
deactivateLoad()
|
||
{
|
||
compuLoad_->timeBase = 0us;
|
||
return move(*this);
|
||
}
|
||
|
||
ScheduleCtx&&
|
||
withLoadMem (size_t sizeBase =LOAD_DEFAULT_MEM_SIZE)
|
||
{
|
||
if (not sizeBase)
|
||
{
|
||
compuLoad_->sizeBase = LOAD_DEFAULT_MEM_SIZE;
|
||
compuLoad_->useAllocation =false;
|
||
}
|
||
else
|
||
{
|
||
compuLoad_->sizeBase = sizeBase;
|
||
compuLoad_->useAllocation =true;
|
||
}
|
||
return move(*this);
|
||
}
|
||
|
||
private:
|
||
/** push away any existing wait state and attach new clean state */
|
||
std::future<void>
|
||
attachNewCompletionSignal()
|
||
{
|
||
std::promise<void> notYetTriggered;
|
||
signalDone_.swap (notYetTriggered);
|
||
return signalDone_.get_future();
|
||
}
|
||
|
||
void
|
||
awaitBlocking(std::future<void> signal)
|
||
{
|
||
if (std::future_status::timeout == signal.wait_for (SAFETY_TIMEOUT))
|
||
throw err::Fatal("Timeout on Scheduler test exceeded.");
|
||
}
|
||
|
||
Job
|
||
calcJob (size_t idx, size_t level)
|
||
{
|
||
return Job{*calcFunctor_
|
||
, calcFunctor_->encodeNodeID(idx)
|
||
, calcFunctor_->encodeLevel(level)
|
||
};
|
||
}
|
||
|
||
Job
|
||
planningJob (size_t endNodeIDX)
|
||
{
|
||
return Job{*planFunctor_
|
||
, planFunctor_->encodeNodeID(endNodeIDX)
|
||
, Time::ANYTIME
|
||
};
|
||
}
|
||
|
||
Job
|
||
wakeUpJob ()
|
||
{
|
||
SpecialJobFun wakeUpFun{[this](JobParameter)
|
||
{
|
||
signalDone_.set_value();
|
||
}};
|
||
return Job{ wakeUpFun
|
||
, InvocationInstanceID()
|
||
, Time::ANYTIME
|
||
};
|
||
}
|
||
|
||
microseconds
|
||
guessPlanningPreroll()
|
||
{
|
||
return microseconds(_raw(Time{chunkSize_ / planSpeed_}));
|
||
}
|
||
|
||
FrameRate
|
||
calcLoadHint()
|
||
{
|
||
return FrameRate{levelSpeed_ * blockLoadFac_};
|
||
}
|
||
|
||
size_t
|
||
calcNextChunkEnd (size_t lastNodeIDX)
|
||
{
|
||
lastNodeIDX += chunkSize_;
|
||
return min (lastNodeIDX, chainLoad_.size()-1);
|
||
} // prevent out-of-bound access
|
||
|
||
Time
|
||
anchorSchedule()
|
||
{
|
||
Time anchor = RealClock::now() + _uTicks(preRoll_);
|
||
if (isnil (startTimes_))
|
||
fillDefaultSchedule();
|
||
size_t numPoints = chainLoad_.topLevel()+2;
|
||
ENSURE (startTimes_.size() == numPoints);
|
||
Offset base{startTimes_.front(), anchor};
|
||
for (size_t level=0; level<numPoints; ++level)
|
||
startTimes_[level] += base;
|
||
return anchor;
|
||
}
|
||
|
||
void
|
||
fillDefaultSchedule()
|
||
{
|
||
size_t numPoints = chainLoad_.topLevel()+2;
|
||
stressFac_ = 1.0;
|
||
startTimes_.clear();
|
||
startTimes_.reserve (numPoints);
|
||
for (size_t level=0; level<numPoints; ++level)
|
||
startTimes_.push_back (level / levelSpeed_);
|
||
}
|
||
|
||
void
|
||
fillAdaptedSchedule (double stressFact, uint concurrency)
|
||
{
|
||
REQUIRE (stressFact > 0);
|
||
stressFac_ = stressFact;
|
||
size_t numPoints = chainLoad_.topLevel()+2;
|
||
startTimes_.clear();
|
||
startTimes_.reserve (numPoints);
|
||
startTimes_.push_back (Time::ZERO);
|
||
chainLoad_.levelScheduleSequence (concurrency)
|
||
.transform([&](double scheduleFact){ return (scheduleFact/stressFac_) * Offset{1,levelSpeed_};})
|
||
.effuse(startTimes_);
|
||
}
|
||
|
||
Time
|
||
jobStartTime (size_t level, size_t nodeIDX =0)
|
||
{
|
||
ENSURE (level < startTimes_.size());
|
||
return startTimes_[level]
|
||
+ nodeExpense_ * (nodeIDX/stressFac_);
|
||
}
|
||
|
||
auto
|
||
lastExitNodes (size_t lastChunkStartIDX)
|
||
{
|
||
return chainLoad_.allExitNodes()
|
||
.transform([&](Node& n){ return chainLoad_.nodeID(n); })
|
||
.filter([=](size_t idx){ return idx >= lastChunkStartIDX; });
|
||
} // index of all Exit-Nodes within last planning-chunk...
|
||
|
||
Time
|
||
calcPlanScheduleTime (size_t lastNodeIDX)
|
||
{/* must be at least 1 level ahead,
|
||
because dependencies are defined backwards;
|
||
the chain-load graph only defines dependencies over one level
|
||
thus the first level in the next chunk must still be able to attach
|
||
dependencies to the last row of the preceding chunk, implying that
|
||
those still need to be ahead of schedule, and not yet dispatched.
|
||
*/
|
||
lastNodeIDX = min (lastNodeIDX, chainLoad_.size()-1); // prevent out-of-bound access
|
||
size_t nextChunkLevel = chainLoad_.nodes_[lastNodeIDX].level;
|
||
nextChunkLevel = nextChunkLevel>2? nextChunkLevel-2 : 0;
|
||
return jobStartTime(nextChunkLevel) - _uTicks(preRoll_);
|
||
}
|
||
};
|
||
|
||
|
||
/**
|
||
* establish and configure the context used for scheduling computations.
|
||
* @note clears hashes and re-propagates seed in the node graph beforehand.
|
||
*/
|
||
template<size_t maxFan>
|
||
typename TestChainLoad<maxFan>::ScheduleCtx
|
||
TestChainLoad<maxFan>::setupSchedule (Scheduler& scheduler)
|
||
{
|
||
clearNodeHashes();
|
||
return ScheduleCtx{*this, scheduler};
|
||
}
|
||
|
||
|
||
|
||
}}} // namespace vault::gear::test
|
||
#endif /*VAULT_GEAR_TEST_TEST_CHAIN_LOAD_H*/
|