/* STRESS-TEST-RIG.hpp - setup for stress and performance investigation Copyright (C) 2024, Hermann Vosseler   **Lumiera** is free software; you can redistribute it and/or modify it   under the terms of the GNU General Public License as published by the   Free Software Foundation; either version 2 of the License, or (at your   option) any later version. See the file COPYING for further details. */ /** @file stress-test-rig.hpp ** A test bench to conduct performance measurement series. Outfitted especially ** to determine runtime behaviour of the Scheduler and associated parts of the ** Lumiera Engine through systematic execution of load scenarios. ** ** # Scheduler Stress Testing ** ** The point of departure for any stress testing is to show that the subject will ** break in controlled ways only. For the Scheduler this can easily be achieved by ** overloading until job deadlines are broken. Much more challenging however is the ** task to find out about the boundary of regular scheduler operation. This realm ** can be defined by the ability of the scheduler to follow and conform to the ** timings set out explicitly in the schedule. Obviously, short and localised ** load peaks can be accommodated, yet once a persistent backlog builds up, ** the schedule starts to slip and the calculation process will flounder. ** ** A method to determine such a _»breaking point«_ in a systematic way is based on ** building a [synthetic calculation load](\ref test-chain-load.hpp) and establish ** the timings of a test schedule based on a simplified model of expected computation ** expense. By scaling and condensing these schedule timings, a loss of control can ** be provoked, and determined by statistical observation: since the process of ** scheduling contains an essentially random component, persistent overload will be ** indicated by an increasing variance of the overall runtime, and a departure from ** the nominal runtime of the executed schedule. ** ** Another, complimentary observation method is to inject a defined and homogeneous ** load peak into the scheduler and then watch the time it takes to process, the ** processing overhead and achieved degree of concurrency. The actual observation ** using this measurement setup attempts to establish a single _control parameter_ ** as free variable, allowing to look for correlations and to build a linear ** regression model to characterise a supposed functional dependency. Simply put, ** given a number of fixed sizes jobs (not further correlated) as input, this ** approach yields a »number of jobs per time unit« and »socked overhead« — ** thereby distilling a _behaviour model_ to describe the actual stochastic data. ** ** ## Setup ** To perform this test scheme, an operational Scheduler is required, and an instance ** of the TestChainLoad must be provided, configured with desired load properties. ** Moreover, the actual measurement setup requires to perform several test executions, ** controlling some parameters in accordance to the observation scheme. The control ** parameters and the specifics of the actual setup should be clearly visible, while ** hiding the complexities of measurement execution. ** ** This can be achieved by a »Toolbench«, which is a framework with building blocks, ** providing a pre-arranged _measurement rig_ for the various kinds of measurement setup. ** The implementation code is arranged as a »sandwich« structure... ** - StressTestRig, which is also the framework class, acts as _bottom layer_ to ** provide an anchor point, some common definitions implying an invocation scheme ** + first a TestChainLoad topology is constructed, based on test parameters ** + this is used to create a TestChainLoad::SchedulerCtx, which is then ** outfitted specifically for each test run ** - the _middle layer_ is a custom `Setup` class, which inherits from the bottom ** layer and fills in the actual topology and configuration for the desired test ** - the test performance is then initiated by layering a specific _test tool_ on ** top of the compound, which in turn picks up the parametrisation from the Setup ** and base configuration, visible as base class (template param) \a CONF ** Together, this leads to the following code scheme, which aims to simplify experimentation: ** \code ** using StressRig = StressTestRig<16>; ** ** struct Setup : StressRig ** { ** uint CONCURRENCY = 4; ** //// more definitions ** ** auto testLoad() ** {....define a Test-Chain-Load topology....} ** ** auto testSetup (TestLoad& testLoad) ** { return StressRig::testSetup(testLoad) ** .withLoadTimeBase(500us) ** // ....more customisation here ** } ** }; ** ** auto result = StressRig::with() ** .perform(); ** \endcode ** ** ## Breaking Point search ** The bench::BreakingPoint tool typically uses a complex interwoven job plan, which is ** tightened until the timing breaks. The _stressFactor_ of the generated schedule will be ** the active parameter of this test, performing a _binary search_ for the _breaking point._ ** The Measurement attempts to narrow down to the point of massive failure, when the ability ** to somehow cope with the schedule completely break down. Based on watching the Scheduler ** in operation, the detection was linked to three conditions, which typically will be ** triggered together, and within a narrow and reproducible parameter range: ** - an individual run counts as _accidentally failed_ when the execution slips ** away by more than 2ms with respect to the defined overall schedule. When more ** than 55% of all observed runs are considered as failed, the first condition is met ** - moreover, the observed ''standard derivation'' must also surpass the same limit ** of > 2ms, which indicates that the Scheduling mechanism is under substantial ** strain; in regular operation, the slip is rather ~ 200µs. ** - the third condition is that the ''averaged delta'' has surpassed 4ms, ** which is 2 times the basic failure indicator. ** ** ## Parameter Correlation ** As a complement, the bench::ParameterRange tool is provided to run a specific Scheduler setup ** while varying a single control parameter within defined limits. This produces a set of (x,y) data, ** which can be used to search for correlations or build a linear regression model to describe the ** Scheduler's behaviour as function of the control parameter. The typical use case would be to use ** the input length (number of Jobs) as control parameter, leading to a model for Scheduling expense. ** ** ## Observation tools ** The TestChainLoad, together with its helpers and framework, already offers some tools to visualise ** the generated topology and to calculate statistics, and to watch an performance with instrumentation. ** In addition, the individual tools provide some debugging output to watch the measurement scheme. ** Result data is either a tuple of values (in case of bench::BreakingPoint), or a table of result ** data as function of the control parameter (for bench::ParameterRange). Result data, when converted ** to CSV, can be visualised as Gnuplot diagram. ** @see TestChainLoad_test ** @see SchedulerStress_test ** @see binary-search.hpp ** @see gnuplot-gen.hpp */ #ifndef VAULT_GEAR_TEST_STRESS_TEST_RIG_H #define VAULT_GEAR_TEST_STRESS_TEST_RIG_H #include "test-chain-load.hpp" #include "lib/binary-search.hpp" #include "lib/test/transiently.hpp" #include "vault/gear/scheduler.hpp" #include "lib/time/timevalue.hpp" #include "lib/meta/function.hpp" #include "lib/format-string.hpp" #include "lib/format-cout.hpp" #include "lib/gnuplot-gen.hpp" #include "lib/stat/statistic.hpp" #include "lib/stat/data.hpp" #include "lib/random.hpp" #include "lib/util.hpp" #include #include #include #include #include namespace vault{ namespace gear { namespace test { using std::make_tuple; using std::forward; /** * Configurable template framework for running Scheduler Stress tests * Use to build a custom setup class, which is then [injected](\ref StressTestRig::with) * to [perform](\ref StressTestRig::Launcher::perform) a _specific measurement tool_. * Several tools and detailed customisations are available in `namespace bench` * - bench::BreakingPoint conducts a binary search to _break a schedule_ * - bench::ParameterRange performs a randomised series of parametrised test runs */ template class StressTestRig : util::NonCopyable { public: using TestLoad = TestChainLoad; using TestSetup = typename TestLoad::ScheduleCtx; /***********************************************************************//** * Entrance Point: build a stress test measurement setup using a dedicated * \a TOOL class, takes the configuration \a CONF as template parameter * and which is assumed to inherit (indirectly) from StressRig. * @tparam CONF specialised subclass of StressRig with customisation * @return a builder to configure and then launch the actual test */ template static auto with() { return Launcher{}; } /* ======= default configuration (inherited) ======= */ uint CONCURRENCY = work::Config::getDefaultComputationCapacity(); bool INSTRUMENTATION = true; double EPSILON = 0.01; ///< error bound to abort binary search double UPPER_STRESS = 1.7; ///< starting point for the upper limit, likely to fail double FAIL_LIMIT = 2.0; ///< delta-limit when to count a run as failure double TRIGGER_FAIL = 0.55; ///< %-fact: criterion-1 failures above this rate double TRIGGER_SDEV = FAIL_LIMIT; ///< in ms : criterion-2 standard derivation double TRIGGER_DELTA = 2*FAIL_LIMIT; ///< in ms : criterion-3 average delta above this limit bool showRuns = false; ///< print a line for each individual run bool showStep = true; ///< print a line for each binary search step bool showRes = true; ///< print result data bool showRef = true; ///< calculate single threaded reference time static constexpr uint REPETITIONS{20}; BlockFlowAlloc bFlow{}; EngineObserver watch{}; Scheduler scheduler{bFlow, watch}; protected: /** Extension point: build the computation topology for this test */ auto testLoad(size_t nodes =64) { return TestLoad{nodes}; } /** (optional) extension point: base configuration of the test ScheduleCtx * @warning the actual setup \a CONF is layered, beware of shadowing. */ auto testSetup (TestLoad& testLoad) { return testLoad.setupSchedule(scheduler) .withLevelDuration(200us) .withJobDeadline(500ms) .withUpfrontPlanning(); } template struct Launcher : CONF { template class TOOL, typename...ARGS> auto perform (ARGS&& ...args) { return TOOL{}.perform (std::forward (args)...); } }; }; namespace bench { ///< Specialised tools to investigate scheduler performance using util::_Fmt; using util::min; using util::max; using std::vector; using std::declval; /**************************************************//** * Specific stress test scheme to determine the * »breaking point« where the Scheduler starts to slip */ template class BreakingPoint : public CONF { using TestLoad = typename CONF::TestLoad; using TestSetup = typename TestLoad::ScheduleCtx; struct Res { double stressFac{0}; double percentOff{0}; double stdDev{0}; double avgDelta{0}; double avgTime{0}; double expTime{0}; }; /** prepare the ScheduleCtx for a specifically parametrised test series */ void configureTest (TestSetup& testSetup, double stressFac) { testSetup.withInstrumentation(CONF::INSTRUMENTATION) // side-effect: clear existing statistics .withAdaptedSchedule(stressFac, CONF::CONCURRENCY, adjustmentFac); } /** perform a repetition of test runs and compute statistics */ Res runProbes (TestSetup& testSetup, double stressFac) { auto sqr = [](auto n){ return n*n; }; Res res; auto& [sf,pf,sdev,avgD,avgT,expT] = res; sf = stressFac; std::array runTime; for (uint i=0; i CONF::FAIL_LIMIT); if (fail) ++ pf; showRun(i, delta, runTime[i], runTime[i] > avgT, fail); } pf /= CONF::REPETITIONS; sdev = sqrt (sdev/CONF::REPETITIONS); showStep(res); return res; } /** criterion to decide if this test series constitutes a slipped schedule */ bool decideBreakPoint (Res& res) { return res.percentOff > 0.99 or( res.percentOff > CONF::TRIGGER_FAIL and res.stdDev > CONF::TRIGGER_SDEV and res.avgDelta > CONF::TRIGGER_DELTA); } /** * invoke a binary search to produce a sequence of test series * with the goal to narrow down the stressFact where the Schedule slips away. */ template Res conductBinarySearch (FUN&& runTestCase, vector const& results) { double breakPoint = lib::binarySearch_upper (forward (runTestCase) , 0.0, CONF::UPPER_STRESS , CONF::EPSILON); uint s = results.size(); ENSURE (s >= 2); Res res; auto& [sf,pf,sdev,avgD,avgT,expT] = res; // average data over the last three steps investigated for smoothing uint points = min (results.size(), 3u); for (uint i=results.size()-points; iavg? % fail? _Fmt fmtStep_{ "%4.2f| : ∅Δ=%4.1f±%-4.2f ∅t=%4.1f %s %%%-3.0f -- expect:%4.1fms"};// stress % ∅Δ % σ % ∅t % fail % pecentOff % t-expect _Fmt fmtResSDv_{"%9s= %5.2f ±%4.2f%s"}; _Fmt fmtResVal_{"%9s: %5.2f%s"}; void showRun(uint i, double delta, double t, bool over, bool fail) { if (CONF::showRuns) cout << fmtRun_ % i % delta % t % (over? "+":"-") % (fail? "●":"○") << endl; } void showStep(Res& res) { if (CONF::showStep) cout << fmtStep_ % res.stressFac % res.avgDelta % res.stdDev % res.avgTime % (decideBreakPoint(res)? "—◆—":"—◇—") % (100*res.percentOff) % res.expTime << endl; } void showRes(Res& res) { if (CONF::showRes) { cout << fmtResVal_ % "stresFac" % res.stressFac % "" < observations; auto performEvaluation = [&](double stressFac) { configureTest (testSetup, stressFac); auto res = runProbes (testSetup, stressFac); observations.push_back (res); return decideBreakPoint(res); }; Res res = conductBinarySearch (move(performEvaluation), observations); showRes (res); showRef (testSetup); return make_tuple (res.stressFac, res.avgDelta, res.avgTime); } }; /**************************************************//** * Specific test scheme to perform a Scheduler setup * over a given control parameter range to determine * correlations */ template class ParameterRange : public CONF { using TestLoad = typename CONF::TestLoad; using TestSetup = typename TestLoad::ScheduleCtx; // Type binding for data evaluation using Param = typename CONF::Param; using Table = typename CONF::Table; void runTest (Param param, Table& data) { TestLoad testLoad = CONF::testLoad(param).buildTopology(); TestSetup testSetup = CONF::testSetup (testLoad) .withInstrumentation(); // Note: by default Schedule with CONF::LEVEL_STEP double millis = testSetup.launch_and_wait() / 1000; auto stat = testSetup.getInvocationStatistic(); CONF::collectResult (data, param, millis, stat); } public: /** * Launch a measurement sequence running the Scheduler with a * varying parameter value to investigate (x,y) correlations. * @return ////TODO a tuple `[stress-factor, ∅delta, ∅run-time]` */ Table perform (Param lower, Param upper) { TRANSIENTLY(work::Config::COMPUTATION_CAPACITY) = CONF::CONCURRENCY; Param dist = upper - lower; uint cnt = CONF::REPETITIONS; vector points; points.reserve (cnt); Param minP{upper}, maxP{lower}; for (uint i=0; i lower) points[cnt-1] = lower; Table results; for (Param point : points) runTest (point, results); return results; } }; /* ====== Preconfigured ParamRange-Evaluations ====== */ using lib::stat::Column; using lib::stat::DataTable; using lib::stat::DataSpan; using lib::stat::CSVData; using IncidenceStat = lib::IncidenceCount::Statistic; /** * Calculate a linear regression model for two table columns * @return a tuple `(socket,gradient,Vector(predicted),Vector(deltas),correlation,maxDelta,stdev)` */ template inline auto linearRegression (Column const& x, Column const& y) { lib::stat::RegressionData points; size_t cnt = min (x.data.size(), y.data.size()); points.reserve (cnt); for (size_t i=0; i < cnt; ++i) points.emplace_back (x.data[i], y.data[i]); return lib::stat::computeLinearRegression (points); } /** * Mix-in for setup of a #ParameterRange evaluation to watch * the processing of a single load peak, using the number of * added job as independent parameter. * @remark inject this definition (by inheritance) into the * Setup, which should then also define a TestChainLoad * graph with an overall size controlled by the #Param * @see SchedulerStress_test#watch_expenseFunction() */ struct LoadPeak_ParamRange_Evaluation { using Param = size_t; struct DataRow { Column param {"load size"}; // independent variable / control parameter Column time {"result time"}; Column conc {"concurrency"}; Column jobtime {"avg jobtime"}; Column impeded {"avg impeded"}; auto allColumns() { return std::tie(param ,time ,conc ,jobtime ,impeded ); } }; using Table = DataTable; void collectResult(Table& data, Param param, double millis, bench::IncidenceStat const& stat) { (void)millis; data.newRow(); data.param = param; data.time = stat.coveredTime / 1000; data.conc = stat.avgConcurrency; data.jobtime = stat.activeTime / stat.activationCnt; data.impeded = (stat.timeAtConc(1) + stat.timeAtConc(0))/stat.activationCnt; } static double avgConcurrency (Table const& results) { return lib::stat::average (DataSpan (results.conc.data)); } static string renderGnuplot (Table const& results) { using namespace lib::gnuplot_gen; string csv = results.renderCSV(); Param maxParam = * std::max_element (results.param.data.begin(), results.param.data.end()); Param xtics = maxParam > 500? 50 : maxParam > 200? 20 : maxParam > 100? 10 : 5; return scatterRegression( ParamRecord().set (KEY_CSVData, csv) .set (KEY_TermSize, "600,600") .set (KEY_Xtics, int64_t(xtics)) .set (KEY_Xlabel, "load size ⟶ number of jobs") .set (KEY_Ylabel, "active time ⟶ ms") .set (KEY_Y2label, "concurrent threads ⟶") .set (KEY_Y3label, "avg job time ⟶ µs") ); } }; // }// namespace bench }}}// namespace vault::gear::test #endif /*VAULT_GEAR_TEST_STRESS_TEST_RIG_H*/