LUMIERA.clone/tests/vault/gear/stress-test-rig.hpp

/*
  STRESS-TEST-RIG.hpp  -  setup for stress and performance investigation

   Copyright (C)
     2024,            Hermann Vosseler <Ichthyostega@web.de>

  **Lumiera** is free software; you can redistribute it and/or modify it
  under the terms of the GNU General Public License as published by the
  Free Software Foundation; either version 2 of the License, or (at your
  option) any later version. See the file COPYING for further details.

*/

/** @file stress-test-rig.hpp
 ** A test bench to conduct performance measurement series. Outfitted especially
 ** to determine runtime behaviour of the Scheduler and associated parts of the
 ** Lumiera Engine through systematic execution of load scenarios.
 ** 
 ** # Scheduler Stress Testing
 ** 
 ** The point of departure for any stress testing is to show that the subject will
 ** break in controlled ways only. For the Scheduler this can easily be achieved by
 ** overloading until job deadlines are broken. Much more challenging however is the
 ** task to find out about the boundary of regular scheduler operation. This realm
 ** can be defined by the ability of the scheduler to follow and conform to the
 ** timings set out explicitly in the schedule. Obviously, short and localised
 ** load peaks can be accommodated, yet once a persistent backlog builds up,
 ** the schedule starts to slip and the calculation process will flounder.
 ** 
 ** A method to determine such a _»breaking point«_ in a systematic way is based on
 ** building a [synthetic calculation load](\ref test-chain-load.hpp) and establish
 ** the timings of a test schedule based on a simplified model of expected computation
 ** expense. By scaling and condensing these schedule timings, a loss of control can
 ** be provoked, and determined by statistical observation: since the process of
 ** scheduling contains an essentially random component, persistent overload will be
 ** indicated by an increasing variance of the overall runtime, and a departure from
 ** the nominal runtime of the executed schedule.
 ** 
 ** Another, complimentary observation method is to inject a defined and homogeneous
 ** load peak into the scheduler and then watch the time it takes to process, the
 ** processing overhead and achieved degree of concurrency. The actual observation
 ** using this measurement setup attempts to establish a single _control parameter_
 ** as free variable, allowing to look for correlations and to build a linear
 ** regression model to characterise a supposed functional dependency. Simply put,
 ** given a number of fixed sizes jobs (not further correlated) as input, this
 ** approach yields a »number of jobs per time unit« and »socked overhead« —
 ** thereby distilling a _behaviour model_ to describe the actual stochastic data.
 ** 
 ** ## Setup
 ** To perform this test scheme, an operational Scheduler is required, and an instance
 ** of the TestChainLoad must be provided, configured with desired load properties.
 ** Moreover, the actual measurement setup requires to perform several test executions,
 ** controlling some parameters in accordance to the observation scheme. The control
 ** parameters and the specifics of the actual setup should be clearly visible, while
 ** hiding the complexities of measurement execution.
 ** 
 ** This can be achieved by a »Toolbench«, which is a framework with building blocks,
 ** providing a pre-arranged _measurement rig_ for the various kinds of measurement setup.
 ** The implementation code is arranged as a »sandwich« structure...
 ** - StressTestRig, which is also the framework class, acts as _bottom layer_ to
 **   provide an anchor point, some common definitions implying an invocation scheme
 **   + first a TestChainLoad topology is constructed, based on test parameters
 **   + this is used to create a TestChainLoad::SchedulerCtx, which is then
 **     outfitted specifically for each test run
 ** - the _middle layer_ is a custom `Setup` class, which inherits from the bottom
 **   layer and fills in the actual topology and configuration for the desired test
 ** - the test performance is then initiated by layering a specific _test tool_ on
 **   top of the compound, which in turn picks up the parametrisation from the Setup
 **   and base configuration, visible as base class (template param) \a CONF
 ** Together, this leads to the following code scheme, which aims to simplify experimentation:
 ** \code
 ** using StressRig = StressTestRig<16>;
 ** 
 ** struct Setup : StressRig
 **   {
 **     uint CONCURRENCY = 4;
 **     //// more definitions
 **     
 **     auto testLoad()
 **       {....define a Test-Chain-Load topology....}
 **     
 **     auto testSetup (TestLoad& testLoad)
 **       { return StressRig::testSetup(testLoad)
 **                          .withLoadTimeBase(500us)
 **                       // ....more customisation here
 **       }
 **   };
 ** 
 ** auto result = StressRig::with<Setup>()
 **                         .perform<bench::SpecialToolClass>();
 ** \endcode
 ** 
 ** ## Breaking Point search
 ** The bench::BreakingPoint tool typically uses a complex interwoven job plan, which is
 ** tightened until the timing breaks. The _stressFactor_ of the generated schedule will be
 ** the active parameter of this test, performing a _binary search_ for the _breaking point._
 ** The Measurement attempts to narrow down to the point of massive failure, when the ability
 ** to somehow cope with the schedule completely break down. Based on watching the Scheduler
 ** in operation, the detection was linked to three conditions, which typically will be
 ** triggered together, and within a narrow and reproducible parameter range:
 ** - an individual run counts as _accidentally failed_ when the execution slips
 **   away by more than 2ms with respect to the defined overall schedule. When more
 **   than 55% of all observed runs are considered as failed, the first condition is met
 ** - moreover, the observed ''standard derivation'' must also surpass the same limit
 **   of > 2ms, which indicates that the Scheduling mechanism  is under substantial
 **   strain; in regular operation, the slip is rather ~ 200µs.
 ** - the third condition is that the ''averaged delta'' has surpassed 4ms,
 **   which is 2 times the basic failure indicator.
 ** 
 ** ## Parameter Correlation
 ** As a complement, the bench::ParameterRange tool is provided to run a specific Scheduler setup
 ** while varying a single control parameter within defined limits. This produces a set of (x,y) data,
 ** which can be used to search for correlations or build a linear regression model to describe the
 ** Scheduler's behaviour as function of the control parameter. The typical use case would be to use
 ** the input length (number of Jobs) as control parameter, leading to a model for Scheduling expense.
 ** 
 ** ## Observation tools
 ** The TestChainLoad, together with its helpers and framework, already offers some tools to visualise
 ** the generated topology and to calculate statistics, and to watch an performance with instrumentation.
 ** In addition, the individual tools provide some debugging output to watch the measurement scheme.
 ** Result data is either a tuple of values (in case of bench::BreakingPoint), or a table of result
 ** data as function of the control parameter (for bench::ParameterRange). Result data, when converted
 ** to CSV, can be visualised as Gnuplot diagram.
 ** @see TestChainLoad_test
 ** @see SchedulerStress_test
 ** @see binary-search.hpp
 ** @see gnuplot-gen.hpp
 */


#ifndef VAULT_GEAR_TEST_STRESS_TEST_RIG_H
#define VAULT_GEAR_TEST_STRESS_TEST_RIG_H


#include "test-chain-load.hpp"
#include "lib/binary-search.hpp"
#include "lib/test/transiently.hpp"

#include "vault/gear/scheduler.hpp"
#include "lib/time/timevalue.hpp"
#include "lib/meta/function.hpp"
#include "lib/format-string.hpp"
#include "lib/format-cout.hpp"
#include "lib/gnuplot-gen.hpp"
#include "lib/stat/statistic.hpp"
#include "lib/stat/data.hpp"
#include "lib/random.hpp"
#include "lib/util.hpp"

#include <algorithm>
#include <utility>
#include <vector>
#include <tuple>
#include <array>


namespace vault{
namespace gear {
namespace test {
  
  using std::make_tuple;
  using std::forward;
  
  
  /**
   * Configurable template framework for running Scheduler Stress tests
   * Use to build a custom setup class, which is then [injected](\ref StressTestRig::with)
   * to [perform](\ref StressTestRig::Launcher::perform) a _specific measurement tool._
   * Several tools and detailed customisations are available in `namespace bench`
   * - bench::BreakingPoint conducts a binary search to _break a schedule_
   * - bench::ParameterRange performs a randomised series of parametrised test runs
   */
  template<size_t maxFan =DEFAULT_FAN>
  class StressTestRig
    : util::NonCopyable
    {
    public:
      using TestLoad  = TestChainLoad<maxFan>;
      using TestSetup = typename TestLoad::ScheduleCtx;
      
      
      /***********************************************************************//**
       * Entrance Point: build a stress test measurement setup using a dedicated
       * \a TOOL class, takes the configuration \a CONF as template parameter
       * and which is assumed to inherit (indirectly) from StressRig.
       * @tparam CONF specialised subclass of StressRig with customisation
       * @return a builder to configure and then launch the actual test
       */
      template<class CONF>
      static auto
      with()
        {
          return Launcher<CONF>{};
        }
      
      
      /* ======= default configuration (inherited) ======= */
      
      uint CONCURRENCY = work::Config::getDefaultComputationCapacity();
      bool INSTRUMENTATION = true;
      double EPSILON      = 0.01;          ///< error bound to abort binary search
      double UPPER_STRESS = 1.7;           ///< starting point for the upper limit, likely to fail
      double FAIL_LIMIT   = 2.0;           ///< delta-limit when to count a run as failure
      double TRIGGER_FAIL = 0.55;          ///< %-fact: criterion-1 failures above this rate
      double TRIGGER_SDEV = FAIL_LIMIT;    ///< in ms : criterion-2 standard derivation
      double TRIGGER_DELTA = 2*FAIL_LIMIT; ///< in ms : criterion-3 average delta above this limit
      bool showRuns = false;    ///< print a line for each individual run
      bool showStep = true;     ///< print a line for each binary search step
      bool showRes  = true;     ///< print result data
      bool showRef  = true;     ///< calculate single threaded reference time
      
      static uint constexpr REPETITIONS{20};

      BlockFlowAlloc bFlow{};
      EngineObserver watch{};
      Scheduler scheduler{bFlow, watch};
      
      
    protected:
      /** Extension point: build the computation topology for this test */
      auto
      testLoad(size_t nodes =64)
        {
          return TestLoad{nodes};
        }
      
      /** (optional) extension point: base configuration of the test ScheduleCtx
       * @warning the actual setup \a CONF is layered, beware of shadowing. */
      auto
      testSetup (TestLoad& testLoad)
        {
          return testLoad.setupSchedule(scheduler)
                         .withLevelDuration(200us)
                         .withJobDeadline(500ms)
                         .withUpfrontPlanning();
        }
      
      template<class CONF>
      struct Launcher : CONF
        {
          template<template<class> class TOOL, typename...ARGS>
          auto
          perform (ARGS&& ...args)
            {
              return TOOL<CONF>{}.perform (std::forward<ARGS> (args)...);
            }
        };
    };
  
  
  namespace bench { ///< Specialised tools to investigate scheduler performance
    
    using util::_Fmt;
    using util::min;
    using util::max;
    using std::vector;
    using std::declval;
    
    
    /**************************************************//**
     * Specific stress test scheme to determine the
     * »breaking point« where the Scheduler starts to slip
     */
    template<class CONF>
    class BreakingPoint
      : public CONF
      {
        using TestLoad  = typename CONF::TestLoad;
        using TestSetup = typename TestLoad::ScheduleCtx;
        
        struct Res
          {
            double stressFac{0};
            double percentOff{0};
            double stdDev{0};
            double avgDelta{0};
            double avgTime{0};
            double expTime{0};
          };
        
        /** prepare the ScheduleCtx for a specifically parametrised test series */
        void
        configureTest (TestSetup& testSetup, double stressFac)
          {
            testSetup.withInstrumentation(CONF::INSTRUMENTATION)          // side-effect: clear existing statistics
                     .withAdaptedSchedule(stressFac, CONF::CONCURRENCY, adjustmentFac);
          }
        
        /** perform a repetition of test runs and compute statistics */
        Res
        runProbes (TestSetup& testSetup, double stressFac)
          {
            auto sqr = [](auto n){ return n*n; };
            Res res;
            auto& [sf,pf,sdev,avgD,avgT,expT] = res;
            sf   = stressFac;
            std::array<double, CONF::REPETITIONS> runTime;
            for (uint i=0; i<CONF::REPETITIONS; ++i)
              {
                runTime[i] = testSetup.launch_and_wait() / 1000;
                avgT += runTime[i];
                maybeAdaptScaleEmpirically (testSetup, stressFac);
              }
            expT = testSetup.getExpectedEndTime() / 1000;
            avgT /= CONF::REPETITIONS;
            avgD = (avgT-expT); // can be < 0
            for (uint i=0; i<CONF::REPETITIONS; ++i)
              {
                sdev += sqr (runTime[i] - avgT);
                double delta = (runTime[i] - expT);
                bool fail = (delta > CONF::FAIL_LIMIT);
                if (fail)
                  ++ pf;
                showRun(i, delta, runTime[i], runTime[i] > avgT, fail);
              }
            pf /= CONF::REPETITIONS;
            sdev = sqrt (sdev/CONF::REPETITIONS);
            showStep(res);
            return res;
          }
        
        /** criterion to decide if this test series constitutes a slipped schedule */
        bool
        decideBreakPoint (Res& res)
          {
            return res.percentOff > 0.99
               or( res.percentOff > CONF::TRIGGER_FAIL
               and res.stdDev     > CONF::TRIGGER_SDEV
               and res.avgDelta   > CONF::TRIGGER_DELTA);
          }
        
        /**
         * invoke a binary search to produce a sequence of test series
         * with the goal to narrow down the stressFact where the Schedule slips away.
         */
        template<class FUN>
        Res
        conductBinarySearch (FUN&& runTestCase, vector<Res> const& results)
          {
            double breakPoint = lib::binarySearch_upper (forward<FUN> (runTestCase)
                                                        , 0.0, CONF::UPPER_STRESS
                                                        , CONF::EPSILON);
            uint s = results.size();
            ENSURE (s >= 2);
            Res res;
            auto& [sf,pf,sdev,avgD,avgT,expT] = res;
            // average data over the last three steps investigated for smoothing
            uint points = min (results.size(), 3u);
            for (uint i=results.size()-points; i<results.size(); ++i)
              {
                Res const& resx = results[i];
                pf   += resx.percentOff;
                sdev += resx.stdDev;
                avgD += resx.avgDelta;
                avgT += resx.avgTime;
                expT += resx.expTime;
              }
            pf   /= points;
            sdev /= points;
            avgD /= points;
            avgT /= points;
            expT /= points;
            sf = breakPoint;
            return res;
          }
        
        /** adaptive scale correction based on observed behaviour */
        double adjustmentFac{1.0};
        size_t gaugeProbes = 3 * CONF::REPETITIONS;
        
        /**
         * Attempt to factor out some observable properties, which are considered circumstantial
         * and not a direct result of scheduling overheads. The artificial computational load is
         * known to drift towards larger values than calibrated; moreover the actual concurrency
         * achieved can deviate from the heuristic assumptions built into the testing schedule.
         * The latter is problematic to some degree however, since the Scheduler is bound to
         * scale down capacity when idle. To strike a reasonable balance, this adjustment of
         * the measurement scale is done only initially, and when the stress factor is high
         * and some degree of pressure on the scheduler can thus be assumed.
         */
        void
        maybeAdaptScaleEmpirically (TestSetup& testSetup, double stressFac)
          {
            if (not gaugeProbes) return;
            double gain = util::limited (0, pow(stressFac, 9), 1);
            if (gain < 0.2) return;
            //
            double formFac = testSetup.determineEmpiricFormFactor (CONF::CONCURRENCY);
            adjustmentFac = gain*formFac + (1-gain)*adjustmentFac;
            testSetup.withAdaptedSchedule(stressFac, CONF::CONCURRENCY, adjustmentFac);
            --gaugeProbes;
          }
        
        
        _Fmt fmtRun_ {"....·%-2d:  Δ=%4.1f        t=%4.1f  %s %s"};                          //      i % Δ  % t % t>avg?  % fail?
        _Fmt fmtStep_{ "%4.2f|  : ∅Δ=%4.1f±%-4.2f  ∅t=%4.1f  %s %%%-3.0f -- expect:%4.1fms"};// stress % ∅Δ % σ % ∅t % fail % pecentOff % t-expect
        _Fmt fmtResSDv_{"%9s= %5.2f ±%4.2f%s"};
        _Fmt fmtResVal_{"%9s: %5.2f%s"};
        
        void
        showRun(uint i, double delta, double t, bool over, bool fail)
          {
            if (CONF::showRuns)
              cout << fmtRun_ % i % delta % t % (over? "+":"-") % (fail? "●":"○")
                   << endl;
          }
        
        void
        showStep(Res& res)
          {
            if (CONF::showStep)
              cout << fmtStep_ % res.stressFac % res.avgDelta % res.stdDev % res.avgTime
                               % (decideBreakPoint(res)? "—◆—":"—◇—")
                               % (100*res.percentOff) % res.expTime
                   << endl;
          }
        
        void
        showRes(Res& res)
          {
            if (CONF::showRes)
              {
                cout << fmtResVal_ % "stresFac" % res.stressFac             % ""  <<endl;
                cout << fmtResVal_ %     "fail" %(res.percentOff * 100)     % '%' <<endl;
                cout << fmtResSDv_ %    "delta" % res.avgDelta % res.stdDev % "ms"<<endl;
                cout << fmtResVal_ %  "runTime" % res.avgTime               % "ms"<<endl;
                cout << fmtResVal_ % "expected" % res.expTime               % "ms"<<endl;
              }
          }
        
        void
        showRef(TestSetup& testSetup)
          {
            if (CONF::showRef)
              cout << fmtResVal_ % "refTime"
                                 % (testSetup.calcRuntimeReference() /1000)
                                 % "ms" << endl;
          }
        
        
      public:
        /**
         * Launch a measurement sequence to determine the »breaking point«
         * for the configured test load and parametrisation of the Scheduler.
         * @return a tuple `[stress-factor, ∅delta, ∅run-time]`
         */
        auto
        perform()
          {
            TRANSIENTLY(work::Config::COMPUTATION_CAPACITY) = CONF::CONCURRENCY;
            
            TestLoad testLoad = CONF::testLoad().buildTopology();
            TestSetup testSetup = CONF::testSetup (testLoad);
            
            vector<Res> observations;
            auto performEvaluation = [&](double stressFac)
                                        {
                                          configureTest (testSetup, stressFac);
                                          auto res = runProbes (testSetup, stressFac);
                                          observations.push_back (res);
                                          return decideBreakPoint(res);
                                        };
            
            Res res = conductBinarySearch (move(performEvaluation), observations);
            showRes (res);
            showRef (testSetup);
            return make_tuple (res.stressFac, res.avgDelta, res.avgTime);
          }
      };
    
    
    /**************************************************//**
     * Specific test scheme to perform a Scheduler setup
     * over a given control parameter range to determine
     * correlations
     */
    template<class CONF>
    class ParameterRange
      : public CONF
      {
        using TestLoad  = typename CONF::TestLoad;
        using TestSetup = typename TestLoad::ScheduleCtx;
        
        // Type binding for data evaluation
        using Param = typename CONF::Param;
        using Table = typename CONF::Table;
        
        
        void
        runTest (Param param, Table& data)
          {
            TestLoad testLoad = CONF::testLoad(param).buildTopology();
            TestSetup testSetup = CONF::testSetup (testLoad)
                                       .withInstrumentation();    // Note: by default Schedule with CONF::LEVEL_STEP
            double millis = testSetup.launch_and_wait() / 1000;
            auto stat = testSetup.getInvocationStatistic();
            CONF::collectResult (data, param, millis, stat);
          }
        
      public:
        /**
         * Launch a measurement sequence running the Scheduler with a
         * varying parameter value to investigate (x,y) correlations.
         * @return ////TODO a tuple `[stress-factor, ∅delta, ∅run-time]`
         */
        Table
        perform (Param lower, Param upper)
          {
            TRANSIENTLY(work::Config::COMPUTATION_CAPACITY) = CONF::CONCURRENCY;
            
            Param dist = upper - lower;
            uint cnt = CONF::REPETITIONS;
            vector<Param> points;
            points.reserve (cnt);
            Param minP{upper}, maxP{lower};
            for (uint i=0; i<cnt; ++i)
              {
                auto random = lib::defaultGen.uni(); // [0 .. 1.0[
                Param pos = lower + Param(floor (random*dist + 0.5));
                points.push_back(pos);
                minP = min (pos, minP);
                maxP = max (pos, maxP);
              }
            // ensure the bounds participate in test
            if (maxP < upper) points[cnt-2] = upper;
            if (minP > lower) points[cnt-1] = lower;
            
            Table results;
            for (Param point : points)
              runTest (point, results);
            return results;
          }
      };
    
    
    /* ====== Preconfigured ParamRange-Evaluations ====== */
    
    using lib::stat::Column;
    using lib::stat::DataTable;
    using lib::stat::DataSpan;
    using lib::stat::CSVData;
    using IncidenceStat = lib::IncidenceCount::Statistic;
    
    /**
     * Calculate a linear regression model for two table columns
     * @return a tuple `(socket,gradient,Vector(predicted),Vector(deltas),correlation,maxDelta,stdev)`
     */
    template<typename F, typename G>
    inline auto
    linearRegression (Column<F> const& x, Column<G> const& y)
    {
      lib::stat::RegressionData points;
      size_t cnt = min (x.data.size(), y.data.size());
      points.reserve (cnt);
      for (size_t i=0; i < cnt; ++i)
        points.emplace_back (x.data[i], y.data[i]);
      return lib::stat::computeLinearRegression (points);
    }
    
    /**
     * Mix-in for setup of a #ParameterRange evaluation to watch
     * the processing of a single load peak, using the number of
     * added job as independent parameter.
     * @remark inject this definition (by inheritance) into the
     *   Setup, which should then also define a TestChainLoad
     *   graph with an overall size controlled by the #Param
     * @see SchedulerStress_test#watch_expenseFunction()
     */
    struct LoadPeak_ParamRange_Evaluation
      {
        using Param = size_t;
        
        struct DataRow
          {
            Column<Param>  param   {"load size"};    // independent variable / control parameter
            Column<double> time    {"result time"};
            Column<double> conc    {"concurrency"};
            Column<double> jobtime {"avg jobtime"};
            Column<double> impeded {"avg impeded"};
            
            auto allColumns()
            { return std::tie(param
                             ,time
                             ,conc
                             ,jobtime
                             ,impeded
                             );
            }
          };
        
        using Table = DataTable<DataRow>;
        
        void
        collectResult(Table& data, Param param, double millis, bench::IncidenceStat const& stat)
          {
            (void)millis;
            data.newRow();
            data.param = param;
            data.time  = stat.coveredTime / 1000;
            data.conc  = stat.avgConcurrency;
            data.jobtime = stat.activeTime / stat.activationCnt;
            data.impeded = (stat.timeAtConc(1) + stat.timeAtConc(0))/stat.activationCnt;
          }
        
        
        static double
        avgConcurrency (Table const& results)
          {
            return lib::stat::average (DataSpan<double> (results.conc.data));
          }
        
        static string
        renderGnuplot (Table const& results)
          {
            using namespace lib::gnuplot_gen;
            string csv = results.renderCSV();
            Param maxParam = * std::max_element (results.param.data.begin(), results.param.data.end());
            Param xtics = maxParam > 500? 50
                        : maxParam > 200? 20
                        : maxParam > 100? 10
                        :                  5;
            return scatterRegression(
                    ParamRecord().set (KEY_CSVData,  csv)
                                 .set (KEY_TermSize, "600,600")
                                 .set (KEY_Xtics,    int64_t(xtics))
                                 .set (KEY_Xlabel,  "load size ⟶ number of jobs")
                                 .set (KEY_Ylabel,  "active time ⟶ ms")
                                 .set (KEY_Y2label, "concurrent threads ⟶")
                                 .set (KEY_Y3label, "avg job time ⟶ µs")
                                );
          }
      };
    //
  }// namespace bench
}}}// namespace vault::gear::test
#endif /*VAULT_GEAR_TEST_STRESS_TEST_RIG_H*/
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								/*
 								  STRESS-TEST-RIG.hpp  -  setup for stress and performance investigation
-												Copyright: clarify and simplify the file headers

 * Lumiera source code always was copyrighted by individual contributors
 * there is no entity "Lumiera.org" which holds any copyrights
 * Lumiera source code is provided under the GPL Version 2+

== Explanations ==
Lumiera as a whole is distributed under Copyleft, GNU General Public License Version 2 or above.
For this to become legally effective, the ''File COPYING in the root directory is sufficient.''

The licensing header in each file is not strictly necessary, yet considered good practice;
attaching a licence notice increases the likeliness that this information is retained
in case someone extracts individual code files. However, it is not by the presence of some
text, that legally binding licensing terms become effective; rather the fact matters that a
given piece of code was provably copyrighted and published under a license. Even reformatting
the code, renaming some variables or deleting parts of the code will not alter this legal
situation, but rather creates a derivative work, which is likewise covered by the GPL!

The most relevant information in the file header is the notice regarding the
time of the first individual copyright claim. By virtue of this initial copyright,
the first author is entitled to choose the terms of licensing. All further
modifications are permitted and covered by the License. The specific wording
or format of the copyright header is not legally relevant, as long as the
intention to publish under the GPL remains clear. The extended wording was
based on a recommendation by the FSF. It can be shortened, because the full terms
of the license are provided alongside the distribution, in the file COPYING.

											
										
										
											2024-11-17 23:42:55 +01:00
+								   Copyright (C)
 ,            Hermann Vosseler <Ichthyostega@web.de>
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
-												Copyright: clarify and simplify the file headers

 * Lumiera source code always was copyrighted by individual contributors
 * there is no entity "Lumiera.org" which holds any copyrights
 * Lumiera source code is provided under the GPL Version 2+

== Explanations ==
Lumiera as a whole is distributed under Copyleft, GNU General Public License Version 2 or above.
For this to become legally effective, the ''File COPYING in the root directory is sufficient.''

The licensing header in each file is not strictly necessary, yet considered good practice;
attaching a licence notice increases the likeliness that this information is retained
in case someone extracts individual code files. However, it is not by the presence of some
text, that legally binding licensing terms become effective; rather the fact matters that a
given piece of code was provably copyrighted and published under a license. Even reformatting
the code, renaming some variables or deleting parts of the code will not alter this legal
situation, but rather creates a derivative work, which is likewise covered by the GPL!

The most relevant information in the file header is the notice regarding the
time of the first individual copyright claim. By virtue of this initial copyright,
the first author is entitled to choose the terms of licensing. All further
modifications are permitted and covered by the License. The specific wording
or format of the copyright header is not legally relevant, as long as the
intention to publish under the GPL remains clear. The extended wording was
based on a recommendation by the FSF. It can be shortened, because the full terms
of the license are provided alongside the distribution, in the file COPYING.

											
										
										
											2024-11-17 23:42:55 +01:00
+								  **Lumiera** is free software; you can redistribute it and/or modify it
 								  under the terms of the GNU General Public License as published by the
 								  Free Software Foundation; either version 2 of the License, or (at your
 								  option) any later version. See the file COPYING for further details.
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
 								*/
 								/** @file stress-test-rig.hpp
 								 ** A test bench to conduct performance measurement series. Outfitted especially
 								 ** to determine runtime behaviour of the Scheduler and associated parts of the
 								 ** Lumiera Engine through systematic execution of load scenarios.
 								 **
 								 ** # Scheduler Stress Testing
 								 **
 								 ** The point of departure for any stress testing is to show that the subject will
 								 ** break in controlled ways only. For the Scheduler this can easily be achieved by
 								 ** overloading until job deadlines are broken. Much more challenging however is the
 								 ** task to find out about the boundary of regular scheduler operation. This realm
 								 ** can be defined by the ability of the scheduler to follow and conform to the
 								 ** timings set out explicitly in the schedule. Obviously, short and localised
 								 ** load peaks can be accommodated, yet once a persistent backlog builds up,
 								 ** the schedule starts to slip and the calculation process will flounder.
 								 **
 								 ** A method to determine such a _»breaking point«_ in a systematic way is based on
 								 ** building a [synthetic calculation load](\ref test-chain-load.hpp) and establish
 								 ** the timings of a test schedule based on a simplified model of expected computation
 								 ** expense. By scaling and condensing these schedule timings, a loss of control can
 								 ** be provoked, and determined by statistical observation: since the process of
 								 ** scheduling contains an essentially random component, persistent overload will be
 								 ** indicated by an increasing variance of the overall runtime, and a departure from
 								 ** the nominal runtime of the executed schedule.
 								 **
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								 ** Another, complimentary observation method is to inject a defined and homogeneous
 								 ** load peak into the scheduler and then watch the time it takes to process, the
 								 ** processing overhead and achieved degree of concurrency. The actual observation
 								 ** using this measurement setup attempts to establish a single _control parameter_
 								 ** as free variable, allowing to look for correlations and to build a linear
 								 ** regression model to characterise a supposed functional dependency. Simply put,
 								 ** given a number of fixed sizes jobs (not further correlated) as input, this
 								 ** approach yields a »number of jobs per time unit« and »socked overhead« —
 								 ** thereby distilling a _behaviour model_ to describe the actual stochastic data.
 								 **
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								 ** ## Setup
 								 ** To perform this test scheme, an operational Scheduler is required, and an instance
 								 ** of the TestChainLoad must be provided, configured with desired load properties.
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								 ** Moreover, the actual measurement setup requires to perform several test executions,
 								 ** controlling some parameters in accordance to the observation scheme. The control
 								 ** parameters and the specifics of the actual setup should be clearly visible, while
 								 ** hiding the complexities of measurement execution.
 								 **
 								 ** This can be achieved by a »Toolbench«, which is a framework with building blocks,
 								 ** providing a pre-arranged _measurement rig_ for the various kinds of measurement setup.
 								 ** The implementation code is arranged as a »sandwich« structure...
 								 ** - StressTestRig, which is also the framework class, acts as _bottom layer_ to
 								 **   provide an anchor point, some common definitions implying an invocation scheme
-												Scheduler-test: adjust contention mitigation as result of testing

Investigate the behaviour over a wider range of job loads,
job count and worker pool sizes. Seemingly the processing
can not fully utilise the available worker pool capacity.

By inspection of trace-dumps, one impeding mechanism could
be identified: the »stickiness« of the contention mitigation.
Whenever a worker encounters repeated contention, it steps up
and adds more and more wait cycles to remove pressure from the
schedule coordination. As such this is fine and prevents further
degradation of performance by repeated atomic synchronisation.
However, this throttling was kept up needlessly after further
successful work-pulls. Since job times of several milliseconds
can be expected on average in media processing, such a long
retention would spread a performance degradation over a duration
of several frames. Thus, the scheme for step-down was changed
to decrease the throttling by a power series rather than just
documenting the level.

											
										
										
											2024-04-10 03:29:44 +02:00
+								 **   + first a TestChainLoad topology is constructed, based on test parameters
 								 **   + this is used to create a TestChainLoad::SchedulerCtx, which is then
 								 **     outfitted specifically for each test run
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								 ** - the _middle layer_ is a custom `Setup` class, which inherits from the bottom
 								 **   layer and fills in the actual topology and configuration for the desired test
 								 ** - the test performance is then initiated by layering a specific _test tool_ on
 								 **   top of the compound, which in turn picks up the parametrisation from the Setup
 								 **   and base configuration, visible as base class (template param) \a CONF
 								 ** Together, this leads to the following code scheme, which aims to simplify experimentation:
 								 ** \code
 								 ** using StressRig = StressTestRig<16>;
 								 **
 								 ** struct Setup : StressRig
 								 **   {
 								 **     uint CONCURRENCY = 4;
 								 **     //// more definitions
 								 **
 								 **     auto testLoad()
 								 **       {....define a Test-Chain-Load topology....}
 								 **
 								 **     auto testSetup (TestLoad& testLoad)
 								 **       { return StressRig::testSetup(testLoad)
 								 **                          .withLoadTimeBase(500us)
 								 **                       // ....more customisation here
 								 **       }
 								 **   };
 								 **
 								 ** auto result = StressRig::with<Setup>()
 								 **                         .perform<bench::SpecialToolClass>();
 								 ** \endcode
 								 **
 								 ** ## Breaking Point search
 								 ** The bench::BreakingPoint tool typically uses a complex interwoven job plan, which is
 								 ** tightened until the timing breaks. The _stressFactor_ of the generated schedule will be
 								 ** the active parameter of this test, performing a _binary search_ for the _breaking point._
 								 ** The Measurement attempts to narrow down to the point of massive failure, when the ability
 								 ** to somehow cope with the schedule completely break down. Based on watching the Scheduler
 								 ** in operation, the detection was linked to three conditions, which typically will be
 								 ** triggered together, and within a narrow and reproducible parameter range:
-												Scheduler-test: define criterion for breaking point

This statistical criterion defines when to count observed Scheduler performance
as loosing control. The test is comprised of three observations, which
all must be confirmed:

- an individual run counts as accidentally failed when the execution slips
  away by more than 2ms with respect to the defined overall schedule.
  When more than 55% of all observed runs are considered as failed,
  the first condition is met
- moreover, the observed standard derivation must also surpass the
  same limit of > 2ms, which indicates that the Scheduling mechanism
  is under substantial strain (on average, the slip is ~ 200µs)
- the third condition is that the ''averaged delta'' has surpassed
  4ms, which is 2 times the basic failure indicator.

These conditions are based on watching the Scheduler in operation;
typically all three conditions slip away by large margin after a
very narrow yet critical increase in the stress level.

Using three conditions together should improve robustness; often
the problems to keep up with the schedule build up over some parameter
range, yet the actual decision should be based on complete loss of control.

											
										
										
											2024-01-03 21:02:23 +01:00
+								 ** - an individual run counts as _accidentally failed_ when the execution slips
 								 **   away by more than 2ms with respect to the defined overall schedule. When more
 								 **   than 55% of all observed runs are considered as failed, the first condition is met
 								 ** - moreover, the observed ''standard derivation'' must also surpass the same limit
 								 **   of > 2ms, which indicates that the Scheduling mechanism  is under substantial
 								 **   strain; in regular operation, the slip is rather ~ 200µs.
 								 ** - the third condition is that the ''averaged delta'' has surpassed 4ms,
 								 **   which is 2 times the basic failure indicator.
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								 **
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								 ** ## Parameter Correlation
 								 ** As a complement, the bench::ParameterRange tool is provided to run a specific Scheduler setup
 								 ** while varying a single control parameter within defined limits. This produces a set of (x,y) data,
 								 ** which can be used to search for correlations or build a linear regression model to describe the
 								 ** Scheduler's behaviour as function of the control parameter. The typical use case would be to use
 								 ** the input length (number of Jobs) as control parameter, leading to a model for Scheduling expense.
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								 **
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								 ** ## Observation tools
 								 ** The TestChainLoad, together with its helpers and framework, already offers some tools to visualise
 								 ** the generated topology and to calculate statistics, and to watch an performance with instrumentation.
 								 ** In addition, the individual tools provide some debugging output to watch the measurement scheme.
 								 ** Result data is either a tuple of values (in case of bench::BreakingPoint), or a table of result
 								 ** data as function of the control parameter (for bench::ParameterRange). Result data, when converted
 								 ** to CSV, can be visualised as Gnuplot diagram.
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								 ** @see TestChainLoad_test
 								 ** @see SchedulerStress_test
-												Scheduler-test: extract search algo into lib

											
										
										
											2024-01-04 02:03:05 +01:00
+								 ** @see binary-search.hpp
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								 ** @see gnuplot-gen.hpp
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								 */
 								#ifndef VAULT_GEAR_TEST_STRESS_TEST_RIG_H
 								#define VAULT_GEAR_TEST_STRESS_TEST_RIG_H
-												Scheduler-test: fine-tuning of result presentation (Gnuplot)

Visual tweaks specific to this measurement setup
 * include a numeric representation of the regression line
 * include descriptive axis labels
 * improve the key names to clarify their meaning
 * heuristic code for the x-ticks
Package these customisations as a helper function into the measurement tool

											
										
										
											2024-04-08 18:44:46 +02:00
+								#include "test-chain-load.hpp"
-												Scheduler-test: extract search algo into lib

											
										
										
											2024-01-04 02:03:05 +01:00
+								#include "lib/binary-search.hpp"
-												Scheduler-test: fine-tuning of result presentation (Gnuplot)

Visual tweaks specific to this measurement setup
 * include a numeric representation of the regression line
 * include descriptive axis labels
 * improve the key names to clarify their meaning
 * heuristic code for the x-ticks
Package these customisations as a helper function into the measurement tool

											
										
										
											2024-04-08 18:44:46 +02:00
+								#include "lib/test/transiently.hpp"
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
 								#include "vault/gear/scheduler.hpp"
 								#include "lib/time/timevalue.hpp"
-												Scheduler-test: binary search over continuous domain

- textbook implementation
- capture results from visited points
- average results form the last three points to damp statistic fluctuations

											
										
										
											2024-01-03 22:48:49 +01:00
+								#include "lib/meta/function.hpp"
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								#include "lib/format-string.hpp"
-												Scheduler-test: fine-tuning of result presentation (Gnuplot)

Visual tweaks specific to this measurement setup
 * include a numeric representation of the regression line
 * include descriptive axis labels
 * improve the key names to clarify their meaning
 * heuristic code for the x-ticks
Package these customisations as a helper function into the measurement tool

											
										
										
											2024-04-08 18:44:46 +02:00
+								#include "lib/format-cout.hpp"
 								#include "lib/gnuplot-gen.hpp"
-												Scheduler-test: calculate linear model as test result

Use the statistic functions imported recently from Yoshimi-test
to compute a linear regression model as immediate test result.

Combining several measurement series, this allows to draw conclusions
about some generic traits and limitations of the scheduler.

											
										
										
											2024-04-09 01:51:03 +02:00
+								#include "lib/stat/statistic.hpp"
-												Scheduler-test: rework `ParameterRange` tool for data visualisation

Rework the existing tool to capture the measurement series
into the newly integrated CSV-based data storage, allowing
to turn the results into a Gnuplot-visualisation.

											
										
										
											2024-04-04 00:44:11 +02:00
+								#include "lib/stat/data.hpp"
-												Library: replace usages of `rand()` in the whole code base

 * most usages are drop-in replacements
 * occasionally the other convenience functions can be used
 * verify call-paths from core code to identify usages
 * ensure reseeding for all tests involving some kind of randomness...

__Note__: some tests were not yet converted,
since their usage of randomness is actually not thread-safe.
This problem existed previously, since also `rand()` is not thread safe,
albeit in most cases it is possible to ignore this problem, as
''garbled internal state'' is also somehow „random“

											
										
										
											2024-11-13 02:23:23 +01:00
+								#include "lib/random.hpp"
-												Scheduler-test: simple implementation of range coverage

- fill the range randomly with probe points
- use the node count as independent parameter
- measurement method *works as intended*
- results indeed show a linear relationship

Results are ''interesting'' however, since the (par,time) points
seem to be arranged into two lines, implying that about half
of the runs were somehow ''degraded'' and performed way slower.

											
										
										
											2024-02-24 04:17:05 +01:00
+								#include "lib/util.hpp"
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
-												Scheduler-test: fine-tuning of result presentation (Gnuplot)

Visual tweaks specific to this measurement setup
 * include a numeric representation of the regression line
 * include descriptive axis labels
 * improve the key names to clarify their meaning
 * heuristic code for the x-ticks
Package these customisations as a helper function into the measurement tool

											
										
										
											2024-04-08 18:44:46 +02:00
+								#include <algorithm>
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								#include <utility>
-												Scheduler-test: simplify binary search implementation

While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.

Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.

											
										
										
											2024-01-04 01:32:11 +01:00
+								#include <vector>
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								#include <tuple>
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								#include <array>
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
 								namespace vault{
 								namespace gear {
 								namespace test {
 								  using std::make_tuple;
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								  using std::forward;
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								  /**
 								   * Configurable template framework for running Scheduler Stress tests
 								   * Use to build a custom setup class, which is then [injected](\ref StressTestRig::with)
 								   * to [perform](\ref StressTestRig::Launcher::perform) a _specific measurement tool._
 								   * Several tools and detailed customisations are available in `namespace bench`
 								   * - bench::BreakingPoint conducts a binary search to _break a schedule_
 								   * - bench::ParameterRange performs a randomised series of parametrised test runs
 								   */
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								  template<size_t maxFan =DEFAULT_FAN>
 								  class StressTestRig
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								    : util::NonCopyable
 								    {
 								    public:
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								      using TestLoad  = TestChainLoad<maxFan>;
 								      using TestSetup = typename TestLoad::ScheduleCtx;
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								      /***********************************************************************//**
 								       * Entrance Point: build a stress test measurement setup using a dedicated
 								       * \a TOOL class, takes the configuration \a CONF as template parameter
 								       * and which is assumed to inherit (indirectly) from StressRig.
 								       * @tparam CONF specialised subclass of StressRig with customisation
 								       * @return a builder to configure and then launch the actual test
 								       */
 								      template<class CONF>
 								      static auto
 								      with()
 								        {
 								          return Launcher<CONF>{};
 								        }
 								      /* ======= default configuration (inherited) ======= */
 								      uint CONCURRENCY = work::Config::getDefaultComputationCapacity();
 								      bool INSTRUMENTATION = true;
 								      double EPSILON      = 0.01;          ///< error bound to abort binary search
-												Scheduler-test: reduce impact of scale adjustments on breakpoint-search

the `BreakingPoint` tool conducts a binary search to find the ''stress factor''
where a given schedule breaks. There are some known deviations related to the
measurement setup, which unfortunately impact the interpretation of the
''stress factor'' scale. Earlier, an attempt was made, to watch those factors
empirically and work a ''form factor'' into the ''effective stress factor''
used to guide this measurement method.

Closer investigation with extended and elastic load patters now revealed
a strong tendency of the Scheduler to scale down the work resources when not
fully loaded. This may be mistaken by the above mentioned adjustments as a sign
of a structural limiation of the possible concurrency.

Thus, as a mitigation, those adjustments are now only performed at the
beginning of the measurement series, and also only when the stress factor
is high (implying that the scheduler is actually overloaded and thus has
no incentive for scaling down).

These observations indicate that the »Breaking Point« search must be taken
with a grain of salt: Especially when the test load does ''not'' contain
a high degree of inter dependencies, it will be ''stretched elastically''
rather than outright broken. And under such circumstances, this measurement
actually gauges the Scheduler's ability to comply to an established
load and computation goal.

											
										
										
											2024-04-17 21:04:03 +02:00
+								      double UPPER_STRESS = 1.7;           ///< starting point for the upper limit, likely to fail
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								      double FAIL_LIMIT   = 2.0;           ///< delta-limit when to count a run as failure
 								      double TRIGGER_FAIL = 0.55;          ///< %-fact: criterion-1 failures above this rate
 								      double TRIGGER_SDEV = FAIL_LIMIT;    ///< in ms : criterion-2 standard derivation
 								      double TRIGGER_DELTA = 2*FAIL_LIMIT; ///< in ms : criterion-3 average delta above this limit
 								      bool showRuns = false;    ///< print a line for each individual run
 								      bool showStep = true;     ///< print a line for each binary search step
 								      bool showRes  = true;     ///< print result data
 								      bool showRef  = true;     ///< calculate single threaded reference time
 								      static uint constexpr REPETITIONS{20};
 								      BlockFlowAlloc bFlow{};
 								      EngineObserver watch{};
 								      Scheduler scheduler{bFlow, watch};
 								    protected:
 								      /** Extension point: build the computation topology for this test */
 								      auto
-												Scheduler-test: simple implementation of range coverage

- fill the range randomly with probe points
- use the node count as independent parameter
- measurement method *works as intended*
- results indeed show a linear relationship

Results are ''interesting'' however, since the (par,time) points
seem to be arranged into two lines, implying that about half
of the runs were somehow ''degraded'' and performed way slower.

											
										
										
											2024-02-24 04:17:05 +01:00
+								      testLoad(size_t nodes =64)
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								        {
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								          return TestLoad{nodes};
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								        }
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								      /** (optional) extension point: base configuration of the test ScheduleCtx
 								       * @warning the actual setup \a CONF is layered, beware of shadowing. */
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								      auto
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								      testSetup (TestLoad& testLoad)
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								        {
 								          return testLoad.setupSchedule(scheduler)
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								                         .withLevelDuration(200us)
-												Scheduler-test: adjust contention mitigation as result of testing

Investigate the behaviour over a wider range of job loads,
job count and worker pool sizes. Seemingly the processing
can not fully utilise the available worker pool capacity.

By inspection of trace-dumps, one impeding mechanism could
be identified: the »stickiness« of the contention mitigation.
Whenever a worker encounters repeated contention, it steps up
and adds more and more wait cycles to remove pressure from the
schedule coordination. As such this is fine and prevents further
degradation of performance by repeated atomic synchronisation.
However, this throttling was kept up needlessly after further
successful work-pulls. Since job times of several milliseconds
can be expected on average in media processing, such a long
retention would spread a performance degradation over a duration
of several frames. Thus, the scheme for step-down was changed
to decrease the throttling by a power series rather than just
documenting the level.

											
										
										
											2024-04-10 03:29:44 +02:00
+								                         .withJobDeadline(500ms)
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								                         .withUpfrontPlanning();
 								        }
 								      template<class CONF>
 								      struct Launcher : CONF
 								        {
 								          template<template<class> class TOOL, typename...ARGS>
 								          auto
 								          perform (ARGS&& ...args)
 								            {
 								              return TOOL<CONF>{}.perform (std::forward<ARGS> (args)...);
 								            }
 								        };
 								    };
 								  namespace bench { ///< Specialised tools to investigate scheduler performance
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								    using util::_Fmt;
 								    using util::min;
 								    using util::max;
 								    using std::vector;
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								    using std::declval;
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								    /**************************************************//**
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								     * Specific stress test scheme to determine the
 								     * »breaking point« where the Scheduler starts to slip
 								     */
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								    template<class CONF>
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								    class BreakingPoint
 								      : public CONF
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								      {
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								        using TestLoad  = typename CONF::TestLoad;
 								        using TestSetup = typename TestLoad::ScheduleCtx;
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
 								        struct Res
 								          {
 								            double stressFac{0};
 								            double percentOff{0};
 								            double stdDev{0};
 								            double avgDelta{0};
 								            double avgTime{0};
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								            double expTime{0};
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								          };
 								        /** prepare the ScheduleCtx for a specifically parametrised test series */
 								        void
 								        configureTest (TestSetup& testSetup, double stressFac)
 								          {
-												Scheduler-test: reorganise test-setup in Stress-test-rig

With the addition of a second tool `bench::ParameterRange`,
the setup of the test-context for measurement became confusing,
since the original scheme was mostly oriented towards the
''breaking point search.''

On close investigation, I discovered several redundancies, and
moreover, it seems questionable to generate an ''adapted-schedule''
for the Parameter-Range measurement method, which aims at overloading
the scheduler and watch the time to resolve such a load peak.

The solution entertained here is to move most of the schedule-ctx setup
into the base implementation, which is typically just inherited by the
actual testcase setup. This allows to leave the decision whether to build
an adapted schedule to the actual tool. So `bench::BreakingPoint` can
always setup the adapted schedule with a specific stress-factor,
while `bench::ParameterRange` by default does nothing in this
respect, and thus the `ScheduleCtx` will provide a default schedule
with the configured level-duration (and the default for this is
lowered to 200µs here).

In a similar vein, calculation of result data points from the raw measurement
is moved over into the actual test setup, thereby gaining flexibility.

											
										
										
											2024-04-05 22:50:06 +02:00
+								            testSetup.withInstrumentation(CONF::INSTRUMENTATION)          // side-effect: clear existing statistics
-												Scheduler-test: fix deficiencies in search control mechanism

In binary search, in order to establish the invariant initially,
a loop is necessary, since a single step might not be sufficient.

Moreover, the ongoing adjustments jeopardise detection of the
statistical breaking point condition, by causing a negative delta
due to gradually approaching the point of convergence -- leading
to an ongoing search in a region beyond the actual breaking point.

											
										
										
											2024-02-19 15:58:05 +01:00
+								                     .withAdaptedSchedule(stressFac, CONF::CONCURRENCY, adjustmentFac);
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								          }
 								        /** perform a repetition of test runs and compute statistics */
 								        Res
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								        runProbes (TestSetup& testSetup, double stressFac)
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								          {
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								            auto sqr = [](auto n){ return n*n; };
 								            Res res;
 								            auto& [sf,pf,sdev,avgD,avgT,expT] = res;
-												Scheduler-test: define criterion for breaking point

This statistical criterion defines when to count observed Scheduler performance
as loosing control. The test is comprised of three observations, which
all must be confirmed:

- an individual run counts as accidentally failed when the execution slips
  away by more than 2ms with respect to the defined overall schedule.
  When more than 55% of all observed runs are considered as failed,
  the first condition is met
- moreover, the observed standard derivation must also surpass the
  same limit of > 2ms, which indicates that the Scheduling mechanism
  is under substantial strain (on average, the slip is ~ 200µs)
- the third condition is that the ''averaged delta'' has surpassed
  4ms, which is 2 times the basic failure indicator.

These conditions are based on watching the Scheduler in operation;
typically all three conditions slip away by large margin after a
very narrow yet critical increase in the stress level.

Using three conditions together should improve robustness; often
the problems to keep up with the schedule build up over some parameter
range, yet the actual decision should be based on complete loss of control.

											
										
										
											2024-01-03 21:02:23 +01:00
+								            sf   = stressFac;
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								            std::array<double, CONF::REPETITIONS> runTime;
 								            for (uint i=0; i<CONF::REPETITIONS; ++i)
 								              {
 								                runTime[i] = testSetup.launch_and_wait() / 1000;
 								                avgT += runTime[i];
-												Scheduler-test: reduce impact of scale adjustments on breakpoint-search

the `BreakingPoint` tool conducts a binary search to find the ''stress factor''
where a given schedule breaks. There are some known deviations related to the
measurement setup, which unfortunately impact the interpretation of the
''stress factor'' scale. Earlier, an attempt was made, to watch those factors
empirically and work a ''form factor'' into the ''effective stress factor''
used to guide this measurement method.

Closer investigation with extended and elastic load patters now revealed
a strong tendency of the Scheduler to scale down the work resources when not
fully loaded. This may be mistaken by the above mentioned adjustments as a sign
of a structural limiation of the possible concurrency.

Thus, as a mitigation, those adjustments are now only performed at the
beginning of the measurement series, and also only when the stress factor
is high (implying that the scheduler is actually overloaded and thus has
no incentive for scaling down).

These observations indicate that the »Breaking Point« search must be taken
with a grain of salt: Especially when the test load does ''not'' contain
a high degree of inter dependencies, it will be ''stretched elastically''
rather than outright broken. And under such circumstances, this measurement
actually gauges the Scheduler's ability to comply to an established
load and computation goal.

											
										
										
											2024-04-17 21:04:03 +02:00
+								                maybeAdaptScaleEmpirically (testSetup, stressFac);
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								              }
-												Scheduler-test: introduce a form-factor to account for empiric adaptation

Relying on the new instrumentation facility, the actually effective
concurrency and cumulative run time of the test jobs can be established.
These can now be cast into a form-factor to represent actual excess expenses
in relation to the theoretical model.

By allowing to adjust the adapted schedule by this form factor,
it can be made to reflect more closely the actual empiric load,
hopefully leading to a more realistic effect of the stress-factor
and thus results better suited to conclude on generic behaviour.

											
										
										
											2024-02-18 18:01:21 +01:00
+								            expT = testSetup.getExpectedEndTime() / 1000;
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								            avgT /= CONF::REPETITIONS;
-												Scheduler-test: optionally allow to propagate immediately

This is just another (obvious) degree of freedom, which could be
interesting to explore in stress testing, while probably not of much
relevance in practice (if a job is expected to become runable earlier,
in can as well be just scheduled earlier).

Some experimentation shows that the timing measurements exhibit more
fluctuations, but also slightly better times when pressure is low, which
is pretty much what I'd expect. When raising pressure, the average
times converge towards the same time range as observed with time bound
propagation.

Note that enabling this variation requires to wire a boolean switch
over various layers of abstraction; arguably this is an unnecessary
complexity and could be retracted once the »experimentation phase«
is over.

This completes the preparation of a Scheduler Stress-Test setup.

											
										
										
											2024-01-08 22:58:16 +01:00
+								            avgD = (avgT-expT); // can be < 0
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								            for (uint i=0; i<CONF::REPETITIONS; ++i)
 								              {
 								                sdev += sqr (runTime[i] - avgT);
-												Scheduler-test: optionally allow to propagate immediately

This is just another (obvious) degree of freedom, which could be
interesting to explore in stress testing, while probably not of much
relevance in practice (if a job is expected to become runable earlier,
in can as well be just scheduled earlier).

Some experimentation shows that the timing measurements exhibit more
fluctuations, but also slightly better times when pressure is low, which
is pretty much what I'd expect. When raising pressure, the average
times converge towards the same time range as observed with time bound
propagation.

Note that enabling this variation requires to wire a boolean switch
over various layers of abstraction; arguably this is an unnecessary
complexity and could be retracted once the »experimentation phase«
is over.

This completes the preparation of a Scheduler Stress-Test setup.

											
										
										
											2024-01-08 22:58:16 +01:00
+								                double delta = (runTime[i] - expT);
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								                bool fail = (delta > CONF::FAIL_LIMIT);
 								                if (fail)
 								                  ++ pf;
 								                showRun(i, delta, runTime[i], runTime[i] > avgT, fail);
 								              }
-												Scheduler-test: binary search working

- Result found in typically 6-7 steps;
- running 20 instead of 30 samples seems sufficient

Breaking point in this example at stress-Factor 0.47 with run-time 39ms

											
										
										
											2024-01-03 23:53:44 +01:00
+								            pf /= CONF::REPETITIONS;
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								            sdev = sqrt (sdev/CONF::REPETITIONS);
 								            showStep(res);
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								            return res;
 								          }
 								        /** criterion to decide if this test series constitutes a slipped schedule */
 								        bool
 								        decideBreakPoint (Res& res)
 								          {
-												Scheduler-test: reduce impact of scale adjustments on breakpoint-search

the `BreakingPoint` tool conducts a binary search to find the ''stress factor''
where a given schedule breaks. There are some known deviations related to the
measurement setup, which unfortunately impact the interpretation of the
''stress factor'' scale. Earlier, an attempt was made, to watch those factors
empirically and work a ''form factor'' into the ''effective stress factor''
used to guide this measurement method.

Closer investigation with extended and elastic load patters now revealed
a strong tendency of the Scheduler to scale down the work resources when not
fully loaded. This may be mistaken by the above mentioned adjustments as a sign
of a structural limiation of the possible concurrency.

Thus, as a mitigation, those adjustments are now only performed at the
beginning of the measurement series, and also only when the stress factor
is high (implying that the scheduler is actually overloaded and thus has
no incentive for scaling down).

These observations indicate that the »Breaking Point« search must be taken
with a grain of salt: Especially when the test load does ''not'' contain
a high degree of inter dependencies, it will be ''stretched elastically''
rather than outright broken. And under such circumstances, this measurement
actually gauges the Scheduler's ability to comply to an established
load and computation goal.

											
										
										
											2024-04-17 21:04:03 +02:00
+								            return res.percentOff > 0.99
 								               or( res.percentOff > CONF::TRIGGER_FAIL
-												Scheduler-test: define criterion for breaking point

This statistical criterion defines when to count observed Scheduler performance
as loosing control. The test is comprised of three observations, which
all must be confirmed:

- an individual run counts as accidentally failed when the execution slips
  away by more than 2ms with respect to the defined overall schedule.
  When more than 55% of all observed runs are considered as failed,
  the first condition is met
- moreover, the observed standard derivation must also surpass the
  same limit of > 2ms, which indicates that the Scheduling mechanism
  is under substantial strain (on average, the slip is ~ 200µs)
- the third condition is that the ''averaged delta'' has surpassed
  4ms, which is 2 times the basic failure indicator.

These conditions are based on watching the Scheduler in operation;
typically all three conditions slip away by large margin after a
very narrow yet critical increase in the stress level.

Using three conditions together should improve robustness; often
the problems to keep up with the schedule build up over some parameter
range, yet the actual decision should be based on complete loss of control.

											
										
										
											2024-01-03 21:02:23 +01:00
+								               and res.stdDev     > CONF::TRIGGER_SDEV
-												Scheduler-test: reduce impact of scale adjustments on breakpoint-search

the `BreakingPoint` tool conducts a binary search to find the ''stress factor''
where a given schedule breaks. There are some known deviations related to the
measurement setup, which unfortunately impact the interpretation of the
''stress factor'' scale. Earlier, an attempt was made, to watch those factors
empirically and work a ''form factor'' into the ''effective stress factor''
used to guide this measurement method.

Closer investigation with extended and elastic load patters now revealed
a strong tendency of the Scheduler to scale down the work resources when not
fully loaded. This may be mistaken by the above mentioned adjustments as a sign
of a structural limiation of the possible concurrency.

Thus, as a mitigation, those adjustments are now only performed at the
beginning of the measurement series, and also only when the stress factor
is high (implying that the scheduler is actually overloaded and thus has
no incentive for scaling down).

These observations indicate that the »Breaking Point« search must be taken
with a grain of salt: Especially when the test load does ''not'' contain
a high degree of inter dependencies, it will be ''stretched elastically''
rather than outright broken. And under such circumstances, this measurement
actually gauges the Scheduler's ability to comply to an established
load and computation goal.

											
										
										
											2024-04-17 21:04:03 +02:00
+								               and res.avgDelta   > CONF::TRIGGER_DELTA);
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								          }
 								        /**
 								         * invoke a binary search to produce a sequence of test series
 								         * with the goal to narrow down the stressFact where the Schedule slips away.
 								         */
 								        template<class FUN>
 								        Res
-												Scheduler-test: simplify binary search implementation

While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.

Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.

											
										
										
											2024-01-04 01:32:11 +01:00
+								        conductBinarySearch (FUN&& runTestCase, vector<Res> const& results)
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								          {
-												Scheduler-test: extract search algo into lib

											
										
										
											2024-01-04 02:03:05 +01:00
+								            double breakPoint = lib::binarySearch_upper (forward<FUN> (runTestCase)
 								                                                        , 0.0, CONF::UPPER_STRESS
 								                                                        , CONF::EPSILON);
-												Scheduler-test: binary search over continuous domain

- textbook implementation
- capture results from visited points
- average results form the last three points to damp statistic fluctuations

											
										
										
											2024-01-03 22:48:49 +01:00
+								            uint s = results.size();
 								            ENSURE (s >= 2);
 								            Res res;
 								            auto& [sf,pf,sdev,avgD,avgT,expT] = res;
 								            // average data over the last three steps investigated for smoothing
 								            uint points = min (results.size(), 3u);
 								            for (uint i=results.size()-points; i<results.size(); ++i)
 								              {
-												Scheduler-test: simplify binary search implementation

While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.

Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.

											
										
										
											2024-01-04 01:32:11 +01:00
+								                Res const& resx = results[i];
-												Scheduler-test: binary search over continuous domain

- textbook implementation
- capture results from visited points
- average results form the last three points to damp statistic fluctuations

											
										
										
											2024-01-03 22:48:49 +01:00
+								                pf   += resx.percentOff;
 								                sdev += resx.stdDev;
 								                avgD += resx.avgDelta;
 								                avgT += resx.avgTime;
 								                expT += resx.expTime;
 								              }
 								            pf   /= points;
 								            sdev /= points;
 								            avgD /= points;
 								            avgT /= points;
 								            expT /= points;
-												Scheduler-test: simplify binary search implementation

While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.

Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.

											
										
										
											2024-01-04 01:32:11 +01:00
+								            sf = breakPoint;
-												Scheduler-test: binary search over continuous domain

- textbook implementation
- capture results from visited points
- average results form the last three points to damp statistic fluctuations

											
										
										
											2024-01-03 22:48:49 +01:00
+								            return res;
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								          }
-												Scheduler-test: reduce impact of scale adjustments on breakpoint-search

the `BreakingPoint` tool conducts a binary search to find the ''stress factor''
where a given schedule breaks. There are some known deviations related to the
measurement setup, which unfortunately impact the interpretation of the
''stress factor'' scale. Earlier, an attempt was made, to watch those factors
empirically and work a ''form factor'' into the ''effective stress factor''
used to guide this measurement method.

Closer investigation with extended and elastic load patters now revealed
a strong tendency of the Scheduler to scale down the work resources when not
fully loaded. This may be mistaken by the above mentioned adjustments as a sign
of a structural limiation of the possible concurrency.

Thus, as a mitigation, those adjustments are now only performed at the
beginning of the measurement series, and also only when the stress factor
is high (implying that the scheduler is actually overloaded and thus has
no incentive for scaling down).

These observations indicate that the »Breaking Point« search must be taken
with a grain of salt: Especially when the test load does ''not'' contain
a high degree of inter dependencies, it will be ''stretched elastically''
rather than outright broken. And under such circumstances, this measurement
actually gauges the Scheduler's ability to comply to an established
load and computation goal.

											
										
										
											2024-04-17 21:04:03 +02:00
+								        /** adaptive scale correction based on observed behaviour */
 								        double adjustmentFac{1.0};
 								        size_t gaugeProbes = 3 * CONF::REPETITIONS;
 								        /**
 								         * Attempt to factor out some observable properties, which are considered circumstantial
 								         * and not a direct result of scheduling overheads. The artificial computational load is
 								         * known to drift towards larger values than calibrated; moreover the actual concurrency
 								         * achieved can deviate from the heuristic assumptions built into the testing schedule.
 								         * The latter is problematic to some degree however, since the Scheduler is bound to
 								         * scale down capacity when idle. To strike a reasonable balance, this adjustment of
 								         * the measurement scale is done only initially, and when the stress factor is high
 								         * and some degree of pressure on the scheduler can thus be assumed.
 								         */
 								        void
 								        maybeAdaptScaleEmpirically (TestSetup& testSetup, double stressFac)
 								          {
 								            if (not gaugeProbes) return;
 								            double gain = util::limited (0, pow(stressFac, 9), 1);
 								            if (gain < 0.2) return;
 								            //
-												Scheduler-test: investigate extended loads with different patterns

The behaviour seems consistent and the schedule breaks at the expected point.
At first sight, concurrency seems slightly to low; detailed investigation
however shows that this is due to the structure of the load graph,
and in fact the run time comes close to optimal values.

											
										
										
											2024-04-17 23:33:24 +02:00
+								            double formFac = testSetup.determineEmpiricFormFactor (CONF::CONCURRENCY);
-												Scheduler-test: reduce impact of scale adjustments on breakpoint-search

the `BreakingPoint` tool conducts a binary search to find the ''stress factor''
where a given schedule breaks. There are some known deviations related to the
measurement setup, which unfortunately impact the interpretation of the
''stress factor'' scale. Earlier, an attempt was made, to watch those factors
empirically and work a ''form factor'' into the ''effective stress factor''
used to guide this measurement method.

Closer investigation with extended and elastic load patters now revealed
a strong tendency of the Scheduler to scale down the work resources when not
fully loaded. This may be mistaken by the above mentioned adjustments as a sign
of a structural limiation of the possible concurrency.

Thus, as a mitigation, those adjustments are now only performed at the
beginning of the measurement series, and also only when the stress factor
is high (implying that the scheduler is actually overloaded and thus has
no incentive for scaling down).

These observations indicate that the »Breaking Point« search must be taken
with a grain of salt: Especially when the test load does ''not'' contain
a high degree of inter dependencies, it will be ''stretched elastically''
rather than outright broken. And under such circumstances, this measurement
actually gauges the Scheduler's ability to comply to an established
load and computation goal.

											
										
										
											2024-04-17 21:04:03 +02:00
+								            adjustmentFac = gain*formFac + (1-gain)*adjustmentFac;
 								            testSetup.withAdaptedSchedule(stressFac, CONF::CONCURRENCY, adjustmentFac);
 								            --gaugeProbes;
 								          }
-												Scheduler-test: define criterion for breaking point

This statistical criterion defines when to count observed Scheduler performance
as loosing control. The test is comprised of three observations, which
all must be confirmed:

- an individual run counts as accidentally failed when the execution slips
  away by more than 2ms with respect to the defined overall schedule.
  When more than 55% of all observed runs are considered as failed,
  the first condition is met
- moreover, the observed standard derivation must also surpass the
  same limit of > 2ms, which indicates that the Scheduling mechanism
  is under substantial strain (on average, the slip is ~ 200µs)
- the third condition is that the ''averaged delta'' has surpassed
  4ms, which is 2 times the basic failure indicator.

These conditions are based on watching the Scheduler in operation;
typically all three conditions slip away by large margin after a
very narrow yet critical increase in the stress level.

Using three conditions together should improve robustness; often
the problems to keep up with the schedule build up over some parameter
range, yet the actual decision should be based on complete loss of control.

											
										
										
											2024-01-03 21:02:23 +01:00
-												Scheduler-test: simplify binary search implementation

While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.

Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.

											
										
										
											2024-01-04 01:32:11 +01:00
+								        _Fmt fmtRun_ {"....·%-2d:  Δ=%4.1f        t=%4.1f  %s %s"};                          //      i % Δ  % t % t>avg?  % fail?
-												Scheduler-test: more precise accounting for expected concurrency

It turns out to be not correct using all the divergence in concurrency
as a form factor, since it is quite common that not all cores can be active
at every level, given the structural constraints as dictated by the load graph.

On the other hand, if the empirical work (non wait-time) concurrency
systematically differs from the simple model used for establishing the schedule,
then this should indeed be considered a form factor and deduced from
the effective stress factor, since it is not a reserve available for speed-up

The solution entertained here is to derive an effective compounded sum
of weights from the calculation used to build the schedule. This compounded
weight sum is typically lower than the plain sum of all node weights, which
is precisely due to the theoretical amount of expense reduction assumed
in the schedule generation. So this gives us a handle at the theoretically
expected expense and through the plain weight sum, we may draw conclusion
about the effective concurrency expected in this schedule.

Taking only this part as base for the empirical deviations yields search results
very close to stressFactor ~1 -- implying that the test setup now
observes what was intended to observe...

											
										
										
											2024-02-19 17:36:46 +01:00
+								        _Fmt fmtStep_{ "%4.2f|  : ∅Δ=%4.1f±%-4.2f  ∅t=%4.1f  %s %%%-3.0f -- expect:%4.1fms"};// stress % ∅Δ % σ % ∅t % fail % pecentOff % t-expect
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								        _Fmt fmtResSDv_{"%9s= %5.2f ±%4.2f%s"};
-												Scheduler-test: define criterion for breaking point

This statistical criterion defines when to count observed Scheduler performance
as loosing control. The test is comprised of three observations, which
all must be confirmed:

- an individual run counts as accidentally failed when the execution slips
  away by more than 2ms with respect to the defined overall schedule.
  When more than 55% of all observed runs are considered as failed,
  the first condition is met
- moreover, the observed standard derivation must also surpass the
  same limit of > 2ms, which indicates that the Scheduling mechanism
  is under substantial strain (on average, the slip is ~ 200µs)
- the third condition is that the ''averaged delta'' has surpassed
  4ms, which is 2 times the basic failure indicator.

These conditions are based on watching the Scheduler in operation;
typically all three conditions slip away by large margin after a
very narrow yet critical increase in the stress level.

Using three conditions together should improve robustness; often
the problems to keep up with the schedule build up over some parameter
range, yet the actual decision should be based on complete loss of control.

											
										
										
											2024-01-03 21:02:23 +01:00
+								        _Fmt fmtResVal_{"%9s: %5.2f%s"};
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
 								        void
 								        showRun(uint i, double delta, double t, bool over, bool fail)
 								          {
 								            if (CONF::showRuns)
 								              cout << fmtRun_ % i % delta % t % (over? "+":"-") % (fail? "●":"○")
 								                   << endl;
 								          }
 								        void
 								        showStep(Res& res)
 								          {
 								            if (CONF::showStep)
-												Scheduler-test: simplify binary search implementation

While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.

Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.

											
										
										
											2024-01-04 01:32:11 +01:00
+								              cout << fmtStep_ % res.stressFac % res.avgDelta % res.stdDev % res.avgTime
 								                               % (decideBreakPoint(res)? "—◆—":"—◇—")
-												Scheduler-test: more precise accounting for expected concurrency

It turns out to be not correct using all the divergence in concurrency
as a form factor, since it is quite common that not all cores can be active
at every level, given the structural constraints as dictated by the load graph.

On the other hand, if the empirical work (non wait-time) concurrency
systematically differs from the simple model used for establishing the schedule,
then this should indeed be considered a form factor and deduced from
the effective stress factor, since it is not a reserve available for speed-up

The solution entertained here is to derive an effective compounded sum
of weights from the calculation used to build the schedule. This compounded
weight sum is typically lower than the plain sum of all node weights, which
is precisely due to the theoretical amount of expense reduction assumed
in the schedule generation. So this gives us a handle at the theoretically
expected expense and through the plain weight sum, we may draw conclusion
about the effective concurrency expected in this schedule.

Taking only this part as base for the empirical deviations yields search results
very close to stressFactor ~1 -- implying that the test setup now
observes what was intended to observe...

											
										
										
											2024-02-19 17:36:46 +01:00
+								                               % (100*res.percentOff) % res.expTime
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								                   << endl;
 								          }
 								        void
 								        showRes(Res& res)
 								          {
 								            if (CONF::showRes)
 								              {
 								                cout << fmtResVal_ % "stresFac" % res.stressFac             % ""  <<endl;
 								                cout << fmtResVal_ %     "fail" %(res.percentOff * 100)     % '%' <<endl;
-												Scheduler-test: binary search working

- Result found in typically 6-7 steps;
- running 20 instead of 30 samples seems sufficient

Breaking point in this example at stress-Factor 0.47 with run-time 39ms

											
										
										
											2024-01-03 23:53:44 +01:00
+								                cout << fmtResSDv_ %    "delta" % res.avgDelta % res.stdDev % "ms"<<endl;
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								                cout << fmtResVal_ %  "runTime" % res.avgTime               % "ms"<<endl;
 								                cout << fmtResVal_ % "expected" % res.expTime               % "ms"<<endl;
 								              }
 								          }
 								        void
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								        showRef(TestSetup& testSetup)
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								          {
 								            if (CONF::showRef)
 								              cout << fmtResVal_ % "refTime"
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								                                 % (testSetup.calcRuntimeReference() /1000)
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								                                 % "ms" << endl;
 								          }
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								      public:
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								        /**
 								         * Launch a measurement sequence to determine the »breaking point«
 								         * for the configured test load and parametrisation of the Scheduler.
 								         * @return a tuple `[stress-factor, ∅delta, ∅run-time]`
 								         */
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								        auto
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								        perform()
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								          {
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								            TRANSIENTLY(work::Config::COMPUTATION_CAPACITY) = CONF::CONCURRENCY;
 								            TestLoad testLoad = CONF::testLoad().buildTopology();
 								            TestSetup testSetup = CONF::testSetup (testLoad);
-												Scheduler-test: simplify binary search implementation

While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.

Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.

											
										
										
											2024-01-04 01:32:11 +01:00
+								            vector<Res> observations;
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								            auto performEvaluation = [&](double stressFac)
 								                                        {
 								                                          configureTest (testSetup, stressFac);
-												Scheduler-test: incorporate statistics computation

adapt the code written yesterday explicitly for the test case
into the new framework for performing a stress-test run.
Notable difference: times converted to millisecond immediately

											
										
										
											2024-01-03 16:27:07 +01:00
+								                                          auto res = runProbes (testSetup, stressFac);
-												Scheduler-test: simplify binary search implementation

While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.

Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.

											
										
										
											2024-01-04 01:32:11 +01:00
+								                                          observations.push_back (res);
 								                                          return decideBreakPoint(res);
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								                                        };
-												Scheduler-test: simplify binary search implementation

While the idea with capturing observation values is nice,
it definitively does not belong into a library impl of the
search algorithm, because this is usage specific and grossly
complicates the invocation.

Rather, observation data can be captured by side-effect
from the probe-λ holding the actual measurement run.

											
										
										
											2024-01-04 01:32:11 +01:00
+								            Res res = conductBinarySearch (move(performEvaluation), observations);
-												Scheduler-test: binary search working

- Result found in typically 6-7 steps;
- running 20 instead of 30 samples seems sufficient

Breaking point in this example at stress-Factor 0.47 with run-time 39ms

											
										
										
											2024-01-03 23:53:44 +01:00
+								            showRes (res);
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								            showRef (testSetup);
-												Scheduler-test: build configurable measurement setup

Elaborate the draft to include all the elements used directly in the test case thus far;
the goal is to introduce some structuring and leave room for flexible confguration,
while implementing the actual binary search as library function over Lambdas.

My expectation is to write a series of individual test instances with varying parameters;
while it seems possible to add further performance test variations into that scheme later on.

											
										
										
											2024-01-02 23:51:47 +01:00
+								            return make_tuple (res.stressFac, res.avgDelta, res.avgTime);
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								          }
 								      };
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
-												Scheduler-test: rework `ParameterRange` tool for data visualisation

Rework the existing tool to capture the measurement series
into the newly integrated CSV-based data storage, allowing
to turn the results into a Gnuplot-visualisation.

											
										
										
											2024-04-04 00:44:11 +02:00
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
 								    /**************************************************//**
 								     * Specific test scheme to perform a Scheduler setup
 								     * over a given control parameter range to determine
 								     * correlations
 								     */
 								    template<class CONF>
 								    class ParameterRange
 								      : public CONF
 								      {
-												Scheduler-test: design problems impeding clean test-setup

Encountering ''just some design problems related to the test setup,''
which however turn out hard to overcome. Seems that, in my eagerness
to create a succinct and clear presentation of the test, I went into
danger territory, overstretching the abilities of the C++ language.

After working with a set of tools created step by step over an extended span of time,
''for me'' the machinations of this setup seem to be reduced to flipping a toggle
here and there, and I want to focus these active parts while laying out this test.
''This would require'' to create a system of nested scopes, while getting more and more
specific gradually, and moving to the individual case at question; notably any
clarification and definition within those inner focused contexts would have to be
picked up and linked in dynamically.

Yet the C++ language only allows to be ''either'' open and flexible towards
the actual types, or ''alternatively'' to select dynamically within a fixed
set of (virtual) methods, which then must be determined from the beginning.
It is not possible to tweak and adjust base definitions after the fact,
and it is not possible to fill in constant definitions dynamically
with late binding to some specific implementation type provided only
at current scope.

Seems that I am running against that brick wall over and over again,
piling up complexities driven by an desire for succinctness and clarity.

Now attempting to resolve this quite frustrating situation...
- fix the actual type of the TestChainLoad by a typedef in test context
- avoid the definitions (and thus the danger of shadowing)
  and use one `testSetup()` method to place all local adjustments.

											
										
										
											2024-04-06 23:21:10 +02:00
+								        using TestLoad  = typename CONF::TestLoad;
 								        using TestSetup = typename TestLoad::ScheduleCtx;
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								        // Type binding for data evaluation
 								        using Param = typename CONF::Param;
 								        using Table = typename CONF::Table;
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
-												Scheduler-test: simple implementation of range coverage

- fill the range randomly with probe points
- use the node count as independent parameter
- measurement method *works as intended*
- results indeed show a linear relationship

Results are ''interesting'' however, since the (par,time) points
seem to be arranged into two lines, implying that about half
of the runs were somehow ''degraded'' and performed way slower.

											
										
										
											2024-02-24 04:17:05 +01:00
+								        void
-												Scheduler-test: reorganise test-setup in Stress-test-rig

With the addition of a second tool `bench::ParameterRange`,
the setup of the test-context for measurement became confusing,
since the original scheme was mostly oriented towards the
''breaking point search.''

On close investigation, I discovered several redundancies, and
moreover, it seems questionable to generate an ''adapted-schedule''
for the Parameter-Range measurement method, which aims at overloading
the scheduler and watch the time to resolve such a load peak.

The solution entertained here is to move most of the schedule-ctx setup
into the base implementation, which is typically just inherited by the
actual testcase setup. This allows to leave the decision whether to build
an adapted schedule to the actual tool. So `bench::BreakingPoint` can
always setup the adapted schedule with a specific stress-factor,
while `bench::ParameterRange` by default does nothing in this
respect, and thus the `ScheduleCtx` will provide a default schedule
with the configured level-duration (and the default for this is
lowered to 200µs here).

In a similar vein, calculation of result data points from the raw measurement
is moved over into the actual test setup, thereby gaining flexibility.

											
										
										
											2024-04-05 22:50:06 +02:00
+								        runTest (Param param, Table& data)
-												Scheduler-test: simple implementation of range coverage

- fill the range randomly with probe points
- use the node count as independent parameter
- measurement method *works as intended*
- results indeed show a linear relationship

Results are ''interesting'' however, since the (par,time) points
seem to be arranged into two lines, implying that about half
of the runs were somehow ''degraded'' and performed way slower.

											
										
										
											2024-02-24 04:17:05 +01:00
+								          {
 								            TestLoad testLoad = CONF::testLoad(param).buildTopology();
 								            TestSetup testSetup = CONF::testSetup (testLoad)
-												Scheduler-test: reorganise test-setup in Stress-test-rig

With the addition of a second tool `bench::ParameterRange`,
the setup of the test-context for measurement became confusing,
since the original scheme was mostly oriented towards the
''breaking point search.''

On close investigation, I discovered several redundancies, and
moreover, it seems questionable to generate an ''adapted-schedule''
for the Parameter-Range measurement method, which aims at overloading
the scheduler and watch the time to resolve such a load peak.

The solution entertained here is to move most of the schedule-ctx setup
into the base implementation, which is typically just inherited by the
actual testcase setup. This allows to leave the decision whether to build
an adapted schedule to the actual tool. So `bench::BreakingPoint` can
always setup the adapted schedule with a specific stress-factor,
while `bench::ParameterRange` by default does nothing in this
respect, and thus the `ScheduleCtx` will provide a default schedule
with the configured level-duration (and the default for this is
lowered to 200µs here).

In a similar vein, calculation of result data points from the raw measurement
is moved over into the actual test setup, thereby gaining flexibility.

											
										
										
											2024-04-05 22:50:06 +02:00
+								                                       .withInstrumentation();    // Note: by default Schedule with CONF::LEVEL_STEP
-												Scheduler-test: rework `ParameterRange` tool for data visualisation

Rework the existing tool to capture the measurement series
into the newly integrated CSV-based data storage, allowing
to turn the results into a Gnuplot-visualisation.

											
										
										
											2024-04-04 00:44:11 +02:00
+								            double millis = testSetup.launch_and_wait() / 1000;
-												Scheduler-test: attempt to find a viable Scheduler setup for this measurement

- better use a Test-Chain-Load without any dependencies
- schedule all at once
- employ instrumentation
- use the inner »overall time« as dependent result variable

The timing results now show an almost perfect linear dependency.
Also the inner overall time seems to omit the setup and tear-down time.
But other observed values (notably the avgConcurrency) do not line up

											
										
										
											2024-03-07 23:29:36 +01:00
+								            auto stat = testSetup.getInvocationStatistic();
-												Scheduler-test: reorganise test-setup in Stress-test-rig

With the addition of a second tool `bench::ParameterRange`,
the setup of the test-context for measurement became confusing,
since the original scheme was mostly oriented towards the
''breaking point search.''

On close investigation, I discovered several redundancies, and
moreover, it seems questionable to generate an ''adapted-schedule''
for the Parameter-Range measurement method, which aims at overloading
the scheduler and watch the time to resolve such a load peak.

The solution entertained here is to move most of the schedule-ctx setup
into the base implementation, which is typically just inherited by the
actual testcase setup. This allows to leave the decision whether to build
an adapted schedule to the actual tool. So `bench::BreakingPoint` can
always setup the adapted schedule with a specific stress-factor,
while `bench::ParameterRange` by default does nothing in this
respect, and thus the `ScheduleCtx` will provide a default schedule
with the configured level-duration (and the default for this is
lowered to 200µs here).

In a similar vein, calculation of result data points from the raw measurement
is moved over into the actual test setup, thereby gaining flexibility.

											
										
										
											2024-04-05 22:50:06 +02:00
+								            CONF::collectResult (data, param, millis, stat);
-												Scheduler-test: simple implementation of range coverage

- fill the range randomly with probe points
- use the node count as independent parameter
- measurement method *works as intended*
- results indeed show a linear relationship

Results are ''interesting'' however, since the (par,time) points
seem to be arranged into two lines, implying that about half
of the runs were somehow ''degraded'' and performed way slower.

											
										
										
											2024-02-24 04:17:05 +01:00
+								          }
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								      public:
 								        /**
 								         * Launch a measurement sequence running the Scheduler with a
 								         * varying parameter value to investigate (x,y) correlations.
 								         * @return ////TODO a tuple `[stress-factor, ∅delta, ∅run-time]`
 								         */
-												Scheduler-test: rework `ParameterRange` tool for data visualisation

Rework the existing tool to capture the measurement series
into the newly integrated CSV-based data storage, allowing
to turn the results into a Gnuplot-visualisation.

											
										
										
											2024-04-04 00:44:11 +02:00
+								        Table
 								        perform (Param lower, Param upper)
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								          {
 								            TRANSIENTLY(work::Config::COMPUTATION_CAPACITY) = CONF::CONCURRENCY;
-												Scheduler-test: rework `ParameterRange` tool for data visualisation

Rework the existing tool to capture the measurement series
into the newly integrated CSV-based data storage, allowing
to turn the results into a Gnuplot-visualisation.

											
										
										
											2024-04-04 00:44:11 +02:00
+								            Param dist = upper - lower;
-												Scheduler-test: simple implementation of range coverage

- fill the range randomly with probe points
- use the node count as independent parameter
- measurement method *works as intended*
- results indeed show a linear relationship

Results are ''interesting'' however, since the (par,time) points
seem to be arranged into two lines, implying that about half
of the runs were somehow ''degraded'' and performed way slower.

											
										
										
											2024-02-24 04:17:05 +01:00
+								            uint cnt = CONF::REPETITIONS;
-												Scheduler-test: rework `ParameterRange` tool for data visualisation

Rework the existing tool to capture the measurement series
into the newly integrated CSV-based data storage, allowing
to turn the results into a Gnuplot-visualisation.

											
										
										
											2024-04-04 00:44:11 +02:00
+								            vector<Param> points;
 								            points.reserve (cnt);
 								            Param minP{upper}, maxP{lower};
-												Scheduler-test: simple implementation of range coverage

- fill the range randomly with probe points
- use the node count as independent parameter
- measurement method *works as intended*
- results indeed show a linear relationship

Results are ''interesting'' however, since the (par,time) points
seem to be arranged into two lines, implying that about half
of the runs were somehow ''degraded'' and performed way slower.

											
										
										
											2024-02-24 04:17:05 +01:00
+								            for (uint i=0; i<cnt; ++i)
 								              {
-												Library: replace usages of `rand()` in the whole code base

 * most usages are drop-in replacements
 * occasionally the other convenience functions can be used
 * verify call-paths from core code to identify usages
 * ensure reseeding for all tests involving some kind of randomness...

__Note__: some tests were not yet converted,
since their usage of randomness is actually not thread-safe.
This problem existed previously, since also `rand()` is not thread safe,
albeit in most cases it is possible to ignore this problem, as
''garbled internal state'' is also somehow „random“

											
										
										
											2024-11-13 02:23:23 +01:00
+								                auto random = lib::defaultGen.uni(); // [0 .. 1.0[
-												Scheduler-test: rework `ParameterRange` tool for data visualisation

Rework the existing tool to capture the measurement series
into the newly integrated CSV-based data storage, allowing
to turn the results into a Gnuplot-visualisation.

											
										
										
											2024-04-04 00:44:11 +02:00
+								                Param pos = lower + Param(floor (random*dist + 0.5));
 								                points.push_back(pos);
-												Scheduler-test: simple implementation of range coverage

- fill the range randomly with probe points
- use the node count as independent parameter
- measurement method *works as intended*
- results indeed show a linear relationship

Results are ''interesting'' however, since the (par,time) points
seem to be arranged into two lines, implying that about half
of the runs were somehow ''degraded'' and performed way slower.

											
										
										
											2024-02-24 04:17:05 +01:00
+								                minP = min (pos, minP);
 								                maxP = max (pos, maxP);
 								              }
 								            // ensure the bounds participate in test
-												Scheduler-test: rework `ParameterRange` tool for data visualisation

Rework the existing tool to capture the measurement series
into the newly integrated CSV-based data storage, allowing
to turn the results into a Gnuplot-visualisation.

											
										
										
											2024-04-04 00:44:11 +02:00
+								            if (maxP < upper) points[cnt-2] = upper;
 								            if (minP > lower) points[cnt-1] = lower;
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
-												Scheduler-test: rework `ParameterRange` tool for data visualisation

Rework the existing tool to capture the measurement series
into the newly integrated CSV-based data storage, allowing
to turn the results into a Gnuplot-visualisation.

											
										
										
											2024-04-04 00:44:11 +02:00
+								            Table results;
-												Scheduler-test: reorganise test-setup in Stress-test-rig

With the addition of a second tool `bench::ParameterRange`,
the setup of the test-context for measurement became confusing,
since the original scheme was mostly oriented towards the
''breaking point search.''

On close investigation, I discovered several redundancies, and
moreover, it seems questionable to generate an ''adapted-schedule''
for the Parameter-Range measurement method, which aims at overloading
the scheduler and watch the time to resolve such a load peak.

The solution entertained here is to move most of the schedule-ctx setup
into the base implementation, which is typically just inherited by the
actual testcase setup. This allows to leave the decision whether to build
an adapted schedule to the actual tool. So `bench::BreakingPoint` can
always setup the adapted schedule with a specific stress-factor,
while `bench::ParameterRange` by default does nothing in this
respect, and thus the `ScheduleCtx` will provide a default schedule
with the configured level-duration (and the default for this is
lowered to 200µs here).

In a similar vein, calculation of result data points from the raw measurement
is moved over into the actual test setup, thereby gaining flexibility.

											
										
										
											2024-04-05 22:50:06 +02:00
+								            for (Param point : points)
 								              runTest (point, results);
-												Scheduler-test: simple implementation of range coverage

- fill the range randomly with probe points
- use the node count as independent parameter
- measurement method *works as intended*
- results indeed show a linear relationship

Results are ''interesting'' however, since the (par,time) points
seem to be arranged into two lines, implying that about half
of the runs were somehow ''degraded'' and performed way slower.

											
										
										
											2024-02-24 04:17:05 +01:00
+								            return results;
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								          }
 								      };
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
 								    /* ====== Preconfigured ParamRange-Evaluations ====== */
 								    using lib::stat::Column;
 								    using lib::stat::DataTable;
-												Scheduler-test: adjust contention mitigation as result of testing

Investigate the behaviour over a wider range of job loads,
job count and worker pool sizes. Seemingly the processing
can not fully utilise the available worker pool capacity.

By inspection of trace-dumps, one impeding mechanism could
be identified: the »stickiness« of the contention mitigation.
Whenever a worker encounters repeated contention, it steps up
and adds more and more wait cycles to remove pressure from the
schedule coordination. As such this is fine and prevents further
degradation of performance by repeated atomic synchronisation.
However, this throttling was kept up needlessly after further
successful work-pulls. Since job times of several milliseconds
can be expected on average in media processing, such a long
retention would spread a performance degradation over a duration
of several frames. Thus, the scheme for step-down was changed
to decrease the throttling by a power series rather than just
documenting the level.

											
										
										
											2024-04-10 03:29:44 +02:00
+								    using lib::stat::DataSpan;
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								    using lib::stat::CSVData;
 								    using IncidenceStat = lib::IncidenceCount::Statistic;
-												Scheduler-test: calculate linear model as test result

Use the statistic functions imported recently from Yoshimi-test
to compute a linear regression model as immediate test result.

Combining several measurement series, this allows to draw conclusions
about some generic traits and limitations of the scheduler.

											
										
										
											2024-04-09 01:51:03 +02:00
+								    /**
 								     * Calculate a linear regression model for two table columns
 								     * @return a tuple `(socket,gradient,Vector(predicted),Vector(deltas),correlation,maxDelta,stdev)`
 								     */
 								    template<typename F, typename G>
-												Scheduler-test: adjust contention mitigation as result of testing

Investigate the behaviour over a wider range of job loads,
job count and worker pool sizes. Seemingly the processing
can not fully utilise the available worker pool capacity.

By inspection of trace-dumps, one impeding mechanism could
be identified: the »stickiness« of the contention mitigation.
Whenever a worker encounters repeated contention, it steps up
and adds more and more wait cycles to remove pressure from the
schedule coordination. As such this is fine and prevents further
degradation of performance by repeated atomic synchronisation.
However, this throttling was kept up needlessly after further
successful work-pulls. Since job times of several milliseconds
can be expected on average in media processing, such a long
retention would spread a performance degradation over a duration
of several frames. Thus, the scheme for step-down was changed
to decrease the throttling by a power series rather than just
documenting the level.

											
										
										
											2024-04-10 03:29:44 +02:00
+								    inline auto
-												Scheduler-test: calculate linear model as test result

Use the statistic functions imported recently from Yoshimi-test
to compute a linear regression model as immediate test result.

Combining several measurement series, this allows to draw conclusions
about some generic traits and limitations of the scheduler.

											
										
										
											2024-04-09 01:51:03 +02:00
+								    linearRegression (Column<F> const& x, Column<G> const& y)
 								    {
 								      lib::stat::RegressionData points;
 								      size_t cnt = min (x.data.size(), y.data.size());
 								      points.reserve (cnt);
 								      for (size_t i=0; i < cnt; ++i)
 								        points.emplace_back (x.data[i], y.data[i]);
 								      return lib::stat::computeLinearRegression (points);
 								    }
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								    /**
 								     * Mix-in for setup of a #ParameterRange evaluation to watch
 								     * the processing of a single load peak, using the number of
 								     * added job as independent parameter.
 								     * @remark inject this definition (by inheritance) into the
 								     *   Setup, which should then also define a TestChainLoad
 								     *   graph with an overall size controlled by the #Param
 								     * @see SchedulerStress_test#watch_expenseFunction()
 								     */
 								    struct LoadPeak_ParamRange_Evaluation
 								      {
 								        using Param = size_t;
 								        struct DataRow
 								          {
 								            Column<Param>  param   {"load size"};    // independent variable / control parameter
 								            Column<double> time    {"result time"};
 								            Column<double> conc    {"concurrency"};
 								            Column<double> jobtime {"avg jobtime"};
-												Scheduler-test: fine-tuning of result presentation (Gnuplot)

Visual tweaks specific to this measurement setup
 * include a numeric representation of the regression line
 * include descriptive axis labels
 * improve the key names to clarify their meaning
 * heuristic code for the x-ticks
Package these customisations as a helper function into the measurement tool

											
										
										
											2024-04-08 18:44:46 +02:00
+								            Column<double> impeded {"avg impeded"};
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
 								            auto allColumns()
 								            { return std::tie(param
 								                             ,time
 								                             ,conc
 								                             ,jobtime
-												Scheduler-test: fine-tuning of result presentation (Gnuplot)

Visual tweaks specific to this measurement setup
 * include a numeric representation of the regression line
 * include descriptive axis labels
 * improve the key names to clarify their meaning
 * heuristic code for the x-ticks
Package these customisations as a helper function into the measurement tool

											
										
										
											2024-04-08 18:44:46 +02:00
+								                             ,impeded
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								                             );
 								            }
 								          };
 								        using Table = DataTable<DataRow>;
 								        void
 								        collectResult(Table& data, Param param, double millis, bench::IncidenceStat const& stat)
 								          {
 								            (void)millis;
 								            data.newRow();
 								            data.param = param;
 								            data.time  = stat.coveredTime / 1000;
 								            data.conc  = stat.avgConcurrency;
-												Scheduler-test: fine-tuning of result presentation (Gnuplot)

Visual tweaks specific to this measurement setup
 * include a numeric representation of the regression line
 * include descriptive axis labels
 * improve the key names to clarify their meaning
 * heuristic code for the x-ticks
Package these customisations as a helper function into the measurement tool

											
										
										
											2024-04-08 18:44:46 +02:00
+								            data.jobtime = stat.activeTime / stat.activationCnt;
 								            data.impeded = (stat.timeAtConc(1) + stat.timeAtConc(0))/stat.activationCnt;
 								          }
-												Scheduler-test: adjust contention mitigation as result of testing

Investigate the behaviour over a wider range of job loads,
job count and worker pool sizes. Seemingly the processing
can not fully utilise the available worker pool capacity.

By inspection of trace-dumps, one impeding mechanism could
be identified: the »stickiness« of the contention mitigation.
Whenever a worker encounters repeated contention, it steps up
and adds more and more wait cycles to remove pressure from the
schedule coordination. As such this is fine and prevents further
degradation of performance by repeated atomic synchronisation.
However, this throttling was kept up needlessly after further
successful work-pulls. Since job times of several milliseconds
can be expected on average in media processing, such a long
retention would spread a performance degradation over a duration
of several frames. Thus, the scheme for step-down was changed
to decrease the throttling by a power series rather than just
documenting the level.

											
										
										
											2024-04-10 03:29:44 +02:00
+								        static double
 								        avgConcurrency (Table const& results)
 								          {
 								            return lib::stat::average (DataSpan<double> (results.conc.data));
 								          }
-												Scheduler-test: fine-tuning of result presentation (Gnuplot)

Visual tweaks specific to this measurement setup
 * include a numeric representation of the regression line
 * include descriptive axis labels
 * improve the key names to clarify their meaning
 * heuristic code for the x-ticks
Package these customisations as a helper function into the measurement tool

											
										
										
											2024-04-08 18:44:46 +02:00
+								        static string
 								        renderGnuplot (Table const& results)
 								          {
 								            using namespace lib::gnuplot_gen;
 								            string csv = results.renderCSV();
 								            Param maxParam = * std::max_element (results.param.data.begin(), results.param.data.end());
 								            Param xtics = maxParam > 500? 50
 								                        : maxParam > 200? 20
 								                        : maxParam > 100? 10
 								                        :                  5;
 								            return scatterRegression(
 								                    ParamRecord().set (KEY_CSVData,  csv)
 								                                 .set (KEY_TermSize, "600,600")
 								                                 .set (KEY_Xtics,    int64_t(xtics))
 								                                 .set (KEY_Xlabel,  "load size ⟶ number of jobs")
 								                                 .set (KEY_Ylabel,  "active time ⟶ ms")
 								                                 .set (KEY_Y2label, "concurrent threads ⟶")
 								                                 .set (KEY_Y3label, "avg job time ⟶ µs")
 								                                );
-												Scheduler-test: settle definition of specific test setup and data

After a lot of further tinkering, seemingly arriving at a
somewhat satisfactory solution for the layout and arrangement of
test definitions and especially the table for measurement series.

While the complete setup remains fragile indeed, and complexity is more
hidden than reduced — the pragmatic compromise established yesterday
at least allows to reduce the amount of boilerplate in the test or
measurement setup to make the actual specifics stand out clearly.

----

As an aside, the usage of the `DataFile` type imported from Yoshimi-test
recently was re-shaped more towards a generic handling of tabular data with
CSV storage option; thus renaming the type now into `DataTable`.
Persistent storage is now just one option, while another usage pattern
compounds observation data into table rows, which are then directly
rendered into a CSV string, e.g. for visualisation as Gnuplot graph.

											
										
										
											2024-04-07 23:52:56 +02:00
+								          }
 								      };
-												Scheduler-test: consider using a complementary measurement method

With the latest improvements, the »breaking point search« works as expected
and yields meaningful data; however — it seems to be well suited rather
for specific setups, which involve an extended graph with massive dependencies,
because only such a setup produces a clearly defined ''breaking point.''

Thus I'm considering to complement this research by another measurement setup
to establish a linear regression model of the Scheduler expense.

To allow integration of this different setup into the existing stress-test-rig,
some rearrangements of the builder notation are necessary; especially we need
to pass the type name of the actual tool, and it seems indicated to
reorder the source code to provide the config base class `StressRig`
at the top, followed by a long (and very technical) implementation
namespace.

											
										
										
											2024-02-23 03:04:24 +01:00
+								    //
 								  }// namespace bench
 								}}}// namespace vault::gear::test
-												Scheduler-test: draft a structure to formalise these investigations

- the goal is to run a binary search
- the search condition should be factored out
- thus some kind of framework or DSL is required,
  to separate the technicalities of the measurement
  from the specifics of the actual test case.

											
										
										
											2024-01-02 21:46:44 +01:00
+								#endif /*VAULT_GEAR_TEST_STRESS_TEST_RIG_H*/