Scheduler-test: implement contention mitigation scheme

while my basic assessment is still that contention will not play a significant
role given the expected real world usage scenario — when testing with
tighter schedule and rather short jobs (500µs), some phases of massive contention
can be observed, leading to significant slow-down of the test.

The major problem seems to be that extended phases of contention will
effectively cause several workers to remain in an active spinning-loop for
multiple microseconds, while also permanently reading the atomic lock.

Thus an adaptive scheme is introduced: after some repeated contention events,
workers now throttle down by themselves, with polling delays increased
with exponential stepping up to 2ms. This turns out to be surprisingly
effective and completely removes any observed delays in the test setup.
This commit is contained in:
Fischlurch 2023-12-20 20:19:10 +01:00
parent 84b92c2ee3
commit 707fbc2933
9 changed files with 933 additions and 850 deletions

View file

@ -149,7 +149,7 @@ namespace gear {
enum Proc {PASS ///< pass on the activation down the chain
,SKIP ///< skip rest of the Activity chain for good
,WAIT ///< nothing to do; wait and re-check for work later
,KILL ///< obliterate the complete Activity-Term and all its dependencies
,KICK ///< back pressure; get out of the way but be back soon
,HALT ///< abandon this play / render process
};
@ -628,7 +628,6 @@ namespace gear {
* @return activity::Proc indication how to proceed with execution
* - activity::PASS continue with regular processing of `next`
* - activity::SKIP ignore the rest of the chain, look for new work
* - activity::KILL abort this complete Activity term (play change)
* - activity::HALT serious problem, stop the Scheduler
*/
template<class EXE>

View file

@ -250,7 +250,7 @@ namespace gear {
* - activity::PASS continue processing in regular operation
* - activity::WAIT nothing to do now, check back later
* - activity::HALT serious problem, cease processing
* - activity::SKIP to contend (spin) on GroomingToken
* - activity::KICK to contend (spin) on GroomingToken
* @note Attempts to acquire the GroomingToken for immediate
* processing, but not for just enqueuing planned tasks.
* Never drops the GroomingToken explicitly (unless when
@ -264,7 +264,7 @@ namespace gear {
,SchedulerInvocation& layer1
)
{
if (!event) return activity::SKIP;
if (!event) return activity::KICK;
Time now = executionCtx.getSchedTime();
sanityCheck (event, now);

View file

@ -670,6 +670,7 @@ namespace gear {
* @return how to proceed further with this worker
* - activity::PASS indicates to proceed or call back immediately
* - activity::SKIP causes to exit this round, yet call back again
* - activity::KICK signals contention (not emitted here)
* - activity::WAIT exits and places the worker into sleep mode
* @note as part of the regular work processing, this function may
* place the current thread into a short-term targeted sleep.

View file

@ -62,5 +62,38 @@ namespace gear {
return util::max (std::thread::hardware_concurrency()
, MINIMAL_CONCURRENCY);
}
/**
* This is part of the weak level of anti-contention measures.
* When a worker is kicked out from processing due to contention, the immediate
* reaction is to try again; if this happens repeatedly however, increasingly strong
* delays are interspersed. Within the _weak zone,_ a short spinning wait is performed,
* and then the thread requests a `yield()` from the OS scheduler; this cycle is repeated.
*/
void
work::performRandomisedSpin (size_t stepping, size_t randFact)
{
size_t degree = CONTEND_SOFT_FACTOR * (1+randFact) * stepping;
for (volatile size_t i=0; i<degree; ++i) {/*SPIN*/}
}
/**
* Calculate the delay time for a stronger anti-contention wait.
* If the contention lasts, the worker must back out temporarily to allow other workers
* to catch up. The delay time is stepped up quickly up to a saturation level, where the
* worker sleeps in the microseconds range this level is chosen as a balance between
* retaining some reactivity vs not incurring additional load. The stepping of the
* anti-contention measures is »sticky« to some degree, because it is not set to
* zero, once contention ends, but rather stepped down gradually.
*/
microseconds
work::steppedRandDelay (size_t stepping, size_t randFact)
{
REQUIRE (stepping > 0);
uint factor = 1u << (stepping-1);
return (CONTEND_WAIT + 10us*randFact) * factor;
}
}} // namespace vault::gear

View file

@ -35,7 +35,7 @@
** Some parameters and configuration is provided to the workers, notably a _work functor_
** invoked actively to »pull« work. The return value from this `doWork()`-function governs
** the worker's behaviour, either by prompting to pull further work, by sending a worker
** into a sleep cycle, or even asking the worker to terminate.
** into a sleep cycle, perform contention mitigation, or even asking the worker to terminate.
**
** @warning concurrency and synchronisation in the Scheduler (which maintains and operates
** WorkForce) is based on the assumption that _all maintenance and organisational
@ -80,7 +80,20 @@ namespace gear {
namespace {
const double MAX_OVERPROVISIONING = 3.0; ///< safety guard to prevent catastrophic overprovisioning
const double MAX_OVERPROVISIONING = 3.0; ///< safety guard to prevent catastrophic over-provisioning
const size_t CONTEND_SOFT_LIMIT = 3; ///< zone for soft anti-contention measures, counting continued contention events
const size_t CONTEND_STARK_LIMIT = CONTEND_SOFT_LIMIT + 5; ///< zone for stark measures, performing a sleep with exponential stepping
const size_t CONTEND_SATURATION = CONTEND_STARK_LIMIT + 4; ///< upper limit for the contention event count
const size_t CONTEND_SOFT_FACTOR = 100; ///< base counter for a spinning wait loop
const size_t CONTEND_RANDOM_STEP = 11; ///< stepping for randomisation of anti-contention measures
const microseconds CONTEND_WAIT = 100us; ///< base time unit for the exponentially stepped-up sleep delay in case of contention
inline size_t
thisThreadHash()
{
return std::hash<std::thread::id>{} (std::this_thread::get_id());
}
}
namespace work { ///< Details of WorkForce (worker pool) implementation
@ -101,13 +114,18 @@ namespace gear {
{
static size_t COMPUTATION_CAPACITY;
const milliseconds IDLE_WAIT = 20ms;
const size_t DISMISS_CYCLES = 100;
const milliseconds IDLE_WAIT = 20ms; ///< wait period when a worker _falls idle_
const size_t DISMISS_CYCLES = 100; ///< number of idle cycles after which the worker terminates
static size_t getDefaultComputationCapacity();
};
void performRandomisedSpin (size_t,size_t);
microseconds steppedRandDelay(size_t,size_t);
using Launch = lib::Thread::Launch;
/*************************************//**
@ -151,10 +169,15 @@ namespace gear {
activity::Proc res = CONF::doWork();
if (emergency.load (std::memory_order_relaxed))
break;
if (res == activity::KICK)
res = contentionWait();
else
if (kickLevel_)
--kickLevel_;
if (res == activity::WAIT)
res = idleWait();
else
idleCycles = 0;
idleCycles_ = 0;
if (res != activity::PASS)
break;
}
@ -172,8 +195,8 @@ namespace gear {
activity::Proc
idleWait()
{
++idleCycles;
if (idleCycles < CONF::DISMISS_CYCLES)
++idleCycles_;
if (idleCycles_ < CONF::DISMISS_CYCLES)
{
sleep_for (CONF::IDLE_WAIT);
return activity::PASS;
@ -181,7 +204,33 @@ namespace gear {
else // idle beyond threshold => terminate worker
return activity::HALT;
}
size_t idleCycles{0};
size_t idleCycles_{0};
activity::Proc
contentionWait()
{
if (not randFact_)
randFact_ = thisThreadHash() % CONTEND_RANDOM_STEP;
if (kickLevel_ <= CONTEND_SOFT_LIMIT)
for (uint i=0; i<kickLevel_; ++i)
{
performRandomisedSpin (kickLevel_,randFact_);
std::this_thread::yield();
}
else
{
auto stepping = util::min (kickLevel_, CONTEND_STARK_LIMIT) - CONTEND_SOFT_LIMIT;
std::this_thread::sleep_for (steppedRandDelay(stepping,randFact_));
}
if (kickLevel_ < CONTEND_SATURATION)
++kickLevel_;
return activity::PASS;
}
size_t kickLevel_{0};
size_t randFact_{0};
};
}//(End)namespace work

View file

@ -487,8 +487,8 @@ namespace test {
auto myself = std::this_thread::get_id();
CHECK (not sched.holdsGroomingToken (myself));
// no effect when empty / no Activity given
CHECK (activity::SKIP == sched.postDispatch (ActivationEvent(), detector.executionCtx, queue));
// no effect when empty / no Activity given (usually this can happen due to lock contention)
CHECK (activity::KICK == sched.postDispatch (ActivationEvent(), detector.executionCtx, queue));
CHECK (not sched.holdsGroomingToken (myself));
// Activity immediately dispatched when on time and GroomingToken can be acquired

View file

@ -120,6 +120,7 @@ namespace test {
verify_pullWork();
verify_workerHalt();
verify_workerSleep();
verify_workerRetard();
verify_workerDismiss();
verify_finalHook();
verify_detectError();
@ -228,6 +229,31 @@ namespace test {
/** @test a worker can be retarded and throttled in case of contention.
*/
void
verify_workerRetard()
{
atomic<uint> check{0};
{ // ▽▽▽▽ regular work-cycles without delay
WorkForce wof{setup ([&]{ ++check; return activity::PASS; })};
wof.incScale();
sleep_for(5ms);
}
uint cyclesPASS{check};
check = 0;
{ // ▽▽▽▽ signals »contention«
WorkForce wof{setup ([&]{ ++check; return activity::KICK; })};
wof.incScale();
sleep_for(5ms);
}
uint cyclesKICK{check};
CHECK (cyclesKICK < cyclesPASS);
CHECK (cyclesKICK < 50);
}
/** @test when a worker is sent into sleep-cycles for an extended time,
* the worker terminates itself.
*/

View file

@ -7302,7 +7302,7 @@ The primary scaling effects exploited to achieve this level of performance are t
The way other parts of the system are built, requires us to obtain a guaranteed knowledge of some job's termination. It is possible to obtain that knowledge with some limited delay, but it nees to be absoultely reliable (violations leading to segfault). The requirements stated above assume this can be achieved through //jobs with guaranteed execution.// Alternatively we could consider installing specific callbacks -- in this case the scheduler itself has to guarantee the invocation of these callbacks, even if the corresponding job fails or is never invoked. It doesn't seem there is any other option.
</pre>
</div>
<div title="SchedulerWorker" creator="Ichthyostega" modifier="Ichthyostega" created="202309041605" modified="202310280149" tags="Rendering operational spec draft" changecount="14">
<div title="SchedulerWorker" creator="Ichthyostega" modifier="Ichthyostega" created="202309041605" modified="202312201342" tags="Rendering operational spec draft" changecount="15">
<pre>The Scheduler //maintains a ''Work Force'' (a pool of workers) to perform the next [[render activities|RenderActivity]] continuously.//
Each worker runs in a dedicated thread; the Activities are arranged in a way to avoid blocking those worker threads
* IO operations are performed asynchronously {{red{planned as of 9/23}}}
@ -7317,6 +7317,7 @@ Moreover, the actual computation tasks, which can be parallelised, are at least
The behaviour of individual workers is guided solely by the return-value flag from the work-functor. Consequently, no shared flags and no direct synchronisation whatsoever is required //within the {{{WorkForce}}} implementation.// -- notwithstanding the fact that the implementation //within the work-functor// obviously needs some concurrency coordination to produce these return values, since the whole point is to invoke this functor concurrently. The following aspects of worker behaviour can be directed:
* returning {{{activity::PASS}}} instructs the worker to re-invoke the work-functor in the same thread immediately
* returning {{{activity::WAIT}}} requests an //idle-wait cycle//
* returning {{{activity::KICK}}} signals //contention,// causing a short back-off
* any other value, notably {{{activity::HALT}}} causes the worker to terminate
* likewise, an exception from anywhere within the worker shall terminate the worker and activate a »disaster mode«
Essentially this implies that //the workers themselves// (not some steering master) perform the management code leading to the aforementioned state directing return codes.

File diff suppressed because it is too large Load diff