With the Scheduler testing effort [[#1344|https://issues.lumiera.org/ticket/1344]], several goals are pursued * by exposing the new scheduler implementation to excessive overload, its robustness can be assessed and defects can be spotted * with the help of a systematic, calibrated load, characteristic performance limits and breaking points can be established @@ -7459,9 +7459,24 @@ Further measurement runs with other parameter values fit well in between the two !!!Stationary Processing The ultimate goal of //load- and stress testing// is to establish a notion of //full load// and to demonstrate adequate performance under //nominal load conditions.// Thus, after investigating overheads and the breaking point of a complex schedule, a measurement setup was established with load patterns deemed „realistic“ -- based on knowledge regarding some typical media processing demands encountered for video editing. Such a setup entails small dependency trees of jobs loaded with computation times around 5ms, interleaving several challenges up to the available level of concurrency. To determine viable parameter bounds, the //breaking-point// measurement method can be applied to an extended graph with this structure, to find out at which level the computations will use the system's abilities to such a degree that it is not able to move along faster any more. <html><img title="Load topology for stationary processing with 8 cores" src="dump/2024-04-08.Scheduler-LoadTest/Topo-20.svg" style="width:100%"/></html> +This pattern can be processed +* with 8 workers in overall 192ms +* processing 256 Nodes, each loaded with 5ms +* since this graph has 35 levels, ∅ 5.6ms are required per level +* on average, concurrency reaches 5.4 (some nodes have to wait for dependencies) +This research revealed again the tendency of the given Scheduler implementation to ''scale-down capacity unless overloaded''. Using the breaking-point method with such a fine grained and rather homogenous schedule can thus be problematic, since a search for the limit will inevitably involve running several probes //below the limit// -- which can cause the scheduler to reduce the number of workers used to a level that fills the available time. Depending on the path taken, the search can thus find a breaking point corresponding to a throttled capacity, while taking a search path through parameter ranges of overload will reveal the ability to follow a much tighter schedule. While this is an inherent problem of this measurement approach, it can be mitigated to some degree by limiting the empiric adaption of the parameter scale to the initial phase of the measurement, while ensuring this initial phase is started from overload territory. +<html><img title="Load topology for stationary processing with 8 cores" src="dump/2024-04-08.Scheduler-LoadTest/Topo-21.svg" style="float:right; width: 80ex; margin-left:2ex"/></html> -This research revealed again the tendency of the given Scheduler implementation to ''scale-down capacity unless overloaded''. Using the breaking-point method with such a fine grained and rather homogenous schedule can be problematic, since a search for the limit will inevitably involve running several probes //below the limit// -- which can cause the scheduler to reduce the number of workers used to a level that fills the available time. Depending on the path taken, the search can thus find a breaking point corresponding to a throttled capacity, while taking a search path through parameter ranges of overload will reveal the ability to follow a much tighter schedule. While this is an inherent problem of this measurement approach, it can be mitigated to some degree by limiting the empiric adaption of the parameter scale to the initial phase of the measurement, while ensuring this initial phase is started from overload territory. -+For comparison, another, similar load pattern was used, which however is comprised entirely of interleaved 4-step linear chains. Each level can thus be handled with a maximum of 4 workers; actually there are 66 levels — with ∅3.88 Nodes/Level, due to the ramp-up and ramp-down towards the ends. +| !observation|!4 workers|!8 workers| +| breaking point|stress 1.01 |stress 0.8 | +| run time|340ms |234ms | +| ≙ per level|5.15ms |3.5 | +| avg.conc|3.5 |5.3 | +These observations indicate ''adequate handling without tangible overhead''. +When limited to 4 workers, the concurrency of ∅ 3.5 is only slightly below the average number of 3.88 Nodes/Level, and the time per level is near optimal, taking into account the fact (established by the overload measurements) that the actual job load tends to be slightly above the calibrated value of 5ms. The setup with 8 workers shows that further worker can be used to accommodate a tighter schedule, but then the symptoms for //breaking the schedule// are already reached at a nominally lower stress value, and only 5.3 of 8 workers will be active on average — this graph simply does not offer more work load locally, since 75% of all Nodes have a predecessor. + +<html><div style="clear: both"/></html>
The Scheduler //maintains a ''Work Force'' (a pool of workers) to perform the next [[render activities|RenderActivity]] continuously.//
diff --git a/wiki/thinkPad.ichthyo.mm b/wiki/thinkPad.ichthyo.mm
index 31912de05..8cb05aa00 100644
--- a/wiki/thinkPad.ichthyo.mm
+++ b/wiki/thinkPad.ichthyo.mm
@@ -116261,10 +116261,11 @@ std::cout << tmpl.render({"what", "World"}) << s
-
-
-
-
+
+
+
+
+
@@ -116759,6 +116760,107 @@ std::cout << tmpl.render({"what", "World"}) << s
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ...ich muß auch daran denken, daß nicht jede Maschine 8 Kerne hat. Insofern erscheint es sinnvoller, die Concurrency auf 4 zu beschränken, und dann zu sehen, wo man landet
+
+
+
+
+
+
+
+
+
+
+
+
+ der Graph-3 hat denn doch immer wieder mehrstufige Dependency-Tries — und ich kann daher nicht entscheiden, ob die beobachtete ∅concurrency = 5.4 auf dependency-wait zurückzuführen ist, oder tatsächlich durch Abregeln der Kapazität durch den Scheduler zustande kam. Möglicherweise auch beides im Wechselspiel.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ das ist ziemlich gut
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ...da die tatsächliche Last ehr bei 5.5ms liegt, können diese Durchschnittswerte nur erklärt werden, indem zeitweilig die zusätzlichen, freien Worker mithelfen; sie können aber nicht permanent voll ausgelastet werden, da der Graph eigentlich nicht so viel Last hergibt. Trotzdem ergibt sich noch ein auf 60% verdichtetes Schedule
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Das deutet alles darauf hin, daß das Scheduling weitgehend effizient ist
+
+
+
+
+
+