From c934e7f079f65d30a1f04868c93564a092393ae8 Mon Sep 17 00:00:00 2001 From: Ichthyostega Date: Wed, 17 Apr 2024 21:04:03 +0200 Subject: [PATCH] Scheduler-test: reduce impact of scale adjustments on breakpoint-search MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit the `BreakingPoint` tool conducts a binary search to find the ''stress factor'' where a given schedule breaks. There are some known deviations related to the measurement setup, which unfortunately impact the interpretation of the ''stress factor'' scale. Earlier, an attempt was made, to watch those factors empirically and work a ''form factor'' into the ''effective stress factor'' used to guide this measurement method. Closer investigation with extended and elastic load patters now revealed a strong tendency of the Scheduler to scale down the work resources when not fully loaded. This may be mistaken by the above mentioned adjustments as a sign of a structural limiation of the possible concurrency. Thus, as a mitigation, those adjustments are now only performed at the beginning of the measurement series, and also only when the stress factor is high (implying that the scheduler is actually overloaded and thus has no incentive for scaling down). These observations indicate that the »Breaking Point« search must be taken with a grain of salt: Especially when the test load does ''not'' contain a high degree of inter dependencies, it will be ''stretched elastically'' rather than outright broken. And under such circumstances, this measurement actually gauges the Scheduler's ability to comply to an established load and computation goal. --- .../{Topo-10 => Topo-10.dot} | 0 .../2024-04-08.Scheduler-LoadTest/Topo-20.dot | 475 ++++++++ .../2024-04-08.Scheduler-LoadTest/Topo-20.svg | 1022 +++++++++++++++++ .../2024-04-08.Scheduler-LoadTest/index.txt | 13 +- src/vault/gear/scheduler.hpp | 2 +- tests/vault/gear/scheduler-stress-test.cpp | 6 +- tests/vault/gear/stress-test-rig.hpp | 43 +- tests/vault/gear/test-chain-load.hpp | 43 +- wiki/renderengine.html | 12 +- wiki/thinkPad.ichthyo.mm | 234 ++-- 10 files changed, 1731 insertions(+), 119 deletions(-) rename doc/devel/dump/2024-04-08.Scheduler-LoadTest/{Topo-10 => Topo-10.dot} (100%) create mode 100644 doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-20.dot create mode 100644 doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-20.svg diff --git a/doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-10 b/doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-10.dot similarity index 100% rename from doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-10 rename to doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-10.dot diff --git a/doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-20.dot b/doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-20.dot new file mode 100644 index 000000000..c01e775f4 --- /dev/null +++ b/doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-20.dot @@ -0,0 +1,475 @@ +digraph { + // Nodes + N0[label="0: 37", shape=doublecircle ] + N1[label="1: 37", shape=circle ] + N2[label="2: 37", shape=circle ] + N3[label="3: 4F", shape=box, style=rounded ] + N4[label="4: 37", shape=circle ] + N5[label="5: 37", shape=circle ] + N6[label="6: 4F", shape=box, style=rounded ] + N7[label="7: 37", shape=circle ] + N8[label="8: 37", shape=circle ] + N9[label="9: 4F", shape=box, style=rounded ] + N10[label="10: 37", shape=circle ] + N11[label="11: 37", shape=circle ] + N12[label="12: 4F", shape=box, style=rounded ] + N13[label="13: 37", shape=circle ] + N14[label="14: 37", shape=circle ] + N15[label="15: 4F", shape=box, style=rounded ] + N16[label="16: 37", shape=circle ] + N17[label="17: 13" ] + N18[label="18: 37", shape=circle ] + N19[label="19: 37", shape=circle ] + N20[label="20: 4F", shape=box, style=rounded ] + N21[label="21: 37", shape=circle ] + N22[label="22: 37", shape=circle ] + N23[label="23: 4F", shape=box, style=rounded ] + N24[label="24: 37", shape=circle ] + N25[label="25: 61", shape=box, style=rounded ] + N26[label="26: 37", shape=circle ] + N27[label="27: 37", shape=circle ] + N28[label="28: 4F", shape=box, style=rounded ] + N29[label="29: 37", shape=circle ] + N30[label="30: 37", shape=circle ] + N31[label="31: 4F", shape=box, style=rounded ] + N32[label="32: 37", shape=circle ] + N33[label="33: 40" ] + N34[label="34: 37", shape=circle ] + N35[label="35: 37", shape=circle ] + N36[label="36: 4F", shape=box, style=rounded ] + N37[label="37: 37", shape=circle ] + N38[label="38: 37", shape=circle ] + N39[label="39: 4F", shape=box, style=rounded ] + N40[label="40: 37", shape=circle ] + N41[label="41: 3A" ] + N42[label="42: 37", shape=circle ] + N43[label="43: 37", shape=circle ] + N44[label="44: 4F", shape=box, style=rounded ] + N45[label="45: 37", shape=circle ] + N46[label="46: 37", shape=circle ] + N47[label="47: 4F", shape=box, style=rounded ] + N48[label="48: 37", shape=circle ] + N49[label="49: F9", shape=box, style=rounded ] + N50[label="50: 37", shape=circle ] + N51[label="51: 37", shape=circle ] + N52[label="52: 4F", shape=box, style=rounded ] + N53[label="53: 37", shape=circle ] + N54[label="54: 37", shape=circle ] + N55[label="55: 4F", shape=box, style=rounded ] + N56[label="56: 37", shape=circle ] + N57[label="57: 40" ] + N58[label="58: 37", shape=circle ] + N59[label="59: 37", shape=circle ] + N60[label="60: 4F", shape=box, style=rounded ] + N61[label="61: 37", shape=circle ] + N62[label="62: 37", shape=circle ] + N63[label="63: 4F", shape=box, style=rounded ] + N64[label="64: 37", shape=circle ] + N65[label="65: 3A" ] + N66[label="66: 37", shape=circle ] + N67[label="67: 37", shape=circle ] + N68[label="68: 4F", shape=box, style=rounded ] + N69[label="69: 37", shape=circle ] + N70[label="70: 37", shape=circle ] + N71[label="71: 4F", shape=box, style=rounded ] + N72[label="72: 37", shape=circle ] + N73[label="73: F9", shape=box, style=rounded ] + N74[label="74: 37", shape=circle ] + N75[label="75: 37", shape=circle ] + N76[label="76: 4F", shape=box, style=rounded ] + N77[label="77: 37", shape=circle ] + N78[label="78: 37", shape=circle ] + N79[label="79: 4F", shape=box, style=rounded ] + N80[label="80: 37", shape=circle ] + N81[label="81: 40" ] + N82[label="82: 37", shape=circle ] + N83[label="83: 37", shape=circle ] + N84[label="84: 4F", shape=box, style=rounded ] + N85[label="85: 37", shape=circle ] + N86[label="86: 37", shape=circle ] + N87[label="87: 4F", shape=box, style=rounded ] + N88[label="88: 37", shape=circle ] + N89[label="89: 3A" ] + N90[label="90: 37", shape=circle ] + N91[label="91: 37", shape=circle ] + N92[label="92: 4F", shape=box, style=rounded ] + N93[label="93: 37", shape=circle ] + N94[label="94: 37", shape=circle ] + N95[label="95: 4F", shape=box, style=rounded ] + N96[label="96: 37", shape=circle ] + N97[label="97: F9", shape=box, style=rounded ] + N98[label="98: 37", shape=circle ] + N99[label="99: 37", shape=circle ] + N100[label="100: 4F", shape=box, style=rounded ] + N101[label="101: 37", shape=circle ] + N102[label="102: 37", shape=circle ] + N103[label="103: 4F", shape=box, style=rounded ] + N104[label="104: 37", shape=circle ] + N105[label="105: 40" ] + N106[label="106: 37", shape=circle ] + N107[label="107: 37", shape=circle ] + N108[label="108: 4F", shape=box, style=rounded ] + N109[label="109: 37", shape=circle ] + N110[label="110: 37", shape=circle ] + N111[label="111: 4F", shape=box, style=rounded ] + N112[label="112: 37", shape=circle ] + N113[label="113: 3A" ] + N114[label="114: 37", shape=circle ] + N115[label="115: 37", shape=circle ] + N116[label="116: 4F", shape=box, style=rounded ] + N117[label="117: 37", shape=circle ] + N118[label="118: 37", shape=circle ] + N119[label="119: 4F", shape=box, style=rounded ] + N120[label="120: 37", shape=circle ] + N121[label="121: F9", shape=box, style=rounded ] + N122[label="122: 37", shape=circle ] + N123[label="123: 37", shape=circle ] + N124[label="124: 4F", shape=box, style=rounded ] + N125[label="125: 37", shape=circle ] + N126[label="126: 37", shape=circle ] + N127[label="127: 4F", shape=box, style=rounded ] + N128[label="128: 37", shape=circle ] + N129[label="129: 40" ] + N130[label="130: 37", shape=circle ] + N131[label="131: 37", shape=circle ] + N132[label="132: 4F", shape=box, style=rounded ] + N133[label="133: 37", shape=circle ] + N134[label="134: 37", shape=circle ] + N135[label="135: 4F", shape=box, style=rounded ] + N136[label="136: 37", shape=circle ] + N137[label="137: 3A" ] + N138[label="138: 37", shape=circle ] + N139[label="139: 37", shape=circle ] + N140[label="140: 4F", shape=box, style=rounded ] + N141[label="141: 37", shape=circle ] + N142[label="142: 37", shape=circle ] + N143[label="143: 4F", shape=box, style=rounded ] + N144[label="144: 37", shape=circle ] + N145[label="145: F9", shape=box, style=rounded ] + N146[label="146: 37", shape=circle ] + N147[label="147: 37", shape=circle ] + N148[label="148: 4F", shape=box, style=rounded ] + N149[label="149: 37", shape=circle ] + N150[label="150: 37", shape=circle ] + N151[label="151: 4F", shape=box, style=rounded ] + N152[label="152: 37", shape=circle ] + N153[label="153: 40" ] + N154[label="154: 37", shape=circle ] + N155[label="155: 37", shape=circle ] + N156[label="156: 4F", shape=box, style=rounded ] + N157[label="157: 37", shape=circle ] + N158[label="158: 37", shape=circle ] + N159[label="159: 4F", shape=box, style=rounded ] + N160[label="160: 37", shape=circle ] + N161[label="161: 3A" ] + N162[label="162: 37", shape=circle ] + N163[label="163: 37", shape=circle ] + N164[label="164: 4F", shape=box, style=rounded ] + N165[label="165: 37", shape=circle ] + N166[label="166: 37", shape=circle ] + N167[label="167: 4F", shape=box, style=rounded ] + N168[label="168: 37", shape=circle ] + N169[label="169: F9", shape=box, style=rounded ] + N170[label="170: 37", shape=circle ] + N171[label="171: 37", shape=circle ] + N172[label="172: 4F", shape=box, style=rounded ] + N173[label="173: 37", shape=circle ] + N174[label="174: 37", shape=circle ] + N175[label="175: 4F", shape=box, style=rounded ] + N176[label="176: 37", shape=circle ] + N177[label="177: 40" ] + N178[label="178: 37", shape=circle ] + N179[label="179: 37", shape=circle ] + N180[label="180: 4F", shape=box, style=rounded ] + N181[label="181: 37", shape=circle ] + N182[label="182: 37", shape=circle ] + N183[label="183: 4F", shape=box, style=rounded ] + N184[label="184: 37", shape=circle ] + N185[label="185: 3A" ] + N186[label="186: 37", shape=circle ] + N187[label="187: 37", shape=circle ] + N188[label="188: 4F", shape=box, style=rounded ] + N189[label="189: 37", shape=circle ] + N190[label="190: 37", shape=circle ] + N191[label="191: 4F", shape=box, style=rounded ] + N192[label="192: 37", shape=circle ] + N193[label="193: F9", shape=box, style=rounded ] + N194[label="194: 37", shape=circle ] + N195[label="195: 37", shape=circle ] + N196[label="196: 4F", shape=box, style=rounded ] + N197[label="197: 37", shape=circle ] + N198[label="198: 37", shape=circle ] + N199[label="199: 4F", shape=box, style=rounded ] + N200[label="200: 37", shape=circle ] + N201[label="201: 40" ] + N202[label="202: 37", shape=circle ] + N203[label="203: 37", shape=circle ] + N204[label="204: 4F", shape=box, style=rounded ] + N205[label="205: 37", shape=circle ] + N206[label="206: 37", shape=circle ] + N207[label="207: 4F", shape=box, style=rounded ] + N208[label="208: 37", shape=circle ] + N209[label="209: 3A" ] + N210[label="210: 37", shape=circle ] + N211[label="211: 37", shape=circle ] + N212[label="212: 4F", shape=box, style=rounded ] + N213[label="213: 37", shape=circle ] + N214[label="214: 37", shape=circle ] + N215[label="215: 4F", shape=box, style=rounded ] + N216[label="216: 37", shape=circle ] + N217[label="217: F9", shape=box, style=rounded ] + N218[label="218: 37", shape=circle ] + N219[label="219: 37", shape=circle ] + N220[label="220: 4F", shape=box, style=rounded ] + N221[label="221: 37", shape=circle ] + N222[label="222: 37", shape=circle ] + N223[label="223: 4F", shape=box, style=rounded ] + N224[label="224: 37", shape=circle ] + N225[label="225: 40" ] + N226[label="226: 37", shape=circle ] + N227[label="227: 37", shape=circle ] + N228[label="228: 4F", shape=box, style=rounded ] + N229[label="229: 37", shape=circle ] + N230[label="230: 37", shape=circle ] + N231[label="231: 4F", shape=box, style=rounded ] + N232[label="232: 37", shape=circle ] + N233[label="233: 3A" ] + N234[label="234: 37", shape=circle ] + N235[label="235: 37", shape=circle ] + N236[label="236: 4F", shape=box, style=rounded ] + N237[label="237: 37", shape=circle ] + N238[label="238: 37", shape=circle ] + N239[label="239: 4F", shape=box, style=rounded ] + N240[label="240: 37", shape=circle ] + N241[label="241: F9", shape=box, style=rounded ] + N242[label="242: 37", shape=circle ] + N243[label="243: 37", shape=circle ] + N244[label="244: 4F", shape=box, style=rounded ] + N245[label="245: 37", shape=circle ] + N246[label="246: 37", shape=circle ] + N247[label="247: 4F", shape=box, style=rounded ] + N248[label="248: 37", shape=circle ] + N249[label="249: 40" ] + N250[label="250: 37", shape=circle ] + N251[label="251: 37", shape=circle ] + N252[label="252: 4F", shape=box, style=rounded ] + N253[label="253: 37", shape=circle ] + N254[label="254: 37", shape=circle ] + N255[label="255: 52", shape=box, style=rounded ] + + // Layers + { /*0*/ rank=min N0 } + { /*1*/ rank=same N1 N2 N3 } + { /*2*/ rank=same N4 N5 N6 N7 N8 N9 } + { /*3*/ rank=same N10 N11 N12 N13 N14 N15 N16 N17 } + { /*4*/ rank=same N18 N19 N20 N21 N22 N23 N24 N25 } + { /*5*/ rank=same N26 N27 N28 N29 N30 N31 N32 N33 } + { /*6*/ rank=same N34 N35 N36 N37 N38 N39 N40 N41 } + { /*7*/ rank=same N42 N43 N44 N45 N46 N47 N48 N49 } + { /*8*/ rank=same N50 N51 N52 N53 N54 N55 N56 N57 } + { /*9*/ rank=same N58 N59 N60 N61 N62 N63 N64 N65 } + { /*10*/ rank=same N66 N67 N68 N69 N70 N71 N72 N73 } + { /*11*/ rank=same N74 N75 N76 N77 N78 N79 N80 N81 } + { /*12*/ rank=same N82 N83 N84 N85 N86 N87 N88 N89 } + { /*13*/ rank=same N90 N91 N92 N93 N94 N95 N96 N97 } + { /*14*/ rank=same N98 N99 N100 N101 N102 N103 N104 N105 } + { /*15*/ rank=same N106 N107 N108 N109 N110 N111 N112 N113 } + { /*16*/ rank=same N114 N115 N116 N117 N118 N119 N120 N121 } + { /*17*/ rank=same N122 N123 N124 N125 N126 N127 N128 N129 } + { /*18*/ rank=same N130 N131 N132 N133 N134 N135 N136 N137 } + { /*19*/ rank=same N138 N139 N140 N141 N142 N143 N144 N145 } + { /*20*/ rank=same N146 N147 N148 N149 N150 N151 N152 N153 } + { /*21*/ rank=same N154 N155 N156 N157 N158 N159 N160 N161 } + { /*22*/ rank=same N162 N163 N164 N165 N166 N167 N168 N169 } + { /*23*/ rank=same N170 N171 N172 N173 N174 N175 N176 N177 } + { /*24*/ rank=same N178 N179 N180 N181 N182 N183 N184 N185 } + { /*25*/ rank=same N186 N187 N188 N189 N190 N191 N192 N193 } + { /*26*/ rank=same N194 N195 N196 N197 N198 N199 N200 N201 } + { /*27*/ rank=same N202 N203 N204 N205 N206 N207 N208 N209 } + { /*28*/ rank=same N210 N211 N212 N213 N214 N215 N216 N217 } + { /*29*/ rank=same N218 N219 N220 N221 N222 N223 N224 N225 } + { /*30*/ rank=same N226 N227 N228 N229 N230 N231 N232 N233 } + { /*31*/ rank=same N234 N235 N236 N237 N238 N239 N240 N241 } + { /*32*/ rank=same N242 N243 N244 N245 N246 N247 N248 N249 } + { /*33*/ rank=same N250 N251 N252 N253 N254 N255 } + + // Topology + N0 -> N3 + N1 -> N6 + N2 -> N9 + N4 -> N12 + N5 -> N15 + N7 -> N17 + N8 -> N17 + N10 -> N20 + N11 -> N23 + N13 -> N25 + N14 -> N25 + N16 -> N25 + N17 -> N25 + N18 -> N28 + N19 -> N31 + N21 -> N33 + N22 -> N33 + N24 -> N33 + N26 -> N36 + N27 -> N39 + N29 -> N41 + N30 -> N41 + N32 -> N41 + N33 -> N41 + N34 -> N44 + N35 -> N47 + N37 -> N49 + N38 -> N49 + N40 -> N49 + N41 -> N49 + N42 -> N52 + N43 -> N55 + N45 -> N57 + N46 -> N57 + N48 -> N57 + N50 -> N60 + N51 -> N63 + N53 -> N65 + N54 -> N65 + N56 -> N65 + N57 -> N65 + N58 -> N68 + N59 -> N71 + N61 -> N73 + N62 -> N73 + N64 -> N73 + N65 -> N73 + N66 -> N76 + N67 -> N79 + N69 -> N81 + N70 -> N81 + N72 -> N81 + N74 -> N84 + N75 -> N87 + N77 -> N89 + N78 -> N89 + N80 -> N89 + N81 -> N89 + N82 -> N92 + N83 -> N95 + N85 -> N97 + N86 -> N97 + N88 -> N97 + N89 -> N97 + N90 -> N100 + N91 -> N103 + N93 -> N105 + N94 -> N105 + N96 -> N105 + N98 -> N108 + N99 -> N111 + N101 -> N113 + N102 -> N113 + N104 -> N113 + N105 -> N113 + N106 -> N116 + N107 -> N119 + N109 -> N121 + N110 -> N121 + N112 -> N121 + N113 -> N121 + N114 -> N124 + N115 -> N127 + N117 -> N129 + N118 -> N129 + N120 -> N129 + N122 -> N132 + N123 -> N135 + N125 -> N137 + N126 -> N137 + N128 -> N137 + N129 -> N137 + N130 -> N140 + N131 -> N143 + N133 -> N145 + N134 -> N145 + N136 -> N145 + N137 -> N145 + N138 -> N148 + N139 -> N151 + N141 -> N153 + N142 -> N153 + N144 -> N153 + N146 -> N156 + N147 -> N159 + N149 -> N161 + N150 -> N161 + N152 -> N161 + N153 -> N161 + N154 -> N164 + N155 -> N167 + N157 -> N169 + N158 -> N169 + N160 -> N169 + N161 -> N169 + N162 -> N172 + N163 -> N175 + N165 -> N177 + N166 -> N177 + N168 -> N177 + N170 -> N180 + N171 -> N183 + N173 -> N185 + N174 -> N185 + N176 -> N185 + N177 -> N185 + N178 -> N188 + N179 -> N191 + N181 -> N193 + N182 -> N193 + N184 -> N193 + N185 -> N193 + N186 -> N196 + N187 -> N199 + N189 -> N201 + N190 -> N201 + N192 -> N201 + N194 -> N204 + N195 -> N207 + N197 -> N209 + N198 -> N209 + N200 -> N209 + N201 -> N209 + N202 -> N212 + N203 -> N215 + N205 -> N217 + N206 -> N217 + N208 -> N217 + N209 -> N217 + N210 -> N220 + N211 -> N223 + N213 -> N225 + N214 -> N225 + N216 -> N225 + N218 -> N228 + N219 -> N231 + N221 -> N233 + N222 -> N233 + N224 -> N233 + N225 -> N233 + N226 -> N236 + N227 -> N239 + N229 -> N241 + N230 -> N241 + N232 -> N241 + N233 -> N241 + N234 -> N244 + N235 -> N247 + N237 -> N249 + N238 -> N249 + N240 -> N249 + N242 -> N252 + N243 -> N255 + N245 -> N255 + N246 -> N255 + N248 -> N255 + N249 -> N255 +} + diff --git a/doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-20.svg b/doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-20.svg new file mode 100644 index 000000000..93d0ac67a --- /dev/null +++ b/doc/devel/dump/2024-04-08.Scheduler-LoadTest/Topo-20.svg @@ -0,0 +1,1022 @@ + + + + + + +%3 + + + +N0 + + +0: 37 + + + +N3 + +3: 4F + + + +N0->N3 + + + + + +N1 + +1: 37 + + + +N6 + +6: 4F + + + +N1->N6 + + + + + +N2 + +2: 37 + + + +N9 + +9: 4F + + + +N2->N9 + + + + + +N4 + +4: 37 + + + +N12 + +12: 4F + + + +N4->N12 + + + + + +N5 + +5: 37 + + + +N15 + +15: 4F + + + +N5->N15 + + + + + +N7 + +7: 37 + + + +N17 + +17: 13 + + + +N7->N17 + + + + + +N8 + +8: 37 + + + +N8->N17 + + + + + +N10 + +10: 37 + + + +N20 + +20: 4F + + + +N10->N20 + + + + + +N11 + +11: 37 + + + +N23 + +23: 4F + + + +N11->N23 + + + + + +N13 + +13: 37 + + + +N25 + +25: 61 + + + +N13->N25 + + + + + +N14 + +14: 37 + + + +N14->N25 + + + + + +N16 + +16: 37 + + + +N16->N25 + + + + + +N17->N25 + + + + + +N18 + +18: 37 + + + +N28 + +28: 4F + + + +N18->N28 + + + + + +N19 + +19: 37 + + + +N31 + +31: 4F + + + +N19->N31 + + + + + +N21 + +21: 37 + + + +N33 + +33: 40 + + + +N21->N33 + + + + + +N22 + +22: 37 + + + +N22->N33 + + + + + +N24 + +24: 37 + + + +N24->N33 + + + + + +N26 + +26: 37 + + + +N36 + +36: 4F + + + +N26->N36 + + + + + +N27 + +27: 37 + + + +N39 + +39: 4F + + + +N27->N39 + + + + + +N29 + +29: 37 + + + +N41 + +41: 3A + + + +N29->N41 + + + + + +N30 + +30: 37 + + + +N30->N41 + + + + + +N32 + +32: 37 + + + +N32->N41 + + + + + +N33->N41 + + + + + +N34 + +34: 37 + + + +N44 + +44: 4F + + + +N34->N44 + + + + + +N35 + +35: 37 + + + +N47 + +47: 4F + + + +N35->N47 + + + + + +N37 + +37: 37 + + + +N49 + +49: F9 + + + +N37->N49 + + + + + +N38 + +38: 37 + + + +N38->N49 + + + + + +N40 + +40: 37 + + + +N40->N49 + + + + + +N41->N49 + + + + + +N42 + +42: 37 + + + +N52 + +52: 4F + + + +N42->N52 + + + + + +N43 + +43: 37 + + + +N55 + +55: 4F + + + +N43->N55 + + + + + +N45 + +45: 37 + + + +N57 + +57: 40 + + + +N45->N57 + + + + + +N46 + +46: 37 + + + +N46->N57 + + + + + +N48 + +48: 37 + + + +N48->N57 + + + + + +N50 + +50: 37 + + + +N60 + +60: 4F + + + +N50->N60 + + + + + +N51 + +51: 37 + + + +N63 + +63: 4F + + + +N51->N63 + + + + + +N53 + +53: 37 + + + +N65 + +65: 3A + + + +N53->N65 + + + + + +N54 + +54: 37 + + + +N54->N65 + + + + + +N56 + +56: 37 + + + +N56->N65 + + + + + +N57->N65 + + + + + +N58 + +58: 37 + + + +N68 + +68: 4F + + + +N58->N68 + + + + + +N59 + +59: 37 + + + +N71 + +71: 4F + + + +N59->N71 + + + + + +N61 + +61: 37 + + + +N73 + +73: F9 + + + +N61->N73 + + + + + +N62 + +62: 37 + + + +N62->N73 + + + + + +N64 + +64: 37 + + + +N64->N73 + + + + + +N65->N73 + + + + + +N66 + +66: 37 + + + +N76 + +76: 4F + + + +N66->N76 + + + + + +N67 + +67: 37 + + + +N79 + +79: 4F + + + +N67->N79 + + + + + +N69 + +69: 37 + + + +N81 + +81: 40 + + + +N69->N81 + + + + + +N70 + +70: 37 + + + +N70->N81 + + + + + +N72 + +72: 37 + + + +N72->N81 + + + + + +N74 + +74: 37 + + + +N84 + +84: 4F + + + +N74->N84 + + + + + +N75 + +75: 37 + + + +N87 + +87: 4F + + + +N75->N87 + + + + + +N77 + +77: 37 + + + +N89 + +89: 3A + + + +N77->N89 + + + + + +N78 + +78: 37 + + + +N78->N89 + + + + + +N80 + +80: 37 + + + +N80->N89 + + + + + +N81->N89 + + + + + +N82 + +82: 37 + + + +N92 + +92: 4F + + + +N82->N92 + + + + + +N83 + +83: 37 + + + +N95 + +95: 4F + + + +N83->N95 + + + + + +N85 + +85: 37 + + + +N97 + +97: F9 + + + +N85->N97 + + + + + +N86 + +86: 37 + + + +N86->N97 + + + + + +N88 + +88: 37 + + + +N88->N97 + + + + + +N89->N97 + + + + + +N90 + +90: 37 + + + +N98 + +98: 94 + + + +N90->N98 + + + + + +N91 + +91: 37 + + + +N91->N98 + + + + + +N93 + +93: 37 + + + +N93->N98 + + + + + +N94 + +94: 37 + + + +N94->N98 + + + + + +N96 + +96: 37 + + + +N96->N98 + + + + + diff --git a/doc/devel/dump/2024-04-08.Scheduler-LoadTest/index.txt b/doc/devel/dump/2024-04-08.Scheduler-LoadTest/index.txt index 893b78cc4..66f2e6e33 100644 --- a/doc/devel/dump/2024-04-08.Scheduler-LoadTest/index.txt +++ b/doc/devel/dump/2024-04-08.Scheduler-LoadTest/index.txt @@ -21,7 +21,7 @@ _Gnuplot_ script. Raw measurement data is stored as CSV (see 'csv.hpp'). Breaking Point Testing ---------------------- -Topo-10:: +Topo-10.dot:: Topology of the processing load used as typical example for _breaking a schedule._ This Graph with 64 nodes is generated by the pre-configured rules `configureShape_chain_loadBursts()`; it starts with a single linear, yet »bursts« @@ -134,6 +134,13 @@ reproducibly slower (at least on my machine). Below 90 jobs, also the spread of value is larger, as is the spread of time in _impeded state,_ which is defined as less than two workers processing active job content at a given time. - - + +Stationary Load +--------------- +The goal for this setup is to demonstrate stable processing over an extended period of time. + +Topo-20.dot:: + Topology used to emulate a realistic load. + It is comprised of small yet interleaved dependency tries, + filling each level up to the maximum capacity (limited here to 8 workers). diff --git a/src/vault/gear/scheduler.hpp b/src/vault/gear/scheduler.hpp index 2ed91070b..390f9ef58 100644 --- a/src/vault/gear/scheduler.hpp +++ b/src/vault/gear/scheduler.hpp @@ -140,7 +140,7 @@ namespace gear { const auto IDLE_WAIT = 20ms; ///< sleep-recheck cycle for workers deemed _idle_ const size_t DISMISS_CYCLES = 100; ///< number of wait cycles before an idle worker terminates completely Offset DUTY_CYCLE_PERIOD{FSecs(1,20)}; ///< period of the regular scheduler »tick« for state maintenance. - Offset DUTY_CYCLE_TOLERANCE{FSecs(1,10)}; ///< maximum slip tolerated on duty-cycle start before triggering Scheduler-emergency + Offset DUTY_CYCLE_TOLERANCE{FSecs(2,10)}; ///< maximum slip tolerated on duty-cycle start before triggering Scheduler-emergency Offset FUTURE_PLANNING_LIMIT{FSecs{20}}; ///< limit timespan of deadline into the future (~360 MiB max) } diff --git a/tests/vault/gear/scheduler-stress-test.cpp b/tests/vault/gear/scheduler-stress-test.cpp index e4513f0e6..eb5ef856b 100644 --- a/tests/vault/gear/scheduler-stress-test.cpp +++ b/tests/vault/gear/scheduler-stress-test.cpp @@ -484,8 +484,8 @@ cout << "time="< %5.3f => %5.3f"} + % gaugeProbes % gain % stressFac% formFac % afak%adjustmentFac % (stressFac/adjustmentFac) <avg? % fail? _Fmt fmtStep_{ "%4.2f| : ∅Δ=%4.1f±%-4.2f ∅t=%4.1f %s %%%-3.0f -- expect:%4.1fms"};// stress % ∅Δ % σ % ∅t % fail % pecentOff % t-expect diff --git a/tests/vault/gear/test-chain-load.hpp b/tests/vault/gear/test-chain-load.hpp index 1406b1d17..e5d9594e6 100644 --- a/tests/vault/gear/test-chain-load.hpp +++ b/tests/vault/gear/test-chain-load.hpp @@ -1971,30 +1971,27 @@ namespace test { return move(*this); } - ScheduleCtx&& - adaptEmpirically (double stressFac =1.0, uint concurrency=0) + double + determineEmpiricFormFactor (uint concurrency=0) { - if (watchInvocations_) - { - auto stat = watchInvocations_->evaluate(); - if (0 < stat.activationCnt) - {// looks like we have actual measurement data - ENSURE (0.0 < stat.avgConcurrency); - if (not concurrency) - concurrency = defaultConcurrency(); - double worktimeRatio = 1 - stat.timeAtConc(0) / stat.coveredTime; - double workConcurrency = stat.avgConcurrency / worktimeRatio; - double weightSum = chainLoad_.calcWeightSum(); - double expectedCompoundedWeight = chainLoad_.calcExpectedCompoundedWeight(concurrency); - double expectedConcurrency = weightSum / expectedCompoundedWeight; - double formFac = 1 / (workConcurrency / expectedConcurrency); - double expectedNodeTime = _uSec(compuLoad_->timeBase) * weightSum / chainLoad_.size(); - double realAvgNodeTime = stat.activeTime / stat.activationCnt; - formFac *= realAvgNodeTime / expectedNodeTime; - return withAdaptedSchedule (stressFac, concurrency, formFac); - } - } - return move(*this); + if (not watchInvocations_) return 1.0; + auto stat = watchInvocations_->evaluate(); + if (0 == stat.activationCnt) return 1.0; + // looks like we have actual measurement data... + ENSURE (0.0 < stat.avgConcurrency); + if (not concurrency) + concurrency = defaultConcurrency(); + double worktimeRatio = 1 - stat.timeAtConc(0) / stat.coveredTime; + double workConcurrency = stat.avgConcurrency / worktimeRatio; + double weightSum = chainLoad_.calcWeightSum(); + double expectedCompoundedWeight = chainLoad_.calcExpectedCompoundedWeight(concurrency); + double expectedConcurrency = weightSum / expectedCompoundedWeight; + double formFac = 1 / (workConcurrency / expectedConcurrency); + double expectedNodeTime = _uSec(compuLoad_->timeBase) * weightSum / chainLoad_.size(); + double realAvgNodeTime = stat.activeTime / stat.activationCnt; + formFac *= realAvgNodeTime / expectedNodeTime; +cout<<"∅conc:"< +
With the Scheduler testing effort [[#1344|https://issues.lumiera.org/ticket/1344]], several goals are pursued
 * by exposing the new scheduler implementation to excessive overload, its robustness can be assessed and defects can be spotted
 * with the help of a systematic, calibrated load, characteristic performance limits and breaking points can be established
@@ -7452,17 +7452,15 @@ The example presented to the right uses a similar setup (''8 workers''), but red
 
 As net effect, most of the load peaks are just handled by two workers, especially for larger load sizes; most of the available processing capacity remains unused for such short running payloads. Moreover, on average a significant amount of time is spent with partially blocked or impeded operation (&rarr; light green circles), since administrative work must be done non-concurrently. Depending on the perspective, this can be seen as a weakness -- or as the result of a deliberate trade-off made by the choice of active work-pulling and a passive Scheduler.
 
-The actual average in-job time (&rarr; dark green dots) is offset significantly here, and closer to 400µs -- which is also confirmed by the gradient of the linear model (0.4ms / 2 Threads ≙ 0.2ms/job). With shorter load sizes below 90 jobs, increased variance can be observerd, and measurements can no longer be subsumed under a single linear relation -- in fact, data points seem to be arranged into several groups with differing, yet mostly linear correlation, which also explains the negative socket value of the overall computed model; using only the data points with > 90 jobs would yield a model with slightly lower gradient but a positive offset of ~2ms.
+The actual average in-job time (&rarr; dark green dots) is offset significantly here, and closer to 400µs -- which is also confirmed by the gradient of the linear model (0.4ms / 2 Threads ≙ 0.2ms/job). With shorter load sizes below 90 jobs, increased variance can be observed, and measurements can no longer be subsumed under a single linear relation -- in fact, data points seem to be arranged into several groups with differing, yet mostly linear correlation, which also explains the negative socket value of the overall computed model; using only the data points with > 90 jobs would yield a model with slightly lower gradient but a positive offset of ~2ms.
 <html><div style="clear: both"/></html>
 Further measurement runs with other parameter values fit well in between the two extremes presented above. It can be concluded that this Scheduler implementation strongly favours larger job sizes starting with several milliseconds, when it comes to processing through a extended homogenous work load without much job interdependencies. Such larger lot sizes can be handled efficiently and close to expected limits, while very small jobs massively degrade the available performance. This can be attributed both to the choice of a randomised capacity distribution, and of pull processing without a central manager.
 
 !!!Stationary Processing
+The ultimate goal of //load- and stress testing// is to establish a notion of //full load// and to demonstrate adequate performance under //nominal load conditions.// Thus, after investigating overheads and the breaking point of a complex schedule, a measurement setup was established with load patterns deemed „realistic“ -- based on knowledge regarding some typical media processing demands encountered for video editing. Such a setup entails small dependency trees of jobs loaded with computation times around 5ms, interleaving several challenges up to the available level of concurrency. To determine viable parameter bounds, the //breaking-point// measurement method can be applied to an extended graph with this structure, to find out at which level the computations will use the system's abilities to such a degree that it is not able to move along faster any more.
+<html><img title="Load topology for stationary processing with 8 cores"  src="dump/2024-04-08.Scheduler-LoadTest/Topo-20.svg"  style="width:100%"/></html>
 
-lorem ipsum
-lorem ipsum nebbich
-ja luia sog I
-
-
+This research revealed again the tendency of the given Scheduler implementation to ''scale-down capacity unless overloaded''. Using the breaking-point method with such a fine grained and rather homogenous schedule can be problematic, since a search for the limit will inevitably involve running several probes //below the limit// -- which can cause the scheduler to reduce the number of workers used to a level that fills the available time. Depending on the path taken, the search can thus find a breaking point corresponding to a throttled capacity, while taking a search path through parameter ranges of overload will reveal the ability to follow a much tighter schedule. While this is an inherent problem of this measurement approach, it can be mitigated to some degree by limiting the empiric adaption of the parameter scale to the initial phase of the measurement, while ensuring this initial phase is started from overload territory.
 
diff --git a/wiki/thinkPad.ichthyo.mm b/wiki/thinkPad.ichthyo.mm index ba6746799..31912de05 100644 --- a/wiki/thinkPad.ichthyo.mm +++ b/wiki/thinkPad.ichthyo.mm @@ -87696,23 +87696,18 @@ Date:   Thu Apr 20 18:53:17 2023 +0200
- - - +

...ich hatte diesen Fix nur oberflächlich getestet, und dabei übersehen, daß eine Assertion ansprechen kann (sogar sehr wahrscheinlich einmal ansprechen wird, sobald der Reparatur-Mechanismus eine größtere Strecke zurücklegt). Das ist aber kein Bug im eigentlichen Reparatur-/reLink-Mechanismus; dieser funktioniert präzise, wie ich nochmals im einzelnen mit dem Debugger nachvollziehen konnte.

- -
+
- - - +

wenn eine Deadline überfahren wurde, ist ein weiterer Zugriff auf den Extend als _undefined behaviour_ zu betrachten. Das gilt auch für das AllocatorHandle, das man früher mal für eine bestimmte Deadline bekommen hat; dieses kann man sehr wohl weiterhin verwenden (solange die Deadline noch in der Zukunft liegt). Konkreter Fall: später noch eine Dependency anhängen. Wenn der Anker dieser Dependecy zu dem Zeitpunkt bereits ausgeführt oder invalidiert wurde, ist man selber schuld! @@ -116289,9 +116284,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

und damit nicht über eine Abweichung der Job-Zeiten in den Formfaktor eingehen @@ -116302,9 +116295,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

naja... die ist unterirdisch @@ -116314,9 +116305,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

zur Erinnerung: in einer Serie machen wir ja eine Art Konvergenz auf einen effektiven Form-Faktor hin. Mit den Ergebnissen eines Laufes wird für den nächsten Lauf nachjustiert; der von außen vorgegebene (nominelle) Streß-Faktor bleibt, aber die tatsächliche Dichte wird so optimiert, daß die dann effektiv diesem Faktor entspricht. Im Zuge dieser Anpassung wird anscheinend das Schedule jeweils etwas verdichtet, und die erreichte Concurency fällt (von etwas über 2 auf 1.6 zuletzt) @@ -116327,9 +116316,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

zumindest nach den inzwischen vorliegenden Beobachtungen aus dem Param-Range-Setup @@ -116347,10 +116334,138 @@ std::cout << tmpl.render({"what", "World"}) << s - - + + + + - + + + + + + + + + + +

+ damit geht jetzt die Auslastung einigermaßen hoch +

+ +
+ + + + + +

+ es ist und bleibt ein Kompromiß.... +

+

+ Ich versuche hier, eine sehr spezifische Meßmethode halbwegs generisch nutzbar zu machen, stecke dabei aber bereits gefährlich viel Vorannahmen über den Scheduler in den Meßprozeß +

+ + +
+ +
+ + + + + + + + + + + + + + + + + +

+ Kapazität wird normalerweise zufällig in eine aktive Zone verteilt; nur wenn wir hinter das Schedule fallen, werden alle Worker eingesetzt +

+ +
+
+ + + + +

+ ...wenn aufgrund einer vorhergehend beobachteten, geringen Parallelität das Schedule gespreizt ist, dann ist die Abarbeitung eines Layers vorzeitig fertig, und die Worker werden hinter den Startpunkt des nächsten Levels verteilt. Damit geht auch dort wieder die Kapazität nur langsam hoch, und nach wenigen Runden hat sich eine kleine Zahl an aktiven Workern herauskristalisiert. Die weitere Nachregulierung sorgt dann genau dafür, daß das Schedule so großzügig ist, daß diese wenigen Worker es schaffen. +

+ +
+
+ + + + + + + + + + +

+ dieses Setup beobachtet nicht den »breaking Point« +

+

+ — sondern das Erreichen eines Lastziels +

+ +
+ + + + + +

+ Ursprünglich wurde die »breaking Point«-Methode an einem komplexen Lastmuster entwickelt, welches bei Überlastung sehr deutlich degeneriert, insofern dann zentrale Vorraussetzungs-Knoten erst spät erreicht werden, und damit das gesamte Schedule sich drastisch verspätet. Sowas ist hier nicht gegeben. Vielmehr verlängert sich die Laufzeit einfach elastisch und proportional, wenn ein einmal vorgegebenes Schedule ohne Puffer genau erfüllt wird. Da sich die Suche von der Seite geringer Last nähert, wird dabei nie genügend Druck aufgebaut, um die Concurrency hochzutreiben. +

+ +
+
+ + + + +

+ locker ⟹ geringe Concurrency ⟹ dieser Punkt wird als strenger klassifiziert und das Schedule wird noch lockerer ⟹ wenn nun wir an den vorherigen Testpunkt zurückkehren würden, dann wäre dort möglicherweise das Testziel (= Schedule gebrochen) gar nicht mehr erfüllt +

+ +
+
+ + + + + + + + + + + + + + + + + + + + + + + + @@ -116360,13 +116475,13 @@ std::cout << tmpl.render({"what", "World"}) << s extent-family.hpp:356

-
+ + + - - - +

...damals allerdings aus einem ganz anderen Kontext, der inzwischen durch einen Umbau im Scheduler behoben ist/sein sollte. @@ -116377,9 +116492,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

»investigateWorkProcessing« @@ -116405,9 +116518,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

0000000609: PRECONDITION: extent-family.hpp:356: thread_9: access: (isValidPos (idx)) @@ -116497,9 +116608,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

linkToPredecessor() @@ -116520,9 +116629,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

tritt reproduzierbar auf ab Load=4ms @@ -116554,9 +116661,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

alloziert wurde auf dem predecessor-Term @@ -116599,16 +116704,13 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

...das ist ohnehin etwas kreativ

- -
+
@@ -116620,16 +116722,13 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

dieser hängt an der Deadline

- -
+
@@ -116643,16 +116742,13 @@ std::cout << tmpl.render({"what", "World"}) << s
- - - +

Der Allokator in sich ist robust; die Deadline beschreibt nur einen Nutzungs-Kontrakt; sie ist zwar im Gate des Blocks gespeichert, aber für den Allokator nur maßgeblich zur Suche des passenden Blocks. Das weitere Nutzungs-Muster muß auf einer höheren Ebene gewährleistet sein.

- -
+
@@ -116675,9 +116771,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

Und zwar unabhängig davon, ob die Kalibrierung mit kurzen oder langen Zeiten und single- oder multithreaded erfolgte. Die Abweichtung tritt nur im realen Last-Kontext auf, und ist (visuell. den Diagrammen nach zu urteilen) korreliert mit dem Grad an contention und irregularität im Ablauf. Tendentiell nimmt sie für längere Testläufe ab, konvergiert aber — auch für ganz große Lasten und sehr lange Läufe — typischerweise zu einem Offset von ~ +1ms @@ -116687,9 +116781,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

Und das ist schon aus rein-logischen Gründen so zu erwarten. Bewußt habe ich beim Aufstellen der Heuristig für das Test-Schedule auf jedwede optimale Anordnung der Rechenwege verzichtet (kein Box-stacking problem lösen!). Hinzu kommen die tatsächlichen Beschränkungen des Worker-Pools. Daraus ergibt sich eine charakteristische Abweichung zwischen einem theoretisch berechneten concurrency-speed-up (wie er in's Schedule eingerechnet ist) und der empirisch beobachteten durchschnittlichen concurrency. Das wird als Form-Faktor gedeutet. @@ -116703,9 +116795,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

zwischen Load-Size und Laufzeit zum kompletten Abarbeiten der erzeugten Lastspitze @@ -116717,9 +116807,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

Gradient sehr nah am zu erwartenden Wert @@ -116727,9 +116815,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

wenn man die empirisch beobachtete, effektive Concurrency und reale durchschnittliche Job-Zeit ansetzt @@ -116739,9 +116825,7 @@ std::cout << tmpl.render({"what", "World"}) << s - - - +

das bedeutet: der tatsächlich beobachtete Sockel hängt von der Länge der Job-Last und der Concurrency ab: Grundsätzlich muß man einmal die ganze Worker-Pool-size zu Beginn und am Ende aufschlagen — mit reduzierter Concurrency. Das ergibt sich bereits aus einer rein logischen Überlegung: »Voll-Last« kann erst konstituiert werden, wenn der erste Worker sich den zweiten Job holt. Analog beginnt der spin-down, wenn der erste worker idle fällt.