After the individual tests, we calculate the averaged delta over the
whole test suite, to detect changes to the overall timings. As it turned out,
using the error propagation for the calculation of the averaged delta
yields the right tolerance band to ignore random fluctuations but
trigger alarm on real changes.
Moreover, add several further timing test cases
to verify the calibration via "platform model" works as intended
Since the platform calibration inevitably incurs some additional error band,
a linear regresssion over the time series of measurements can additionally be used
to spot ongoing systematic changes below this general error band, while
leveling out local statistical fluctuations.