- most notably the NOBUG logging flags have been renamed now - but for the configuration, I'll stick to "GUI" for now, since "Stage" would be bewildering for an occasional user - in a similar vein, most documentation continues to refer to the GUI
196 lines
8.4 KiB
Text
196 lines
8.4 KiB
Text
SchedulerRequirements
|
|
=====================
|
|
|
|
// please don't remove the //word: comments
|
|
|
|
[grid="all"]
|
|
`------------`-----------------------
|
|
*State* _Draft_
|
|
*Date* _Mi 09 Jan 2013 12:04:03 CET_
|
|
*Proposed by* Ichthyostega <prg@ichthyostega.de>
|
|
-------------------------------------
|
|
|
|
********************************************************************************
|
|
.Abstract
|
|
Define the expected core properties and requirements of the Scheduler service.
|
|
********************************************************************************
|
|
|
|
The rendering and playback subsystem relies on a Scheduler component to invoke
|
|
individual frame rendering and resource fetching jobs. This RfC summarises the
|
|
general assumptions and requirements other parts of the application are relying
|
|
on
|
|
|
|
Description
|
|
-----------
|
|
//description: add a detailed description:
|
|
|
|
The *Scheduler* is responsible for getting the individual _render jobs_ to run.
|
|
The basic idea is that individual render jobs _should never block_ -- and thus
|
|
the calculation of a single frame might be split into several atomic jobs,
|
|
including resource fetching. This expected usage should be considered together
|
|
with the data exchange protocol defined for data output through the `OutputSlot`
|
|
instances; moreover the extended data of the low-level model can be hot-swapped
|
|
while rendering continues to go on, necessitating to release blocks of superseded
|
|
model data at well defined points. Combining all these known usage constraints
|
|
leads to the following requirements for the scheduler:
|
|
|
|
ordering of jobs::
|
|
the scheduler has to ensure all prerequisites of a given job are met
|
|
|
|
job time window::
|
|
when it is not possible to run a job within the defined target time window, it
|
|
must not be run any more but rather be marked as failure
|
|
|
|
failure propagation::
|
|
when a job fails, either due to an job internal error, or by timing glitch,
|
|
the effect of this failure needs to propagate reliably; we need a mechanism
|
|
for dependent jobs to receive a notification of such a failure state
|
|
|
|
conditional scheduling::
|
|
we need to provide some way to tie the activity of jobs to external conditions,
|
|
notable examples being the availability of cached data, or the arrival of data
|
|
loaded from storage
|
|
|
|
superseding of planned jobs::
|
|
changes in playback modes require us to ``change the plan on-the-fly'' --
|
|
essentially this means we need to 'supersede' a group of already planned jobs.
|
|
Moreover, we need certain ordering guarantees to ensure the resulting switch
|
|
in the effective output data happens once and without glitches.
|
|
|
|
The scheduler interface and specification establishes some kind of micro-language
|
|
to encode the patterns of behaviour prompted by the playback control and the
|
|
interpretation of the render node model. Together these basic requirements
|
|
help to address some relevant themes
|
|
|
|
dependency on prerequisites
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
Render tasks don't exist in isolation; they depend on prerequisites, both preceding
|
|
calculations and the availability of data. Since our primary decision is to avoid
|
|
blocking waits, these prerequisites need to be modelled as other jobs, which leads
|
|
to dependencies and conditional scheduling.
|
|
|
|
|
|
detecting termination
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
The way other parts of the system are built, requires us to obtain a guaranteed
|
|
knowledge of some specific job's termination. More precisely, we need to find out
|
|
when a ``stream of calculations'' has left a well defined domain -- and this can
|
|
be modelled by the activation of specific marker jobs. It is possible to obtain
|
|
that knowledge with some timing leeway, but in the end, this information needs
|
|
to arrive with absolutely reliability (violations leading to segfault).
|
|
|
|
|
|
job scheduling modes
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
The scheduler offers various modes of treatment on a per job base. The default is to
|
|
handle jobs time based with a moderate latency. Alternatively jobs can be handled as
|
|
_background_ jobs, as _freewheeling_ jobs (maximum usage of performance and bandwidth),
|
|
or as _low-latency timed_ jobs.
|
|
|
|
|
|
|
|
latency, reliability and precision
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
By involving a scheduler component we employ an asynchronous calculation model.
|
|
This allows and necessitates to define special guarantees regarding various properties
|
|
of job execution.
|
|
|
|
- it is acknowledged that every scheduling involves some *latency* -- which needs to be
|
|
included into any calculation of deadlines. The latency limits the minimum time window
|
|
we can target for scheduling an operation
|
|
- it is acknowledged that timing specifications include some degree of fuzziness -- but
|
|
it is possible to give guarantees regarding correctness. The defined state transitions
|
|
and notifications will happen *reliably*
|
|
- it is acknowledged that the behaviour of the scheduler is non-deterministic (the way
|
|
this term is used in computer science). Yet still we'll impose some ordering guarantees,
|
|
which will be observed with *precision*: both the _adding_ and the _superseding_ of
|
|
a group of jobs happens in a transactional way, to retain the ordering according
|
|
to dependency and job time.
|
|
|
|
|
|
|
|
|
|
Tasks
|
|
~~~~~
|
|
// List what needs to be done to implement this Proposal:
|
|
// * first step [yellow-background]#WIP#
|
|
* define the job interface ([green]#✔ done#)
|
|
* define a protocol for job state handling [red yellow-background]#TBD#
|
|
* define the representation of dependencies and the notifications in practice [red yellow-background]#TBD#
|
|
* verify the proposed requirements by an scheduler implementation sketch [red yellow-background]#TBD#
|
|
|
|
|
|
|
|
Discussion
|
|
~~~~~~~~~~
|
|
|
|
Pros
|
|
^^^^
|
|
// add a fact list/enumeration which make this suitable:
|
|
* the entity ``job with defined properties'' serves as an interface
|
|
* open and complex patterns of behaviour can be built on top
|
|
* a proper scheduler replaces several other mechanisms (threaded output backend, producer-consumer queue
|
|
with locking, GUI animation services)
|
|
* to provide an _atomic execution service_ allows us to control various aspects of execution explicitly
|
|
* in the end, this enables to scale and use various kinds of hardware
|
|
|
|
|
|
Cons
|
|
^^^^
|
|
// fact list of the known/considered bad implications:
|
|
* there is no ``for-loop'' to base any playback control structures on
|
|
* compliance to externally imposed deadlines and memory management are challenging.
|
|
|
|
Alternatives
|
|
^^^^^^^^^^^^
|
|
//alternatives: explain alternatives and tell why they are not viable:
|
|
|
|
. use a synchronous player with buffering
|
|
. use a simplistic scheduler with entirely atomic jobs
|
|
|
|
We do not want (1), since it is tied to an obsolete hardware model and lacks the ability to be
|
|
adapted to the new kinds of hardware available today or to be expected in near future. We do not
|
|
want (2) since it essentially doesn't solve any problem, but rather pushes complexity into the
|
|
higher layers (Session, Stage), which are lacking the information about individual jobs and timing.
|
|
|
|
|
|
|
|
Rationale
|
|
---------
|
|
//rationale: Give a concise summary why it should be done *this* way:
|
|
We use a scheduler to gain flexibility in controlling various aspects of computation and I/O usage.
|
|
Moreover, we turn the scheduler into an interface between the Vault and Steam-Layer; while the exact
|
|
outfitting of the individual jobs highly depends on internals of the Session and Engine models,
|
|
the properties of _actual job execution_, closely related to system programming are akin
|
|
to the Vault. The actual requirements outlined in this RfC are derived from the internals
|
|
of the player implementation, while _the way_ these requirements are defined, and especially
|
|
the aspects _omitted_ from specification are derived from knowledge regarding the possible
|
|
scheduler and vault layer implementation.
|
|
|
|
|
|
//Conclusion
|
|
//----------
|
|
//conclusion: When approbate (this proposal becomes a Final)
|
|
// write some conclusions about its process:
|
|
|
|
|
|
|
|
|
|
Comments
|
|
--------
|
|
//comments: append below
|
|
|
|
.State -> Draft
|
|
This RfC emerged from the work on the _player implementation_, which is the immediate
|
|
client built on top of the scheduler service. At FrOSCon 2013 and the following developer meeting,
|
|
we had an extended discussion covering various aspects of the possible scheduler implementation.
|
|
The goal is to settle down on an interface definition, so the player and engine implementation
|
|
can be developed independently of the scheduler implementation
|
|
|
|
Ichthyostega:: 'Do 19 Sep 2013 21:31:07 CEST' ~<prg@ichthyostega.de>~
|
|
|
|
|
|
//endof_comments:
|
|
|
|
''''
|
|
Back to link:/documentation/devel/rfc.html[Lumiera Design Process overview]
|