679 lines
47 KiB
Text
679 lines
47 KiB
Text
2012-12-12 Lumiera Developers Meeting
|
|
=====================================
|
|
:Author: Ichthyo
|
|
:Date: 2012-12-21
|
|
|
|
Dec 20, 2012 on #lumiera 20:00 - 23:23 UTC +
|
|
|
|
__Participants__
|
|
|
|
* cehteh
|
|
* ichthyo
|
|
* Benny
|
|
* Hendrik
|
|
|
|
_Summary written by ichthyo_
|
|
|
|
|
|
|
|
Doxygen woes
|
|
------------
|
|
_Hendik_ pointed out an example where the handling and presentation
|
|
of extracted documentation was confusing. It turned out that didn't recognise
|
|
some documentation comments and thus retained those within the pretty printed source.
|
|
Basically this was known and documented behaviour, but confusing still the less.
|
|
|
|
_ichthyo_ slightly tweaked the configuration. Moreover, currently he creates and uploads
|
|
the API-doc manually and irregularly , so the content on the website is quite outdated at
|
|
times. Automatic publishing was previously done by builddrone; _cehteh_ promised to finish
|
|
and install an improved version...
|
|
|
|
We all agree that we somehow dislike Doxygen, but aren't aware of reasonable alternatives.
|
|
|
|
Conclusion
|
|
~~~~~~~~~~
|
|
|
|
* _ichthyo_ will fix the comments not recognised by Doxygen
|
|
* we reconfirm that we do _not_ want to create all our documentation based on Doxygen
|
|
|
|
|
|
|
|
FrOSCon aftermath
|
|
-----------------
|
|
The visits, hiking together, and the meeting at FrOSCon was refreshing and reassuring.
|
|
In the end, all went well. Everyone survived the after-froscon party and Benny's car is
|
|
fixed and working again.
|
|
|
|
_Benny_ proposes to create a page with some pictures, just to retain some traces of this
|
|
event. _Ichthyo_ is a bit reluctant, since he didn't care especially about documentation
|
|
this time, but he promises to look what usable images he's gotten.
|
|
|
|
Conclusion
|
|
~~~~~~~~~~
|
|
|
|
* create a page with some images
|
|
* conclusion about FrOSCon? ``it was fun'' ...
|
|
|
|
|
|
|
|
Scheduler: Interface and requirements
|
|
-------------------------------------
|
|
_Benny_ showed interest to work on that topic. The first step would be to build or use
|
|
a priority queue textbook implementation as starting point. Some time ago, _cehteh_ included
|
|
a suitable implementation in his link:http://git.pipapo.org/?p=cehsrc;a=summary[cehlib],a
|
|
collection of basic C library routines, mostly extracted from Lumiera's library. _ichthyo_
|
|
integrates this priority queue into the Lumiera tree.
|
|
|
|
The rest of the meeting was an extended discussion touching and affirming the most relevant
|
|
issues and considerations of the scheduler implementation to be expected.
|
|
|
|
- the core scheduler has to be kept rather simple
|
|
- the actual job function is wrapped into an extended function, which is tightly integrated
|
|
with the scheduler's implementation. This approach allows to implement more elaborate
|
|
strategies, without increasing the complexity of the actual scheduler.
|
|
- handling of dependencies between jobs is considered as one of the tricky requirements
|
|
- the intention is to pre-process and transform prerequisites in lists of dependent jobs
|
|
- support for prerequisites prompts us to provide some mechanism for ``conditionals''
|
|
- notably, the data required for processing will become available asynchronously.
|
|
- thus, the scheduler must include some form of _polling_ to detect when prerequisites
|
|
are finally met and unblock dependent jobs as a consequence.
|
|
- our scheduling works strictly ordered by time. There is no throttling. But we provide
|
|
_multiple_ scheduler queues, which use _different_ ``thread classes''
|
|
- a given job can be in multiple queues at the same time; first invocation wins.
|
|
- we intend to employ _work stealing_
|
|
- some special kinds of scheduling are not time bound (e.g. background rendering,
|
|
``freewheeling'' rendering). But we use time bound delivery as the fundamental
|
|
model and treat these as corner cases
|
|
- ``the scheduler'' as a service and sub-system encompasses more than just the
|
|
low-level implementation of a priority queue. We need an integrated _manager_
|
|
or _controller_ to provide the more high-level services required by the
|
|
``player'' subsystem in Proc-Layer.
|
|
- we need a mechanism to obsolete or supersede jobs, which are already planned,
|
|
but not yet triggered. The reason lies in the interactive nature of the Player.
|
|
- the implementation needs to be worked out; this is an internal detail of the
|
|
scheduler (seen as a subsystem), but likely it is not implemented in the
|
|
low-level scheduler queue. One promising implementation approach is to
|
|
use special ``group leader'' marker jobs.
|
|
- when jobs are superseded, the switch from the old to the new version should
|
|
happen in a clean way; there are several options how to achieve that in practice
|
|
- jobs will not only be constrained by their deadline; rather, we'll allow to define
|
|
a _time window_, during which a job must be triggered for regular execution.
|
|
|
|
_see below for a slightly shortened transcript of these discussions_
|
|
|
|
Conclusion
|
|
~~~~~~~~~~
|
|
* The scheduler service has a high-level interface
|
|
* there are multiple really simple, low-level scheduler queues
|
|
* some kind of manager or controller connects both levels
|
|
* superseding of planned jobs can be implemented through ``group leader'' jobs
|
|
* the link:{rfc}/SchedulerRequirements.html[RfC] should be augmented accordingly
|
|
|
|
|
|
|
|
Next meeting
|
|
------------
|
|
|
|
The next meeting will be Thursday October 10, 20:00 UTC
|
|
|
|
|
|
''''
|
|
|
|
++++
|
|
<br/>
|
|
<br/>
|
|
<br/>
|
|
<br/>
|
|
++++
|
|
|
|
|
|
[[irctranscript]]
|
|
IRC Transcript
|
|
--------------
|
|
- xref:dependencies[dependant jobs and conditionals]
|
|
- xref:schedulingmodes[various modes of scheduling]
|
|
- xref:architecture[questions of architecture]
|
|
- xref:superseding[aborting/superseding of jobs]
|
|
- xref:cleanswitch[clean switch when superseding planned jobs]
|
|
|
|
|
|
.-- Discussion of details --
|
|
[caption="☉Transcript☉ "]
|
|
----------------------------
|
|
[2013-09-12 22:55:00] <ichthyo> bennI_ you said you might look into that topic, as time permits, of course
|
|
[2013-09-12 22:49:29] <cehteh> .. scheduler .. shall I explain what I have in mind for the backend lowest level?
|
|
[2013-09-12 22:55:12] <cehteh> a low level job should be very very simple, that is: a single pointer to a
|
|
'job function' and a list of prerequisite jobs (and maybe little more)
|
|
[2013-09-12 22:55:57] <cehteh> all things like rescheduling, aborting, etc. are on the lowest level handled over
|
|
this single job function which gets a parameter about the state/condition on which
|
|
its run (in time, aborting, expired, ....)
|
|
[2013-09-12 22:57:22] <cehteh> anything more, especially dispatching on different functions to handle the actual state
|
|
should be implemented on a level above (and maybe already in C++ by Proc) then
|
|
[2013-09-12 22:57:41] <bennI_> but the job function is just a call back function, defined elsewhere
|
|
[2013-09-12 22:57:48] <cehteh> yes
|
|
[2013-09-12 22:58:16] <bennI_> so it's the jobs that are being scheduled
|
|
[2013-09-12 22:58:26] <cehteh> yes
|
|
[2013-09-12 22:58:58] <ichthyo> basically yes, but as you said, this low-level job function also has to handle
|
|
the state and maybe dispatch to the right high-level function. A different function
|
|
for working, than for aborting, for example
|
|
[2013-09-12 22:59:45] <cehteh> yes, but want to leave that out of the scheduler itself, that's handled on a higher level
|
|
[2013-09-12 22:59:45] <ichthyo> so that is kind of a thin layer on top of the basic scheduler
|
|
----------------------------
|
|
|
|
|
|
|
|
|
|
[[dependencies]]
|
|
.-- dependant jobs and conditionals --
|
|
[caption="☉Transcript☉ "]
|
|
----------------------------
|
|
[2013-09-12 22:59:46] <bennI_> what about the dependent jobs?
|
|
[2013-09-12 23:00:03] <ichthyo> thats the most important question I think, since *that* is something special
|
|
[2013-09-12 23:00:19] <cehteh> yes dependencies need to be handled by the scheduler
|
|
[2013-09-12 23:00:52] <ichthyo> well... at least the scheduler needs to poll them in some way
|
|
[2013-09-12 23:01:01] <ichthyo> poll or notify or re-try or the like
|
|
[2013-09-12 23:01:03] <cehteh> one question: shall jobs be in the priority queue even if their dependencies
|
|
are not yet satisfied? ... I'd tend to say yes
|
|
|
|
[2013-09-12 23:01:11] <bennI_> so we're not going to have a scheduler with simple jobs
|
|
[2013-09-12 23:01:31] <cehteh> the scheduler must maintain lists of dependencies. Nodes for these lists are likely
|
|
to be allocated by the small object allocator I've written some time ago;
|
|
because 2 different jobs can depend on a single other job and other more complex
|
|
cross dependencies
|
|
...
|
|
|
|
[2013-09-12 23:02:54] <cehteh> dependencies are the results of jobs
|
|
[2013-09-12 23:03:04] <ichthyo> so you propose to pre-process that prerequisites and rather store them
|
|
as dependencies internally?
|
|
[2013-09-12 23:03:11] <cehteh> a 'job' might be a no-op if the resource is available
|
|
[2013-09-12 23:03:04] <bennI_> but these prerequisites are IN the scheduler
|
|
[2013-09-12 23:03:10] <bennI_> not in the higher level?
|
|
[2013-09-12 23:03:41] <ichthyo> bennI_ the scheduler needs only to be aware that there is some kind of dependency
|
|
[2013-09-12 23:03:43] <cehteh> yes the scheduler needs to be aware of dependencies so anything needs to be abstracted somehow,
|
|
that's why I'd like to say anything is a 'job', even if that's technically not completely true
|
|
because something might be a 'singleton instance' and no job needs to be run to create it
|
|
[2013-09-12 23:04:21] <ichthyo> on that level indeed, yes
|
|
[2013-09-12 23:04:45] <ichthyo> any more fancy functionality is encapsulated within that simple job abstraction
|
|
[2013-09-12 23:06:18] <cehteh> as long some resource (which can become a prerequisite/dependency of any other job) exists
|
|
it has some ultra-lightweight job structure associated with it
|
|
|
|
[2013-09-12 23:06:45] <ichthyo> now, for example, lets consider the loading of data from a file.
|
|
How does this work in practice? do we get a callback when the data arrives?
|
|
and where do we get that callback? guess in another thread.
|
|
and then, how do we instruct the scheduler so that the jobs dependant on the
|
|
arrival of that data can now become active?
|
|
[2013-09-12 23:08:07] <cehteh> note that we do memory mapping, we never really load data as in calling read()
|
|
but we might create prefetch jobs which complete when data is in memory.
|
|
The actual loading is done by the kernel
|
|
[2013-09-12 23:08:24] <ichthyo> yes, my understanding too
|
|
[2013-09-12 23:08:38] <ichthyo> but it is asynchronous, right?
|
|
[2013-09-12 23:08:42] <cehteh> yes
|
|
[2013-09-12 23:08:58] <bennI_> from the schedulers perspective, it's juts a callback, so it is defined elsewhere
|
|
i.e., data loading and details are implemented elsewhere in the callback itself
|
|
|
|
[2013-09-12 23:09:43] <ichthyo> But... there is a problem: not the scheduler invokes that callback,
|
|
someone else (triggered by the kernel) invokes this callback, and
|
|
this must result in the unblocking of the dependant jobs, right?
|
|
[2013-09-12 23:10:35] <bennI_> the scheduler just has the callback waiting, then when all conditions are met
|
|
it just gets scheduled
|
|
[2013-09-12 23:10:21] <cehteh> nope
|
|
[2013-09-12 23:10:54] <cehteh> note: we need a 'polling' state for a job
|
|
[2013-09-12 23:10:57] <ichthyo> so the scheduler polls a variable, signalling that the data is there (or isn't yet)
|
|
[2013-09-12 23:11:14] <cehteh> yes
|
|
[2013-09-12 23:11:22] <cehteh> but even if not we can ignore the fact; then our job might block,
|
|
in practice that should happen *very* rarely
|
|
[2013-09-12 23:11:37] <ichthyo> we could abstract that as a very simple conditional
|
|
the scheduler is polling some conditional
|
|
[2013-09-12 23:11:47] <bennI_> unless the presence of the data for the job is a precondition
|
|
[2013-09-12 23:12:28] <ichthyo> bennI_: yes, the presence of the data *is* a precondition for a render job
|
|
thus the scheduler must not start the render job, unless the data is already there
|
|
[2013-09-12 23:13:11] <bennI_> so the job cannot be 'scheduled' until data is present -- this is one of the preconditions
|
|
[2013-09-12 23:12:47] <cehteh> loading data: have a job calling posix_memadvice(..WILLNEED) soon enough
|
|
before calling the job which needs the data
|
|
if we can not afford blocking, then we can poll the data with mincore()
|
|
and if the data is not there abort the job or take some other action.
|
|
I prolly add some (rather small budget) memory locking to the backend too
|
|
[2013-09-12 23:14:13] <ichthyo> OK, but then this would require some kind of re-trying or polling of jobs.
|
|
do we want this? I mean a job can start processing, once the data is there
|
|
and we are in the pre defined time window
|
|
[2013-09-12 23:15:08] <cehteh> then if *really* needed you can make a job which locks the data in ram
|
|
(that is really loading it and only completes when its loaded)
|
|
this way you can avoid polling too. But that's rather something we should
|
|
do only for really important things
|
|
[2013-09-12 23:15:39] <ichthyo> urghs, that would block the thread, right? so polling sounds more sane
|
|
[2013-09-12 23:16:22] <cehteh> blocking the thread is no issue as we have a thread pool and this thread pool
|
|
should later be aware that some threads might be blocking.
|
|
(I planned to add some thread class for that)
|
|
[2013-09-12 23:16:36] <bennI_> what about having 2 jobs: one is the load, the other is a precondition,
|
|
i.e., the presence of the data
|
|
[2013-09-12 23:16:52] <cehteh> bennI_: yes exactly like that
|
|
[2013-09-12 23:16:53] <ichthyo> bennI_ yes that was what I was thinking too
|
|
[2013-09-12 23:17:05] <ichthyo> one job to prepare / trigger the loading
|
|
one job to verify the data is there (this is a conditional)
|
|
[2013-09-12 23:17:19] <cehteh> but either one can block .. we a free to decide which one blocks
|
|
[2013-09-12 23:17:28] <bennI_> so we have two stupid jobs, where onne can only happen if the other happens
|
|
[2013-09-12 23:17:29] <ichthyo> and then the actual calculation job
|
|
[2013-09-12 23:18:22] <cehteh> well the scheduler on the lowest level should be unaware of all that
|
|
.. just dead dumb scheduling. All logic is rolled on higher levels
|
|
that allows us for more smart things depending on the actual use case,
|
|
different strategies for different things
|
|
[2013-09-12 23:17:50] <bennI_> we could even have only ONE job: it has a linked lists of jobs
|
|
but why not -- instead of ONE job, it can have a linked list of small jobs
|
|
once it's scheduled, maybe only one job gets run, i.e. data gets loaded.
|
|
the next schedule of the same job, loads the actual file, then the dependency
|
|
[2013-09-12 23:20:36] <cehteh> bennI_: first is a priqueue which schedules jobs by time
|
|
maybe aside of that there is a list of jobs ready to run
|
|
and a list of jobs not ready to run b/c their dependencies are not satisfied
|
|
[2013-09-12 23:22:21] <cehteh> (mhm ready to run? maybe not, just run it!)
|
|
[2013-09-12 23:22:49] <bennI_> ok, so the jobs themselves are going to be a bit more complex
|
|
I think we'll need a bit of trial and error; we cannot possibly envisage
|
|
everything right now. Maybe some of the jobs will not be a simple linked list,
|
|
but a tree
|
|
[2013-09-12 23:22:36] <cehteh> OK plan:
|
|
scheduler first picks job from the priqueue and checks if all dependencies are met
|
|
either way it calls the job function, either with "in time" or with "missing dependencies"
|
|
[2013-09-12 23:23:44] <bennI_> dumping some branches altogether?
|
|
[2013-09-12 23:24:30] <cehteh> then if state is "missing dependencies" the job function *might* insert the job
|
|
into the list of blocked jobs (or cancel the job or call anything else)
|
|
whenever a job completes, it might release one or more jobs from the 'blocked' list,
|
|
maybe even move them to a delayed list, and then the scheduler (after handling the priqueue)
|
|
picks any jobs from this delayed list up and runs them
|
|
[2013-09-12 23:25:02] <ichthyo> OK
|
|
[2013-09-12 23:25:39] <bennI_> do ALL jobs get equally scheduled
|
|
or do dependent jobs only enter in a linked list of jobs for one scheduled job,
|
|
if you know what I mean...
|
|
[2013-09-12 23:27:16] <cehteh> bennI_: jobs are scheduled by the priqueue which is ordered by time .. plus maybe some small
|
|
integer so you can give a order to jobs ought to be run at the same time
|
|
we have no job priorities otherwise, but we will have (at least) 2 schedulers
|
|
one is for the hard jobs which must be run in time, and one is for background task
|
|
and one (but these rather needs special attention) will be some realtime scheduler
|
|
which does the hard timing stuff, like switching video frames or such, but that's
|
|
not part of the normal processing
|
|
[2013-09-12 23:28:03] <bennI_> yes, but some jobs are not jobs in themselves, only depend on other jobs
|
|
[2013-09-12 23:28:16] <bennI_> do these su, dependent jobs get priqued?
|
|
[2013-09-12 23:29:23] <ichthyo> bennI_: I think this is a question of design. Both approaches will work, but it depends
|
|
on the implementer of the scheduler to prefer one approach over the other
|
|
[2013-09-12 23:30:12] <cehteh> bennI_: my idea was that each job might be in both queues (in-time and background)
|
|
so for example when the system is reasonable idle a job might be first scheduled
|
|
by the 'background' scheduler because its normal scheduled time is way ahead
|
|
----------------------------
|
|
|
|
|
|
|
|
|
|
[[schedulingmodes]]
|
|
.-- various modes of scheduling --
|
|
[caption="☉Transcript☉ "]
|
|
----------------------------
|
|
[2013-09-12 23:30:23] <ichthyo> cehteh: we certainly get a third (or better a fourth) mode of operation:
|
|
the "freewheeling" calculation. This is what we use on final rendering.
|
|
Do any calculation as soon as possible, but without any timing constraints
|
|
[2013-09-12 23:31:31] <cehteh> ichthyo: the freewheeling is rather simple: just put jobs with time==0
|
|
into the background scheduler
|
|
or?
|
|
[2013-09-12 23:32:10] <ichthyo> OK, so it can be implemented as a special case of background scheduling,
|
|
where we just use all available resources to 100%
|
|
[2013-09-12 23:32:36] <cehteh> or not time=0 but time==now
|
|
[2013-09-12 23:32:59] <cehteh> because other jobs should eventually be handled too
|
|
anyways I think that's not becoming complicated
|
|
[2013-09-12 23:33:42] * ichthyo nods
|
|
[2013-09-12 23:34:24] <ichthyo> only the meaning is a bit different, not so much the implementation
|
|
background == throttle resource usage
|
|
freewheeling == get me all the power that is available
|
|
[2013-09-12 23:36:52] <cehteh> background wont throttle, it just doesn't get scheduled when there are
|
|
more important things to do
|
|
[2013-09-12 23:37:19] <ichthyo> mhm, might be, not entirely sure
|
|
[2013-09-12 23:37:35] <cehteh> there is no point in putting something into the background queue
|
|
if we never need the stuff. But it should have some safety margin there
|
|
and maybe a different thread class (OS priority for the thread)
|
|
[2013-09-12 23:38:11] <ichthyo> for example: user works with the application and plays back,
|
|
but at the same time, excess resources are used for pre-rendering
|
|
[2013-09-12 23:38:19] <cehteh> yes
|
|
[2013-09-12 23:38:21] <ichthyo> but without affecting the responsiveness.
|
|
thus we don't want to use 100% of the IO bandwidth for background
|
|
[2013-09-12 23:38:53] <cehteh> then schedule less or more sparse background jobs
|
|
[2013-09-12 23:39:35] <cehteh> but we cant really throttle IO in a more direct way, since that's obligation
|
|
of the kernel and we can only hint it
|
|
[2013-09-12 23:39:54] <ichthyo> ok, but someone has to do that.
|
|
Proc certainly can't do that, it can only give you a whole bunch of jobs
|
|
for background rendering
|
|
[2013-09-12 23:40:07] <cehteh> the job function might be aware if its scheduled because of in-time or
|
|
background queue and adjust itself
|
|
[2013-09-12 23:40:57] <cehteh> we only schedule background jobs if nothing else is to do
|
|
[2013-09-12 23:41:18] <ichthyo> what means "nothing else"?
|
|
for example, if we're waiting for data to arrive, can we meanwhile schedule
|
|
background calculation (non-IO) jobs?
|
|
[2013-09-12 23:41:45] <cehteh> and if I/O becomes the bottleneck some logic to throttle background jobs
|
|
might be implemented on the higher level job functions...
|
|
[2013-09-12 23:42:12] <cehteh> I abstracted "thread classes" in the worker pool remember
|
|
[2013-09-12 23:42:23] * ichthyo nods
|
|
[2013-09-12 23:42:33] <cehteh> these are not just priorities but define rather the purpose of a thread
|
|
we can have "background IO" thread classes there .. and if a job gets scheduled
|
|
from the background queue it might query the IO load and rather abort/delay the job
|
|
|
|
[2013-09-12 23:43:51] <ichthyo> another thing worth mentioning
|
|
[2013-09-12 23:43:55] <ichthyo> the planning of new jobs happens within jobs
|
|
there are some special planning jobs from time to time
|
|
but that certainly remains opaque for the scheduler
|
|
|
|
[2013-09-12 23:44:15] <cehteh> the scheduler should not be aware/responsible for any other resource managemnt
|
|
[2013-09-12 23:44:38] <bennI_> you can feed a nice to the scheduler, but the scheduler only uses the nice value
|
|
to reshuffle. Someone else must decide when to issue a nice
|
|
[2013-09-12 23:45:47] <cehteh> bennI_: there is no 'nice' value our schedulers are time based
|
|
niceness is defined by the 'thread classes' which can define even more
|
|
(io priority, aborting policies,...) -- even (hard) realtime and OS scheduling policies
|
|
----------------------------
|
|
|
|
|
|
|
|
|
|
[[architecture]]
|
|
.-- questions of architecture --
|
|
[caption="☉Transcript☉ "]
|
|
----------------------------
|
|
[2013-09-12 23:45:31] <ichthyo> as I see it: we have several low-level priqueues.
|
|
And we have a high-level scheduler interface. Some facility in between decides
|
|
which low-level queue to use. Proc will only state the rough class of a job,
|
|
i.e. timebound, or background. Thus these details are kept out of the low-level
|
|
(actual) scheduler (implementation), but are configured close to the scheduler,
|
|
when we add jobs
|
|
[2013-09-12 23:48:16] <cehteh> the creator of a job (proc in most case) tells what purpose a job has
|
|
(numbercrunching, io, user-interface, foo, bar): that's a 'threadclass'
|
|
the actual implementation of threadclasses are defined elsewhere
|
|
|
|
[2013-09-12 23:49:09] <ichthyo> from Proc-Layer's perspective, all these details happen within the
|
|
scheduler as a black box. Proc only states a rough class (or purpose) of the job.
|
|
When the job is added, this gets translated into using a suitable thread class,
|
|
and we'll put the job in the right low-level scheduler queue
|
|
[2013-09-12 23:49:44] <cehteh> actually for each job you can do 2 of such things .. as i mentioned earlier,
|
|
each job can be in both queues, so you can define time+threadclass for background
|
|
and for in-time scheduler ... with a little caveat about priority inversion,
|
|
threadclass should be the same when you insert something in both queues,
|
|
only for freewheeling or other special jobs they might differ
|
|
[2013-09-12 23:51:52] <cehteh> bennI_: btw implementation detail I didnt tell you yet..
|
|
you know "work stealing schedulers" ?
|
|
we absolutely want that :)
|
|
[2013-09-12 23:52:09] <ichthyo> yes ;-)
|
|
its a kind of load balancing -- a very simple and clever one
|
|
[2013-09-12 23:52:11] <cehteh> that is: each OS thread has its own scheduler.
|
|
each job running on a thread which creates new thread puts these on the scheduler
|
|
of its own thread and only if a thread has nothing else to do, and the system
|
|
is not loaded, then it steals jobs from other threads. That gives far more locality
|
|
and much less contention.
|
|
[2013-09-12 23:56:23] <cehteh> another detail: we need to figure out if we need a pool of threads for each
|
|
threadclass OR if switching a thread to another threadclass is more performant.
|
|
[2013-09-12 23:56:45] <ichthyo> *this* probably needs some experimentation
|
|
[2013-09-12 23:56:49] <cehteh> yes
|
|
|
|
|
|
[2013-09-12 23:57:32] <ichthyo> OK, so please let me recap one thing: how we handle prerequisites
|
|
[2013-09-12 23:58:00] <ichthyo> namely, (1) we translate them into dependent jobs (jobs that follow)
|
|
and (2) we have conditional jobs, which are polled regularly,
|
|
until they signal that a condition is met (or their deadline has passed)
|
|
|
|
[2013-09-12 23:58:59] <cehteh> ichthyo: I think job creation is a (at least) 2 step process .. first you create
|
|
the job structure, fill in any data (jobs it depends on). These 'jobs it depends on'
|
|
might be just created of course .. and then when done you unleash it to the scheduler
|
|
[2013-09-12 23:59:39] <ichthyo> indeed
|
|
[2013-09-12 23:59:52] <ichthyo> on high-level, I hand over jobs as "transactions" or batches
|
|
then, the scheduler-frontend might do some preprocessing and re-grouping
|
|
and finally hands over the right data structures to the low-level interface
|
|
|
|
[2013-09-13 00:00:35] <cehteh> after you given it to the scheduler, shall the scheduler dispose it when done
|
|
(I mean really done) or give it back to you
|
|
[2013-09-13 00:00:49] <ichthyo> never give it back
|
|
[2013-09-13 00:00:52] <cehteh> OK
|
|
[2013-09-13 00:00:56] <ichthyo> it is really point-and-shot
|
|
[2013-09-13 00:00:58] <ichthyo> BUT -- there is a catch
|
|
mind me
|
|
we need to be able to "change the plan"
|
|
[2013-09-13 00:01:18] <cehteh> there must be no catch .. if there is one, then you have to get it back :)
|
|
[2013-09-13 00:01:33] <ichthyo> for example
|
|
[2013-09-13 00:01:37] <cehteh> yes
|
|
[2013-09-13 00:01:46] <cehteh> but that's completely on your side
|
|
[2013-09-13 00:01:52] <ichthyo> no
|
|
[2013-09-13 00:02:00] <ichthyo> lets assume Proc has given the jobs for the next second to the scheduler
|
|
that is 25 frames * 3 jobs per frame * number of channels.
|
|
then, 20 ms later, the User in the GUI hits the pause button
|
|
[2013-09-13 00:02:52] <ichthyo> now we need a way to "call back" *exactly those* jobs,
|
|
no other jobs (other timelines)
|
|
[2013-09-13 00:03:19] <ichthyo> so we need a "scope", and we need to be able to "cancel" or "retarget"
|
|
the jobs already given to the scheduler. But never *individual* jobs,
|
|
always whole groups of jobs
|
|
[2013-09-13 00:03:00] <cehteh> you prolly create a higher level "render-job" class.
|
|
Now, if you want to be able to abort or move it then you have a flag there
|
|
(and/or maybe a backpointer to the low level job)
|
|
[2013-09-13 00:04:05] <ichthyo> no
|
|
[2013-09-13 00:04:10] <cehteh> wait
|
|
[2013-09-13 00:04:20] <ichthyo> I am absolutely sure I don't keep any pointer to the low level job
|
|
since I don't have any place to manage that.
|
|
it is really point and shot
|
|
[2013-09-13 00:04:39] <cehteh> yes
|
|
[2013-09-13 00:05:15] <cehteh> you don't need to manage .. this is just a tag
|
|
[2013-09-13 00:05:24] <ichthyo> but some kind of flag or tag would work indeed, yes
|
|
[2013-09-13 00:06:02] <cehteh> your job function just handles that
|
|
if (self->job == myself) .... else oops_i_got_dumped()
|
|
of course self->job needs to be protected by some mutex
|
|
[2013-09-13 00:07:14] <ichthyo> I think that is an internal detail of the scheduler (as a subsystem)
|
|
[2013-09-13 00:07:29] <cehteh> now when you reschedule you just create a new (low level)job .. and tag
|
|
the higher level job with that job and unleash it
|
|
[2013-09-13 00:07:34] <ichthyo> the *scheduler* wraps the actual job into a job function, of course
|
|
[2013-09-13 00:08:15] <cehteh> so this self->job is just a tag about the owner, no need to manage
|
|
you only need to check for equality and maybe some special case like NULL for aborts
|
|
no need to release or manage it
|
|
[2013-09-13 00:08:57] <ichthyo> well yes. but that is not Proc
|
|
not Proc or the Player is doing that, but the scheduler (in the wider sense) is doing that
|
|
since in my understanding, only the scheduler has access to the jobs, after they
|
|
have been handed over
|
|
[2013-09-13 00:09:41] <cehteh> well proc creates the job
|
|
[2013-09-13 00:09:47] <cehteh> yes
|
|
[2013-09-13 00:10:12] <cehteh> but you certainly augment the low level job structure with some higher level data
|
|
[2013-09-13 00:10:30] <ichthyo> yes
|
|
[2013-09-13 00:10:49] <ichthyo> and the scheduler itself will certainly also put some metadata into the job descriptor struct
|
|
[2013-09-13 00:10:50] <bennI_> then throw them at the scheduler and say goodby?
|
|
[2013-09-13 00:10:53] <cehteh> there you just add a mutex and a tag (job pointer)
|
|
[2013-09-13 00:11:16] <ichthyo> I'd like to see that entirely as an implementation detail within the scheduler
|
|
[2013-09-13 00:11:23] <cehteh> the job descriptor must become very small
|
|
[2013-09-13 00:11:28] <ichthyo> since it highly depends on thread management and the like
|
|
[2013-09-13 00:11:50] <cehteh> no need
|
|
[2013-09-13 00:12:15] <ichthyo> essentially, Proc will try to keep out of that discussion
|
|
[2013-09-13 00:12:16] <cehteh> it can become an implementation detail of some middle layer .. above the scheduler
|
|
----------------------------
|
|
|
|
|
|
|
|
[[superseding]]
|
|
.-- how to handle aborting/superseding --
|
|
[caption="☉Transcript☉ "]
|
|
----------------------------
|
|
[2013-09-13 00:12:19] <bennI_> how is the proc going to say 'stop some job'
|
|
[2013-09-13 00:12:28] <ichthyo> yes that's the question. That's what I'm getting at
|
|
[2013-09-13 00:12:49] <ichthyo> Proc will certainly never ask you to stop a specific job
|
|
this is sure.
|
|
[2013-09-13 00:12:54] <bennI_> Proc doesn't have a handle or the like
|
|
[2013-09-13 00:13:09] <ichthyo> BUT -- Proc will ask you to re-target / abort or whatever all jobs within a certain scope
|
|
and this scope is given with the job definition as a tag
|
|
[2013-09-13 00:13:15] <cehteh> only proc knows to stop something but you need some grip on it .. and of course that are
|
|
the proc own datastructures (higher level job descriptors)
|
|
[2013-09-13 00:13:34] <ichthyo> as said -- proc tags each job with e.g. some number XXX
|
|
and then it might come to the scheduler and say:
|
|
please halt all jobs with number XXX
|
|
[2013-09-13 00:14:09] <cehteh> the 'tag' can be the actual job handle .. even if you don't own it any more
|
|
[2013-09-13 00:14:32] <bennI_> 'IT' ?
|
|
the proc?
|
|
[2013-09-13 00:14:44] <cehteh> why number .. why not the low level job pointer?
|
|
that is guaranteed to be a valid unique number for each job
|
|
[2013-09-13 00:15:06] <ichthyo> cehteh: since Proc never retains knowledge regarding individual jobs
|
|
[2013-09-13 00:15:17] <cehteh> uhm -- when you create a job it does
|
|
[2013-09-13 00:15:24] <bennI_> 'cause the Proc layer wants to give the job to the scheduler and say goodby,
|
|
I know nothing about you anymore
|
|
[2013-09-13 00:15:35] <ichthyo> exactly. Proc finds out what needs to be calculated, creates those job descriptors,
|
|
tags them as a group and then throws it over the wall
|
|
[2013-09-13 00:15:52] <bennI_> but proc can't throw it over the wall
|
|
[2013-09-13 00:16:10] <bennI_> it has a vested interessted in the job, i.e. abort!!!
|
|
[2013-09-13 00:16:05] <cehteh> OK but I see this "tags these as groups" as layer above the dead simple scheduler
|
|
I dont really like the idea to implement that on the lowest level, but the job-function
|
|
can add a layer above and handle this
|
|
[2013-09-13 00:17:19] <ichthyo> no, you don't need to implement it on the lowest level, of course
|
|
but basically its is an internal detail of "the scheduler"
|
|
[2013-09-13 00:17:50] <cehteh> nah .. "the manager" :)
|
|
[2013-09-13 00:17:57] <ichthyo> :-D
|
|
[2013-09-13 00:18:01] <cehteh> the scheduler doesnt care
|
|
[2013-09-13 00:18:07] <bennI_> who handles the layer?
|
|
[2013-09-13 00:18:18] <ichthyo> anyway, then "the manager" is also part of "the scheduler" damn it ;-)
|
|
[2013-09-13 00:18:26] <bennI_> is it the WALL where the proc thows over the job
|
|
[2013-09-13 00:18:31] <cehteh> it just schedules .. and if a job is canceled then the job-function
|
|
has to figure that out and make a no-op
|
|
[2013-09-13 00:18:28] <ichthyo> Backend handles that layer. "The scheduler" is a high level thing
|
|
it contains multiple low-level schedulers, priqueues and all the management stuff,
|
|
and "the scheduler" as a subsystem can arrange this stuff in an optimal way.
|
|
No one else can.
|
|
[2013-09-13 00:18:49] <bennI_> stop -- I see this as a slight problem
|
|
[2013-09-13 00:19:20] <bennI_> ...and all the management stuff?
|
|
[2013-09-13 00:19:15] <cehteh> bennI_: we never wanted to remove jobs from the priority queue,
|
|
because that's relative expensive
|
|
[2013-09-13 00:19:43] <bennI_> yes, removing jobs is not really a schedulers job
|
|
[2013-09-13 00:20:07] <ichthyo> true, no doubt
|
|
[2013-09-13 00:20:18] <ichthyo> but as a client I just expect this service
|
|
[2013-09-13 00:20:23] <bennI_> its that mystically 'layer' or manager?
|
|
[2013-09-13 00:20:29] <ichthyo> yes
|
|
[2013-09-13 00:20:43] <ichthyo> and this mystical manager needs internal knowledge how the scheduler works
|
|
[2013-09-13 00:20:45] <cehteh> so the logic is all in the job function ..
|
|
[2013-09-13 00:20:47] <bennI_> as a client you can expect the manager to do this
|
|
[2013-09-13 00:20:55] <bennI_> but the manager belongs not to the scheduler
|
|
[2013-09-13 00:21:00] <ichthyo> but the client doesn't need internal knowledge how the scheduler works
|
|
[2013-09-13 00:21:11] <ichthyo> thus, clearly, the manager belongs to the scheduler, not the client
|
|
[2013-09-13 00:22:43] <bennI_> this will not be directly implemented within the scheduler
|
|
[2013-09-13 00:23:15] <ichthyo> bennI_: absolutely, this is not in the low-level scheduler.
|
|
But it is closer to the low-level scheduler, than it is to the player
|
|
|
|
|
|
[2013-09-13 00:21:26] <bennI_> ok, WHO wants to stop or abort jobs?
|
|
[2013-09-13 00:21:46] <cehteh> proc :>
|
|
[2013-09-13 00:21:48] <ichthyo> the player
|
|
[2013-09-13 00:22:11] <ichthyo> more precisely: the player doesn't want to stop jobs, but he wants to change the playback mode
|
|
[2013-09-13 00:22:29] <cehteh> ichthyo: do you really need a 'scope' or can you have a 'leader' which you abort
|
|
This leader is practically implemented as a job which is already finished but others wait on it
|
|
[2013-09-13 00:22:44] <ichthyo> such a leader would likely be a solution
|
|
[2013-09-13 00:23:35] <cehteh> ichthyo: agreed
|
|
[2013-09-13 00:24:03] <ichthyo> cehteh: actually I really don't care *how* it is implemented.
|
|
Proc can try to support that feature with giving the right information
|
|
grouping or tagging would be one option
|
|
[2013-09-13 00:24:26] <ichthyo> just look at the requirement from the player:
|
|
consider the pause button or loop playing, while the user moves the loop boundaries
|
|
[2013-09-13 00:24:50] <ichthyo> or think at scrubbing where the user drags the "playhead" marker while it move
|
|
[2013-09-13 00:24:19] <cehteh> I opt for the leader implementation
|
|
because that needs no complicated scope lookup and can be implemented with the facilities
|
|
already there. But that still means that someone has to manage this leaders,
|
|
i.e. a small structure above the low level jobs (struct{mutex; enum state})
|
|
and these leaders then have an identity/handle/pointer you need to care for
|
|
[2013-09-13 00:27:00] <ichthyo> let's say, *someone* has to care
|
|
[2013-09-13 00:27:12] <cehteh> ahh
|
|
|
|
[2013-09-13 00:27:50] <bennI_> I think we're going to need an intermediate layer between the job creator and the scheduler
|
|
[2013-09-13 00:27:59] <ichthyo> yes, my thinking too
|
|
[2013-09-13 00:28:13] <bennI_> 'cause not only the job creator has access to the jobs,
|
|
some one else will also want to kill jobs, which is not the job creator
|
|
and how is the job killer supposed to know WHICH tag, or handle, of a job to kill
|
|
[2013-09-13 00:29:25] <cehteh> we use a strict ownership paradigm
|
|
if someone else wants to operate on something it has to be its owner
|
|
[2013-09-13 00:30:22] <ichthyo> yes, and thus this management tasks need to be done within "the scheduler" in the wider sense
|
|
[2013-09-13 00:30:31] <cehteh> but that's not really a problem here
|
|
Proc creates jobs and this (slightly special) leader job and hands it over to the player
|
|
[2013-09-13 00:30:52] <ichthyo> wait, the other way round
|
|
[2013-09-13 00:30:53] <cehteh> or other way around the player creates this leader and asks Proc to fill out the rest for it
|
|
[2013-09-13 00:30:58] <bennI_> but the killer, who is not the creator, doesn't own the job
|
|
but the scheduler KNOIWS NOTHING about jobs, only dependencies
|
|
[2013-09-13 00:31:13] <cehteh> but someone knows the leader; you just 'kill' the leader
|
|
----------------------------
|
|
|
|
|
|
|
|
[[cleanswitch]]
|
|
.-- clean switch when superseding planned jobs --
|
|
[caption="☉Transcript☉ "]
|
|
----------------------------
|
|
[2013-09-13 00:31:44] <ichthyo> the player only requests "all jobs in this timeline and for this play process"
|
|
to be superseded by new jobs, and this is expressed by some tag or number or handle
|
|
or whatever (the player doesn't care)
|
|
[2013-09-13 00:32:38] <ichthyo> so please note, it is not even just killing, effectively it is superseding,
|
|
but this is probably irrelevant for the scheduler, since the scheduler just
|
|
sees new jobs coming in afterwards
|
|
|
|
[2013-09-13 00:34:24] <ichthyo> unfortunately there is one other, nasty detail:
|
|
we need that switch for the superseding to happen in a clean manner.
|
|
[2013-09-13 00:34:42] <ichthyo> This doesn't need to happen immediately, nor does it need to happen even at the same time
|
|
in each channel. But it cant't be that the old version for a frame job and the new version
|
|
of a frame job will both be triggered. It must be the old version, and then a clean switch,
|
|
and from then on the new version, otherwise we'll get lots of flickering and crappy noise
|
|
on the sound tracks
|
|
[2013-09-13 00:36:16] <cehteh> eh?
|
|
[2013-09-13 00:36:29] <ichthyo> yes, new and old can't be interleaved
|
|
[2013-09-13 00:36:38] <cehteh> that never happens
|
|
[2013-09-13 00:36:45] <ichthyo> ok, then fine...
|
|
[2013-09-13 00:36:49] <cehteh> because of the 'functional' model
|
|
[2013-09-13 00:37:03] <cehteh> you never render into the same buffer if the buffer is still in use
|
|
in the worst, the invalidated jobs already runs and the actual job is out of luck
|
|
[2013-09-13 00:37:34] <ichthyo> well, we talked a thousand times about that:
|
|
this doesn't work for the output, since we don't and never can't manage the output buffers
|
|
[2013-09-13 00:38:02] <cehteh> I think that will work for output as well
|
|
[2013-09-13 00:38:16] <ichthyo> I know, cehteh that you want to think that ;-)
|
|
[2013-09-13 00:38:22] <cehteh> even if I cant, they are abstracted and interlocked
|
|
[2013-09-13 00:38:44] <ichthyo> but actually it is the other way round. You use some library for output, and this
|
|
just gives *us* some buffer managed by the library, and then we have to ensure
|
|
that our jobs exactly match the time window and dispose the data into this buffer
|
|
[2013-09-13 00:39:15] <cehteh> yes but we have can have only one job at a time writing to that buffer
|
|
[2013-09-13 00:39:28] <ichthyo> this is the nasty corner case where our nice concept collides with the rest of the world ;-)
|
|
[2013-09-13 00:39:32] <cehteh> these jobs are atomic -- at least we should make these atomic
|
|
[2013-09-13 00:40:00] <ichthyo> yes, that's important
|
|
[2013-09-13 00:40:21] <cehteh> even if that means rendering an invalidated frame that's better than rendering garbage
|
|
[2013-09-13 00:40:40] <ichthyo> of course
|
|
[2013-09-13 00:41:17] <ichthyo> but anyway, it is easy for the scheduler to ensure that either the old version runs,
|
|
or that all jobs belonging to the old version are marked as cancelled and only then
|
|
the processing of the new jobs takes place
|
|
that is kind of a transactional switch
|
|
such is really easy for the implementation of the scheduler to ensure.
|
|
But it is near impossible for anyone else in the system to ensure that
|
|
[2013-09-13 00:42:22] <cehteh> I really see no problem .. of course I would like if all buffers are under our control,
|
|
but even if not, or if we need to make a memcpy .. still this resource is abstracted
|
|
and only one writer can be there and all readers are blocked until the writer job is finished
|
|
[2013-09-13 00:43:35] <cehteh> reader in this case might be a hard-realtime buffer-flip by the player
|
|
[2013-09-13 00:44:15] <ichthyo> I also think this isn't really a problem, but something to be aware off.
|
|
Moreover at some point we need to tell the output mechanism where the data is
|
|
and there are two possibilities:
|
|
(1) within the callback which is activated by the output library,
|
|
we copy the data from an intermediary buffer
|
|
or
|
|
(2) our jobs immediately render into the address given by the output mechanism
|
|
[2013-09-13 00:45:45] <ichthyo> (1) looks simpler, but incurs an additional memcopy -- not really much of a problem
|
|
[2013-09-13 00:46:33] <ichthyo> but for (1), when such a switch happens, at the moment when the output library prompts us
|
|
to deliver, we need to know from *which* internal buffer to get the data
|
|
[2013-09-13 00:46:38] <cehteh> i'd aim for both varieties .. and make that somehow configurable
|
|
[2013-09-13 00:46:48] <ichthyo> yes, that would be ideal
|
|
[2013-09-13 00:46:49] <cehteh> nothing needs to be fixed there
|
|
ideally we might even mmap output buffers directly on the graphics card memory
|
|
and manage that with our backend and tell the output lib (opengl) what to display.
|
|
I really want to have this very flexible
|
|
[2013-09-13 00:47:14] * ichthyo thinks the same
|
|
----------------------------
|
|
|
|
|
|
.-- define jobs by time window --
|
|
[caption="☉Transcript☉ "]
|
|
----------------------------
|
|
[2013-09-13 00:47:39] <ichthyo> this leads to another small detail: we really need a *time window* for
|
|
the activation of jobs, i.e. a start time, and a deadline
|
|
start time == not activate this job before time xxx
|
|
and deadline == mark this job as failed if it can't be started before this deadline
|
|
do you think such a start or minimum time is a problem for the scheduler implementation ?
|
|
it is kind of an additional pre-condition
|
|
The reason is simple. If we get our scheduling to work very precise,
|
|
we can dispose of a lot of other handover and blocking mechanisms
|
|
[2013-09-13 00:51:40] <cehteh> I was thinking about that too
|
|
Initially I once had the idea to have the in-time scheduler scheduled by start time
|
|
and the background scheduler by "not after" -- but prolly both schedulers should
|
|
just have time+span, making them both the same.
|
|
[2013-09-13 00:53:03] <ichthyo> fine
|
|
----------------------------
|
|
|
|
|