From b91b964d21425cc942a589160d93560d88fc4736 Mon Sep 17 00:00:00 2001 From: Ichthyostega Date: Mon, 16 Sep 2013 04:03:15 +0200 Subject: [PATCH] DOC: 9/2013 meeting sumary and IRC transcript --- doc/devel/meeting_summary/2013-09-12.txt | 679 +++++++++++++++++++++++ 1 file changed, 679 insertions(+) create mode 100644 doc/devel/meeting_summary/2013-09-12.txt diff --git a/doc/devel/meeting_summary/2013-09-12.txt b/doc/devel/meeting_summary/2013-09-12.txt new file mode 100644 index 000000000..d59a70687 --- /dev/null +++ b/doc/devel/meeting_summary/2013-09-12.txt @@ -0,0 +1,679 @@ +2012-12-12 Lumiera Developers Meeting +===================================== +:Author: Ichthyo +:Date: 2012-12-21 + +Dec 20, 2012 on #lumiera 20:00 - 23:23 UTC + + +__Participants__ + + * cehteh + * ichthyo + * Benny + * Hendrik + +_Summary written by ichthyo_ + + + +Doxygen woes +------------ +_Hendik_ pointed out an example where the handling and presentation +of extracted documentation was confusing. It turned out that didn't recognise +some documentation comments and thus retained those within the pretty printed source. +Basically this was known and documented behaviour, but confusing still the less. + +_ichthyo_ slightly tweaked the configuration. Moreover, currently he creates and uploads +the API-doc manually and irregularly , so the content on the website is quite outdated at +times. Automatic publishing was previously done by builddrone; _cehteh_ promised to finish +and install an improved version... + +We all agree that we somehow dislike Doxygen, but aren't aware of reasonable alternatives. + +Conclusion +~~~~~~~~~~ + + * _ichthyo_ will fix the comments not recognised by Doxygen + * we reconfirm that we do _not_ want to create all our documentation based on Doxygen + + + +FrOSCon aftermath +----------------- +The visits, hiking together, and the meeting at FrOSCon was refreshing and reassuring. +In the end, all went well. Everyone survived the after-froscon party and Benny's car is +fixed and working again. + +_Benny_ proposes to create a page with some pictures, just to retain some traces of this +event. _Ichthyo_ is a bit reluctant, since he didn't care especially about documentation +this time, but he promises to look what usable images he's gotten. + +Conclusion +~~~~~~~~~~ + + * create a page with some images + * conclusion about FrOSCon? ``it was fun'' ... + + + +Scheduler: Interface and requirements +------------------------------------- +_Benny_ showed interest to work on that topic. The first step would be to build or use +a priority queue textbook implementation as starting point. Some time ago, _cehteh_ included +a suitable implementation in his link:http://git.pipapo.org/?p=cehsrc;a=summary[cehlib],a +collection of basic C library routines, mostly extracted from Lumiera's library. _ichthyo_ +integrates this priority queue into the Lumiera tree. + +The rest of the meeting was an extended discussion touching and affirming the most relevant +issues and considerations of the scheduler implementation to be expected. + +- the core scheduler has to be kept rather simple +- the actual job function is wrapped into an extended function, which is tightly integrated + with the scheduler's implementation. This approach allows to implement more elaborate + strategies, without increasing the complexity of the actual scheduler. +- handling of dependencies between jobs is considered as one of the tricky requirements +- the intention is to pre-process and transform prerequisites in lists of dependent jobs +- prerequisites require us to build in some mechanisms for ``conditionals'' +- notably, the data required for processing will become available asynchronously. +- thus, the scheduler must include some form of _polling_ to detect when prerequisites + are finally met and unblock dependent jobs as a consequence. +- our scheduling works strictly ordered by time. There is no throttling. But we provide + _multiple_ scheduler queues, which use _different_ ``thread classes'' +- a given job can be in multiple queues at the same time; first invocation wins. +- we intend to employ _work stealing_ +- some special kinds of scheduling are not time bound (e.g. background rendering, + ``freewheeling'' rendering). But we use time bound delivery as the fundamental + model and treat these as corner cases +- ``the scheduler'' as a service and sub-system encompasses more than just the + low-level implementation of a priority queue. We need an integrated _manager_ + or _controller_ to provide the more high-level services required by the + ``player'' subsystem in Proc-Layer. +- we need a mechanism to obsolete or supersede jobs, which are already planned, + but not yet triggered. The reason lies in the interactive nature of the Player. +- the implementation needs to be worked out; this is an internal detail of the + scheduler (seen as a subsystem), but likely it is not implemented in the + low-level scheduler queue. One promising implementation approach is to + use special ``group leader'' marker jobs. +- when jobs are superseded, the switch from the old to the new version should + happen in a clean way; there are several options how to achieve that in practice +- jobs will not only defined by their deadline; rather, we'll allow to define + a _time window_, during which a job must be triggered for regular execution. + +_see below for a slightly shortened transcript of these discussions_ + +Conclusion +~~~~~~~~~~ + * The scheduler service has a high-level interface + * there are multiple really simple, low-level scheduler queues + * some kind of manager or controller connects both levels + * superseding of planned jobs can be implemented through ``group leader'' jobs + * the link:{rfc}/SchedulerRequirements.html[RfC] should be augmented accordingly + + + +Next meeting +------------ + +The next meeting will be Thursday October 10, 20:00 UTC + + +'''' + +++++ +
+
+
+
+++++ + + +[[irctranscript]] +IRC Transcript +-------------- +- xref:dependencies[dependant jobs and conditionals] +- xref:schedulingmodes[various modes of scheduling] +- xref:architecture[questions of architecture] +- xref:superseding[aborting/superseding of jobs] +- xref:cleanswitch[clean switch when superseding planned jobs] + + +.-- Discussion of details -- +[caption="☉Transcript☉ "] +---------------------------- +[2013-09-12 22:55:00] bennI_ you said you might look into that topic, as time permits, of course +[2013-09-12 22:49:29] .. scheduler .. shall I explain what I have in mind for the backend lowest level? +[2013-09-12 22:55:12] a low level job should be very very simple, that is: a single pointer to a + 'job function' and a list of prerequisite jobs (and maybe little more) +[2013-09-12 22:55:57] all things like rescheduling, aborting, etc. are on the lowest level handled over + this single job function which gets a parameter about the state/condition on which + its run (in time, aborting, expired, ....) +[2013-09-12 22:57:22] anything more, especially dispatching on different functions to handle the actual state + should be implemented on a level above (and maybe already in C++ by Proc) then +[2013-09-12 22:57:41] but the job function is just a call back function, defined elsewhere +[2013-09-12 22:57:48] yes +[2013-09-12 22:58:16] so it's the jobs that are being scheduled +[2013-09-12 22:58:26] yes +[2013-09-12 22:58:58] basically yes, but as you said, this low-level job function also has to handle + the state and maybe dispatch to the right high-level function. A different function + for working, than for aborting, for example +[2013-09-12 22:59:45] yes, but want to leave that out of the scheduler itself, that's handled on a higher level +[2013-09-12 22:59:45] so that is kind of a thin layer on top of the basic scheduler +---------------------------- + + + + +[[dependencies]] +.-- dependant jobs and conditionals -- +[caption="☉Transcript☉ "] +---------------------------- +[2013-09-12 22:59:46] what about the dependent jobs? +[2013-09-12 23:00:03] thats the most important question I think, since *that* is something special +[2013-09-12 23:00:19] yes dependencies need to be handled by the scheduler +[2013-09-12 23:00:52] well... at least the scheduler needs to poll them in some way +[2013-09-12 23:01:01] poll or notify or re-try or the like +[2013-09-12 23:01:03] one question: shall jobs be in the priority queue even if their dependencies + are not yet satisfied? ... I'd tend to say yes + +[2013-09-12 23:01:11] so we're not going to have a scheduler with simple jobs +[2013-09-12 23:01:31] the scheduler must maintain lists of dependencies. Nodes for these lists are likely + to be allocated by the small object allocator I've written some time ago; + because 2 different jobs can depend on a single other job and other more complex + cross dependencies +... + +[2013-09-12 23:02:54] dependencies are the results of jobs +[2013-09-12 23:03:04] so you propose to pre-process that prerequisites and rather store them + as dependencies internally? +[2013-09-12 23:03:11] a 'job' might be a no-op if the resource is available +[2013-09-12 23:03:04] but these prerequisites are IN the scheduler +[2013-09-12 23:03:10] not in the higher level? +[2013-09-12 23:03:41] bennI_ the scheduler needs only to be aware that there is some kind of dependency +[2013-09-12 23:03:43] yes the scheduler needs to be aware of dependencies so anything needs to be abstracted somehow, + that's why I'd like to say anything is a 'job', even if that's technically not completely true + because something might be a 'singleton instance' and no job needs to be run to create it +[2013-09-12 23:04:21] on that level indeed, yes +[2013-09-12 23:04:45] any more fancy functionality is encapsulated within that simple job abstraction +[2013-09-12 23:06:18] as long some resource (which can become a prerequisite/dependency of any other job) exists + it has some ultra-lightweight job structure associated with it + +[2013-09-12 23:06:45] now, for example, lets consider the loading of data from a file. + How does this work in practice? do we get a callback when the data arrives? + and where do we get that callback? guess in another thread. + and then, how do we instruct the scheduler so that the jobs dependant on the + arrival of that data can now become active? +[2013-09-12 23:08:07] note that we do memory mapping, we never really load data as in calling read() + but we might create prefetch jobs which complete when data is in memory. + The actual loading is done by the kernel +[2013-09-12 23:08:24] yes, my understanding too +[2013-09-12 23:08:38] but it is asynchronous, right? +[2013-09-12 23:08:42] yes +[2013-09-12 23:08:58] from the schedulers perspective, it's juts a callback, so it is defined elsewhere + i.e., data loading and details are implemented elsewhere in the callback itself + +[2013-09-12 23:09:43] But... there is a problem: not the scheduler invokes that callback, + someone else (triggered by the kernel) invokes this callback, and + this must result in the unblocking of the dependant jobs, right? +[2013-09-12 23:10:35] the scheduler just has the callback waiting, then when all conditions are met + it just gets scheduled +[2013-09-12 23:10:21] nope +[2013-09-12 23:10:54] note: we need a 'polling' state for a job +[2013-09-12 23:10:57] so the scheduler polls a variable, signalling that the data is there (or isn't yet) +[2013-09-12 23:11:14] yes +[2013-09-12 23:11:22] but even if not we can ignore the fact; then our job might block, + in practice that should happen *very* rarely +[2013-09-12 23:11:37] we could abstract that as a very simple conditional + the scheduler is polling some conditional +[2013-09-12 23:11:47] unless the presence of the data for the job is a precondition +[2013-09-12 23:12:28] bennI_: yes, the presence of the data *is* a precondition for a render job + thus the scheduler must not start the render job, unless the data is already there +[2013-09-12 23:13:11] so the job cannot be 'scheduled' until data is present -- this is one of the preconditions +[2013-09-12 23:12:47] loading data: have a job calling posix_memadvice(..WILLNEED) soon enough + before calling the job which needs the data + if we can not afford blocking, then we can poll the data with mincore() + and if the data is not there abort the job or take some other action. + I prolly add some (rather small budget) memory locking to the backend too +[2013-09-12 23:14:13] OK, but then this would require some kind of re-trying or polling of jobs. + do we want this? I mean a job can start processing, once the data is there + and we are in the pre defined time window +[2013-09-12 23:15:08] then if *really* needed you can make a job which locks the data in ram + (that is really loading it and only completes when its loaded) + this way you can avoid polling too. But that's rather something we should + do only for really important things +[2013-09-12 23:15:39] urghs, that would block the thread, right? so polling sounds more sane +[2013-09-12 23:16:22] blocking the thread is no issue as we have a thread pool and this thread pool + should later be aware that some threads might be blocking. + (I planned to add some thread class for that) +[2013-09-12 23:16:36] what about having 2 jobs: one is the load, the other is a precondition, + i.e., the presence of the data +[2013-09-12 23:16:52] bennI_: yes exactly like that +[2013-09-12 23:16:53] bennI_ yes that was what I was thinking too +[2013-09-12 23:17:05] one job to prepare / trigger the loading + one job to verify the data is there (this is a conditional) +[2013-09-12 23:17:19] but either one can block .. we a free to decide which one blocks +[2013-09-12 23:17:28] so we have two stupid jobs, where onne can only happen if the other happens +[2013-09-12 23:17:29] and then the actual calculation job +[2013-09-12 23:18:22] well the scheduler on the lowest level should be unaware of all that + .. just dead dumb scheduling. All logic is rolled on higher levels + that allows us for more smart things depending on the actual use case, + different strategies for different things +[2013-09-12 23:17:50] we could even have only ONE job: it has a linked lists of jobs + but why not -- instead of ONE job, it can have a linked list of small jobs + once it's scheduled, maybe only one job gets run, i.e. data gets loaded. + the next schedule of the same job, loads the actual file, then the dependency +[2013-09-12 23:20:36] bennI_: first is a priqueue which schedules jobs by time + maybe aside of that there is a list of jobs ready to run + and a list of jobs not ready to run b/c their dependencies are not satisfied +[2013-09-12 23:22:21] (mhm ready to run? maybe not, just run it!) +[2013-09-12 23:22:49] ok, so the jobs themselves are going to be a bit more complex + I think we'll need a bit of trial and error; we cannot possibly envisage + everything right now. Maybe some of the jobs will not be a simple linked list, + but a tree +[2013-09-12 23:22:36] OK plan: + scheduler first picks job from the priqueue and checks if all dependencies are met + either way it calls the job function, either with "in time" or with "missing dependencies" +[2013-09-12 23:23:44] dumping some branches altogether? +[2013-09-12 23:24:30] then if state is "missing dependencies" the job function *might* insert the job + into the list of blocked jobs (or cancel the job or call anything else) + whenever a job completes, it might release one or more jobs from the 'blocked' list, + maybe even move them to a delayed list, and then the scheduler (after handling the priqueue) + picks any jobs from this delayed list up and runs them +[2013-09-12 23:25:02] OK +[2013-09-12 23:25:39] do ALL jobs get equally scheduled + or do dependent jobs only enter in a linked list of jobs for one scheduled job, + if you know what I mean... +[2013-09-12 23:27:16] bennI_: jobs are scheduled by the priqueue which is ordered by time .. plus maybe some small + integer so you can give a order to jobs ought to be run at the same time + we have no job priorities otherwise, but we will have (at least) 2 schedulers + one is for the hard jobs which must be run in time, and one is for background task + and one (but these rather needs special attention) will be some realtime scheduler + which does the hard timing stuff, like switching video frames or such, but that's + not part of the normal processing +[2013-09-12 23:28:03] yes, but some jobs are not jobs in themselves, only depend on other jobs +[2013-09-12 23:28:16] do these su, dependent jobs get priqued? +[2013-09-12 23:29:23] bennI_: I think this is a question of design. Both approaches will work, but it depends + on the implementer of the scheduler to prefer one approach over the other +[2013-09-12 23:30:12] bennI_: my idea was that each job might be in both queues (in-time and background) + so for example when the system is reasonable idle a job might be first scheduled + by the 'background' scheduler because its normal scheduled time is way ahead +---------------------------- + + + + +[[schedulingmodes]] +.-- various modes of scheduling -- +[caption="☉Transcript☉ "] +---------------------------- +[2013-09-12 23:30:23] cehteh: we certainly get a third (or better a fourth) mode of operation: + the "freewheeling" calculation. This is what we use on final rendering. + Do any calculation as soon as possible, but without any timing constraints +[2013-09-12 23:31:31] ichthyo: the freewheeling is rather simple: just put jobs with time==0 + into the background scheduler + or? +[2013-09-12 23:32:10] OK, so it can be implemented as a special case of background scheduling, + where we just use all available resources to 100% +[2013-09-12 23:32:36] or not time=0 but time==now +[2013-09-12 23:32:59] because other jobs should eventually be handled too + anyways I think that's not becoming complicated +[2013-09-12 23:33:42] * ichthyo nods +[2013-09-12 23:34:24] only the meaning is a bit different, not so much the implementation + background == throttle resource usage + freewheeling == get me all the power that is available +[2013-09-12 23:36:52] background wont throttle, it just doesn't get scheduled when there are + more important things to do +[2013-09-12 23:37:19] mhm, might be, not entirely sure +[2013-09-12 23:37:35] there is no point in putting something into the background queue + if we never need the stuff. But it should have some safety margin there + and maybe a different thread class (OS priority for the thread) +[2013-09-12 23:38:11] for example: user works with the application and plays back, + but at the same time, excess resources are used for pre-rendering +[2013-09-12 23:38:19] yes +[2013-09-12 23:38:21] but without affecting the responsiveness. + thus we don't want to use 100% of the IO bandwidth for background +[2013-09-12 23:38:53] then schedule less or more sparse background jobs +[2013-09-12 23:39:35] but we cant really throttle IO in a more direct way, since that's obligation + of the kernel and we can only hint it +[2013-09-12 23:39:54] ok, but someone has to do that. + Proc certainly can't do that, it can only give you a whole bunch of jobs + for background rendering +[2013-09-12 23:40:07] the job function might be aware if its scheduled because of in-time or + background queue and adjust itself +[2013-09-12 23:40:57] we only schedule background jobs if nothing else is to do +[2013-09-12 23:41:18] what means "nothing else"? + for example, if we're waiting for data to arrive, can we meanwhile schedule + background calculation (non-IO) jobs? +[2013-09-12 23:41:45] and if I/O becomes the bottleneck some logic to throttle background jobs + might be implemented on the higher level job functions... +[2013-09-12 23:42:12] I abstracted "thread classes" in the worker pool remember +[2013-09-12 23:42:23] * ichthyo nods +[2013-09-12 23:42:33] these are not just priorities but define rather the purpose of a thread + we can have "background IO" thread classes there .. and if a job gets scheduled + from the background queue it might query the IO load and rather abort/delay the job + +[2013-09-12 23:43:51] another thing worth mentioning +[2013-09-12 23:43:55] the planning of new jobs happens within jobs + there are some special planning jobs from time to time + but that certainly remains opaque for the scheduler + +[2013-09-12 23:44:15] the scheduler should not be aware/responsible for any other resource managemnt +[2013-09-12 23:44:38] you can feed a nice to the scheduler, but the scheduler only uses the nice value + to reshuffle. Someone else must decide when to issue a nice +[2013-09-12 23:45:47] bennI_: there is no 'nice' value our schedulers are time based + niceness is defined by the 'thread classes' which can define even more + (io priority, aborting policies,...) -- even (hard) realtime and OS scheduling policies +---------------------------- + + + + +[[architecture]] +.-- questions of architecture -- +[caption="☉Transcript☉ "] +---------------------------- +[2013-09-12 23:45:31] as I see it: we have several low-level priqueues. + And we have a high-level scheduler interface. Some facility in between decides + which low-level queue to use. Proc will only state the rough class of a job, + i.e. timebound, or background. Thus these details are kept out of the low-level + (actual) scheduler (implementation), but are configured close to the scheduler, + when we add jobs +[2013-09-12 23:48:16] the creator of a job (proc in most case) tells what purpose a job has + (numbercrunching, io, user-interface, foo, bar): that's a 'threadclass' + the actual implementation of threadclasses are defined elsewhere + +[2013-09-12 23:49:09] from Proc-Layer's perspective, all these details happen within the + scheduler as a black box. Proc only states a rough class (or purpose) of the job. + When the job is added, this gets translated into using a suitable thread class, + and we'll put the job in the right low-level scheduler queue +[2013-09-12 23:49:44] actually for each job you can do 2 of such things .. as i mentioned earlier, + each job can be in both queues, so you can define time+threadclass for background + and for in-time scheduler ... with a little caveat about priority inversion, + threadclass should be the same when you insert something in both queues, + only for freewheeling or other special jobs they might differ +[2013-09-12 23:51:52] bennI_: btw implementation detail I didnt tell you yet.. + you know "work stealing schedulers" ? + we absolutely want that :) +[2013-09-12 23:52:09] yes ;-) + its a kind of load balancing -- a very simple and clever one +[2013-09-12 23:52:11] that is: each OS thread has its own scheduler. + each job running on a thread which creates new thread puts these on the scheduler + of its own thread and only if a thread has nothing else to do, and the system + is not loaded, then it steals jobs from other threads. That gives far more locality + and much less contention. +[2013-09-12 23:56:23] another detail: we need to figure out if we need a pool of threads for each + threadclass OR if switching a thread to another threadclass is more performant. +[2013-09-12 23:56:45] *this* probably needs some experimentation +[2013-09-12 23:56:49] yes + + +[2013-09-12 23:57:32] OK, so please let me recap one thing: how we handle prerequisites +[2013-09-12 23:58:00] namely, (1) we translate them into dependent jobs (jobs that follow) + and (2) we have conditional jobs, which are polled regularly, + until they signal that a condition is met (or their deadline has passed) + +[2013-09-12 23:58:59] ichthyo: I think job creation is a (at least) 2 step process .. first you create + the job structure, fill in any data (jobs it depends on). These 'jobs it depends on' + might be just created of course .. and then when done you unleash it to the scheduler +[2013-09-12 23:59:39] indeed +[2013-09-12 23:59:52] on high-level, I hand over jobs as "transactions" or batches + then, the scheduler-frontend might do some preprocessing and re-grouping + and finally hands over the right data structures to the low-level interface + +[2013-09-13 00:00:35] after you given it to the scheduler, shall the scheduler dispose it when done + (I mean really done) or give it back to you +[2013-09-13 00:00:49] never give it back +[2013-09-13 00:00:52] OK +[2013-09-13 00:00:56] it is really point-and-shot +[2013-09-13 00:00:58] BUT -- there is a catch + mind me + we need to be able to "change the plan" +[2013-09-13 00:01:18] there must be no catch .. if there is one, then you have to get it back :) +[2013-09-13 00:01:33] for example +[2013-09-13 00:01:37] yes +[2013-09-13 00:01:46] but that's completely on your side +[2013-09-13 00:01:52] no +[2013-09-13 00:02:00] lets assume Proc has given the jobs for the next second to the scheduler + that is 25 frames * 3 jobs per frame * number of channels. + then, 20 ms later, the User in the GUI hits the pause button +[2013-09-13 00:02:52] now we need a way to "call back" *exactly those* jobs, + no other jobs (other timelines) +[2013-09-13 00:03:19] so we need a "scope", and we need to be able to "cancel" or "retarget" + the jobs already given to the scheduler. But never *individual* jobs, + always whole groups of jobs +[2013-09-13 00:03:00] you prolly create a higher level "render-job" class. + Now, if you want to be able to abort or move it then you have a flag there + (and/or maybe a backpointer to the low level job) +[2013-09-13 00:04:05] no +[2013-09-13 00:04:10] wait +[2013-09-13 00:04:20] I am absolutely sure I don't keep any pointer to the low level job + since I don't have any place to manage that. + it is really point and shot +[2013-09-13 00:04:39] yes +[2013-09-13 00:05:15] you don't need to manage .. this is just a tag +[2013-09-13 00:05:24] but some kind of flag or tag would work indeed, yes +[2013-09-13 00:06:02] your job function just handles that + if (self->job == myself) .... else oops_i_got_dumped() + of course self->job needs to be protected by some mutex +[2013-09-13 00:07:14] I think that is an internal detail of the scheduler (as a subsystem) +[2013-09-13 00:07:29] now when you reschedule you just create a new (low level)job .. and tag + the higher level job with that job and unleash it +[2013-09-13 00:07:34] the *scheduler* wraps the actual job into a job function, of course +[2013-09-13 00:08:15] so this self->job is just a tag about the owner, no need to manage + you only need to check for equality and maybe some special case like NULL for aborts + no need to release or manage it +[2013-09-13 00:08:57] well yes. but that is not Proc + not Proc or the Player is doing that, but the scheduler (in the wider sense) is doing that + since in my understanding, only the scheduler has access to the jobs, after they + have been handed over +[2013-09-13 00:09:41] well proc creates the job +[2013-09-13 00:09:47] yes +[2013-09-13 00:10:12] but you certainly augment the low level job structure with some higher level data +[2013-09-13 00:10:30] yes +[2013-09-13 00:10:49] and the scheduler itself will certainly also put some metadata into the job descriptor struct +[2013-09-13 00:10:50] then throw them at the scheduler and say goodby? +[2013-09-13 00:10:53] there you just add a mutex and a tag (job pointer) +[2013-09-13 00:11:16] I'd like to see that entirely as an implementation detail within the scheduler +[2013-09-13 00:11:23] the job descriptor must become very small +[2013-09-13 00:11:28] since it highly depends on thread management and the like +[2013-09-13 00:11:50] no need +[2013-09-13 00:12:15] essentially, Proc will try to keep out of that discussion +[2013-09-13 00:12:16] it can become an implementation detail of some middle layer .. above the scheduler +---------------------------- + + + +[[superseding]] +.-- how to handle aborting/superseding -- +[caption="☉Transcript☉ "] +---------------------------- +[2013-09-13 00:12:19] how is the proc going to say 'stop some job' +[2013-09-13 00:12:28] yes that's the question. That's what I'm getting at +[2013-09-13 00:12:49] Proc will certainly never ask you to stop a specific job + this is sure. +[2013-09-13 00:12:54] Proc doesn't have a handle or the like +[2013-09-13 00:13:09] BUT -- Proc will ask you to re-target / abort or whatever all jobs within a certain scope + and this scope is given with the job definition as a tag +[2013-09-13 00:13:15] only proc knows to stop something but you need some grip on it .. and of course that are + the proc own datastructures (higher level job descriptors) +[2013-09-13 00:13:34] as said -- proc tags each job with e.g. some number XXX + and then it might come to the scheduler and say: + please halt all jobs with number XXX +[2013-09-13 00:14:09] the 'tag' can be the actual job handle .. even if you don't own it any more +[2013-09-13 00:14:32] 'IT' ? + the proc? +[2013-09-13 00:14:44] why number .. why not the low level job pointer? + that is guaranteed to be a valid unique number for each job +[2013-09-13 00:15:06] cehteh: since Proc never retains knowledge regarding individual jobs +[2013-09-13 00:15:17] uhm -- when you create a job it does +[2013-09-13 00:15:24] 'cause the Proc layer wants to give the job to the scheduler and say goodby, + I know nothing about you anymore +[2013-09-13 00:15:35] exactly. Proc finds out what needs to be calculated, creates those job descriptors, + tags them as a group and then throws it over the wall +[2013-09-13 00:15:52] but proc can't throw it over the wall +[2013-09-13 00:16:10] it has a vested interessted in the job, i.e. abort!!! +[2013-09-13 00:16:05] OK but I see this "tags these as groups" as layer above the dead simple scheduler + I dont really like the idea to implement that on the lowest level, but the job-function + can add a layer above and handle this +[2013-09-13 00:17:19] no, you don't need to implement it on the lowest level, of course + but basically its is an internal detail of "the scheduler" +[2013-09-13 00:17:50] nah .. "the manager" :) +[2013-09-13 00:17:57] :-D +[2013-09-13 00:18:01] the scheduler doesnt care +[2013-09-13 00:18:07] who handles the layer? +[2013-09-13 00:18:18] anyway, then "the manager" is also part of "the scheduler" damn it ;-) +[2013-09-13 00:18:26] is it the WALL where the proc thows over the job +[2013-09-13 00:18:31] it just schedules .. and if a job is canceled then the job-function + has to figure that out and make a no-op +[2013-09-13 00:18:28] Backend handles that layer. "The scheduler" is a high level thing + it contains multiple low-level schedulers, priqueues and all the management stuff, + and "the scheduler" as a subsystem can arrange this stuff in an optimal way. + No one else can. +[2013-09-13 00:18:49] stop -- I see this as a slight problem +[2013-09-13 00:19:20] ...and all the management stuff? +[2013-09-13 00:19:15] bennI_: we never wanted to remove jobs from the priority queue, + because that's relative expensive +[2013-09-13 00:19:43] yes, removing jobs is not really a schedulers job +[2013-09-13 00:20:07] true, no doubt +[2013-09-13 00:20:18] but as a client I just expect this service +[2013-09-13 00:20:23] its that mystically 'layer' or manager? +[2013-09-13 00:20:29] yes +[2013-09-13 00:20:43] and this mystical manager needs internal knowledge how the scheduler works +[2013-09-13 00:20:45] so the logic is all in the job function .. +[2013-09-13 00:20:47] as a client you can expect the manager to do this +[2013-09-13 00:20:55] but the manager belongs not to the scheduler +[2013-09-13 00:21:00] but the client doesn't need internal knowledge how the scheduler works +[2013-09-13 00:21:11] thus, clearly, the manager belongs to the scheduler, not the client +[2013-09-13 00:22:43] this will not be directly implemented within the scheduler +[2013-09-13 00:23:15] bennI_: absolutely, this is not in the low-level scheduler. + But it is closer to the low-level scheduler, than it is to the player + + +[2013-09-13 00:21:26] ok, WHO wants to stop or abort jobs? +[2013-09-13 00:21:46] proc :> +[2013-09-13 00:21:48] the player +[2013-09-13 00:22:11] more precisely: the player doesn't want to stop jobs, but he wants to change the playback mode +[2013-09-13 00:22:29] ichthyo: do you really need a 'scope' or can you have a 'leader' which you abort + This leader is practically implemented as a job which is already finished but others wait on it +[2013-09-13 00:22:44] such a leader would likely be a solution +[2013-09-13 00:23:35] ichthyo: agreed +[2013-09-13 00:24:03] cehteh: actually I really don't care *how* it is implemented. + Proc can try to support that feature with giving the right information + grouping or tagging would be one option +[2013-09-13 00:24:26] just look at the requirement from the player: + consider the pause button or loop playing, while the user moves the loop boundaries +[2013-09-13 00:24:50] or think at scrubbing where the user drags the "playhead" marker while it move +[2013-09-13 00:24:19] I opt for the leader implementation + because that needs no complicated scope lookup and can be implemented with the facilities + already there. But that still means that someone has to manage this leaders, + i.e. a small structure above the low level jobs (struct{mutex; enum state}) + and these leaders then have an identity/handle/pointer you need to care for +[2013-09-13 00:27:00] let's say, *someone* has to care +[2013-09-13 00:27:12] ahh + +[2013-09-13 00:27:50] I think we're going to need an intermediate layer between the job creator and the scheduler +[2013-09-13 00:27:59] yes, my thinking too +[2013-09-13 00:28:13] 'cause not only the job creator has access to the jobs, + some one else will also want to kill jobs, which is not the job creator + and how is the job killer supposed to know WHICH tag, or handle, of a job to kill +[2013-09-13 00:29:25] we use a strict ownership paradigm + if someone else wants to operate on something it has to be its owner +[2013-09-13 00:30:22] yes, and thus this management tasks need to be done within "the scheduler" in the wider sense +[2013-09-13 00:30:31] but that's not really a problem here + Proc creates jobs and this (slightly special) leader job and hands it over to the player +[2013-09-13 00:30:52] wait, the other way round +[2013-09-13 00:30:53] or other way around the player creates this leader and asks Proc to fill out the rest for it +[2013-09-13 00:30:58] but the killer, who is not the creator, doesn't own the job + but the scheduler KNOIWS NOTHING about jobs, only dependencies +[2013-09-13 00:31:13] but someone knows the leader; you just 'kill' the leader +---------------------------- + + + +[[cleanswitch]] +.-- clean switch when superseding planned jobs -- +[caption="☉Transcript☉ "] +---------------------------- +[2013-09-13 00:31:44] the player only requests "all jobs in this timeline and for this play process" + to be superseded by new jobs, and this is expressed by some tag or number or handle + or whatever (the player doesn't care) +[2013-09-13 00:32:38] so please note, it is not even just killing, effectively it is superseding, + but this is probably irrelevant for the scheduler, since the scheduler just + sees new jobs coming in afterwards + +[2013-09-13 00:34:24] unfortunately there is one other, nasty detail: + we need that switch for the superseding to happen in a clean manner. +[2013-09-13 00:34:42] This doesn't need to happen immediately, nor does it need to happen even at the same time + in each channel. But it cant't be that the old version for a frame job and the new version + of a frame job will both be triggered. It must be the old version, and then a clean switch, + and from then on the new version, otherwise we'll get lots of flickering and crappy noise + on the sound tracks +[2013-09-13 00:36:16] eh? +[2013-09-13 00:36:29] yes, new and old can't be interleaved +[2013-09-13 00:36:38] that never happens +[2013-09-13 00:36:45] ok, then fine... +[2013-09-13 00:36:49] because of the 'functional' model +[2013-09-13 00:37:03] you never render into the same buffer if the buffer is still in use + in the worst, the invalidated jobs already runs and the actual job is out of luck +[2013-09-13 00:37:34] well, we talked a thousand times about that: + this doesn't work for the output, since we don't and never can't manage the output buffers +[2013-09-13 00:38:02] I think that will work for output as well +[2013-09-13 00:38:16] I know, cehteh that you want to think that ;-) +[2013-09-13 00:38:22] even if I cant, they are abstracted and interlocked +[2013-09-13 00:38:44] but actually it is the other way round. You use some library for output, and this + just gives *us* some buffer managed by the library, and then we have to ensure + that our jobs exactly match the time window and dispose the data into this buffer +[2013-09-13 00:39:15] yes but we have can have only one job at a time writing to that buffer +[2013-09-13 00:39:28] this is the nasty corner case where our nice concept collides with the rest of the world ;-) +[2013-09-13 00:39:32] these jobs are atomic -- at least we should make these atomic +[2013-09-13 00:40:00] yes, that's important +[2013-09-13 00:40:21] even if that means rendering an invalidated frame that's better than rendering garbage +[2013-09-13 00:40:40] of course +[2013-09-13 00:41:17] but anyway, it is easy for the scheduler to ensure that either the old version runs, + or that all jobs belonging to the old version are marked as cancelled and only then + the processing of the new jobs takes place + that is kind of a transactional switch + such is really easy for the implementation of the scheduler to ensure. + But it is near impossible for anyone else in the system to ensure that +[2013-09-13 00:42:22] I really see no problem .. of course I would like if all buffers are under our control, + but even if not, or if we need to make a memcpy .. still this resource is abstracted + and only one writer can be there and all readers are blocked until the writer job is finished +[2013-09-13 00:43:35] reader in this case might be a hard-realtime buffer-flip by the player +[2013-09-13 00:44:15] I also think this isn't really a problem, but something to be aware off. + Moreover at some point we need to tell the output mechanism where the data is + and there are two possibilities: + (1) within the callback which is activated by the output library, + we copy the data from an intermediary buffer + or + (2) our jobs immediately render into the address given by the output mechanism +[2013-09-13 00:45:45] (1) looks simpler, but incurs an additional memcopy -- not really much of a problem +[2013-09-13 00:46:33] but for (1), when such a switch happens, at the moment when the output library prompts us + to deliver, we need to know from *which* internal buffer to get the data +[2013-09-13 00:46:38] i'd aim for both varieties .. and make that somehow configurable +[2013-09-13 00:46:48] yes, that would be ideal +[2013-09-13 00:46:49] nothing needs to be fixed there + ideally we might even mmap output buffers directly on the graphics card memory + and manage that with our backend and tell the output lib (opengl) what to display. + I really want to have this very flexible +[2013-09-13 00:47:14] * ichthyo thinks the same +---------------------------- + + +.-- define jobs by time window -- +[caption="☉Transcript☉ "] +---------------------------- +[2013-09-13 00:47:39] this leads to another small detail: we really need a *time window* for + the activation of jobs, i.e. a start time, and a deadline + start time == not activate this job before time xxx + and deadline == mark this job as failed if it can't be started before this deadline + do you think such a start or minimum time is a problem for the scheduler implementation ? + it is kind of an additional pre-condition + The reason is simple. If we get our scheduling to work very precise, + we can dispose of a lot of other handover and blocking mechanisms +[2013-09-13 00:51:40] I was thinking about that too + Initially I once had the idea to have the in-time scheduler scheduled by start time + and the background scheduler by "not after" -- but prolly both schedulers should + just have time+span, making them both the same. +[2013-09-13 00:53:03] fine +---------------------------- + +