LUMIERA.clone/doc/devel/rfc/ThreadsSignalsAndImportantManagementTasks.txt

Threads, Signals and important management tasks
===============================================

// please don't remove the //word: comments

[grid="all"]
`------------`-----------------------
*State*         _Final_
*Date*          _Sat Jul 24 21:59:02 2010_
*Proposed by*   Christian Thaeter <ct@pipapo.org>
-------------------------------------

[abstract]
******************************************************************************
Handling of Signals in a multithreaded Application is little special, I define
here how we going to implement this.
******************************************************************************


Description
-----------
//description: add a detailed description:

By default in POSIX signals are send to whatever thread is running and handled
there. This is quite unfortunate because a thread might be in some time
constrained situation, hold some locks or have some special priority. The
common way to handle this is blocking (most) signals in all threads except
having one dedicated signal handling thread. Moreover it makes sense that the
initial thread does this signal handling.

For Lumiera I propose to follow this practice and extend it a little by
dedicate a thread to some management tasks. These are:
 * signal handling, see below.
 * resource management (resource-collector), waiting on a condition variable or
   message queue to execute actions.
 * watchdog for threads, not being part of the application schedulers but
   waking up periodically (infrequently, every so many seconds) and check if
   any thread got stuck (threads.h defines a deadline api which threads may
   use). We may add some flag to threads defining what to do with a given
   thread when it got stuck (emergency shutdown or just cancel the thread).
   Generally threads should not get stuck but we have to be prepared against
   rogue plugins and programming errors.


.Signals which need to be handled

This are mostly proposals about how the application shall react on signals and
comments about possible signals.

 SIGTERM::
        Send on computer shutdown to all running apps. When running with GUI
        but we likely lost the Xserver connection before, this needs to be
        handled from the GUI. Nevertheless in any case (most importantly when
        running headless) we should do a fast application shutdown, no
        data/work should go lost, a checkpoint in the log is created. Some
        caveat might be that Lumiera has to sync a lot of data to disk. This
        means that usual timeouts from SIGTERM to SIGKILL as in nomal shutdown
        might be not sufficient, there is nothing we can do there. The user has
        to configure his system to extend this timeouts (alternative: see
        SIGUSR below).

 SIGINT::
        This is the CTRL-C case from terminal, in most cases this means that a
        user wants to break the application immediately. We trigger an
        emergency shutdown. Recent actions are be logged already, so no work
        gets lost, but no checkpoint in the log gets created so one has to
        explicitly recover the interrupted state.

 SIGBUS::
        Will be raised by I/O errors in mapped memory. This is a kindof
        exceptional signal which might be handled in induvidual threads. When
        the cause of the error is traceable then the job/thread worked on this
        data goes into a errorneous mode, else we can only do a emergency
        shutdown.

 SIGFPE::
        Floating point exception, divison by zero or something similar. Might
        be allowed to be handled by each thread. In the global handler we may
        just ignore it or do an emergency shutdown. tbd.

 SIGHUP::
        For daemons this signal is usually used to re-read configuration data.
        We shall do so too when running headless. When running with GUI this
        might be either act like SIGTERM or SIGINT. possibly this can be
        configureable.

 SIGSEGV::
        Should not be handled, at the time a SEGV appears we are in a undefined
        state and anything we do may make things worse.

 SIGUSR1::
        First user defined signal. Sync all data to disk, generate a
        checkpoint. The application may block until this is completed. This can
        be used in preparation of a shutdown or periodically to create some
        safe-points.

 SIGUSR2::
        Second user defined signal. Produce diagnostics, to terminal and file.

 SIGXCPU::
        CPU time limit exceeded. Emergency Shutdown.

 SIGXFSZ::
        File size limit exceeded. Emergency Shutdown.


Tasks
~~~~~
// List what would need to be done to implement this Proposal in a few words:
// * item ...

We have appstate::maybeWait() which already does such a loop. It needs to be
extended by the proposed things above.


Discussion
~~~~~~~~~~


Pros
^^^^
// add just a fact list/enumeration which make this suitable:
//  * foo
//  * bar ...


Cons
^^^^
// fact list of the known/considered bad implications:


Alternatives
^^^^^^^^^^^^
//alternatives: if possible explain/link alternatives and tell why they are not


Rationale
---------
//rationale: Describe why it should be done *this* way:

This is rather common practice. I describe this here for Documentation purposes
and to point out which details are not yet covered.

//Conclusion
//----------
//conclusion: When approbate (this proposal becomes a Final) write some


Comments
--------
//comments: append below

.State -> Final
ichthyo wants this to be a dedicated thread (own subsystem) instead running in
the initial thread. Fixed this in the proposal above, this makes this accepted.
    Do 14 Apr 2011 03:40:41 CEST Christian Thaeter <ct@pipapo.org>


//endof_comments:
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00			`Threads, Signals and important management tasks`
			`===============================================`

			`// please don't remove the //word: comments`

			`[grid="all"]`
			`------------`-----------------------
Changes/Updates on the RFC documents as decided on the Developer meeting 2011-04-14 03:47:16 +02:00			`State _Final_`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00			`Date _Sat Jul 24 21:59:02 2010_`
			`Proposed by Christian Thaeter <ct@pipapo.org>`
			`-------------------------------------`

			`[abstract]`
			`******************************************************************************`
			`Handling of Signals in a multithreaded Application is little special, I define`
			`here how we going to implement this.`
			`******************************************************************************`


			`Description`
			`-----------`
			`//description: add a detailed description:`

Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`By default in POSIX signals are send to whatever thread is running and handled`
			`there. This is quite unfortunate because a thread might be in some time`
			`constrained situation, hold some locks or have some special priority. The`
			`common way to handle this is blocking (most) signals in all threads except`
			`having one dedicated signal handling thread. Moreover it makes sense that the`
			`initial thread does this signal handling.`

			`For Lumiera I propose to follow this practice and extend it a little by`
Changes/Updates on the RFC documents as decided on the Developer meeting 2011-04-14 03:47:16 +02:00			`dedicate a thread to some management tasks. These are:`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00			`* signal handling, see below.`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`* resource management (resource-collector), waiting on a condition variable or`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00			`message queue to execute actions.`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`* watchdog for threads, not being part of the application schedulers but`
			`waking up periodically (infrequently, every so many seconds) and check if`
			`any thread got stuck (threads.h defines a deadline api which threads may`
			`use). We may add some flag to threads defining what to do with a given`
			`thread when it got stuck (emergency shutdown or just cancel the thread).`
			`Generally threads should not get stuck but we have to be prepared against`
			`rogue plugins and programming errors.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00

			`.Signals which need to be handled`

Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`This are mostly proposals about how the application shall react on signals and`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00			`comments about possible signals.`

			`SIGTERM::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`Send on computer shutdown to all running apps. When running with GUI`
			`but we likely lost the Xserver connection before, this needs to be`
			`handled from the GUI. Nevertheless in any case (most importantly when`
			`running headless) we should do a fast application shutdown, no`
			`data/work should go lost, a checkpoint in the log is created. Some`
			`caveat might be that Lumiera has to sync a lot of data to disk. This`
			`means that usual timeouts from SIGTERM to SIGKILL as in nomal shutdown`
			`might be not sufficient, there is nothing we can do there. The user has`
			`to configure his system to extend this timeouts (alternative: see`
			`SIGUSR below).`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`SIGINT::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`This is the CTRL-C case from terminal, in most cases this means that a`
			`user wants to break the application immediately. We trigger an`
Global-Layer-Renaming: fix remaining textual usages and IDs in the code - most notably the NOBUG logging flags have been renamed now - but for the configuration, I'll stick to "GUI" for now, since "Stage" would be bewildering for an occasional user - in a similar vein, most documentation continues to refer to the GUI 2018-11-16 22:38:29 +01:00			`emergency shutdown. Recent actions are be logged already, so no work`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`gets lost, but no checkpoint in the log gets created so one has to`
			`explicitly recover the interrupted state.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`SIGBUS::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`Will be raised by I/O errors in mapped memory. This is a kindof`
			`exceptional signal which might be handled in induvidual threads. When`
			`the cause of the error is traceable then the job/thread worked on this`
			`data goes into a errorneous mode, else we can only do a emergency`
			`shutdown.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`SIGFPE::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`Floating point exception, divison by zero or something similar. Might`
			`be allowed to be handled by each thread. In the global handler we may`
			`just ignore it or do an emergency shutdown. tbd.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`SIGHUP::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`For daemons this signal is usually used to re-read configuration data.`
			`We shall do so too when running headless. When running with GUI this`
			`might be either act like SIGTERM or SIGINT. possibly this can be`
			`configureable.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`SIGSEGV::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`Should not be handled, at the time a SEGV appears we are in a undefined`
			`state and anything we do may make things worse.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`SIGUSR1::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`First user defined signal. Sync all data to disk, generate a`
			`checkpoint. The application may block until this is completed. This can`
			`be used in preparation of a shutdown or periodically to create some`
			`safe-points.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`SIGUSR2::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`Second user defined signal. Produce diagnostics, to terminal and file.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`SIGXCPU::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`CPU time limit exceeded. Emergency Shutdown.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`SIGXFSZ::`
Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`File size limit exceeded. Emergency Shutdown.`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00

			`Tasks`
			`~~~~~`
			`// List what would need to be done to implement this Proposal in a few words:`
			`// * item ...`

Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`We have appstate::maybeWait() which already does such a loop. It needs to be`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00			`extended by the proposed things above.`


formatting fixes for existing RfCs 2010-12-13 04:46:44 +01:00
			`Discussion`
			`~~~~~~~~~~`


RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00			`Pros`
			`^^^^`
			`// add just a fact list/enumeration which make this suitable:`
			`// * foo`
			`// * bar ...`


			`Cons`
			`^^^^`
			`// fact list of the known/considered bad implications:`



			`Alternatives`
formatting fixes for existing RfCs 2010-12-13 04:46:44 +01:00			`^^^^^^^^^^^^`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00			`//alternatives: if possible explain/link alternatives and tell why they are not`



			`Rationale`
			`---------`
			`//rationale: Describe why it should be done this way:`

Rewrap all RFC's This reverts commit 65bae31de4103abb7d7b6fd004a8315973d3144a. and reprocessed the wrapping. Note that the automatic wrapping is not perfect, some manual fixing by removing some hunks was required. 2010-07-26 02:54:14 +02:00			`This is rather common practice. I describe this here for Documentation purposes`
RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00			`and to point out which details are not yet covered.`

			`//Conclusion`
			`//----------`
			`//conclusion: When approbate (this proposal becomes a Final) write some`




			`Comments`
			`--------`
			`//comments: append below`

Changes/Updates on the RFC documents as decided on the Developer meeting 2011-04-14 03:47:16 +02:00			`.State -> Final`
			`ichthyo wants this to be a dedicated thread (own subsystem) instead running in`
clean-up: trifles 2025-06-07 23:59:57 +02:00			`the initial thread. Fixed this in the proposal above, this makes this accepted.`
Changes/Updates on the RFC documents as decided on the Developer meeting 2011-04-14 03:47:16 +02:00			`Do 14 Apr 2011 03:40:41 CEST Christian Thaeter <ct@pipapo.org>`

RFC about the application main thread handling signals and resource management 2010-07-25 14:58:36 +02:00
			`//endof_comments:`