2010-07-25 14:58:36 +02:00
|
|
|
Threads, Signals and important management tasks
|
|
|
|
|
===============================================
|
|
|
|
|
|
|
|
|
|
// please don't remove the //word: comments
|
|
|
|
|
|
|
|
|
|
[grid="all"]
|
|
|
|
|
`------------`-----------------------
|
2011-04-14 03:47:16 +02:00
|
|
|
*State* _Final_
|
2010-07-25 14:58:36 +02:00
|
|
|
*Date* _Sat Jul 24 21:59:02 2010_
|
|
|
|
|
*Proposed by* Christian Thaeter <ct@pipapo.org>
|
|
|
|
|
-------------------------------------
|
|
|
|
|
|
|
|
|
|
[abstract]
|
|
|
|
|
******************************************************************************
|
|
|
|
|
Handling of Signals in a multithreaded Application is little special, I define
|
|
|
|
|
here how we going to implement this.
|
|
|
|
|
******************************************************************************
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Description
|
|
|
|
|
-----------
|
|
|
|
|
//description: add a detailed description:
|
|
|
|
|
|
2010-07-26 02:54:14 +02:00
|
|
|
By default in POSIX signals are send to whatever thread is running and handled
|
|
|
|
|
there. This is quite unfortunate because a thread might be in some time
|
|
|
|
|
constrained situation, hold some locks or have some special priority. The
|
|
|
|
|
common way to handle this is blocking (most) signals in all threads except
|
|
|
|
|
having one dedicated signal handling thread. Moreover it makes sense that the
|
|
|
|
|
initial thread does this signal handling.
|
|
|
|
|
|
|
|
|
|
For Lumiera I propose to follow this practice and extend it a little by
|
2011-04-14 03:47:16 +02:00
|
|
|
dedicate a thread to some management tasks. These are:
|
2010-07-25 14:58:36 +02:00
|
|
|
* signal handling, see below.
|
2010-07-26 02:54:14 +02:00
|
|
|
* resource management (resource-collector), waiting on a condition variable or
|
2010-07-25 14:58:36 +02:00
|
|
|
message queue to execute actions.
|
2010-07-26 02:54:14 +02:00
|
|
|
* watchdog for threads, not being part of the application schedulers but
|
|
|
|
|
waking up periodically (infrequently, every so many seconds) and check if
|
|
|
|
|
any thread got stuck (threads.h defines a deadline api which threads may
|
|
|
|
|
use). We may add some flag to threads defining what to do with a given
|
|
|
|
|
thread when it got stuck (emergency shutdown or just cancel the thread).
|
|
|
|
|
Generally threads should not get stuck but we have to be prepared against
|
|
|
|
|
rogue plugins and programming errors.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
|
|
|
|
|
.Signals which need to be handled
|
|
|
|
|
|
2010-07-26 02:54:14 +02:00
|
|
|
This are mostly proposals about how the application shall react on signals and
|
2010-07-25 14:58:36 +02:00
|
|
|
comments about possible signals.
|
|
|
|
|
|
|
|
|
|
SIGTERM::
|
2010-07-26 02:54:14 +02:00
|
|
|
Send on computer shutdown to all running apps. When running with GUI
|
|
|
|
|
but we likely lost the Xserver connection before, this needs to be
|
|
|
|
|
handled from the GUI. Nevertheless in any case (most importantly when
|
|
|
|
|
running headless) we should do a fast application shutdown, no
|
|
|
|
|
data/work should go lost, a checkpoint in the log is created. Some
|
|
|
|
|
caveat might be that Lumiera has to sync a lot of data to disk. This
|
|
|
|
|
means that usual timeouts from SIGTERM to SIGKILL as in nomal shutdown
|
|
|
|
|
might be not sufficient, there is nothing we can do there. The user has
|
|
|
|
|
to configure his system to extend this timeouts (alternative: see
|
|
|
|
|
SIGUSR below).
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
SIGINT::
|
2010-07-26 02:54:14 +02:00
|
|
|
This is the CTRL-C case from terminal, in most cases this means that a
|
|
|
|
|
user wants to break the application immediately. We trigger an
|
2018-11-16 22:38:29 +01:00
|
|
|
emergency shutdown. Recent actions are be logged already, so no work
|
2010-07-26 02:54:14 +02:00
|
|
|
gets lost, but no checkpoint in the log gets created so one has to
|
|
|
|
|
explicitly recover the interrupted state.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
SIGBUS::
|
2010-07-26 02:54:14 +02:00
|
|
|
Will be raised by I/O errors in mapped memory. This is a kindof
|
|
|
|
|
exceptional signal which might be handled in induvidual threads. When
|
|
|
|
|
the cause of the error is traceable then the job/thread worked on this
|
|
|
|
|
data goes into a errorneous mode, else we can only do a emergency
|
|
|
|
|
shutdown.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
SIGFPE::
|
2010-07-26 02:54:14 +02:00
|
|
|
Floating point exception, divison by zero or something similar. Might
|
|
|
|
|
be allowed to be handled by each thread. In the global handler we may
|
|
|
|
|
just ignore it or do an emergency shutdown. tbd.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
SIGHUP::
|
2010-07-26 02:54:14 +02:00
|
|
|
For daemons this signal is usually used to re-read configuration data.
|
|
|
|
|
We shall do so too when running headless. When running with GUI this
|
|
|
|
|
might be either act like SIGTERM or SIGINT. possibly this can be
|
|
|
|
|
configureable.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
SIGSEGV::
|
2010-07-26 02:54:14 +02:00
|
|
|
Should not be handled, at the time a SEGV appears we are in a undefined
|
|
|
|
|
state and anything we do may make things worse.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
SIGUSR1::
|
2010-07-26 02:54:14 +02:00
|
|
|
First user defined signal. Sync all data to disk, generate a
|
|
|
|
|
checkpoint. The application may block until this is completed. This can
|
|
|
|
|
be used in preparation of a shutdown or periodically to create some
|
|
|
|
|
safe-points.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
SIGUSR2::
|
2010-07-26 02:54:14 +02:00
|
|
|
Second user defined signal. Produce diagnostics, to terminal and file.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
SIGXCPU::
|
2010-07-26 02:54:14 +02:00
|
|
|
CPU time limit exceeded. Emergency Shutdown.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
SIGXFSZ::
|
2010-07-26 02:54:14 +02:00
|
|
|
File size limit exceeded. Emergency Shutdown.
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
|
|
|
|
|
Tasks
|
|
|
|
|
~~~~~
|
|
|
|
|
// List what would need to be done to implement this Proposal in a few words:
|
|
|
|
|
// * item ...
|
|
|
|
|
|
2010-07-26 02:54:14 +02:00
|
|
|
We have appstate::maybeWait() which already does such a loop. It needs to be
|
2010-07-25 14:58:36 +02:00
|
|
|
extended by the proposed things above.
|
|
|
|
|
|
|
|
|
|
|
2010-12-13 04:46:44 +01:00
|
|
|
|
|
|
|
|
Discussion
|
|
|
|
|
~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
|
2010-07-25 14:58:36 +02:00
|
|
|
Pros
|
|
|
|
|
^^^^
|
|
|
|
|
// add just a fact list/enumeration which make this suitable:
|
|
|
|
|
// * foo
|
|
|
|
|
// * bar ...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cons
|
|
|
|
|
^^^^
|
|
|
|
|
// fact list of the known/considered bad implications:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alternatives
|
2010-12-13 04:46:44 +01:00
|
|
|
^^^^^^^^^^^^
|
2010-07-25 14:58:36 +02:00
|
|
|
//alternatives: if possible explain/link alternatives and tell why they are not
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
---------
|
|
|
|
|
//rationale: Describe why it should be done *this* way:
|
|
|
|
|
|
2010-07-26 02:54:14 +02:00
|
|
|
This is rather common practice. I describe this here for Documentation purposes
|
2010-07-25 14:58:36 +02:00
|
|
|
and to point out which details are not yet covered.
|
|
|
|
|
|
|
|
|
|
//Conclusion
|
|
|
|
|
//----------
|
|
|
|
|
//conclusion: When approbate (this proposal becomes a Final) write some
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Comments
|
|
|
|
|
--------
|
|
|
|
|
//comments: append below
|
|
|
|
|
|
2011-04-14 03:47:16 +02:00
|
|
|
.State -> Final
|
|
|
|
|
ichthyo wants this to be a dedicated thread (own subsystem) instead running in
|
2025-06-07 23:59:57 +02:00
|
|
|
the initial thread. Fixed this in the proposal above, this makes this accepted.
|
2011-04-14 03:47:16 +02:00
|
|
|
Do 14 Apr 2011 03:40:41 CEST Christian Thaeter <ct@pipapo.org>
|
|
|
|
|
|
2010-07-25 14:58:36 +02:00
|
|
|
|
|
|
|
|
//endof_comments:
|