LUMIERA.clone/doc/devel/rfc_pending/ResourceManagementProfiling.txt

Resource Management: Profiling
==============================

// please don't remove the //word: comments

[grid="all"]
`------------`-----------------------
*State*         _Idea_
*Date*          _Fri Jul 23 19:34:29 2010_
*Proposed by*   Christian Thaeter <ct@pipapo.org>
-------------------------------------

[abstract]
******************************************************************************
From the beginning on we planned some kind of 'profiling' to adapt dynamically
to workload and machine capabilities. I describe here how statistic data can be
gathered in a generic way. This will later work together with other components
tuning the system automatically.
******************************************************************************


Description
-----------
//description: add a detailed description:

I just introduce some ideas about the planned profiling framework here, nothing
is defined/matured yet this is certainly subject for futher discussion and
refinement.

.Requirements/Evaluation generic::
        Profiling should be sufficiently abstracted to have a single set of
        datastructures and algorithms to work on a broad range of subjects
        being profiled. Moreover the profiling core just offers unitless
        counters, semantic will be added on top of that on a higher level.

 least possible overhead::
        Profiling itself must not cost much, it must not block and should avoid
        expensive operations. Simple integer arithmetic without divisions is
        suggested.

 accurate::
        We may sample data on in stochastic way to reduce the overhead,
        nevertheless data which gets sampled must be accurately stored and
        processed without rounding losses and drifts.

 transient values::
        It's quite common that some values can be far off either in maximum or
        in minimum direction, the system should adapt to this and recover from
        such false alarms. Workload also changes over time we need to find some
        way to measure the current/recent workload an grand total over the
        whole application runtime is rather uninteresting. While it is also
        important that we adapt slow enough not to get into some osccilating
        cycle.

 active or passive system::
        Profiling can be only passive collecting data and let it be analyzed by
        some other component or active triggering some action when some limits
        are reached. I am yet a bit undecided and keep it open for both.


.Brainstorming in Code
[source,C]
------------------------------------------------------------------------------

typedef int64_t profile_value;

struct profile
{
  ProfileVTable vtable;

  /*
   Using trylock for sampling makes it never contend on the lock but some
   samples are lost. Should be ok.
  */
  mutex_t lock;                         /* with trylock? */


  /* statistics / running averages */

  /* n being a small number 2-5 or so */
  profile_value max[n];                 /* n maximum values seen so far,
  decreased by recovery */
  profile_value min[n];                 /* n minimum values seen so far,
  increased by recovery */

  /* store sum & count, but average calculation implies a division and will be
     only done on demand */
  profile_value count;                  /* count profile calls */
  profile_value sum;                    /* sum up all calls, average =
  sum/count */

  /* current is the sampled value to be integrated */

  /* trend is caclulated before theb new run_average */
  profile_value trend;                  /* trend = (trend +
  (run_average-current))>>1 */

  /* we may need some slower diverging formula for running average */
  profile_value run_average;            /* run_average = (run_average +
  current)>>1) */


  /* active limits, define whats good and whats bad, calls back to vtable when
     limit is hit */
  profile_value max_limit;
  profile_value min_limit;
  /* do we want limits for trends too? */

  /* we count how often we hit limits, a hit/miss ratio will give a good value
     for optimization */
  profile_value hit_cnt;
  profile_value high_miss_cnt;
  profile_value low_miss_cnt;

  /* recovery state */
  int rec_init;
  int rec_current;
  int rec_percent;


  void* extra;
};
------------------------------------------------------------------------------


Tasks
~~~~~
// List what would need to be done to implement this Proposal in a few words:
// * item ...


Discussion
~~~~~~~~~~

Pros
^^^^
// add just a fact list/enumeration which make this suitable:
//  * foo
//  * bar ...


Cons
^^^^
// fact list of the known/considered bad implications:


Alternatives
^^^^^^^^^^^^
//alternatives: if possible explain/link alternatives and tell why they are not
  viable:


Rationale
---------
//rationale: Describe why it should be done *this* way:


//Conclusion
//----------
//conclusion: When approbate (this proposal becomes a Final) write some
  conclusions about its process:


Comments
--------
//comments: append below


//endof_comments: