lumiera_/doc/design/lowlevel/index.txt

Design Documents: Vault
=======================

What follows is a summary of Lumiera's *Data Handling Backend*

This is the foundation layer responsible for any high performance or high volume
data access. Within Lumiera, there are two main kinds of data handling:

* The Session and the object models manipulated through the GUI are kept in memory.
  They are backed by a _storage backend,_ which provides database-like storage and
  especially logging, replaying and ``Undo'' of all ongoing modifications..
* Media data is handled _frame wise_ -- as described below.

The vault layer (``backend'') uses *memory mapping* to make data available to the program.
This is somewhat different to the more common open/read/write/close file access,
while giving superior performance and much better memory utilization.
The Vault-Layer must be able to handle more data than will fit into the memory
or even address space on 32 bit architectures. Moreover, a project might access more files
than the OS can keep open simultaneously, thus the for _Files used by the Vault,_ it needs a
*FilehandleCache* to manage file handle dynamically.

Which parts of a file are actually mapped to physical RAM is managed by the kernel;
it keeps a *FileMapCache* to manage the *FileMaps* we've set up.
In the End, the application itself only requests *Data Frames* from the Vault.

To minimize latency and optimize CPU utilization we have a *Prefetch thread* which operates
a *Scheduler* to render and cache frames which are _expected to be consumed soon_. The intention
is to manage the rendering _just in time_.

The prefetcher keeps *Statistics* for optimizing performance.


Accessing Files
---------------

+FileDescriptor+ is the superclass of all possible filetypes, it has a weak reference to a 
+FileHandle+ which is managed in within the +FilehandleCache+. On creation,  only the existence
(when reading) or access for write for new files are checked. The +FileDescriptor+ stores some
generic metadata about the underlying file and intended use. But the actual opening is done on demand.

The _content of files is memory mapped_ into the process address space.
This is managed by +FileMap+ entries and a +FileMapCache+.

File Handles
~~~~~~~~~~~~
A +FilehandleCache+ serves to store a finite maximum number of +FileHandles+ as a MRU list.
FileHandles are managed by the +FilehandleCache+; basically they are just storing the underlying OS file
handles and managed in a lazy/weak way, (re)opened when needed and aging in the cache when not needed,
since the amount of open file handles is limited aged ones will be closed and reused when the system
needs to open another file.

File Mapping
~~~~~~~~~~~~
The +FileMapCache+ keeps a list of +FileMaps+, which are currently not in use and subject of aging.
Each +FileMap+ object contains many +Frames+. The actual layout depends on the type of the File.
Mappings need to be _page aligned_ while Frames can be anywhere within a file and dynamically sized.

All established ++FileMap++s are managed together in a central +FileMapCache+.
Actually, +FileMap+ objects are transparent to the application. The upper layers will just
request Frames by position and size. Thus, the +File+ entities associate a filename with the underlying
low level File Descriptor and access

Frames
~~~~~~
+Frames+ are the smallest datablocks handled by the Vault. The application tells the Vault Layer to make
Files available and from then on just requests Frames. Actually, those Frames are (references to) blocks
of continuous memory. They can be anything depending on the usage of the File (Video frames, encoder frames,
blocks of sound samples). Frames are referenced by a smart-pointer like object which manages the lifetime
and caching behavior.

Each frame referece can be in one out of three states:

readonly::
  the backing +FileMap+ is checked out from the aging list, frames can be read
  
readwrite::
  the backing +FileMap+ is checked out from the aging list, frames can be read and written
  
weak::
  the +FileMap+ object is checked back into the aging list, the frame can't be accessed but we can
  try to transform a weak reference into a readonly or readwrite reference


Frames can be addressed uniquely whenever a frame is not available. The vault can't serve a cached
version of the frame, a (probably recursive) rendering request will be issued.

Prefetching
~~~~~~~~~~~
There are 2 important points when we want to access data with low latency:

. Since we handle much more data than it will fit into most computers RAM.
  The data which is backed in files has to be paged in and available when needed.
  The +Prefetch+ Thread manages page hinting to the kernel (posix_madvise()..)
. Intermediate Frames must eventually be rendered to the cache.
  The Vault Layer will send +Renderjobs+ to the +Scheduler+.

Whenever something queries a +Frame+ from the vault it provides hints about what it is doing.
These hints contain:

* Timing constraints
 - When will the +Frame+ be needed
 - could we drop the request if it won't be available (rendered) in-time
* Priority of this job (as soon as possible, or just in time?)
* action (Playing forward, playing backward, tweaking, playback speed, recursive rendering of dependent frames)

.Notes
* The Vault Layer will try to render related frames in groups.
* This means that following frames are scheduled with lower priority.
* Whenever the program really requests them the priority will be adjusted.
 
 
-> more about link:Scheduler.html[the Scheduling of calculation jobs]
Global-Layer-Renaming: adapt lots of documentation 2018-11-15 21:13:52 +01:00			`Design Documents: Vault`
			`=======================`
Move in the existing documentation outline from the Website-repository 2010-11-14 23:19:07 +01:00
Minor grammatical and textual improvements in the documentation. 2013-04-12 03:42:46 +02:00			`What follows is a summary of Lumiera's Data Handling Backend`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00
			`This is the foundation layer responsible for any high performance or high volume`
Minor grammatical and textual improvements in the documentation. 2013-04-12 03:42:46 +02:00			`data access. Within Lumiera, there are two main kinds of data handling:`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00
Minor grammatical and textual improvements in the documentation. 2013-04-12 03:42:46 +02:00			`* The Session and the object models manipulated through the GUI are kept in memory.`
			`They are backed by a _storage backend,_ which provides database-like storage and`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00			especially logging, replaying and ``Undo'' of all ongoing modifications..
			`* Media data is handled _frame wise_ -- as described below.`

Global-Layer-Renaming: fix remaining textual usages and IDs in the code - most notably the NOBUG logging flags have been renamed now - but for the configuration, I'll stick to "GUI" for now, since "Stage" would be bewildering for an occasional user - in a similar vein, most documentation continues to refer to the GUI 2018-11-16 22:38:29 +01:00			The vault layer (``backend'') uses memory mapping to make data available to the program.
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00			`This is somewhat different to the more common open/read/write/close file access,`
			`while giving superior performance and much better memory utilization.`
Global-Layer-Renaming: adapt lots of documentation 2018-11-15 21:13:52 +01:00			`The Vault-Layer must be able to handle more data than will fit into the memory`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00			`or even address space on 32 bit architectures. Moreover, a project might access more files`
Global-Layer-Renaming: adapt lots of documentation 2018-11-15 21:13:52 +01:00			`than the OS can keep open simultaneously, thus the for _Files used by the Vault,_ it needs a`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00			`FilehandleCache to manage file handle dynamically.`

			`Which parts of a file are actually mapped to physical RAM is managed by the kernel;`
			`it keeps a FileMapCache to manage the FileMaps we've set up.`
Global-Layer-Renaming: adapt lots of documentation 2018-11-15 21:13:52 +01:00			`In the End, the application itself only requests Data Frames from the Vault.`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00
			`To minimize latency and optimize CPU utilization we have a Prefetch thread which operates`
			`a Scheduler to render and cache frames which are _expected to be consumed soon_. The intention`
			`is to manage the rendering _just in time_.`

			`The prefetcher keeps Statistics for optimizing performance.`


			`Accessing Files`
			`---------------`

			`+FileDescriptor+ is the superclass of all possible filetypes, it has a weak reference to a`
			`+FileHandle+ which is managed in within the +FilehandleCache+. On creation, only the existence`
			`(when reading) or access for write for new files are checked. The +FileDescriptor+ stores some`
			`generic metadata about the underlying file and intended use. But the actual opening is done on demand.`

			`The _content of files is memory mapped_ into the process address space.`
			`This is managed by +FileMap+ entries and a +FileMapCache+.`

			`File Handles`
			`~~~~~~~~~~~~`
			`A +FilehandleCache+ serves to store a finite maximum number of +FileHandles+ as a MRU list.`
			`FileHandles are managed by the +FilehandleCache+; basically they are just storing the underlying OS file`
			`handles and managed in a lazy/weak way, (re)opened when needed and aging in the cache when not needed,`
			`since the amount of open file handles is limited aged ones will be closed and reused when the system`
			`needs to open another file.`

			`File Mapping`
			`~~~~~~~~~~~~`
			`The +FileMapCache+ keeps a list of +FileMaps+, which are currently not in use and subject of aging.`
			`Each +FileMap+ object contains many +Frames+. The actual layout depends on the type of the File.`
			`Mappings need to be _page aligned_ while Frames can be anywhere within a file and dynamically sized.`

			`All established ++FileMap++s are managed together in a central +FileMapCache+.`
			`Actually, +FileMap+ objects are transparent to the application. The upper layers will just`
			`request Frames by position and size. Thus, the +File+ entities associate a filename with the underlying`
			`low level File Descriptor and access`

			`Frames`
			`~~~~~~`
Global-Layer-Renaming: adapt lots of documentation 2018-11-15 21:13:52 +01:00			`+Frames+ are the smallest datablocks handled by the Vault. The application tells the Vault Layer to make`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00			`Files available and from then on just requests Frames. Actually, those Frames are (references to) blocks`
			`of continuous memory. They can be anything depending on the usage of the File (Video frames, encoder frames,`
			`blocks of sound samples). Frames are referenced by a smart-pointer like object which manages the lifetime`
			`and caching behavior.`

			`Each frame referece can be in one out of three states:`

			`readonly::`
			`the backing +FileMap+ is checked out from the aging list, frames can be read`

			`readwrite::`
			`the backing +FileMap+ is checked out from the aging list, frames can be read and written`

			`weak::`
			`the +FileMap+ object is checked back into the aging list, the frame can't be accessed but we can`
			`try to transform a weak reference into a readonly or readwrite reference`


Global-Layer-Renaming: adapt lots of documentation 2018-11-15 21:13:52 +01:00			`Frames can be addressed uniquely whenever a frame is not available. The vault can't serve a cached`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00			`version of the frame, a (probably recursive) rendering request will be issued.`

			`Prefetching`
			`~~~~~~~~~~~`
			`There are 2 important points when we want to access data with low latency:`

			`. Since we handle much more data than it will fit into most computers RAM.`
			`The data which is backed in files has to be paged in and available when needed.`
			`The +Prefetch+ Thread manages page hinting to the kernel (posix_madvise()..)`
			`. Intermediate Frames must eventually be rendered to the cache.`
Global-Layer-Renaming: adapt lots of documentation 2018-11-15 21:13:52 +01:00			`The Vault Layer will send +Renderjobs+ to the +Scheduler+.`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00
Global-Layer-Renaming: adapt lots of documentation 2018-11-15 21:13:52 +01:00			`Whenever something queries a +Frame+ from the vault it provides hints about what it is doing.`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00			`These hints contain:`

			`* Timing constraints`
			`- When will the +Frame+ be needed`
			`- could we drop the request if it won't be available (rendered) in-time`
			`* Priority of this job (as soon as possible, or just in time?)`
			`* action (Playing forward, playing backward, tweaking, playback speed, recursive rendering of dependent frames)`

			`.Notes`
Global-Layer-Renaming: adapt lots of documentation 2018-11-15 21:13:52 +01:00			`* The Vault Layer will try to render related frames in groups.`
remove the superfluous TiddlyWikis ..after integrating all still relevant asciidoced content into the main website. 2012-01-11 06:55:54 +01:00			`* This means that following frames are scheduled with lower priority.`
			`* Whenever the program really requests them the priority will be adjusted.`


			`-> more about link:Scheduler.html[the Scheduling of calculation jobs]`

Move in the existing documentation outline from the Website-repository 2010-11-14 23:19:07 +01:00