LUMIERA.clone/doc/devel/rfc/SystematicMetadata.txt

SystematicMetadata
==================

// please don't remove the //word: comments

[options="autowidth"]
|====================================
|*State*       | _Idea_
|*Date*        | _Mo 08 Okt 2012 04:39:16 CEST_
|*Proposed by* | Ichthyostega <prg@ichthyostega.de>
|====================================

********************************************************************************
.Abstract
[red]#TODO# _give a short summary of this proposal_
********************************************************************************

Lumiera is a metadata processing application: _Data_ is _media data_, and everything
else is _metadata_. Since our basic decision is to rely on existing libraries for
handling data, the ``metadata part'' is what _we are building anew._

This RfC describes a fundamental approach towards metadata handling.


Description
-----------
//description: add a detailed description:
Metadata is conceived as a huge uniform tree. This tree is conceptual -- it is
never represented as a whole. In the implemented system, we only ever see parts
of this virtual tree being cast into concrete data representations. These parts
are like islands of explicitly defined and typed structure, yet they never need
to span the whole virtual model, and thus there never needs to be an universal
model data structure definition. Data structure becomes implementation detail.

Parts of the system talk to each other by _describing_ some subtree of metadata.
This description is transferred _in the form of a tree diff:_ the receiver pulls
a sequence of diff verbs from a diff iterator, and interpreting these verbs will
walk him down and expand the tree in question. Sub-scopes are ``opened'' and
populated, similar to populating a filesystem. It is up to the receiver to
assemble these information into a suitable representation. Some receiver
might invoke an object factory, while another serialises data into an
external textual or binary representation.


Abstract Metadata Model
~~~~~~~~~~~~~~~~~~~~~~~
The conceptual model for metadata is close to what the JSON format uses: +
There are primitive values as +null+, string, number and boolean. Compund values
can be arrays or records, the latter being a sub-scope populated with key-value pairs.

We might consider some extensions

 * having data values similar to BSON of MongoDB: integrals, floats, timestamps
 * introducing two _special magic keys_ for records: `"type"` and `"id"`


Sources and Overlays
~~~~~~~~~~~~~~~~~~~~
Metadata is delivered from _sources_, which can be _layered_. Similarly, on the
receiving side, there can be multiple _writeable layers_, with a routing strategy
to decide which writeable layer receives a given metadata element. This routing
is implemented within a pipeline connecting sender and receiver; if the default
routing strategy isn't sufficient, we can control the routing by introducing a
a meta-tree in some separate branch, this way making the metadata self-referential.


Some points to note
~~~~~~~~~~~~~~~~~~~
- this concept doesn't say anything about the actual meaning of the metadata elements,
  since that is always determined by the receiver, based on the current context.
- likewise, this concept doesn't state anything about the actual interactions, the
  involved parts and how the interaction is initiated and configured; this is considered
  an external topic, which needs to be solved within the applicable context (e.g. the
  session has a specific protocol how to retrieve a persisted session snapshot)
- there is no separate _system configuration_ -- configuration appears just as a
  local record of key-value pairs, which is interpreted according to the context.
- in a similar vein, this concept deliberately doesn't state anything regarding the
  handling of _defaults_, since these are so highly dependent on the actual context.


Tasks
~~~~~
// List what needs to be done to implement this Proposal:
// * first step ([green]#✔ done#)
 * define the interaction API [yellow-background]#WIP#
 * scrutinise this concept to find the pitfalls [yellow-background]#WIP#
 * build a demonstration prototype, where the receiver fabricates an object [red yellow-background]#TBD#
   ** the unit tests related to the _Diff System_ could be counted as such a demonstration +
      Ichthyostega:: '2029-09-13'

Discussion
~~~~~~~~~~

Pros
^^^^
- the basic implementation is strikingly simple, much simpler than building
  a huge data structure or any kind of serialisation/deserialisation scheme
- parts can be combined in an open fashion, we don't need a final concept up-front
- even complex routing and overlaying strategies become manageable, since they can be
  treated in isolation, local for a given scope and apart from the storage representation
- library implementations for textual representations can be integrated.


Cons
^^^^
- the theoretical view is challenging and rather uncommon
- a naive implementation holds the whole data tree in memory twice
- how the coherent ``islands'' are combined is only a matter of invocation order
  and thus dangerously flexible


Alternatives
^^^^^^^^^^^^
//alternatives: explain alternatives and tell why they are not viable:
The classical alternative is to define a common core data structure, which
needs to be finalised quickly. Isolated functional modules will then be written
to work on that common data set, which leads to a high degree of coupling.
Since this approach effectively doesn't scale well, what happens in practice is
that several independent storage and exchange systems start to exist in parallel,
e.g. system configuration, persisted object model, plug-in parameters, presentation
state.


Rationale
---------
//rationale: Give a concise summary why it should be done *this* way:
Basically common (meta) data could take on a lot of shapes between two extremes:

- the _precise typed structure_, which also is a contract
- the _open dynamic structure_, which leaves the contract implicit

The concept detailed in this RfC tries to reconcile those extremes by avoiding
a global concrete representation; +
this way the actual interaction -- with the necessity
of defining a contract -- is turned into a local problem.


//Conclusion
//----------
//conclusion: When approved (this proposal becomes a Final)
//            write some conclusions about its process:


Comments
--------
//comments: append below

This RfC seems to be more like a vision statement; it contains some interesting
ideas, but not much of an actual proposal. At that time, presumably I had hoped
to spur further discussion or provoke some objection, in order to clarify what
we should be aiming at.

_During the following years,_ many of the ideas spelled out first in this text
found their way into the *Diff System*, now used as a foundation for connecting the
GUI to the Session core in Steam Layer. Especially the discussion of ``Alternatives''
seem to indicate that an essential motivation for this RfC was to find a viable
alternative to building the whole Application around a _central data model_ --
which is in essence what I later transformed into a practical concept with the
aforementioned ``Diff System''.

-> see the design page regarding link:{ldoc}/design/architecture/ETD.html[»External Tree Description«]

[underline]#Bottom line#: not sure what to do with this RfC; concepts explained therein
seem still highly relevant and central to what Lumiera is intended to become;
but this text does not fit into the format of an RfC, nor is there a community
of developers to discuss such a design vision appropriately.

Ichthyostega:: '2025-09-13'


//endof_comments:

''''
Back to link:/x/DesignProcess.html[Lumiera Design Process overview]
RFC architecture draft for metadata handling and serialisation 2012-10-08 06:53:22 +02:00			`SystematicMetadata`
			`==================`

			`// please don't remove the //word: comments`

clean-up: comb through the historical pages to fix markup errors Some sections of the Lumiera website document meeting minutes, discussion protocols and design proposals from the early days of the project; these pages were initially authored in the »Moin Moin Wiki« operated by Cehteh on pipapo.org at that time; this wiki backed the first publications of the »Cinelerra-3« initiative, which turned into the Lumiera project eventually. Some years later, those pages were transliterated into Asciidoc semi-automatically, resulting in a lot of broken markup and links. This is a long standing maintenance problem problem plaguing the Lumiera website, since those breakages cause a lot of warnings and flood the logs of any linkchecker run. 2025-09-08 04:04:39 +02:00			`[options="autowidth"]`
			`\|====================================`
			`\|State \| _Idea_`
			`\|Date \| _Mo 08 Okt 2012 04:39:16 CEST_`
			`\|Proposed by \| Ichthyostega <prg@ichthyostega.de>`
			`\|====================================`
RFC architecture draft for metadata handling and serialisation 2012-10-08 06:53:22 +02:00
			`********************************************************************************`
			`.Abstract`
clean-up: comb through the historical pages to fix markup errors Some sections of the Lumiera website document meeting minutes, discussion protocols and design proposals from the early days of the project; these pages were initially authored in the »Moin Moin Wiki« operated by Cehteh on pipapo.org at that time; this wiki backed the first publications of the »Cinelerra-3« initiative, which turned into the Lumiera project eventually. Some years later, those pages were transliterated into Asciidoc semi-automatically, resulting in a lot of broken markup and links. This is a long standing maintenance problem problem plaguing the Lumiera website, since those breakages cause a lot of warnings and flood the logs of any linkchecker run. 2025-09-08 04:04:39 +02:00			`[red]#TODO# _give a short summary of this proposal_`
RFC architecture draft for metadata handling and serialisation 2012-10-08 06:53:22 +02:00			`********************************************************************************`

			`Lumiera is a metadata processing application: _Data_ is _media data_, and everything`
			`else is _metadata_. Since our basic decision is to rely on existing libraries for`
			handling data, the ``metadata part'' is what _we are building anew._

			`This RfC describes a fundamental approach towards metadata handling.`


			`Description`
			`-----------`
			`//description: add a detailed description:`
			`Metadata is conceived as a huge uniform tree. This tree is conceptual -- it is`
			`never represented as a whole. In the implemented system, we only ever see parts`
			`of this virtual tree being cast into concrete data representations. These parts`
			`are like islands of explicitly defined and typed structure, yet they never need`
			`to span the whole virtual model, and thus there never needs to be an universal`
			`model data structure definition. Data structure becomes implementation detail.`

			`Parts of the system talk to each other by _describing_ some subtree of metadata.`
adjust the related RfC "SystematicMetadata" There is a long-standing RfC which basically describes the same idea on a much wider, conceptual scope. Indeed I consider this approach used here for solving the problem with GUI uptades also as a proof of concept, to be expanded to a much wider scope in case it works out well. The new insight here is, that, by transferring a diff in pull mode, we can circumvent the architectural problems with typing, which showed up quite clearly in earlier design studies towards this concept. The change from push to pull is by far not so fundamental as it looks, since the sender still may initiate the exchange by sending a message offering the diff iterator for the receiver to pull. This way, we get a handshake and still sustain the crucial part, which is to decouple the data representation and give the receiver full control over the interpretation of the exchanged data. 2015-03-22 01:37:16 +01:00			`This description is transferred _in the form of a tree diff:_ the receiver pulls`
			`a sequence of diff verbs from a diff iterator, and interpreting these verbs will`
			walk him down and expand the tree in question. Sub-scopes are ``opened'' and
			`populated, similar to populating a filesystem. It is up to the receiver to`
			`assemble these information into a suitable representation. Some receiver`
			`might invoke an object factory, while another serialises data into an`
			`external textual or binary representation.`
RFC architecture draft for metadata handling and serialisation 2012-10-08 06:53:22 +02:00

			`Abstract Metadata Model`
			`~~~~~~~~~~~~~~~~~~~~~~~`
			`The conceptual model for metadata is close to what the JSON format uses: +`
			`There are primitive values as +null+, string, number and boolean. Compund values`
			`can be arrays or records, the latter being a sub-scope populated with key-value pairs.`

			`We might consider some extensions`

			`* having data values similar to BSON of MongoDB: integrals, floats, timestamps`
			* introducing two _special magic keys_ for records: `"type"` and `"id"`


			`Sources and Overlays`
			`~~~~~~~~~~~~~~~~~~~~`
			`Metadata is delivered from _sources_, which can be _layered_. Similarly, on the`
			`receiving side, there can be multiple _writeable layers_, with a routing strategy`
			`to decide which writeable layer receives a given metadata element. This routing`
			`is implemented within a pipeline connecting sender and receiver; if the default`
			`routing strategy isn't sufficient, we can control the routing by introducing a`
			`a meta-tree in some separate branch, this way making the metadata self-referential.`


			`Some points to note`
			`~~~~~~~~~~~~~~~~~~~`
			`- this concept doesn't say anything about the actual meaning of the metadata elements,`
			`since that is always determined by the receiver, based on the current context.`
			`- likewise, this concept doesn't state anything about the actual interactions, the`
			`involved parts and how the interaction is initiated and configured; this is considered`
			`an external topic, which needs to be solved within the applicable context (e.g. the`
			`session has a specific protocol how to retrieve a persisted session snapshot)`
			`- there is no separate _system configuration_ -- configuration appears just as a`
			`local record of key-value pairs, which is interpreted according to the context.`
			`- in a similar vein, this concept deliberately doesn't state anything regarding the`
			`handling of _defaults_, since these are so highly dependent on the actual context.`


			`Tasks`
			`~~~~~`
			`// List what needs to be done to implement this Proposal:`
			`// * first step ([green]#✔ done#)`
			`* define the interaction API [yellow-background]#WIP#`
			`* scrutinise this concept to find the pitfalls [yellow-background]#WIP#`
			`* build a demonstration prototype, where the receiver fabricates an object [red yellow-background]#TBD#`
clean-up: comb through the historical pages to fix markup errors Some sections of the Lumiera website document meeting minutes, discussion protocols and design proposals from the early days of the project; these pages were initially authored in the »Moin Moin Wiki« operated by Cehteh on pipapo.org at that time; this wiki backed the first publications of the »Cinelerra-3« initiative, which turned into the Lumiera project eventually. Some years later, those pages were transliterated into Asciidoc semi-automatically, resulting in a lot of broken markup and links. This is a long standing maintenance problem problem plaguing the Lumiera website, since those breakages cause a lot of warnings and flood the logs of any linkchecker run. 2025-09-08 04:04:39 +02:00			`** the unit tests related to the _Diff System_ could be counted as such a demonstration +`
			`Ichthyostega:: '2029-09-13'`
RFC architecture draft for metadata handling and serialisation 2012-10-08 06:53:22 +02:00
			`Discussion`
			`~~~~~~~~~~`

			`Pros`
			`^^^^`
			`- the basic implementation is strikingly simple, much simpler than building`
			`a huge data structure or any kind of serialisation/deserialisation scheme`
			`- parts can be combined in an open fashion, we don't need a final concept up-front`
			`- even complex routing and overlaying strategies become manageable, since they can be`
			`treated in isolation, local for a given scope and apart from the storage representation`
			`- library implementations for textual representations can be integrated.`



			`Cons`
			`^^^^`
			`- the theoretical view is challenging and rather uncommon`
			`- a naive implementation holds the whole data tree in memory twice`
			- how the coherent ``islands'' are combined is only a matter of invocation order
			`and thus dangerously flexible`




			`Alternatives`
			`^^^^^^^^^^^^`
			`//alternatives: explain alternatives and tell why they are not viable:`
			`The classical alternative is to define a common core data structure, which`
			`needs to be finalised quickly. Isolated functional modules will then be written`
			`to work on that common data set, which leads to a high degree of coupling.`
			`Since this approach effectively doesn't scale well, what happens in practice is`
			`that several independent storage and exchange systems start to exist in parallel,`
			`e.g. system configuration, persisted object model, plug-in parameters, presentation`
			`state.`



			`Rationale`
			`---------`
			`//rationale: Give a concise summary why it should be done this way:`
			`Basically common (meta) data could take on a lot of shapes between two extremes:`

			`- the _precise typed structure_, which also is a contract`
			`- the _open dynamic structure_, which leaves the contract implicit`

			`The concept detailed in this RfC tries to reconcile those extremes by avoiding`
			`a global concrete representation; +`
			`this way the actual interaction -- with the necessity`
			`of defining a contract -- is turned into a local problem.`


			`//Conclusion`
			`//----------`
clean-up: comb through the historical pages to fix markup errors Some sections of the Lumiera website document meeting minutes, discussion protocols and design proposals from the early days of the project; these pages were initially authored in the »Moin Moin Wiki« operated by Cehteh on pipapo.org at that time; this wiki backed the first publications of the »Cinelerra-3« initiative, which turned into the Lumiera project eventually. Some years later, those pages were transliterated into Asciidoc semi-automatically, resulting in a lot of broken markup and links. This is a long standing maintenance problem problem plaguing the Lumiera website, since those breakages cause a lot of warnings and flood the logs of any linkchecker run. 2025-09-08 04:04:39 +02:00			`//conclusion: When approved (this proposal becomes a Final)`
RFC architecture draft for metadata handling and serialisation 2012-10-08 06:53:22 +02:00			`// write some conclusions about its process:`




			`Comments`
			`--------`
			`//comments: append below`

clean-up: comb through the historical pages to fix markup errors Some sections of the Lumiera website document meeting minutes, discussion protocols and design proposals from the early days of the project; these pages were initially authored in the »Moin Moin Wiki« operated by Cehteh on pipapo.org at that time; this wiki backed the first publications of the »Cinelerra-3« initiative, which turned into the Lumiera project eventually. Some years later, those pages were transliterated into Asciidoc semi-automatically, resulting in a lot of broken markup and links. This is a long standing maintenance problem problem plaguing the Lumiera website, since those breakages cause a lot of warnings and flood the logs of any linkchecker run. 2025-09-08 04:04:39 +02:00			`This RfC seems to be more like a vision statement; it contains some interesting`
			`ideas, but not much of an actual proposal. At that time, presumably I had hoped`
			`to spur further discussion or provoke some objection, in order to clarify what`
			`we should be aiming at.`

			`_During the following years,_ many of the ideas spelled out first in this text`
			`found their way into the Diff System, now used as a foundation for connecting the`
			GUI to the Session core in Steam Layer. Especially the discussion of ``Alternatives''
			`seem to indicate that an essential motivation for this RfC was to find a viable`
			`alternative to building the whole Application around a _central data model_ --`
			`which is in essence what I later transformed into a practical concept with the`
			aforementioned ``Diff System''.

			`-> see the design page regarding link:{ldoc}/design/architecture/ETD.html[»External Tree Description«]`

			`[underline]#Bottom line#: not sure what to do with this RfC; concepts explained therein`
			`seem still highly relevant and central to what Lumiera is intended to become;`
			`but this text does not fit into the format of an RfC, nor is there a community`
			`of developers to discuss such a design vision appropriately.`

			`Ichthyostega:: '2025-09-13'`

RFC architecture draft for metadata handling and serialisation 2012-10-08 06:53:22 +02:00
			`//endof_comments:`

			`''''`
clean-up: comb through the historical pages to fix markup errors Some sections of the Lumiera website document meeting minutes, discussion protocols and design proposals from the early days of the project; these pages were initially authored in the »Moin Moin Wiki« operated by Cehteh on pipapo.org at that time; this wiki backed the first publications of the »Cinelerra-3« initiative, which turned into the Lumiera project eventually. Some years later, those pages were transliterated into Asciidoc semi-automatically, resulting in a lot of broken markup and links. This is a long standing maintenance problem problem plaguing the Lumiera website, since those breakages cause a lot of warnings and flood the logs of any linkchecker run. 2025-09-08 04:04:39 +02:00			`Back to link:/x/DesignProcess.html[Lumiera Design Process overview]`