planning towards a tree diff language

before engaging into the implementation of lib::Record, I prefer to conduct a round of planning, to get a clearer view about the requirements we'll meet when extending our existing list diff to tree structures
2015-06-08 01:58:39 +02:00 · 2015-06-08 01:58:39 +02:00 · 8e27416594
commit 8e27416594
parent cecb5db972
2 changed files with 36 additions and 23 deletions
--- a/src/lib/diff/gen-node.hpp
+++ b/src/lib/diff/gen-node.hpp
@ -24,12 +24,12 @@
 /** @file gen-node.hpp
 ** Generic building block for tree shaped (meta)data structures.
 ** A representation built from GenNode elements is intended to support
- ** introspection of data structures and exchange of mutations in the
- ** form of \link diff-language.hpp diff messages. \endlink
+ ** (limited) introspection of data structures and exchange of mutations
+ ** in the form of \link diff-language.hpp diff messages. \endlink
 ** 
 ** Despite of the name, GenNode is \em not meant to be an universal
 ** data representation; rather it is limited to embody a fixed hard
- ** wired set of data elements, able to stand-in for attributes
+ ** wired set of data types, able to stand-in for attributes
 ** and sub scope contents of the lumiera high-level data model.
 ** 
 ** \par Anatomy of a GenNode
@ -44,13 +44,13 @@
 ** will be referred by a suitable reference representation (PlacementID).
 ** The DataCap is what creates the polymorphic nature, where the common
 ** interface is mostly limited to managemental tasks (copying of values,
- ** external representation). Besides, there are special flavours of
- ** the DataCap to represent \em sub-collections of GenNode elements.
- ** Especially, the \ref Record type is a kind of collection suitable
- ** to represent object-like structures, since it both holds several
- ** \am attributes referable by-name, and a (ordered) collection
- ** of elements treated as children within the scope of the
- ** given record.
+ ** external representation).
+ ** 
+ ** To represent object-like structures and for building trees, a special
+ ** kind of data type is placed into the DataCap. This type, Record<GenNode>
+ ** is recursive and has the ability to hold both a a set of attributes
+ ** addressable by-name and an (ordered) collection of elements treated
+ ** as children within the scope of the given record.
 ** 
 ** \par Requirements
 ** 
@ -122,6 +122,7 @@ namespace diff{
  
  class GenNode;
  
+  using Rec = Record<GenNode>;
  using DataValues = meta::Types<int
                                ,int64_t
                                ,short
@ -130,10 +131,11 @@ namespace diff{
                                ,double
                                ,string
                                ,time::Time
+                                ,time::Offset
                                ,time::Duration
                                ,time::TimeSpan
                                ,hash::LuidH
-                                ,Record<GenNode>
+                                ,Rec
                                >;
  
  
@ -161,42 +163,42 @@ namespace diff{
  
  template<>
  inline bool
-  Record<GenNode>::isAttribute (GenNode const& v)
+  Rec::isAttribute (GenNode const& v)
  {
    return false; ////TODO
  }
  
  template<>
  inline bool
-  Record<GenNode>::isTypeID (GenNode const& v)
+  Rec::isTypeID (GenNode const& v)
  {
    return false; ////TODO
  }
  
  template<>
  inline string
-  Record<GenNode>::extractTypeID (GenNode const& v)
+  Rec::extractTypeID (GenNode const& v)
  {
    return "todo"; ////TODO
  }
  
  template<>
  inline GenNode
-  Record<GenNode>::buildTypeAttribute (string const& typeID)
+  Rec::buildTypeAttribute (string const& typeID)
  {
    return GenNode(); ///TODO
  }
  
  template<>
  inline string
-  Record<GenNode>::extractKey (GenNode const& v)
+  Rec::extractKey (GenNode const& v)
  {
    return "todo"; ////TODO
  }
  
  template<>
  inline GenNode
-  Record<GenNode>::extractVal (GenNode const& v)
+  Rec::extractVal (GenNode const& v)
  {
    return GenNode(); ///TODO
  }
--- a/wiki/renderengine.html
+++ b/wiki/renderengine.html
@ -8018,7 +8018,7 @@ Within the context of GuiModelUpdate, we discern two distinct situations necessi
 the second case is what poses the real challenge in terms of writing well organised code. Since in that case, the receiver side has to translate generic diff verbs into operations on hard wired language level data structures -- structures, we can not control, predict or limit beforhand. We deal with this situation by introducing a specific intermediary, the &amp;rarr; TreeMutator.
 </pre>
 </div>
-<div title="TreeDiffModel" creator="Ichthyostega" modifier="Ichthyostega" created="201410270313" modified="201505310130" tags="Model GuiPattern spec draft" changecount="54">
+<div title="TreeDiffModel" creator="Ichthyostega" modifier="Ichthyostega" created="201410270313" modified="201506072246" tags="Model GuiPattern spec draft" changecount="61">
 <pre>for the purpose of handling updates in the GUI timeline display efficiently, we need to determine and represent //structural differences//
 This leads to what could be considered the very opposite of data-centric programming. Instead of embody »the truth« into a central data model with predefined layout, we base our achitecture on a set of actors and their collaboration. In the mentioned example this would be the high-level view in the Session, the Builder, the UI-Bus and the presentation elements within the timeline view. Underlying to each such collaboration is a shared conception of data. There is no need to //actually represent that data// -- it can be conceived to exist in a more descriptive, declarative [[external tree description (ETD)|ExternalTreeDescription]]. In fact, what we //do represent// is a ''diff'' against such an external rendering.

@ -8065,13 +8065,24 @@ Thus, for our specific usage scenario, the foremost relevant question is //how t
 __Implementation note__:The representation chosen here uses terms of constant size for the individual diff steps; in most cases, the argument is redundant and can be used for verification when applying the diff -- with the exception of the {{{ins}}} term, where it actually encodes additional information. Especially the {{{find}}}-representation is a compromise, since we encode as &quot;search for the term a~~5~~ and insert it at curent position&quot;. The more obvious rendering -- &quot;push term a~~4~~ back by +1 steps&quot; -- requires an additional integer argument not neccesary for any of the other diff verbs, defeating a fixed size value implementation.

 !!!extension to tree changes
-Basically we could send messages for recursive descent right after each {{{pick}}} token -- yet, while minimal, such a representation would be unreadable, and requires a dedicated stack storage on both sides. Thus we arrange for the //recursive treatment of children// to be sent //postfix,// after the messages for the current node. Recursive descent is indicated by explicit (and slightly redundant) //bracketing tokens://
-*{{{open}}}(node-ID)  : recurse into the designated node, which must be present already as result of the preceding changes
-*{{{close}}}(node-ID)  : close the current node context and return one step up; the node-ID is given for verification, but can be used to restore the working position at parent level
-In addition, we might consider to introduce up/down folding primitives
-*{{{fold}}}(//num//, node-ID) : pick the next //num// elements and fold them down into a new child with given node-ID
+Diff description and diff handling can be applied to tree-like data structures as well. Some usages of textual comparison (e.g. diffing of programming language texts) are effectively working on tree structures -- yet they do not build on the structure of the diffed data explicitly. But if we represent the data structures symbolically, the change form text diffing to data structure diffing is marginal. The only relevant change is to handle embedded recursive diff descriptions of the child nodes. As it stands, each node or &quot;object&quot; can be represented as a list of properties plus the attachment of child nodes. This list can be treated with the methods developed for a stream of text tokes.
+
+Basically the transition from text diffing to changes on data structures is achieved by exchanging the //type of the tokens.// Instead of words, or lines of text, we now use //data elements.// To do so, we introduce a symbolic ExternalTreeDescription of tree-like core data structures. The elementary token element used in this tree diff, the GenNode, embodies either simple plain data elements (numbers, strings, booleans, id-hashes, time values) -- or it describes a //recursive data element,// given as {{{Record&lt;GenNode&gt;}}}. Such a recursive data element describes object-like entities as a sequence of metadata, named attributes and ordered child-nodes -- it is handled in two phases: the first step is to treat the presence and ordering of child data elements, insertions and deletes. The second phase opens for each structurally changed child data element a recursive bracketing construct, as indicated  by explicit (and slightly redundant) //bracketing tokens://
+*{{{mut}}}(node-ID)  : recurse into the designated node, which must be present already as result of the preceding changes. The following diff tokens describe //mutations// of the child
+*{{{emu}}}(node-ID)  : close the current node context and return one step up; the node-ID is given for verification, but can be used to restore the working position at parent level
+In addition, in a future extension, we might consider to introduce up/down folding primitives
+*{{{fold}}}(node-ID) : pick the following elements and fold them down into a new child with given node-ID. The downfolding continues until the next {{{emu}}} token
 *{{{lift}}}(node-ID) : remove the next child node, which must be node-ID, and insert its children at current position

+Since the bracketing construct for mutation of child structures bears the ID of the parent, a certain degree of leeway is introduced. In theory, we could always open such a bracketing construct right after the {{{pick}}} token accepting the parent -- yet, while minimal, such a strictly depth-first representation would be hard to read -- so we allow to group the recursive treatement of children //post-fix,// after the messages for the current node. In a similar vein, we introduce another token to describe a //short-cut://
+*{{{after}}}(node-ID) : fast-forward through the sequence of elements at current level until the position after the designated element.
+To complement this language construct, we define some special, magical (meta) element-~IDs
+*{{{_CHILD_}}} : marks an //unnamed// ID. Mostly, the implementation exploits this specific marker to distinguish between nodes which are (named) attributes of an object, and real children
+*{{{_THIS_}}} : can be used to refer to the immediately preceding element without knowing its name. Typically used to open a {{{mut(_THIS_)}}} ... {{{emu(_THIS_)}}} bracket to populate a newly inserted object
+*{{{_ATTRIBS_}}} : can be used to jump {{{after(_ATTRIBS_)}}} when mutating the contents of an object. So the following diff verbs will immediately start working on the children
+*{{{_END_}}} : likewise can be used to jump {{{after(_END_)}}} to start appending new elements without caring for the existing current content.
+All these additional language constructs aren't strictly necessary, but widen the usability of the langauge, also to cover the description of incomplete or fuzzy diffs.
+
 !!!deriving conventional representations
 On receiving the terms of this &quot;diff language&quot;, it is possible to generate the well known and more conventional diff representations,
 i.e. a ''unified diff'' or the ''predicate notation'' used above to describe the list diffing algorithm, just by accumulating changes.