I assumed that, since GenNode is composed of copyable and
assignable types, the standard implementation will do.
But I overlooked the run time type check on the opaque
payload type within lib::Variant. When a type mismatch
is detected, the default implementation has already
assigned and thus altered the IDs.
So we need to roll our own implementation, and to add
insult to injury, we can't use the copy-and-swap idiom either.
This is actually a STL library feature, and was added precisely
for the reason encountered here: if we want logarithmic search,
we'll have to construct a new GenNode object, just to have something
for the set to invoke the comparison operator.
C++14 introduced the convention that the Comparator of the set
may define a marker type `is_transparent` alongside with a generic
comparison operator. But, as is obvious from the source code of
our GNU Standard library implementation, our std::set has no such
overload to make use of that feature
http://en.cppreference.com/w/cpp/container/set/findhttp://stackoverflow.com/questions/20317413/what-are-transparent-comparators
The only good thing is that, just 10 minutes ago, I felt like
a complete moron because I'm writing a unit test for such a simple
storage class. ;-)
...when the Test-Nexus processes a command binding message.
In the real system of course we do not want to log every bind message.
The challenge here is the fact that command binding as such
is opaque, and the types of the data within the bind message
are opaque as well. Finally I settled on the compromise
to log them as strings, but only the DataCap part;
most value types applicable within GenNode
have a string representation to match.
the rationale is that I deliberately do not want to provide
a mechanism to iterate "over all contents in stringified form".
Because this could be seen as an invitation to process GenNode-
datastructures in an imperative way. Please recall we do not
want that. Users shall either *match* contents (using a visitor),
or they are required to know the type of the contents beforehand.
Both cases favour structural and type based programming over
dynamic run-time based inspection of contents
The actual task prompting me to add this iteration mechanism
is that I want to build a diagnostic, which allows to verify
that a binding message was sent over the bus with some
specific parameter values.
...no need to enclose empty sections when there are no
attributes or no children. Makes test code way more readable.
TestEventLog_test PASS as far as implemented
some tests rely on additional diagnostics code being linked in,
which happens, when lib/format-util.hpp is included prior to
the instantiation of lib::diff::Record rsp. lib::Variant.
The reason why i opended this can of worms was to avoid includion
of this formatting and diagnostics code into such basic headers
as lib/variant.hpp or lib/diff/gen-node.hpp
Now it turns out, that on some platforms the linker will use
a later instantiation of lib::Variant::Buff<GenNode>::operator string
in spite of a complete instantiation of this virtual function
being available already in liblumierasupport.so
But the real reason is that -- with this trickery -- we're violating
the single definition rule, so we get what we deserved.
TODO (Ticket #973): at a later point in development we have to re-assess,
the precise impact of including lib/format-util.hpp into
lib/diff/gen-node.hpp
Right now I expect GenNode to be used pervasively, so I am
reluctant to make that header too heavyweight.
because otherwise we'd need to send a whole subtree
over the wire and then descend into it just to find an element.
This too is a ripple effect of making '==' deep
well... this was quite a piece of work
Added some documentation, but a complete documentation,
preferably to the website, would be desirable, as would
be a more complete test covering the negative corner cases
yeah, working with open fire is dangerous...
For performace reasons I've undercut the premise
to make GenNode / Record immutable. Now I'm dealing with
raw storage layout together with this quite hairy distinction
between "attribute scope" and "child scope"
In hindsight, it might have been better to implement Record
as a single list, and to maintain a shortcut pointer to jump
to the start of the attributes.
while implementing this, I've discovered a conceptual error:
we allow to accept attributes, even when we've already entered
the child scope. This means that we can not predictable get back
at the "last" (i.e. the currently touched) element, because this
might be such an attribute. So a really correct implementation
would have to memorise the "current" element, which is really
tricky, given the various ways of touching elements in our
diff language.
In the end I've decided to ignore this problem (maybe a better
solution would have been to disallow those "late" attributes?)
My reasoning is that attributes are unlikely to be full records,
rather just values, and values are never mutated. (but note
that it is definitively possible to have an record as attribute!)
...while I must admit that I'm a bit doubtful about that
language feature, but it does come in handy when manually
writing diff messages. The reason is the automatic naming
of child objects, which makes it often hard to refer to
a child after the fact, since the name can not be
reconstructed systematically.
Obviously the downside of this "anonymous pick / delete"
is that we allow to pick (accept) or even delete just
any child, which happens to sit there, without being
able to detect a synchronisation mismatch between
sender and receiver.
i.e. flat match, not deep equality.
This allows to send just an Ref (with the ID) over the
wire to refer to an complete object to be picked, moved
or deleted on the receiver side.
in the first version, I defined equality to just compare the IDs
But that didn't seem right, or what one would expect by the concept
of equality (this is a long standing discussion with persistent
object-relationally mapped data).
So I changed the semantics of equaility to be "deep".
As this means possiblty to visit a whole tree depth-first,
it seems reasonable to provide the shallow "identity-comparison" likewise.
And the most reaonable choice is to use the "matches(object)" API
for that, since, in case of objects, the matches was defined
as full equality, which now seems redundant.
Thus: from now on: obj.matches(otherObj)
means they share the same IDs
The Ref-GenNode is just a specifically constructed GenNode,
and intended to be sliced down to an ordinary GenNode
immediately after construction. It seems, GCC didn't "get that"
and instead emitted an recursive invocation of the same ctor,
which obviously leads to stack overflow.
Problem solved by explicitly coding the copy initialisation,
after the full definition of Ref is available.
the type is the only meta attribute supported by now,
thus the decision was to handle this manually, instead of
introducing a full scope for meta attributes. Unfortunately
this leads to an assymetry: while it is possible to send an
attribute named "type", which will be intercepted and used
as a new type ID, the type will not show up when iterating
or searching through attributes.
When applying a diff, the only possibility is to *insert*
a new type attribute, and we need to check and handle this
likewise manually.
It is difficult to reconcile our general architecture for the
linearised diff representation with the processing of recursive,
tree-like data structures. The natural and most clean way to
deal with trees is to use recursion, i.e. the processor stack.
But in our case, this means we'd have to peek into the next
token of the language and then forward the diff iterator
into a recursive call on the nested scope. Essentially, this
breaks the separation between receiving a token sequence and
interpretation for a concrete target data structure.
For this reason, it is preferrable to make the stack an
internal state of the concrete interpreter. The downside of
this approach is the quite confusing data storage management;
we try to make the role of the storage elements a bit more
clear through descriptive accessor functions.
implement the list handling primitives analogous to the
implementation of list-diff-applicator -- just again with
the additional twist to keep the attribute and child scopes
separated.
...so now the stage is set. We can reimplement
the handling of the list diff cases here in the context
of tree diff application. The additional twist of course
being the distinction between attribute and child scope
each language token of our "linearised diff representation"
carries a payload data element, which typically is the piece
of data to be altered (added, mutated, etc).
Basically, these elements have value semantics and are
"sent over wire", and thus it seems natural when the
language interpreter functions accept that piece of payload
by-value. But since we're now sending GenNode elements as
parameter data in our diff, which typically are of the
size of 10 data elements (640 bit on a 64bit machine),
it seems more resonable to pass these argument elements
by const& through the interpreter function. This still
means we can (and will indeed) copy the mutated data
values when applying the diff, but we're able to
relay the data more efficiently to the point where
it's consumed.
this boils down to the two alternatives
- manipulate the target data structure
- build an altered copy
since our goal is to handle large tree structures efficiently,
the decision was cast in favour of data manipulation
so basically it's time to explicate the way
our diff language will actually be written.
Similar to the list diff case, it's a linear sequence
of verb tokens, but in this case, the payload value
in each token is a GenNode. This is the very reason
why GenNode was conceived as value object with an
opaque DataCap payload
initially the intention was to include a "bracketing construct"
into the values returned by the iterator. After considering
the various implementation and representation approaches,
it seems more appropriate just to expose a measure for the
depth-in-tree through the iterator itself, leaving any concerns
about navigation and structure reconstruction to the usage site.
As rationale we consider the full tree reconstruction as a very
specialised use case, and as such the normal "just iteration" usage
should not pay for this in terms of iterator size and implementation
complexity. Once a "level" measure is exposed, the usage site
can do precisely the same, with the help of the
HierarchyOrientationIndicator.
Whooa!
Templates are powerful.
programming this way is really fun.
under the assumption that the parts are logical,
all conceivable combinations of theses parts are bound to be correct
it passes compilation, but the test still fails, since
I've changed the expected semantics of the iteration,
in the light of the insights I've gained during
re-investigation of the IterExplorer.
What I now actually intend is rather to embed a
HierarchyOrientationIndicator into the iterator,
instead of returning a special "bracket" marker
reference to indicate return from a nested scope.
Only a Record payload constitutes a nested scope.
For all other (primitive) values, we return an empty iterator.
When used within ScopeExplorer, this implementation will just
lead to exposing any simple value once, while delving into
iteration of nested scopes
The only substantial change (besides compilation fixes) is
to confine the iteration to *const access*
This is a good thing; the whole Record/GenNode structure
was designed to represent immutable data, necessitating
a dedicated *Mutator* for any reshaping.
Initially I intended just to supply an addapter to use
the monadic IterExplorer for this recursive expansion
of GenNode contents. Investigating this approach was
relevant to highlight the minimum requirements for
such an evaluation mechanics: since our GenNode
is an hierarchical structure without back-links,
we are bound to use a stack at some point. And
since an Iterator is a materialised continuation,
we can not use the processor stack and are forced
to represent this stack in memory.
Yet, on second thought, we do not need the full power
of the IterExplorer monad; especially we do not need
to bind arbitrary functions into the monad, just one
single scope exploring function, implemented as
Variant visitor. Based on these observations, we can
"inline" the monad structure into a double nested
iterator, where the outer capsule carries a stack
of scopes to be explored.
horay!
seems like madness?
well -- found and squashed a bug: equality on RecordRef
implicitly converted to GenNode(RecordRef), which always
generates new (distinct) IDs and so never succeeds. What
we really want is equality test on the references
There is no generic implementation for these functions, since
they are highly dependent on the payload used within Record<TY>
Here we use Record<GenNode>, which turns the whole setup into an
recursive data type; we especially rely on the fact that each
GenNode has an embedded symbolic ID, and we use this ID to encode
the 'key' for named attributes
initially my intention was to use the ID for equality test.
But on a second thought, this seemed like a bad idea, since
it confuses the concepts of equality and identity.
Note: at the moment, I do not know if we even need an equality test,
so it is provided here rather for sake of completeness. And this
means even more that we want an 'equality' implementation that
does what one would naively expect: compare the object identity
*and* compare the contents.
not entirely sure about the design, but lets try this approach:
they can be "cloned" and likewise move-assigned, but we do not
allow the regular assignment, because this would enable to use
references like pointers (what we deliberately do not want)
especially setting (changing) attributes turned out to be tricky,
since in case of a GenNode this would mean to re-bind the hash ID;
we can not possibly do that properly without knowing the type of the payload,
and by design this payload type is opaque (erased).
As resort, I changed the semantics of the assign operation:
now it rather builds a new payload element, with a given initialiser.
In case of the strings, this ends up being the same operation,
while in case of GenNode, this is now something entirely different:
we can now build a new GenNode "in place" of the old one, and both
will have the same symbolic ID (attribute key). Incidentally,
our Variant implementation will reject such a re-building operatinon
when this means to change the (opaque) payload type.
in addition, I created a new API function on the Mutator,
allowing to move-in a complete attribute object. Actually this
new function became the working implementation. This way, it is
still possible to emplace a new attribute efficiently (consider
this to be a whole object graph!). But only, if the key (ID)
embedded in the attribute object is already what is the intended
key for this attribute. This way, we elegantly circumvent the
problem of having to re-bind a hash ID without knowing the type seed
initially, the intention was to inject the type as a magic attribute.
But this turned out to make the implementation brittle, asymmetric
and either quite demanding, or inefficient.
The only sane approach would be to introduce a third collection,
the metadata attributes. Then it would be possible to handle these
automatically, but expose them through the iterator.
In the end I decided against it, just the type attribute
allone does not justify that effort. So now the type is an
special magic field and kept apart from any object data.
this solves the problem how to deal with value access
- for the simple default (string) implementation,
we use a 'key = val' syntax and thus have to split strings,
which means we need to return contents by value
- for the actual relevant use case we have GenNode entries,
which may recursively hold further Records. For dealing
with diff messages over this data struture, its a good
idea to allow for const& value access (otherwise we'd
end up copying large subtrees for trivial operaions)
OMG, what was all this about?
OK... this cant possibly work this way.
At least we need to trim after splitting the attributes.
But this is not enough, we want the value, which implies
to make the type flexible (since we cant return a const& to
a substring extracted on-the-fly)
Note: not fixing all relevant warnings.
Especially, the "-Woverloaded-virtual" of Clang defeats the whole purpose
of generated generic interfaces. For example, our Variant type is instantiated
with a list of types the variant can hold. Through metaprogramming, this
instantiation generates also an embedded Visitor interface, which has
virtual 'handle(TY)' functions for all the types in question
The client now may implement, or even partially implement this Visitor,
to retrieve specific data out of given Variant instance with unknown conent.
To complain that some other virtual overload is now shaddowed is besides the point,
so we might consider to disable this warning altogether
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56402
The lambda definition captures the this pointer,
but the ctor of the lamda does not initialise this capture.
In our case, we're lucky, as we don't use the "this" pointer;
otherwise, we'd get a crash a runtime.
Fixed since GCC-4.7.3 --> it's *really* time to upgrade to Debian/Jessie
after sleeping a night over this, it seems obvios
that we do not want to start the build proces "implicitly",
starting from a Record<GenNode>. Rather, we always want
the user to plant a dedicated Mutator object, which then
can remain noncopyable and is passed by reference through
the whole builder chain. Movin innards of *this object*
are moved away a the end of the chain does not pose much risk.
especially I've now decided how to handle const-ness:
We're open to all forms of const-ness, the actual usage decides.
const GenNode will only expose a const& to the data values
still TODO is the object builder notation for diff::Record
forwarding equality to the embedded EntryID
Basically, two GenNodes are equal when they have the same "identity"
Ironically, this is the usual twist with database entities
on a second thought, this "workaround" does not look so bad,
due to the C++11 feature to request the default implementation explicitly.
Maybe we'll never need a generic solution for these cases
I decided to allow for an 'unbound' reference to allow
default construction of elements involving record references.
I am aware of the implications, but I place the focus
on the value nature of GenNode elements; the RecordRef
was introduced only as a means to cary out diff comparisons
and similar computations.
before engaging into the implementation of lib::Record,
I prefer to conduct a round of planning, to get a clearer
view about the requirements we'll meet when extending
our existing list diff to tree structures
Initially, I considered to build an index table like
collection of ordered attributes. But since our actual
use case is Record<GenNode>, this was ruled out in favour
of just a vector<GenNode>, where the keys are embedded
right within the nameID-Field of GenNode.
A decisive factor was the observation, that this design
is basically forced to encode the attribute keys somehow
into the attribute values, because otherwise the whole
collection like initialisation and iteration would break
down. Thus, a fully generic implementation is not possible,
and a pseudo generic implementation just for the purpose of
writing unit tests would be overkill.
Basically this decision means that Record requires an
explicit specialisation to implement the attribute-key
binding for each value type to use.
The actual trick to make it work is to use decltype on the function operator
http://stackoverflow.com/questions/7943525/is-it-possible-to-figure-out-the-parameter-type-and-return-type-of-a-lambda/7943765#7943765
In addition, we now pick up the functor by template type and
store it under that very type. For one, this cuts the size
of the generated class by a factor of two. And it gives the
compiler the ability to inline a closure as much as is possible,
especially when the created Binder / Mutator lives in the same
reference frame the closure taps into.
to carry out that rather obvious step, I was bound to consider
all the implications of choosing a given layout and handling pattern
for our external structure representation.
Finally, I settled upon the following decisions
- the value space represented within the DataCap is flat, not further structured
- the distinction between "attribute" and "nested object" is merely conceptual
and will be enforced solely by the diff detection / representation protocol
- basically, a nested subtree may appear as an attribute; the difference
between attributes and children lies solely in the way of access and referral:
by-name vs. positional
- it is pointless to save space for the representation of the discriminator ID
- but we can omit any further explicit type tag, because
- we do *not* support programming by switch-on-type, and thus
- we do *not* support full introspection, only a passive type-safety check
- this is *not* a limitation, since we acknowledge that GenNode is a *Monad*
- and the partial function needed within any flatMap implementation
maps naturally onto our Variant-Visitor; thus
- the DataCap can basically just *be* a Variant
- and GenNode has just to supply the neccessary shaffolding
to turn that into a full fledged Monad implementation, including
direct construction by wrapping a value and flatMap with tree walk
After some reconsideration, I decide to stick to the approach with the closures,
but to use a metaprotramming technique to build an inheritance chain.
While I can not decide on the real world impact of storing all those closures,
in theory this approach should enable the compiler to remove all of the
storage overhead. Since, when storing the result into an auto variable
right within scope (as demonstrated in the test), the compiler
sees the concrete type and might be able to boil down the actual
generated virtual function implementations, thereby inlining the
given closures.
Whereas, on the other hand, if we'd go the obvious conventional route
and place the closures into a Map allocated on the stack, I wouldn't
expect the compiler to do data flow analysis to prove this allocation
is not necessary and inline it away.
NOTE: there is now guarantee this inlining trick will ever work.
And, moreover, we don't know anything regarding the runtime effect.
The whole picture is way more involved as it might seem at first sight.
Even if we go the completely conventional route and require every
participating object to supply an implementation of some kind of
"Serializable" interface, we'll end up with a (hand written!)
implementation class for each participating setup, which takes
up space in the code segment of the executable. While the closure
based approach chosen here, consumes data segment (or heap) space
per instance for the functors (or function pointers) representing
the closures, plus code segment space for the closures, but the
latter with a way higher potential for inlining, since the closure
code and the generated virtual functions are necessarily emitted
within the same compilation unit and within a local (inline, not
publickly exposed) scope.
so yes, it is complicated, and inevitably involves three layers
of indirection. The alternative seems to bind the GUI direcly to
the Session interface -- is there a middle gound?
For the messages from GUI to Proc, we have our commands, based
on PlacementRef entities. But for feeding model updates to the
GUI, whatever I consider, I end up either with diff messages or
an synchronised access to Session attributes, which ties the
responsiveness of the GUI to the Builder operation.
- we use a GenNode element
- this holds a polymorphic value known as DataCap
- besides simple attribute values, this may hold collections of GenNode sub elements
- a special kind of GenNode collection, the Record, is used to represent objects
The purpose of this setup is to enable an external model representation
which is only loosely coupled to the interndal data representation
through the exchange of (tree)diff messages
sans the implementation of the index lookup table(s)
The algorithm is KISS, a variant of insertion sort, i.e.
worst time quadratic, but known to perform well on small data sets.
The mere generation of the diff description is O(n log n), since
we do not verify that we can "find" out of order elements. We leave
this to the consumer of the diff, which at this point has to scan
into the rest of the data sequence (leading to quadratic complexity)
finally....
The problem is that the C++ "dependent types" defeat the typical
DSL usage, where you define some helper function in a generic
language setup class and mix this language in as superclass.
This is, C++ requires us to refer explicitly to any dependent type,
since, due to possible template specialisations, the parser
can't know if a given symbol is a inherited type or a field.
As a solution, we place the token constructor functors into a
static struct "token", which allows to write e.g. token.insert(xyz)
As decided in beb57cde
this changeset switches our basic list diff language to work
in the style of an insertion sort. Rather than 'pushing back'
out-of-order elements, we scan and bring forward missing elements.
Later, when passing the original location of the elements
fetched this way, a 'skip' verb will help to clean up
possible leftowers, so implementation is possible
(and indeed acomplished) without shifting any other elements.