this is the tiny bit of operational functionality needed on top:
whenever we're reconfiguring the predicate, we need to re-trigger
the evaluation (and clear the cached value)
n.b.: I've verified in debugger that the closure is
allocated on the heap and the functors are passed by value
after looking into our various iterator tools,
it seems obvious that our filtering iterator implementation
has almost all of the required behaviour; we only need to
add a hook to rewrite and extend the filtering functor,
which can now nicely done with a lambda closure.
This means all memory management, if necessary, is
pushed into std::function and the automated memory
management for closures provided by the runtime.
..while we should note at this point that the whole techique
of hijacking std::hash is superfluous now, since the standard libray
does no longer define a static assertion which defeats SFINAE
some tests rely on additional diagnostics code being linked in,
which happens, when lib/format-util.hpp is included prior to
the instantiation of lib::diff::Record rsp. lib::Variant.
The reason why i opended this can of worms was to avoid includion
of this formatting and diagnostics code into such basic headers
as lib/variant.hpp or lib/diff/gen-node.hpp
Now it turns out, that on some platforms the linker will use
a later instantiation of lib::Variant::Buff<GenNode>::operator string
in spite of a complete instantiation of this virtual function
being available already in liblumierasupport.so
But the real reason is that -- with this trickery -- we're violating
the single definition rule, so we get what we deserved.
TODO (Ticket #973): at a later point in development we have to re-assess,
the precise impact of including lib/format-util.hpp into
lib/diff/gen-node.hpp
Right now I expect GenNode to be used pervasively, so I am
reluctant to make that header too heavyweight.
because otherwise we'd need to send a whole subtree
over the wire and then descend into it just to find an element.
This too is a ripple effect of making '==' deep
well... this was quite a piece of work
Added some documentation, but a complete documentation,
preferably to the website, would be desirable, as would
be a more complete test covering the negative corner cases
yeah, working with open fire is dangerous...
For performace reasons I've undercut the premise
to make GenNode / Record immutable. Now I'm dealing with
raw storage layout together with this quite hairy distinction
between "attribute scope" and "child scope"
In hindsight, it might have been better to implement Record
as a single list, and to maintain a shortcut pointer to jump
to the start of the attributes.
while implementing this, I've discovered a conceptual error:
we allow to accept attributes, even when we've already entered
the child scope. This means that we can not predictable get back
at the "last" (i.e. the currently touched) element, because this
might be such an attribute. So a really correct implementation
would have to memorise the "current" element, which is really
tricky, given the various ways of touching elements in our
diff language.
In the end I've decided to ignore this problem (maybe a better
solution would have been to disallow those "late" attributes?)
My reasoning is that attributes are unlikely to be full records,
rather just values, and values are never mutated. (but note
that it is definitively possible to have an record as attribute!)
...while I must admit that I'm a bit doubtful about that
language feature, but it does come in handy when manually
writing diff messages. The reason is the automatic naming
of child objects, which makes it often hard to refer to
a child after the fact, since the name can not be
reconstructed systematically.
Obviously the downside of this "anonymous pick / delete"
is that we allow to pick (accept) or even delete just
any child, which happens to sit there, without being
able to detect a synchronisation mismatch between
sender and receiver.
i.e. flat match, not deep equality.
This allows to send just an Ref (with the ID) over the
wire to refer to an complete object to be picked, moved
or deleted on the receiver side.
in the first version, I defined equality to just compare the IDs
But that didn't seem right, or what one would expect by the concept
of equality (this is a long standing discussion with persistent
object-relationally mapped data).
So I changed the semantics of equaility to be "deep".
As this means possiblty to visit a whole tree depth-first,
it seems reasonable to provide the shallow "identity-comparison" likewise.
And the most reaonable choice is to use the "matches(object)" API
for that, since, in case of objects, the matches was defined
as full equality, which now seems redundant.
Thus: from now on: obj.matches(otherObj)
means they share the same IDs
The Ref-GenNode is just a specifically constructed GenNode,
and intended to be sliced down to an ordinary GenNode
immediately after construction. It seems, GCC didn't "get that"
and instead emitted an recursive invocation of the same ctor,
which obviously leads to stack overflow.
Problem solved by explicitly coding the copy initialisation,
after the full definition of Ref is available.
the type is the only meta attribute supported by now,
thus the decision was to handle this manually, instead of
introducing a full scope for meta attributes. Unfortunately
this leads to an assymetry: while it is possible to send an
attribute named "type", which will be intercepted and used
as a new type ID, the type will not show up when iterating
or searching through attributes.
When applying a diff, the only possibility is to *insert*
a new type attribute, and we need to check and handle this
likewise manually.
It is difficult to reconcile our general architecture for the
linearised diff representation with the processing of recursive,
tree-like data structures. The natural and most clean way to
deal with trees is to use recursion, i.e. the processor stack.
But in our case, this means we'd have to peek into the next
token of the language and then forward the diff iterator
into a recursive call on the nested scope. Essentially, this
breaks the separation between receiving a token sequence and
interpretation for a concrete target data structure.
For this reason, it is preferrable to make the stack an
internal state of the concrete interpreter. The downside of
this approach is the quite confusing data storage management;
we try to make the role of the storage elements a bit more
clear through descriptive accessor functions.
implement the list handling primitives analogous to the
implementation of list-diff-applicator -- just again with
the additional twist to keep the attribute and child scopes
separated.
...so now the stage is set. We can reimplement
the handling of the list diff cases here in the context
of tree diff application. The additional twist of course
being the distinction between attribute and child scope
each language token of our "linearised diff representation"
carries a payload data element, which typically is the piece
of data to be altered (added, mutated, etc).
Basically, these elements have value semantics and are
"sent over wire", and thus it seems natural when the
language interpreter functions accept that piece of payload
by-value. But since we're now sending GenNode elements as
parameter data in our diff, which typically are of the
size of 10 data elements (640 bit on a 64bit machine),
it seems more resonable to pass these argument elements
by const& through the interpreter function. This still
means we can (and will indeed) copy the mutated data
values when applying the diff, but we're able to
relay the data more efficiently to the point where
it's consumed.
this boils down to the two alternatives
- manipulate the target data structure
- build an altered copy
since our goal is to handle large tree structures efficiently,
the decision was cast in favour of data manipulation
so basically it's time to explicate the way
our diff language will actually be written.
Similar to the list diff case, it's a linear sequence
of verb tokens, but in this case, the payload value
in each token is a GenNode. This is the very reason
why GenNode was conceived as value object with an
opaque DataCap payload
while it's still not really clear how we'll use this helper
and if we need it at all -- some weeks ago I changed its
semantics to be strictly based on the delta to a reference level.
Now this means, we could go below level zero, but this doesn't
make any sense in the context of navigating a tree. Actually,
our test case triggered this situation, which caused the
reference level to wrap around, since it is stored in an
unsigned variable.
Thus I'll add a precondition to keep the level positive,
and I'll change the test to comply.
Initially I've deliberately omitted those, to nudge towards
using time quantisation and TCode formatting for any external
representation of time values.
While this recommendation is still valid, the overloaded
string conversion turns out to be helpful for unit testing
and diagnostics in compound data structures.
See Record<GenNode>