write down a first draft for a definiton section,
to describe the fundamental parts involved, when
applying a diff message onto implementation defined
data structures
After a break of tree weeks, I found it difficult to find may way
amidst all those various levels of abstraction. In addition to this
definition, we'll probably also need a high level overview of the
whole diff system operation.
...all of this implementation boils down to slightly adjusting
the code written for the test-mutation-target. Insofar it pays off now
having implemented this diagnostic and demonstration first.
Moreover I'm implementing this basic scheme of "diff application"
roughly the fourth time, thus things kindof fall into place now.
What's really hard is all those layers of abstraction in between.
Lesson learned (after being off for three weeks, due to LAC and
other obligations): I really need to document the meaning of the
closures, and I need to document the "abstract operational semantics"
of diff application, otherwise no one will be able to provide
the correct closures.
while I still keep my stance not to allow reflection and
switch-on-type, access to the internal / semantic type of
an embedded record seems a valid compromise to allow
to deal with collections of object-like children
of mixed kind.
Indirectly (and quite intentional) this also opens a loophole
to detect if a given GenNode might constitute a nested scope,
but with the for the actual nested element indeed to cary
a type symbol. Effectively this limits the use of this shortcut
to situations where the handling context does have some pre-established
knowledge about what types *might* be expected. This is precisely
the kind of constraint I intend to uphold: I do not want the
false notion of "total flexibility", as is conveyed by introspection.
since we're moving elements around to apply the diff,
dangerous situation might arise in case anyone takes a copy
of the mutator. Thus we effectively limit the possible
usage pattern and only allow to build an anonymous
TreeMutator subclass through the Builder-DSL.
The concrete "onion layers" of the TreeMutator are now limited
- to be created by the chaining operations of the Builder DSl
- to be moved into target location, retaining ownership.
I still feel somewhat queasy with this whole situation!
We need to return the product of the DSL/Builder by value,
but we also want to swap away the current contents before
starting the mutation, and we do not want a stateful lifecycle
for the mutator implementation. Which means, we need to swap
right at construction, and then we copy -- TADAAA!
Thus I'm going for the solution to disallow copying of the
mutator, yet to allow moving, and to change the builder
to move its product into place. Probably should even push
this policy up into the base class (TreeMutator) to set
everyone straight.
Looks like this didn't show up with the test dummy implementation
just because in this case the src buffer also lived within th
TestMutationTarget, which is assumed to sit where it is, so
effectively we moved around only pointers.
the whole implementation will very much be based on
my experiences with the TestMutationTarget and TestWireTap.
Insofar it was a good idea to implement this test dummy first,
as a prototype. Basically what emerges here is a standard pattern
how to implement a tree mutator:
- the TreeMutator will be a one-way-off "throwaway" object.
- its lifecylce starts with sucking away the previous contents
- consuming the diff moves contents back in place
- thus the mutator always attaches onto a target by reference
and needs the ability to manipulate the target
the collection binding can be configured with various
lambdas to supply the basic building blocks of the generated binding.
Since we allow picking up basically anything (functors,
function pointers, function objects, lamdas), and since
we speculate on inlining optimisation of lambdas, we can not
enforce a specific signature in the builder functions.
But at least we can static_assert on the effective signature
at the point where we're generating the actual binding configuration
we can't generate a static assertion so easily here.
Problem is, when forming this type, we don't know if
the user will override and provide a custom binding
in some chained call within the nested DSL.
Might still be able to come up with some clever trick,
like e.g. returing an unsuitable marker type from these
dummy default implementations and then, later on, when
actually building the collection binding, to detect
those marker types and rise a static assert at that point.
This would at least give us a better error message,
and in theory, it should always be possible to
detect this kind of misuse at compile time
...through the use of partial specialisation and SFINAE.
There are some rather specific (yet expectedly not uncommon) cases,
where we'd be able to provide a sensible default for the
- match predicate
- new element constructor
of the binding. While in all other cases, the user
has to provide an explicit implementation for these
crucial building blocks anyway.
...but does not compile, since all of the fallback functions
will be instantiated, even while in fact we're overriding them
right away with something that *can* be compiled.
this prompts me to reconsider and question the basic approach
with closures for binding, while in fact what I am doing here
is to implement an ABC.
- the test will use some really private data types,
valid only within the scope of the test function.
- invoking the builder for real got me into problems
with the aggregate initialisation I'd used.
Maybe it's the function pointers? Anyway, working
around that by definint a telescope ctor
when setting up a binding to child elements within a STL collection,
all the variable elements are preconfigured to a more or less
disabled and inactive state.
the concern is for the structure of the builder to be
incomprehensible and completely buried within the
implementation details of the various binding layers
most of the mutation primitives return bool(true)
when /any/ layer or part of the TreeMuator was able
to cope with the diff verb.
This is based on the assumption to configure the
TreeMutator in such a way that at most one facility
will actually handle and apply a given verb. That is,
we'll assume that the TreeMutator acutally wraps and
adapts *one* custom data structure, to which the
diff has to be applied.
The TestWireTap is special, insofar it indeed targets
a *second* data structure, albeit not a "real" one,
just a dest and diagnostics dummy.
the first part of the unit test (now passing)
is able to demonstrate the full set of diff operations
just by binding to a TestMutationTarget.
Now, after verifying the design of those primmitive operations,
we can now proceed with bindings to "real" data structures
when implementing the assignment and mutation primitives
it became clear that the original approach of just storing
a log or string rendered elements does not work: for
assignment, we need to locate an element by ID
this one went through unnoticed, because the situation
is not covered in unit-test. The tests written thus fare
are more like a proof-of-concept. I didn't want to spend
weeks on writing extensive coverage of all corner cases,
at least not before all aspects of the tree diff protocol
are settled. Seemingly this backfires already
now the full API for the "mutation primitives" is shaped.
Of course the actual implementation is missing, but that
should be low hanging fuit by now.
What still requires some thinking though is how to implement
the selector, so we'll actually get a onion shaped decorator
basically we'll establish a collaboration where both sides
know only the interface (contract) of the partner; a safe margin
for allocation size has to be established through metaprogramming (TODO)
what's problematic is that we leave back waste in the
internal buffer holding the source. Thus it doesn't make
sense to check if this buffer is empty. Rather the
Mutator must offer an predicate emptySrc().
This will be relevant for other implementations as well
while the original name, 'replace', conveys the intention,
this more standard name 'swap' reveals what is done
and thus opens a wider array of possible usage
now this feels like making progress again,
even when just writing stubs ;-)
Moreover, it became clear that the "typing" of typed child collections
will always be ad hoc, and thus needs to be ensured on a case by case
base. As a consequence, all mutation primitives must carry the
necessary information for the internal selector to decide if this
primitive is applicable to a given decorator layer. Because
otherwise it is not possible to uphold the concept of a single,
abstracted "source position", where in fact each typed sub-collection
of children (and thus each "onion layer" in the decorator chain)
maintains its own private position
after sleeping one night over the problem, this seems to be
the most natural solution, since the possibility of assignment
naturally arises from the fact that, for tree diff, we have
to distinguish between the *identity* of an element node and
its payload (which could be recursive). Thus, IFF the payoad
is an assignable value, why not allow to assign it. Doing so
elegnatly solves the problem with assignment of attributes
Signed-off-by: Ichthyostega <prg@ichthyostega.de>
I assumed that, since GenNode is composed of copyable and
assignable types, the standard implementation will do.
But I overlooked the run time type check on the opaque
payload type within lib::Variant. When a type mismatch
is detected, the default implementation has already
assigned and thus altered the IDs.
So we need to roll our own implementation, and to add
insult to injury, we can't use the copy-and-swap idiom either.
This is actually a STL library feature, and was added precisely
for the reason encountered here: if we want logarithmic search,
we'll have to construct a new GenNode object, just to have something
for the set to invoke the comparison operator.
C++14 introduced the convention that the Comparator of the set
may define a marker type `is_transparent` alongside with a generic
comparison operator. But, as is obvious from the source code of
our GNU Standard library implementation, our std::set has no such
overload to make use of that feature
http://en.cppreference.com/w/cpp/container/set/findhttp://stackoverflow.com/questions/20317413/what-are-transparent-comparators
The only good thing is that, just 10 minutes ago, I felt like
a complete moron because I'm writing a unit test for such a simple
storage class. ;-)
...when the Test-Nexus processes a command binding message.
In the real system of course we do not want to log every bind message.
The challenge here is the fact that command binding as such
is opaque, and the types of the data within the bind message
are opaque as well. Finally I settled on the compromise
to log them as strings, but only the DataCap part;
most value types applicable within GenNode
have a string representation to match.
the rationale is that I deliberately do not want to provide
a mechanism to iterate "over all contents in stringified form".
Because this could be seen as an invitation to process GenNode-
datastructures in an imperative way. Please recall we do not
want that. Users shall either *match* contents (using a visitor),
or they are required to know the type of the contents beforehand.
Both cases favour structural and type based programming over
dynamic run-time based inspection of contents
The actual task prompting me to add this iteration mechanism
is that I want to build a diagnostic, which allows to verify
that a binding message was sent over the bus with some
specific parameter values.
...no need to enclose empty sections when there are no
attributes or no children. Makes test code way more readable.
TestEventLog_test PASS as far as implemented
some tests rely on additional diagnostics code being linked in,
which happens, when lib/format-util.hpp is included prior to
the instantiation of lib::diff::Record rsp. lib::Variant.
The reason why i opended this can of worms was to avoid includion
of this formatting and diagnostics code into such basic headers
as lib/variant.hpp or lib/diff/gen-node.hpp
Now it turns out, that on some platforms the linker will use
a later instantiation of lib::Variant::Buff<GenNode>::operator string
in spite of a complete instantiation of this virtual function
being available already in liblumierasupport.so
But the real reason is that -- with this trickery -- we're violating
the single definition rule, so we get what we deserved.
TODO (Ticket #973): at a later point in development we have to re-assess,
the precise impact of including lib/format-util.hpp into
lib/diff/gen-node.hpp
Right now I expect GenNode to be used pervasively, so I am
reluctant to make that header too heavyweight.
because otherwise we'd need to send a whole subtree
over the wire and then descend into it just to find an element.
This too is a ripple effect of making '==' deep
well... this was quite a piece of work
Added some documentation, but a complete documentation,
preferably to the website, would be desirable, as would
be a more complete test covering the negative corner cases
yeah, working with open fire is dangerous...
For performace reasons I've undercut the premise
to make GenNode / Record immutable. Now I'm dealing with
raw storage layout together with this quite hairy distinction
between "attribute scope" and "child scope"
In hindsight, it might have been better to implement Record
as a single list, and to maintain a shortcut pointer to jump
to the start of the attributes.
while implementing this, I've discovered a conceptual error:
we allow to accept attributes, even when we've already entered
the child scope. This means that we can not predictable get back
at the "last" (i.e. the currently touched) element, because this
might be such an attribute. So a really correct implementation
would have to memorise the "current" element, which is really
tricky, given the various ways of touching elements in our
diff language.
In the end I've decided to ignore this problem (maybe a better
solution would have been to disallow those "late" attributes?)
My reasoning is that attributes are unlikely to be full records,
rather just values, and values are never mutated. (but note
that it is definitively possible to have an record as attribute!)
...while I must admit that I'm a bit doubtful about that
language feature, but it does come in handy when manually
writing diff messages. The reason is the automatic naming
of child objects, which makes it often hard to refer to
a child after the fact, since the name can not be
reconstructed systematically.
Obviously the downside of this "anonymous pick / delete"
is that we allow to pick (accept) or even delete just
any child, which happens to sit there, without being
able to detect a synchronisation mismatch between
sender and receiver.
i.e. flat match, not deep equality.
This allows to send just an Ref (with the ID) over the
wire to refer to an complete object to be picked, moved
or deleted on the receiver side.
in the first version, I defined equality to just compare the IDs
But that didn't seem right, or what one would expect by the concept
of equality (this is a long standing discussion with persistent
object-relationally mapped data).
So I changed the semantics of equaility to be "deep".
As this means possiblty to visit a whole tree depth-first,
it seems reasonable to provide the shallow "identity-comparison" likewise.
And the most reaonable choice is to use the "matches(object)" API
for that, since, in case of objects, the matches was defined
as full equality, which now seems redundant.
Thus: from now on: obj.matches(otherObj)
means they share the same IDs
The Ref-GenNode is just a specifically constructed GenNode,
and intended to be sliced down to an ordinary GenNode
immediately after construction. It seems, GCC didn't "get that"
and instead emitted an recursive invocation of the same ctor,
which obviously leads to stack overflow.
Problem solved by explicitly coding the copy initialisation,
after the full definition of Ref is available.
the type is the only meta attribute supported by now,
thus the decision was to handle this manually, instead of
introducing a full scope for meta attributes. Unfortunately
this leads to an assymetry: while it is possible to send an
attribute named "type", which will be intercepted and used
as a new type ID, the type will not show up when iterating
or searching through attributes.
When applying a diff, the only possibility is to *insert*
a new type attribute, and we need to check and handle this
likewise manually.
It is difficult to reconcile our general architecture for the
linearised diff representation with the processing of recursive,
tree-like data structures. The natural and most clean way to
deal with trees is to use recursion, i.e. the processor stack.
But in our case, this means we'd have to peek into the next
token of the language and then forward the diff iterator
into a recursive call on the nested scope. Essentially, this
breaks the separation between receiving a token sequence and
interpretation for a concrete target data structure.
For this reason, it is preferrable to make the stack an
internal state of the concrete interpreter. The downside of
this approach is the quite confusing data storage management;
we try to make the role of the storage elements a bit more
clear through descriptive accessor functions.
implement the list handling primitives analogous to the
implementation of list-diff-applicator -- just again with
the additional twist to keep the attribute and child scopes
separated.
...so now the stage is set. We can reimplement
the handling of the list diff cases here in the context
of tree diff application. The additional twist of course
being the distinction between attribute and child scope
each language token of our "linearised diff representation"
carries a payload data element, which typically is the piece
of data to be altered (added, mutated, etc).
Basically, these elements have value semantics and are
"sent over wire", and thus it seems natural when the
language interpreter functions accept that piece of payload
by-value. But since we're now sending GenNode elements as
parameter data in our diff, which typically are of the
size of 10 data elements (640 bit on a 64bit machine),
it seems more resonable to pass these argument elements
by const& through the interpreter function. This still
means we can (and will indeed) copy the mutated data
values when applying the diff, but we're able to
relay the data more efficiently to the point where
it's consumed.
this boils down to the two alternatives
- manipulate the target data structure
- build an altered copy
since our goal is to handle large tree structures efficiently,
the decision was cast in favour of data manipulation
so basically it's time to explicate the way
our diff language will actually be written.
Similar to the list diff case, it's a linear sequence
of verb tokens, but in this case, the payload value
in each token is a GenNode. This is the very reason
why GenNode was conceived as value object with an
opaque DataCap payload
initially the intention was to include a "bracketing construct"
into the values returned by the iterator. After considering
the various implementation and representation approaches,
it seems more appropriate just to expose a measure for the
depth-in-tree through the iterator itself, leaving any concerns
about navigation and structure reconstruction to the usage site.
As rationale we consider the full tree reconstruction as a very
specialised use case, and as such the normal "just iteration" usage
should not pay for this in terms of iterator size and implementation
complexity. Once a "level" measure is exposed, the usage site
can do precisely the same, with the help of the
HierarchyOrientationIndicator.
Whooa!
Templates are powerful.
programming this way is really fun.
under the assumption that the parts are logical,
all conceivable combinations of theses parts are bound to be correct
it passes compilation, but the test still fails, since
I've changed the expected semantics of the iteration,
in the light of the insights I've gained during
re-investigation of the IterExplorer.
What I now actually intend is rather to embed a
HierarchyOrientationIndicator into the iterator,
instead of returning a special "bracket" marker
reference to indicate return from a nested scope.
Only a Record payload constitutes a nested scope.
For all other (primitive) values, we return an empty iterator.
When used within ScopeExplorer, this implementation will just
lead to exposing any simple value once, while delving into
iteration of nested scopes
The only substantial change (besides compilation fixes) is
to confine the iteration to *const access*
This is a good thing; the whole Record/GenNode structure
was designed to represent immutable data, necessitating
a dedicated *Mutator* for any reshaping.
Initially I intended just to supply an addapter to use
the monadic IterExplorer for this recursive expansion
of GenNode contents. Investigating this approach was
relevant to highlight the minimum requirements for
such an evaluation mechanics: since our GenNode
is an hierarchical structure without back-links,
we are bound to use a stack at some point. And
since an Iterator is a materialised continuation,
we can not use the processor stack and are forced
to represent this stack in memory.
Yet, on second thought, we do not need the full power
of the IterExplorer monad; especially we do not need
to bind arbitrary functions into the monad, just one
single scope exploring function, implemented as
Variant visitor. Based on these observations, we can
"inline" the monad structure into a double nested
iterator, where the outer capsule carries a stack
of scopes to be explored.
horay!
seems like madness?
well -- found and squashed a bug: equality on RecordRef
implicitly converted to GenNode(RecordRef), which always
generates new (distinct) IDs and so never succeeds. What
we really want is equality test on the references
There is no generic implementation for these functions, since
they are highly dependent on the payload used within Record<TY>
Here we use Record<GenNode>, which turns the whole setup into an
recursive data type; we especially rely on the fact that each
GenNode has an embedded symbolic ID, and we use this ID to encode
the 'key' for named attributes
initially my intention was to use the ID for equality test.
But on a second thought, this seemed like a bad idea, since
it confuses the concepts of equality and identity.
Note: at the moment, I do not know if we even need an equality test,
so it is provided here rather for sake of completeness. And this
means even more that we want an 'equality' implementation that
does what one would naively expect: compare the object identity
*and* compare the contents.
not entirely sure about the design, but lets try this approach:
they can be "cloned" and likewise move-assigned, but we do not
allow the regular assignment, because this would enable to use
references like pointers (what we deliberately do not want)
especially setting (changing) attributes turned out to be tricky,
since in case of a GenNode this would mean to re-bind the hash ID;
we can not possibly do that properly without knowing the type of the payload,
and by design this payload type is opaque (erased).
As resort, I changed the semantics of the assign operation:
now it rather builds a new payload element, with a given initialiser.
In case of the strings, this ends up being the same operation,
while in case of GenNode, this is now something entirely different:
we can now build a new GenNode "in place" of the old one, and both
will have the same symbolic ID (attribute key). Incidentally,
our Variant implementation will reject such a re-building operatinon
when this means to change the (opaque) payload type.
in addition, I created a new API function on the Mutator,
allowing to move-in a complete attribute object. Actually this
new function became the working implementation. This way, it is
still possible to emplace a new attribute efficiently (consider
this to be a whole object graph!). But only, if the key (ID)
embedded in the attribute object is already what is the intended
key for this attribute. This way, we elegantly circumvent the
problem of having to re-bind a hash ID without knowing the type seed
initially, the intention was to inject the type as a magic attribute.
But this turned out to make the implementation brittle, asymmetric
and either quite demanding, or inefficient.
The only sane approach would be to introduce a third collection,
the metadata attributes. Then it would be possible to handle these
automatically, but expose them through the iterator.
In the end I decided against it, just the type attribute
allone does not justify that effort. So now the type is an
special magic field and kept apart from any object data.
this solves the problem how to deal with value access
- for the simple default (string) implementation,
we use a 'key = val' syntax and thus have to split strings,
which means we need to return contents by value
- for the actual relevant use case we have GenNode entries,
which may recursively hold further Records. For dealing
with diff messages over this data struture, its a good
idea to allow for const& value access (otherwise we'd
end up copying large subtrees for trivial operaions)
OMG, what was all this about?
OK... this cant possibly work this way.
At least we need to trim after splitting the attributes.
But this is not enough, we want the value, which implies
to make the type flexible (since we cant return a const& to
a substring extracted on-the-fly)
Note: not fixing all relevant warnings.
Especially, the "-Woverloaded-virtual" of Clang defeats the whole purpose
of generated generic interfaces. For example, our Variant type is instantiated
with a list of types the variant can hold. Through metaprogramming, this
instantiation generates also an embedded Visitor interface, which has
virtual 'handle(TY)' functions for all the types in question
The client now may implement, or even partially implement this Visitor,
to retrieve specific data out of given Variant instance with unknown conent.
To complain that some other virtual overload is now shaddowed is besides the point,
so we might consider to disable this warning altogether
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56402
The lambda definition captures the this pointer,
but the ctor of the lamda does not initialise this capture.
In our case, we're lucky, as we don't use the "this" pointer;
otherwise, we'd get a crash a runtime.
Fixed since GCC-4.7.3 --> it's *really* time to upgrade to Debian/Jessie
after sleeping a night over this, it seems obvios
that we do not want to start the build proces "implicitly",
starting from a Record<GenNode>. Rather, we always want
the user to plant a dedicated Mutator object, which then
can remain noncopyable and is passed by reference through
the whole builder chain. Movin innards of *this object*
are moved away a the end of the chain does not pose much risk.
especially I've now decided how to handle const-ness:
We're open to all forms of const-ness, the actual usage decides.
const GenNode will only expose a const& to the data values
still TODO is the object builder notation for diff::Record
forwarding equality to the embedded EntryID
Basically, two GenNodes are equal when they have the same "identity"
Ironically, this is the usual twist with database entities
on a second thought, this "workaround" does not look so bad,
due to the C++11 feature to request the default implementation explicitly.
Maybe we'll never need a generic solution for these cases
I decided to allow for an 'unbound' reference to allow
default construction of elements involving record references.
I am aware of the implications, but I place the focus
on the value nature of GenNode elements; the RecordRef
was introduced only as a means to cary out diff comparisons
and similar computations.
before engaging into the implementation of lib::Record,
I prefer to conduct a round of planning, to get a clearer
view about the requirements we'll meet when extending
our existing list diff to tree structures
Initially, I considered to build an index table like
collection of ordered attributes. But since our actual
use case is Record<GenNode>, this was ruled out in favour
of just a vector<GenNode>, where the keys are embedded
right within the nameID-Field of GenNode.
A decisive factor was the observation, that this design
is basically forced to encode the attribute keys somehow
into the attribute values, because otherwise the whole
collection like initialisation and iteration would break
down. Thus, a fully generic implementation is not possible,
and a pseudo generic implementation just for the purpose of
writing unit tests would be overkill.
Basically this decision means that Record requires an
explicit specialisation to implement the attribute-key
binding for each value type to use.
The actual trick to make it work is to use decltype on the function operator
http://stackoverflow.com/questions/7943525/is-it-possible-to-figure-out-the-parameter-type-and-return-type-of-a-lambda/7943765#7943765
In addition, we now pick up the functor by template type and
store it under that very type. For one, this cuts the size
of the generated class by a factor of two. And it gives the
compiler the ability to inline a closure as much as is possible,
especially when the created Binder / Mutator lives in the same
reference frame the closure taps into.