yeah, working with open fire is dangerous...
For performace reasons I've undercut the premise
to make GenNode / Record immutable. Now I'm dealing with
raw storage layout together with this quite hairy distinction
between "attribute scope" and "child scope"
In hindsight, it might have been better to implement Record
as a single list, and to maintain a shortcut pointer to jump
to the start of the attributes.
while implementing this, I've discovered a conceptual error:
we allow to accept attributes, even when we've already entered
the child scope. This means that we can not predictable get back
at the "last" (i.e. the currently touched) element, because this
might be such an attribute. So a really correct implementation
would have to memorise the "current" element, which is really
tricky, given the various ways of touching elements in our
diff language.
In the end I've decided to ignore this problem (maybe a better
solution would have been to disallow those "late" attributes?)
My reasoning is that attributes are unlikely to be full records,
rather just values, and values are never mutated. (but note
that it is definitively possible to have an record as attribute!)
...while I must admit that I'm a bit doubtful about that
language feature, but it does come in handy when manually
writing diff messages. The reason is the automatic naming
of child objects, which makes it often hard to refer to
a child after the fact, since the name can not be
reconstructed systematically.
Obviously the downside of this "anonymous pick / delete"
is that we allow to pick (accept) or even delete just
any child, which happens to sit there, without being
able to detect a synchronisation mismatch between
sender and receiver.
i.e. flat match, not deep equality.
This allows to send just an Ref (with the ID) over the
wire to refer to an complete object to be picked, moved
or deleted on the receiver side.
in the first version, I defined equality to just compare the IDs
But that didn't seem right, or what one would expect by the concept
of equality (this is a long standing discussion with persistent
object-relationally mapped data).
So I changed the semantics of equaility to be "deep".
As this means possiblty to visit a whole tree depth-first,
it seems reasonable to provide the shallow "identity-comparison" likewise.
And the most reaonable choice is to use the "matches(object)" API
for that, since, in case of objects, the matches was defined
as full equality, which now seems redundant.
Thus: from now on: obj.matches(otherObj)
means they share the same IDs
The Ref-GenNode is just a specifically constructed GenNode,
and intended to be sliced down to an ordinary GenNode
immediately after construction. It seems, GCC didn't "get that"
and instead emitted an recursive invocation of the same ctor,
which obviously leads to stack overflow.
Problem solved by explicitly coding the copy initialisation,
after the full definition of Ref is available.
the type is the only meta attribute supported by now,
thus the decision was to handle this manually, instead of
introducing a full scope for meta attributes. Unfortunately
this leads to an assymetry: while it is possible to send an
attribute named "type", which will be intercepted and used
as a new type ID, the type will not show up when iterating
or searching through attributes.
When applying a diff, the only possibility is to *insert*
a new type attribute, and we need to check and handle this
likewise manually.
It is difficult to reconcile our general architecture for the
linearised diff representation with the processing of recursive,
tree-like data structures. The natural and most clean way to
deal with trees is to use recursion, i.e. the processor stack.
But in our case, this means we'd have to peek into the next
token of the language and then forward the diff iterator
into a recursive call on the nested scope. Essentially, this
breaks the separation between receiving a token sequence and
interpretation for a concrete target data structure.
For this reason, it is preferrable to make the stack an
internal state of the concrete interpreter. The downside of
this approach is the quite confusing data storage management;
we try to make the role of the storage elements a bit more
clear through descriptive accessor functions.
implement the list handling primitives analogous to the
implementation of list-diff-applicator -- just again with
the additional twist to keep the attribute and child scopes
separated.
...so now the stage is set. We can reimplement
the handling of the list diff cases here in the context
of tree diff application. The additional twist of course
being the distinction between attribute and child scope
each language token of our "linearised diff representation"
carries a payload data element, which typically is the piece
of data to be altered (added, mutated, etc).
Basically, these elements have value semantics and are
"sent over wire", and thus it seems natural when the
language interpreter functions accept that piece of payload
by-value. But since we're now sending GenNode elements as
parameter data in our diff, which typically are of the
size of 10 data elements (640 bit on a 64bit machine),
it seems more resonable to pass these argument elements
by const& through the interpreter function. This still
means we can (and will indeed) copy the mutated data
values when applying the diff, but we're able to
relay the data more efficiently to the point where
it's consumed.
this boils down to the two alternatives
- manipulate the target data structure
- build an altered copy
since our goal is to handle large tree structures efficiently,
the decision was cast in favour of data manipulation
so basically it's time to explicate the way
our diff language will actually be written.
Similar to the list diff case, it's a linear sequence
of verb tokens, but in this case, the payload value
in each token is a GenNode. This is the very reason
why GenNode was conceived as value object with an
opaque DataCap payload
while it's still not really clear how we'll use this helper
and if we need it at all -- some weeks ago I changed its
semantics to be strictly based on the delta to a reference level.
Now this means, we could go below level zero, but this doesn't
make any sense in the context of navigating a tree. Actually,
our test case triggered this situation, which caused the
reference level to wrap around, since it is stored in an
unsigned variable.
Thus I'll add a precondition to keep the level positive,
and I'll change the test to comply.
Initially I've deliberately omitted those, to nudge towards
using time quantisation and TCode formatting for any external
representation of time values.
While this recommendation is still valid, the overloaded
string conversion turns out to be helpful for unit testing
and diagnostics in compound data structures.
See Record<GenNode>
initially the intention was to include a "bracketing construct"
into the values returned by the iterator. After considering
the various implementation and representation approaches,
it seems more appropriate just to expose a measure for the
depth-in-tree through the iterator itself, leaving any concerns
about navigation and structure reconstruction to the usage site.
As rationale we consider the full tree reconstruction as a very
specialised use case, and as such the normal "just iteration" usage
should not pay for this in terms of iterator size and implementation
complexity. Once a "level" measure is exposed, the usage site
can do precisely the same, with the help of the
HierarchyOrientationIndicator.
Whooa!
Templates are powerful.
programming this way is really fun.
under the assumption that the parts are logical,
all conceivable combinations of theses parts are bound to be correct
it passes compilation, but the test still fails, since
I've changed the expected semantics of the iteration,
in the light of the insights I've gained during
re-investigation of the IterExplorer.
What I now actually intend is rather to embed a
HierarchyOrientationIndicator into the iterator,
instead of returning a special "bracket" marker
reference to indicate return from a nested scope.
Only a Record payload constitutes a nested scope.
For all other (primitive) values, we return an empty iterator.
When used within ScopeExplorer, this implementation will just
lead to exposing any simple value once, while delving into
iteration of nested scopes
The only substantial change (besides compilation fixes) is
to confine the iteration to *const access*
This is a good thing; the whole Record/GenNode structure
was designed to represent immutable data, necessitating
a dedicated *Mutator* for any reshaping.
seemingly the operator-> was not yet used in any real scenario.
The whole point with IterAdapter is that it uses an opaque "location type",
which is owned by the controlling container. In many cases this will
actually be just a pointer into the container storage, but we
must not assume it is this way. Thus the only way to obtain a
(language) pointer is to dereference the "position type" and
take the address of the result
Initially I intended just to supply an addapter to use
the monadic IterExplorer for this recursive expansion
of GenNode contents. Investigating this approach was
relevant to highlight the minimum requirements for
such an evaluation mechanics: since our GenNode
is an hierarchical structure without back-links,
we are bound to use a stack at some point. And
since an Iterator is a materialised continuation,
we can not use the processor stack and are forced
to represent this stack in memory.
Yet, on second thought, we do not need the full power
of the IterExplorer monad; especially we do not need
to bind arbitrary functions into the monad, just one
single scope exploring function, implemented as
Variant visitor. Based on these observations, we can
"inline" the monad structure into a double nested
iterator, where the outer capsule carries a stack
of scopes to be explored.
This helper was drafted for the Job / JobPlanning and Scheduler
interface in 2013, but seemingly not yet put into action. While
in the original use case, we have a genuine measuerment for the
tree depth (given by the depth of the processing stack), in other
use cases we want to use to offset embedded within the indicator
itself for keeping track of the depth. Thus I add a second
mark operation, which usess the current offset to set a new
reference level. This has the consequence that the offset
has now to reflect the new reference point immediately
Since C++ is not a real functional programming language and
has unsafe unmanaged pointers, it is not difficult to produce
dangling references within an extended evaluation pipeline
involving transient objects and pass-by-reference.
In the initial implementation, I built in a safeguard copy
into the signature of the Explorer function, to make sure even
a transiently dressed-up input value gets materialised before
proceeding with the source sequence. Unfortunately this safeguard
turns out as a roadblock now; we might as well take the input
by reference and return an "expanded" state by value. We might
even want to do the full "expansion" on referred state, when
we're able to ensure the source values remain in memory
until consumption.
Thus now the full power of decision is placed on the signature
of the explorer function. The expansion strategies of IterExplorer
will no longer attempt to "sanitise" the signature of the passed-in
function to prevent desaster; I've added some warnings into the
documentation to highlight that danger. Basically, if you want
to be clever, then you're bound to read and understand inticacies
of the implementation.
If in doubt, use values and copying. C++ is optimised for that.