since this is meant as a policy implementation, reduce it to the bare operation;
the actual container storage handling logic shall be implemented in the container
and based on those primitive and configurable base operations
...still fighting to get the design of the `AllocationPolicy`
settled to work well with `AllocationCluster` while also allowing
to handle data types which are (not) trivially copyable.
This changeset attempts to turn the logic round: now we capture
an ''move exclusion flag'' and otherwise allow the Policy to
decide on its own, based on the ''element type''
- verifies if new element can just fit in
- otherwise ensure the storage adjustments are basically possible
- throw exception in case the new element can not be accommodated
- else request possible storage adjustments
- and finally let the allocator place the new element
Draft skeleton of the logic for element creation.
This turns out to be a rather challenging piece of code,
since we have to rely on logical reasoning about properties
of the element types in order to decide if and how these
elements can be emplaced, including the possibility to
re-allocate and move existing data to a new location.
- if we know the exact element type, we can handle any
copyable or movable object
- however, if the container is filled with a mixture of types,
we can not re-allocate or grow dynamically, unless all data
is trivially copyable (and can thus be handled through memmove)
- moreover we must ensure the ability to invoke the proper destructor
In-depth analysis of storage management revealed a misconception
with respect to possible storage optimisations, requiring more
metadata fields to handle all corner cases correctly.
It seems prudent to avoid any but the most obvious optimisations
and wait for real-world usage for a better understanding of the
prevalent access patterns. However, in preparation for any future
optimisations, all access coordination and storage metadata is
now relocated into the `ArrayBucket`, and thus resides within the
managed allocation, allowing for localised layout optimisations.
To place this into context: the expected prevalent use case is
for the »Render Nodes Network«, which relies on `AllocationCluster`
for storage management; most nodes will have only a single predecessor
or successor, leading to a large number of lib::Several intsances
populated with a single data element. In such a scenario, it is
indeed rather wasteful to allocate four »slot« of metadata for
each container instance; even more so since most of this
metadata is not even required in such a scenario.
...which basically ''seems doable'' now, yet turns up several unsolved problems
- need a way to handle excess storage for the raw allocation
- generally should relocate all metadata into the ArrayBucket
- mismatch at various APIs; must re-think where to pass size explicitly
- unclear yet how and where to pass the actual element type to create
...turns out to be rather challenging, due to the far reaching requirements
* the default case (heap allocation) ''must work out-of-the box''
* optionally a C++ standard conformant `Allocator` can be adapted
* which works correct even in case this allocator is ''not a monostate''
* **essential requirement** is to pass an `AllocationCluster` reference directly
* need a ''generic extension point'' to adapt to similar elaborate custom schemes
__Note__: especially we want to create a direct collaboration between the allocation policy and the underlying allocator to allow support for a dedicate ''realloc operation''
- code spelled out as intended, according to generic scheme
- can now encode the »unmanaged« case directly as `null`-deleter,
because in all other cases a deleter function is mandatory now
- add default constructor to `ArrayBucket`, detailing the default spread
even while at first sight only a ''deleter instance'' is required,
it seems prudent to rearrange the code in accordance to the prospective
Allocator / Object Factory concept, and rather try to incorporate
the specifics of the memory layout into this generic view, thereby
abstracting the actual allocator away.
This can be achieved by using a standard-allocator for `std::byte`
as the base allocator and treat each individual element allocator
as a specialised cross-allocator (assuming that this cross adaptation
is actually trivial in almost all cases)
The fundamental decision is that we want to have a single generic front-end,
meaning that we must jump dynamically into a configured deleter function.
And on top of that comes the additional requirement that ''some allocators''
are in fact tied to a specific instance, while other allocators are monostate.
However, we can distinguish both by probing if the allocator can be default constructed,
and if a default constructed allocator is equivalent to the currently used alloctor instance.
If this test fails, we must indeed maintain a single allocator instance,
and (to avoid overengineering for this rather special use case) we will
place this allocator instance into heap memory then, with a self-cleanup mechanism
On the other hand, all monostate allocators can be handled through static trapolines.
- the basic decision is to implement ''realloc'' similar to `std::vector`
- however the situation is complicated by the desire to allow arbitrary element types
- ⟹ must build a strategy based on the properties of the target type
- the completely dynamic growth is only possibly for trivially-movable types
- can introduce a dedicated ''element type'' though, and store a trampolin handler
- create by forwarding allocator arguments to policy
- builder-Op to append from iterator
- decide to collapse the ArrayBucket class, since
access is going through unsafe pointer arithmetic anyway
- favour dynamic polymorphism
- use additional memory for management data alongside the element allocation
- encode a flag and a deleter pointer to enable ownership of the allocation
- inherit base container privately into builder, so the build ends with a slice
Some decisions
- use a single template with policy base
- population via separate builder class
- implemented similar to vector (start/end)
- but able to hold larger (subclass) objects
- basically works out-of-the-box now
- the hard wired fixed Extent size is a serious limitation
- however, this is not the intended primary use, rather complementary
...this is an important detail: quite commonly, a custom allocator
is actually implemented as monostate, to avoid bloating every client container
with a backlink pointer; by inheriting the `StdFactory` adapter from the
allocator, the empty-base optimisation can be exploited.
In the standard case thus LinkedElements is the same size as a single
pointer, which is already exploited at several places in the code base.
Notably `AllocationCluster` uses a »virtual overlay« to dress-up the
position pointer as `LinkedElements`, allowing to delegate most of the
administration and memory management to existing and verified code.
With this adjustments, `LinkedElements` pass the tests again
and the rework of `AllocationCluster` is considered complete.
This is the first validation of the new design:
the policy to take ownership can be reimplemented simply
by delegating to the adaptor for a C++ standard allocator
The following structure can be expected, after __switching to C++20__
* Concept **Allocator** deals with the bare memory allocation
* Concept **Factory** handles object creation and disposal by delegation
* Concept **Handle** is a ready-made functor for dependency-injection
Right now, an implementation of the ''prospective Factory Concept''
can be provided, by delegating through `std::allocator_traits` to a given
`std::allocator` or compatible object
By default, LinkedElements uses a policy OwningHeapAllocated;
while retaining this interface, this policy should be recast
to rely on a standard compliant allocator, with a default
fallback to `std::allocator<T>`
This way, a single policy would serve all the cases where
objects are actually owned and managed by `LinkedElements`,
and most special policies would be redundant.
This turns out to be quite tedious and technical however,
since the newer standard mandates to use std::allocator_traits
as front-end, and moreover the standard allocators are always
tied to one specific target type, while `LinkedElements` is
deliberately used to maintain a polymorphic sequence.
since the calculation to find the current block start
has been recast as a private method, it is now possible to
calculate the allocation statistics without mutating the pos pointer.
To enable such usages, add a wrapper for `LinkedElements` to expose
an element-pointer temporarily as a immutable `LinkedElements` list,
allowing to iterate or use subscript and size information functions
...what I've implemented yesterday is effectively the same functionality
as provided automatically by the C++ object system when using a virtual destructor.
Thus a much cleaner solution is to turn `Destructor` into a interface
and let C++ do all the hard work.
Verified in test: works as intended
This is the first draft, implementing the invocation explicitly
through a trampoline function. While it seems to work,
the formulation can probably be simplified....
These diagnostics helpers must rely on low-level trickery,
since the implementation strives at avoiding unnecessary storage overhead.
Since `AllocationCluster` is move-only (for good reasons) and `StorageManager`
can not be constructed independently, a »backdoor« is created by
forced cast, relying on the known memory layout
- rather accept hard-wired limits than making the implementation excessively generic
- by exploiting the layout, the administrative overhead can be reduced significantly
- the trick with the "virtual managment overlay" allows to hand-off most of the
clean-up work to C++ destructor invocation
- it is important to verify these low-level arrangements explicitly by unit-test
* this is pure old-style low-level trickery
* using a layout trick, the `AllocationCluster`
can be operated with the bare minimum of overhead
* this trick relies on the memory layout of `lib::LinkedElements`
...due to the decision to use a much simpler allocation scheme
to increase probability for actual savings, after switching the API
and removing all trading related aspects, a lot of further code is obsoleted
Notably this raises the difficult question,
whether to ensure **invocation of destructors**.
Not invoking dtors ''breaks one of the most fundamental contracts''
of the C++ language — yet the infrastructure to invoke dtors in such
a heterogeneous cluster of allocations creates a hugely significant
overhead and is bound to poison the caches (objects to be deallocated
typically sit in cold memory pages).
What makes this decision especially daunting is the fact that the
low-level-Model can be expected to be one of the largest systemic
data structures (letting aside the media buffers).
I am leaning towards a compromise: turn down this decision
towards the user of the `AllocationCluster`
After some analysis, it became clear that the existing code for
`AllocationCluster` (while in itself valid) will likely miss the point
for the expected usage in the low-level Model: most segments of the
model will be rather small, and thus there is not enough potential for
amortisation when using such a per-type and per-segment scheme;
a rather simplistic linear allocator will be sufficient.
On the other hand, with the current C++ standard it is easy to provide
a complient allocator implementation for STL containers, and thus the
interface should be retro-fitted accordingly.
At the time of the initial design attempts, I naively created a
classic interface to describe an fixed container allocated ''elsewhere.''
Meanwhile the C++ language has evolved and this whole idea looks
much more as if it could be a ''Concept'' (C++20). Moreover, having
several implementations of such a container interface is deemed inadequate,
since it would necessitate ''at least two indirections'' — while
going the Concept + Template route would allow to work without any
indirection, given our current understanding that the `ProcNode` itself
is ''not an interface'' — rather a building block.
- the starting point is the idea to build a dedicated ''turnout system''
- `StateAdapter`, `BuffTable` ⟶ `FeedManifold` and _Invocation_ will be fused
- actually, the `TurnoutSystem` will be ''pulled'' and orchestrate the invocation
- the structure is assumed to be recursive
The essence of the Node-Invocation, as developed 2009 / 2011 remains intact,
yet it will be organised along a clearer structure
Within the existing body of code, there are two unfinished attempts
towards building a node invocation and management of data buffers.
The first attempt was entirely driven from the angle of invoking a
processing function, while the second one draws from a wider scope
and can be considered the solution to build upon regarding data buffers
in general. However, the results of the first approach are well suited
for their specific purpose, so both solutions will be combined.
Thus the arrangement of data feeds going in and out of the render node
shall be renamed into `BuffTable` -> `FeedManifold`
...which seems to be basically fine thus far
...beyond some renaming and rearranging
''it turns out that the final, crucial links,
necessary to tie all together, are yet to be developed''
Facing quite some difficulties here, since there are (at least)
two abandoned past efforts towards building a render node network
in the code base; the structure and architecture decisions from these
previous attempts seem largely valid still, yet on a technical level,
the style of construction evolved considerably in the meantime. Moreover,
these old fragments of code, written during the early stages of the
project, were lacking clear goals and anchor points at places;
the situation is quite different now in this respect.
Sticking to well proven practice, the rework will be driven by a test setup,
and will progress over three steps with increasing levels of integration.
The initial effort of building a Scheduler can now be **considered complete**
Reaching this milestone required considerable time and effort, including
an extended series of tests to weld out obvious design and implementation flaws.
While the assessment of the new Scheduler's limitation and traits is ''far from complete,''
some basic achievements could be confirmed through this extended testing effort:
* the Scheduler is able to follow a given schedule effectively,
until close up to the load limit
* the ''stochastic load management'' causes some latency on isolated events,
in the order of magnitude < 5ms
* the Scheduler is susceptible to degradation through Contention
* as mitigation, the Scheduler prefers to reduce capacity in such a situation
* operating the Scheduler effectively thus requires a minimum job size of 2ms
* the ability for sustained operation under full nominal load has been confirmed
by performing **test sequences with over 80 seconds**
* beyond the mentioned latency (<5ms) and a typical turnaround of 100µs per job
(for debug builds), **no further significant overhead** was found.
Design, Implementation and Testing were documented extensively in the [https://lumiera.org/wiki/renderengine.html#Scheduler%20SchedulerProcessing%20SchedulerTest%20SchedulerWorker%20SchedulerMemory%20RenderActivity%20JobPlanningPipeline%20PlayProcess%20Rendering »TiddlyWiki« #Scheduler]
This test completes the stress-testing effort
and summarises the findings
* Scheduler performs within relevant parameter range without significant overhead
* Scheduler can operate with full load in stable state, with 100% correct result
The behaviour seems consistent and the schedule breaks at the expected point.
At first sight, concurrency seems slightly to low; detailed investigation
however shows that this is due to the structure of the load graph,
and in fact the run time comes close to optimal values.