Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
/*
|
|
|
|
|
|
Parse(Test) - verify parsing textual specifications
|
|
|
|
|
|
|
|
|
|
|
|
Copyright (C)
|
|
|
|
|
|
2024, Hermann Vosseler <Ichthyostega@web.de>
|
|
|
|
|
|
|
|
|
|
|
|
**Lumiera** is free software; you can redistribute it and/or modify it
|
|
|
|
|
|
under the terms of the GNU General Public License as published by the
|
|
|
|
|
|
Free Software Foundation; either version 2 of the License, or (at your
|
|
|
|
|
|
option) any later version. See the file COPYING for further details.
|
|
|
|
|
|
|
|
|
|
|
|
* *****************************************************************/
|
|
|
|
|
|
|
|
|
|
|
|
/** @file parse-test.cpp
|
|
|
|
|
|
** unit test \ref Parse_test
|
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#include "lib/test/run.hpp"
|
|
|
|
|
|
#include "lib/test/test-helper.hpp"
|
|
|
|
|
|
#include "lib/parse.hpp"
|
|
|
|
|
|
//#include "lib/format-util.hpp"
|
|
|
|
|
|
#include "lib/meta/tuple-helper.hpp"
|
|
|
|
|
|
#include "lib/test/diagnostic-output.hpp"//////////////////TODO
|
|
|
|
|
|
//#include "lib/util.hpp"
|
|
|
|
|
|
|
|
|
|
|
|
//#include <vector>
|
|
|
|
|
|
//#include <memory>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
namespace util {
|
|
|
|
|
|
namespace parse{
|
|
|
|
|
|
namespace test {
|
|
|
|
|
|
|
2025-01-21 03:53:29 +01:00
|
|
|
|
using lib::test::showType;
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
using lib::meta::typeSymbol;
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
using lib::meta::is_Tuple;
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
using std::decay_t;
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
using std::get;
|
|
|
|
|
|
// using util::join;
|
|
|
|
|
|
// using util::isnil;
|
|
|
|
|
|
// using std::vector;
|
|
|
|
|
|
// using std::shared_ptr;
|
|
|
|
|
|
// using std::make_shared;
|
|
|
|
|
|
|
|
|
|
|
|
// using LERR_(ITER_EXHAUST);
|
|
|
|
|
|
// using LERR_(INDEX_BOUNDS);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
namespace { // test fixture
|
|
|
|
|
|
|
|
|
|
|
|
// const uint NUM_ELMS = 10;
|
|
|
|
|
|
|
|
|
|
|
|
// using Numz = vector<uint>;
|
|
|
|
|
|
|
|
|
|
|
|
} // (END)fixture
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/************************************************************************//**
|
|
|
|
|
|
* @test verify helpers and shortcuts for simple recursive descent parsing
|
|
|
|
|
|
* of structured data and specifications.
|
|
|
|
|
|
*
|
|
|
|
|
|
* @see parse.hpp
|
|
|
|
|
|
* @see proc-node.cpp "usage example"
|
|
|
|
|
|
*/
|
|
|
|
|
|
class Parse_test : public Test
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
|
|
|
|
virtual void
|
|
|
|
|
|
run (Arg)
|
|
|
|
|
|
{
|
|
|
|
|
|
simpleBlah();
|
|
|
|
|
|
acceptTerminal();
|
|
|
|
|
|
acceptSequential();
|
2025-01-19 23:11:25 +01:00
|
|
|
|
acceptAlternatives();
|
2025-01-22 16:42:28 +01:00
|
|
|
|
acceptIterWithDelim();
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
acceptOptionally();
|
|
|
|
|
|
acceptBracketed();
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/** @test TODO just blah. */
|
|
|
|
|
|
void
|
|
|
|
|
|
simpleBlah ()
|
|
|
|
|
|
{
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-01-18 22:18:44 +01:00
|
|
|
|
|
|
|
|
|
|
/** @test define a terminal symbol to match by parse. */
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
void
|
|
|
|
|
|
acceptTerminal()
|
|
|
|
|
|
{
|
|
|
|
|
|
// set up a parser function to accept some token as terminal
|
|
|
|
|
|
auto parse = Parser{"hello (\\w+) world"};
|
|
|
|
|
|
string toParse{"hello vile world of power"};
|
|
|
|
|
|
auto eval = parse (toParse);
|
|
|
|
|
|
CHECK (eval.result);
|
2025-01-18 22:18:44 +01:00
|
|
|
|
auto res = *eval.result; // ◁——————————— the »result model« of a terminal parse is the RegExp-Matcher
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
CHECK (res.ready() and not res.empty());
|
|
|
|
|
|
CHECK (res.size() == "2"_expect );
|
|
|
|
|
|
CHECK (res.position() == "0"_expect );
|
|
|
|
|
|
CHECK (res.str() == "hello vile world"_expect );
|
|
|
|
|
|
CHECK (res[1] == "vile"_expect );
|
|
|
|
|
|
CHECK (res.suffix() == " of power"_expect );
|
|
|
|
|
|
|
2025-01-18 22:18:44 +01:00
|
|
|
|
auto syntax = Syntax{move (parse)}; // Build a syntax clause from the simple terminal symbol parser
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
CHECK (not syntax.hasResult());
|
|
|
|
|
|
syntax.parse (toParse);
|
2025-01-18 22:18:44 +01:00
|
|
|
|
CHECK (syntax.success()); // Syntax clause holds an implicit state from the last parse
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
CHECK (syntax.getResult()[1] == "vile"_expect);
|
|
|
|
|
|
|
|
|
|
|
|
// shorthand notation to start building a syntax
|
|
|
|
|
|
auto syntax2 = accept ("(\\w+) world");
|
|
|
|
|
|
CHECK (not syntax2.hasResult());
|
|
|
|
|
|
syntax2.parse (toParse);
|
|
|
|
|
|
CHECK (not syntax2.success());
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
string bye{"cruel world"};
|
|
|
|
|
|
syntax2.parse (bye);
|
|
|
|
|
|
CHECK (syntax2.success());
|
|
|
|
|
|
CHECK (syntax2.getResult()[1] == "cruel"_expect);
|
|
|
|
|
|
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
// Going full circle: extract Parser definition from syntax
|
|
|
|
|
|
auto parse2 = Parser{syntax2};
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
CHECK (eval.result->str(1) == "vile");
|
|
|
|
|
|
eval = parse2 (toParse);
|
|
|
|
|
|
CHECK (not eval.result);
|
|
|
|
|
|
eval = parse2 (bye);
|
|
|
|
|
|
CHECK (eval.result->str(1) == "cruel");
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-01-18 22:18:44 +01:00
|
|
|
|
|
2025-01-22 02:21:39 +01:00
|
|
|
|
/** @test define a sequence of syntax structures to match by parse.
|
|
|
|
|
|
* - first demonstrate explicitly how the consecutive parsing works
|
|
|
|
|
|
* and how both models are combined into a product model (tuple)
|
|
|
|
|
|
* - demonstrate how leading whitespace is skipped automatically
|
|
|
|
|
|
* - then perform the same parse with a Syntax clause build with
|
|
|
|
|
|
* the `seq()` builder-DSL
|
|
|
|
|
|
* - extend this Syntax by adding a further sequential clause.
|
|
|
|
|
|
*/
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
void
|
|
|
|
|
|
acceptSequential()
|
2025-01-22 02:21:39 +01:00
|
|
|
|
{ //_______________________________________________
|
2025-01-18 22:18:44 +01:00
|
|
|
|
// Demonstration: how sequence combinator works....
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
auto term1 = buildConnex ("hello");
|
|
|
|
|
|
auto term2 = buildConnex ("world");
|
|
|
|
|
|
auto parseSeq = [&](StrView toParse)
|
|
|
|
|
|
{
|
|
|
|
|
|
using R1 = decltype(term1)::Result;
|
|
|
|
|
|
using R2 = decltype(term2)::Result;
|
|
|
|
|
|
using ProductResult = std::tuple<R1,R2>;
|
|
|
|
|
|
using ProductEval = Eval<ProductResult>;
|
|
|
|
|
|
auto eval1 = term1.parse (toParse);
|
|
|
|
|
|
if (eval1.result)
|
|
|
|
|
|
{
|
2025-01-18 00:20:24 +01:00
|
|
|
|
uint end1 = eval1.consumed;
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
StrView restInput = toParse.substr(end1);
|
|
|
|
|
|
auto eval2 = term2.parse (restInput);
|
|
|
|
|
|
if (eval2.result)
|
|
|
|
|
|
{
|
2025-01-18 00:20:24 +01:00
|
|
|
|
uint consumedOverall = end1 + eval2.consumed;
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
return ProductEval{ProductResult{move(*eval1.result)
|
2025-01-18 00:20:24 +01:00
|
|
|
|
,move(*eval2.result)}
|
|
|
|
|
|
,consumedOverall
|
|
|
|
|
|
};
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
return ProductEval{std::nullopt};
|
|
|
|
|
|
};
|
|
|
|
|
|
string s1{"hello millions"};
|
2025-01-18 22:18:44 +01:00
|
|
|
|
string s2{"hello world"};
|
|
|
|
|
|
string s3{" hello world trade "};
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
|
|
|
|
|
|
auto e1 = parseSeq(s1);
|
2025-01-18 22:18:44 +01:00
|
|
|
|
CHECK (not e1.result); // Syntax 'hello'>>'world' does not accept "hello millions"
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
auto e2 = parseSeq(s2);
|
|
|
|
|
|
CHECK ( e2.result);
|
|
|
|
|
|
|
2025-01-22 01:11:05 +01:00
|
|
|
|
using SeqRes = decltype(e2)::Result; // Note: the result type depends on the actual syntax construction
|
2025-01-18 22:18:44 +01:00
|
|
|
|
CHECK (is_Tuple<SeqRes>()); // Result model from sequence is the tuple of terminal results
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
auto& [r1,r2] = *e2.result;
|
|
|
|
|
|
CHECK (r1.str() == "hello"_expect);
|
|
|
|
|
|
CHECK (r2.str() == "world"_expect);
|
|
|
|
|
|
|
2025-01-18 22:18:44 +01:00
|
|
|
|
CHECK (term2.parse(" world").result); // Note: leading whitespace skipped by the basic terminal parsers
|
|
|
|
|
|
CHECK (term2.parse("\n \t world ").result);
|
|
|
|
|
|
CHECK (not term2.parse(" old ").result);
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-01-22 02:21:39 +01:00
|
|
|
|
//___________________________________________________
|
2025-01-18 22:18:44 +01:00
|
|
|
|
// DSL parse clause builder: a sequence of terminals...
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
auto syntax = accept("hello").seq("world");
|
|
|
|
|
|
|
2025-01-18 22:18:44 +01:00
|
|
|
|
// Perform the same parse as demonstrated above....
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
CHECK (not syntax.hasResult());
|
|
|
|
|
|
syntax.parse(s1);
|
|
|
|
|
|
CHECK (not syntax.success());
|
|
|
|
|
|
syntax.parse(s2);
|
|
|
|
|
|
CHECK (syntax);
|
|
|
|
|
|
SeqRes seqModel = syntax.getResult();
|
|
|
|
|
|
CHECK (get<0>(seqModel).str() == "hello"_expect);
|
|
|
|
|
|
CHECK (get<1>(seqModel).str() == "world"_expect);
|
2025-01-18 00:20:24 +01:00
|
|
|
|
|
2025-01-18 22:18:44 +01:00
|
|
|
|
|
|
|
|
|
|
// can build extended clause from existing one
|
2025-01-18 00:20:24 +01:00
|
|
|
|
auto syntax2 = syntax.seq("trade");
|
|
|
|
|
|
CHECK (not syntax2.hasResult());
|
|
|
|
|
|
syntax2.parse(s2);
|
|
|
|
|
|
CHECK (not syntax2.success());
|
|
|
|
|
|
syntax2.parse(s3);
|
|
|
|
|
|
CHECK (syntax2.success());
|
2025-01-18 22:18:44 +01:00
|
|
|
|
auto seqModel2 = syntax2.getResult(); // Note: model of consecutive sequence is flattened into a single tuple
|
2025-01-18 00:20:24 +01:00
|
|
|
|
CHECK (get<0>(seqModel2).str() == "hello"_expect);
|
|
|
|
|
|
CHECK (get<1>(seqModel2).str() == "world"_expect);
|
|
|
|
|
|
CHECK (get<2>(seqModel2).str() == "trade"_expect);
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
}
|
2025-01-19 23:11:25 +01:00
|
|
|
|
|
|
|
|
|
|
|
2025-01-22 02:21:39 +01:00
|
|
|
|
|
2025-01-22 16:42:28 +01:00
|
|
|
|
/** @test define alternative syntax structures to match by parse.
|
2025-01-21 03:53:29 +01:00
|
|
|
|
* - first demonstrate how a model with alternative branches can be
|
|
|
|
|
|
* populated and gradually extended while searching for a match.
|
2025-01-22 02:21:39 +01:00
|
|
|
|
* - then show explicitly the logic to check and select branches
|
|
|
|
|
|
* and construct the corresponding sum-model (variant)
|
2025-01-21 03:53:29 +01:00
|
|
|
|
*/
|
2025-01-19 23:11:25 +01:00
|
|
|
|
void
|
|
|
|
|
|
acceptAlternatives()
|
2025-01-22 01:11:05 +01:00
|
|
|
|
{ //_______________________________
|
|
|
|
|
|
// Demonstrate Alt-Model mechanics
|
2025-01-20 23:55:42 +01:00
|
|
|
|
using R1 = char;
|
|
|
|
|
|
using R2 = string;
|
|
|
|
|
|
using R3 = double;
|
|
|
|
|
|
|
2025-01-22 01:11:05 +01:00
|
|
|
|
// build Model-Alternatives incrementally
|
2025-01-20 23:55:42 +01:00
|
|
|
|
using A1 = AltModel<R1>;
|
2025-01-21 03:53:29 +01:00
|
|
|
|
CHECK (showType<A1>() == "parse::AltModel<char>"_expect);
|
2025-01-22 01:11:05 +01:00
|
|
|
|
|
2025-01-20 23:55:42 +01:00
|
|
|
|
using A2 = A1::Additionally<R2>;
|
2025-01-21 03:53:29 +01:00
|
|
|
|
CHECK (showType<A2>() == "parse::AltModel<char, string>"_expect);
|
|
|
|
|
|
|
2025-01-22 01:11:05 +01:00
|
|
|
|
// create instance to represent this second branch...
|
|
|
|
|
|
A2 model2 = A2::mark_right ("seduced");
|
2025-01-21 03:53:29 +01:00
|
|
|
|
CHECK (sizeof(A2) >= sizeof(string)+sizeof(size_t));
|
|
|
|
|
|
CHECK (model2.SIZ == sizeof(string));
|
2025-01-22 01:11:05 +01:00
|
|
|
|
CHECK (model2.TOP == 1);
|
2025-01-21 03:53:29 +01:00
|
|
|
|
CHECK (model2.selected() == 1);
|
2025-01-22 01:11:05 +01:00
|
|
|
|
CHECK (model2.get<1>() == "seduced");
|
2025-01-21 03:53:29 +01:00
|
|
|
|
|
2025-01-20 23:55:42 +01:00
|
|
|
|
using A3 = A2::Additionally<R3>;
|
2025-01-22 01:11:05 +01:00
|
|
|
|
A3 model3 = A3::mark_left (move (model2));
|
2025-01-21 03:53:29 +01:00
|
|
|
|
CHECK (showType<A3>() == "parse::AltModel<char, string, double>"_expect);
|
|
|
|
|
|
CHECK (sizeof(A3) == sizeof(A2));
|
2025-01-22 01:11:05 +01:00
|
|
|
|
CHECK (model3.TOP == 2);
|
2025-01-21 03:53:29 +01:00
|
|
|
|
CHECK (model3.selected() == 1);
|
2025-01-22 01:11:05 +01:00
|
|
|
|
CHECK (model3.get<1>() == "seduced");
|
2025-01-21 03:53:29 +01:00
|
|
|
|
|
2025-01-20 23:55:42 +01:00
|
|
|
|
auto res = move(model3);
|
2025-01-21 03:53:29 +01:00
|
|
|
|
CHECK (showType<decltype(res)>() == "parse::AltModel<char, string, double>"_expect);
|
|
|
|
|
|
CHECK (sizeof(res) == sizeof(A2));
|
2025-01-22 01:11:05 +01:00
|
|
|
|
CHECK (res.selected() == 1);
|
|
|
|
|
|
CHECK (res.get<1>() == "seduced");
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
//_____________________________________________
|
|
|
|
|
|
// Demonstration: how branch combinator works....
|
|
|
|
|
|
auto term1 = buildConnex ("brazen");
|
|
|
|
|
|
auto term2 = buildConnex ("bragging");
|
|
|
|
|
|
auto parseAlt = [&](StrView toParse)
|
|
|
|
|
|
{
|
|
|
|
|
|
using R1 = decltype(term1)::Result;
|
|
|
|
|
|
using R2 = decltype(term2)::Result;
|
|
|
|
|
|
using SumResult = AltModel<R1,R2>;
|
|
|
|
|
|
using SumEval = Eval<SumResult>;
|
|
|
|
|
|
auto eval1 = term1.parse (toParse);
|
|
|
|
|
|
if (eval1.result)
|
|
|
|
|
|
{
|
|
|
|
|
|
uint endBranch1 = eval1.consumed;
|
|
|
|
|
|
return SumEval{SumResult::mark_left (move(*eval1.result))
|
|
|
|
|
|
,endBranch1
|
|
|
|
|
|
};
|
|
|
|
|
|
}
|
|
|
|
|
|
auto eval2 = term2.parse (toParse);
|
|
|
|
|
|
if (eval2.result)
|
|
|
|
|
|
{
|
|
|
|
|
|
uint endBranch2 = eval2.consumed;
|
|
|
|
|
|
return SumEval{SumResult::mark_right (move(*eval2.result))
|
|
|
|
|
|
,endBranch2
|
|
|
|
|
|
};
|
|
|
|
|
|
}
|
|
|
|
|
|
return SumEval{std::nullopt};
|
|
|
|
|
|
};
|
|
|
|
|
|
string s1{"decent contender"};
|
|
|
|
|
|
string s2{"brazen dicktator"};
|
|
|
|
|
|
|
|
|
|
|
|
auto e1 = parseAlt(s1);
|
|
|
|
|
|
CHECK (not e1.result); // does not compute....
|
|
|
|
|
|
auto e2 = parseAlt(s2); // one hell of a match!
|
|
|
|
|
|
CHECK ( e2.result);
|
|
|
|
|
|
CHECK (e2.result->selected() == 0); // Selector-ID of the first matching branch (here #0)
|
|
|
|
|
|
CHECK (e2.result->get<0>().str() == "brazen"); // We know that branch#0 holds a RegExp-Matcher (from term1)
|
|
|
|
|
|
CHECK (e2.result->get<0>().suffix() == " dicktator");
|
2025-01-22 02:21:39 +01:00
|
|
|
|
CHECK (e2.consumed == 6);
|
|
|
|
|
|
CHECK (s2.substr(e2.consumed) == " dicktator");
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
//________________________________________________
|
|
|
|
|
|
// DSL parse clause builder: alternative branches...
|
|
|
|
|
|
auto syntax = accept("brazen").alt("bragging");
|
|
|
|
|
|
|
|
|
|
|
|
// Perform the same parse as demonstrated above....
|
|
|
|
|
|
CHECK (not syntax.hasResult());
|
|
|
|
|
|
syntax.parse(s1);
|
|
|
|
|
|
CHECK (not syntax.success());
|
|
|
|
|
|
syntax.parse(s2);
|
|
|
|
|
|
CHECK (syntax);
|
|
|
|
|
|
auto altModel = syntax.getResult();
|
|
|
|
|
|
CHECK (altModel.selected() == 0);
|
|
|
|
|
|
CHECK (altModel.get<0>().str() == "brazen");
|
|
|
|
|
|
|
|
|
|
|
|
// can build extended clause from existing one
|
|
|
|
|
|
auto syntax2 = syntax.alt("smarmy (\\w+)");
|
|
|
|
|
|
CHECK (not syntax2.hasResult());
|
|
|
|
|
|
syntax2.parse(s1);
|
|
|
|
|
|
CHECK (not syntax2.success());
|
|
|
|
|
|
syntax2.parse(s2);
|
|
|
|
|
|
CHECK (syntax2.success());
|
2025-01-22 16:42:28 +01:00
|
|
|
|
CHECK (syntax2.getResult().N == 2); // Note: further branch has been folded into an extended AltModel
|
2025-01-22 02:21:39 +01:00
|
|
|
|
CHECK (syntax2.getResult().selected() == 0); // ... string s2 still matched the same branch (#0)
|
|
|
|
|
|
CHECK (syntax2.getResult().get<0>().str() == "brazen");
|
|
|
|
|
|
|
|
|
|
|
|
syntax2.parse("smarmy saviour");
|
|
|
|
|
|
CHECK (syntax2.success());
|
|
|
|
|
|
auto altModel2 = syntax2.getResult();
|
|
|
|
|
|
CHECK (syntax2.getResult().selected() == 2); // ... but another string can match the added branch #2
|
|
|
|
|
|
CHECK (syntax2.getResult().get<2>().str() == "smarmy saviour");
|
|
|
|
|
|
CHECK (syntax2.getResult().get<2>().str(1) == "saviour");
|
|
|
|
|
|
} // Note: syntax for this branch #2 captured an additional word
|
2025-01-22 16:42:28 +01:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
/** @test define repetitive sequence with delimiter
|
2025-01-22 16:42:28 +01:00
|
|
|
|
* - demonstrate how actually to accept such a flexible sequence
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
* - cover integration into the syntax clause DSL
|
|
|
|
|
|
* - repetition count and delimiter
|
2025-01-22 16:42:28 +01:00
|
|
|
|
*/
|
|
|
|
|
|
void
|
|
|
|
|
|
acceptIterWithDelim()
|
|
|
|
|
|
{ //_______________________________________________
|
|
|
|
|
|
// Demonstration: how repetitive sequence works....
|
|
|
|
|
|
auto sep = buildConnex (",");
|
|
|
|
|
|
auto term = buildConnex ("\\w+");
|
|
|
|
|
|
auto parseSeq = [&](StrView toParse)
|
|
|
|
|
|
{
|
|
|
|
|
|
using Res = decltype(term)::Result;
|
|
|
|
|
|
using IterResult = std::vector<Res>;
|
|
|
|
|
|
using IterEval = Eval<IterResult>;
|
|
|
|
|
|
uint consumed{0};
|
|
|
|
|
|
IterResult results;
|
|
|
|
|
|
auto hasResults = [&]{ return not results.empty(); };
|
|
|
|
|
|
while (true)
|
|
|
|
|
|
{
|
|
|
|
|
|
uint offset{0};
|
|
|
|
|
|
if (hasResults())
|
|
|
|
|
|
{
|
|
|
|
|
|
auto delim = sep.parse (toParse);
|
|
|
|
|
|
if (not delim.result)
|
|
|
|
|
|
break;
|
|
|
|
|
|
offset += delim.consumed;
|
|
|
|
|
|
}
|
|
|
|
|
|
auto eval = term.parse (toParse.substr(offset));
|
|
|
|
|
|
if (not eval.result)
|
|
|
|
|
|
break;
|
|
|
|
|
|
offset += eval.consumed;
|
|
|
|
|
|
results.emplace_back (move(*eval.result));
|
|
|
|
|
|
toParse = toParse.substr(offset);
|
|
|
|
|
|
consumed += offset;
|
|
|
|
|
|
}
|
|
|
|
|
|
return hasResults()? IterEval{move(results), consumed}
|
|
|
|
|
|
: IterEval{std::nullopt};
|
|
|
|
|
|
};
|
2025-01-22 22:31:25 +01:00
|
|
|
|
string s1{"seid umschlungen, Millionen"};
|
2025-01-22 16:42:28 +01:00
|
|
|
|
string s2{"beguile, extort, profit"};
|
|
|
|
|
|
|
|
|
|
|
|
auto e1 = parseSeq(s1);
|
|
|
|
|
|
CHECK (e1.result);
|
|
|
|
|
|
CHECK (e1.result->size() == 1);
|
2025-01-22 22:31:25 +01:00
|
|
|
|
CHECK (e1.result->at(0).str() == "seid");
|
2025-01-22 16:42:28 +01:00
|
|
|
|
CHECK (e1.result->at(0).suffix() == " umschlungen, Millionen");
|
|
|
|
|
|
CHECK (e1.consumed == 4);
|
|
|
|
|
|
|
|
|
|
|
|
auto e2 = parseSeq(s2);
|
|
|
|
|
|
CHECK (e2.result);
|
|
|
|
|
|
CHECK (e2.result->size() == 3);
|
|
|
|
|
|
CHECK (e2.result->at(0).str() == "beguile");
|
|
|
|
|
|
CHECK (e2.result->at(1).str() == "extort" );
|
|
|
|
|
|
CHECK (e2.result->at(2).str() == "profit" );
|
|
|
|
|
|
CHECK (e2.result->at(0).suffix() == ", extort, profit");
|
|
|
|
|
|
CHECK (e2.result->at(1).suffix() == ", profit");
|
|
|
|
|
|
CHECK (e2.result->at(2).suffix() == "" );
|
|
|
|
|
|
CHECK (e2.consumed == s2.length());
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
//______________________________________________
|
|
|
|
|
|
// DSL parse clause builder: iterative sequence...
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
auto syntax1 = accept_repeated(",", term);
|
2025-01-22 22:31:25 +01:00
|
|
|
|
|
|
|
|
|
|
// Perform the same parse as demonstrated above....
|
|
|
|
|
|
CHECK (not syntax1.hasResult());
|
|
|
|
|
|
syntax1.parse(s1);
|
|
|
|
|
|
CHECK (syntax1.success());
|
|
|
|
|
|
auto res1 = syntax1.getResult();
|
|
|
|
|
|
CHECK (res1.size() == 1);
|
|
|
|
|
|
CHECK (res1.get(0).str() == "seid");
|
|
|
|
|
|
|
|
|
|
|
|
syntax1.parse(s2);
|
|
|
|
|
|
CHECK (syntax1.success());
|
|
|
|
|
|
res1 = syntax1.getResult();
|
|
|
|
|
|
CHECK (res1.size() == 3);
|
|
|
|
|
|
CHECK (res1[0].str() == "beguile");
|
|
|
|
|
|
CHECK (res1[1].str() == "extort" );
|
|
|
|
|
|
CHECK (res1[2].str() == "profit" );
|
|
|
|
|
|
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
auto syntax2 = accept_repeated(1,2,",", term);
|
|
|
|
|
|
auto syntax3 = accept_repeated( 4,",", term);
|
2025-01-22 22:31:25 +01:00
|
|
|
|
syntax2.parse(s2);
|
|
|
|
|
|
syntax3.parse(s2);
|
|
|
|
|
|
CHECK ( syntax2);
|
|
|
|
|
|
CHECK (not syntax3);
|
|
|
|
|
|
CHECK (syntax2.getResult().size() == 2);
|
|
|
|
|
|
CHECK (s2.substr(syntax2.consumed()) == ", profit");
|
|
|
|
|
|
|
|
|
|
|
|
auto sx = s2 + " , \tdump";
|
|
|
|
|
|
syntax3.parse(sx);
|
|
|
|
|
|
CHECK (syntax3);
|
|
|
|
|
|
CHECK (syntax3.getResult().size() == 4);
|
|
|
|
|
|
CHECK (syntax3.getResult()[0].str() == "beguile");
|
|
|
|
|
|
CHECK (syntax3.getResult()[1].str() == "extort" );
|
|
|
|
|
|
CHECK (syntax3.getResult()[2].str() == "profit" );
|
|
|
|
|
|
CHECK (syntax3.getResult()[3].str() == "dump" );
|
|
|
|
|
|
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
auto syntax4 = accept_repeated(term);
|
2025-01-22 22:31:25 +01:00
|
|
|
|
syntax4.parse(s1);
|
|
|
|
|
|
CHECK (syntax4.success());
|
|
|
|
|
|
CHECK (syntax4.getResult().size() == 2);
|
|
|
|
|
|
CHECK (syntax4.getResult()[0].str() == "seid");
|
|
|
|
|
|
CHECK (syntax4.getResult()[1].str() == "umschlungen" );
|
|
|
|
|
|
CHECK (s1.substr(syntax4.consumed()) == ", Millionen");
|
2025-01-22 16:42:28 +01:00
|
|
|
|
}
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/** @test define compound syntax with optional sub-clause
|
|
|
|
|
|
* - use the DSL to construct a complex syntax
|
|
|
|
|
|
* - by default, several parts are implicitly sequenced
|
|
|
|
|
|
* - here we combine repeated parts with an optional clause
|
|
|
|
|
|
* - which in turn is again a compound syntax clause
|
|
|
|
|
|
* - the produced model reflects the structure of this syntax
|
|
|
|
|
|
* - result model of the optional clause is wrapped into `std::optional`
|
|
|
|
|
|
* - terminal elements produce a `std::smatch` (RegExp matcher object)
|
|
|
|
|
|
*/
|
|
|
|
|
|
void
|
|
|
|
|
|
acceptOptionally()
|
|
|
|
|
|
{
|
|
|
|
|
|
auto syntax = accept_repeated(",", "\\w+") // first we look for comma separated words
|
|
|
|
|
|
.opt(accept("and") // then (implicitly sequenced) an optional clause
|
|
|
|
|
|
.repeat("\\w+")); // ... comprising "and" followed by several words
|
|
|
|
|
|
using Model = decay_t<decltype(syntax.getResult())>;
|
|
|
|
|
|
|
|
|
|
|
|
string s1{"fearmongering, scapegoating, intimidation"};
|
|
|
|
|
|
string s2{"charisma and divine blessing"};
|
|
|
|
|
|
|
|
|
|
|
|
CHECK (not syntax.hasResult());
|
|
|
|
|
|
syntax.parse(s1);
|
|
|
|
|
|
CHECK (syntax.success());
|
|
|
|
|
|
|
|
|
|
|
|
Model res1 = syntax.getResult();
|
|
|
|
|
|
CHECK (typeSymbol(res1) == "SeqModel");
|
|
|
|
|
|
CHECK (typeSymbol(res1.get<0>()) == "IterModel");
|
|
|
|
|
|
CHECK (typeSymbol(res1.get<1>()) == "optional");
|
|
|
|
|
|
|
|
|
|
|
|
CHECK (res1.N == 2); // 2-component tuple at top
|
|
|
|
|
|
CHECK (res1.get<0>().size() == 3); // sequence in 1st component matched 3 elements
|
|
|
|
|
|
CHECK (res1.get<0>()[0].str() == "fearmongering"); // elements in the sequence...
|
|
|
|
|
|
CHECK (res1.get<0>()[1].str() == "scapegoating");
|
|
|
|
|
|
CHECK (res1.get<0>()[2].str() == "intimidation");
|
|
|
|
|
|
CHECK (res1.get<1>() == std::nullopt); // the optional clause did not match
|
|
|
|
|
|
|
|
|
|
|
|
syntax.parse(s2);
|
|
|
|
|
|
CHECK (syntax.success());
|
|
|
|
|
|
|
|
|
|
|
|
Model res2 = syntax.getResult();
|
|
|
|
|
|
CHECK (typeSymbol(res2) == "SeqModel"); // Syntax SeqModel
|
|
|
|
|
|
CHECK (typeSymbol(res2.get<0>()) == "IterModel"); // repeat(word) opt IterModel optional
|
|
|
|
|
|
CHECK (typeSymbol(res2.get<1>()) == "optional"); // | |
|
|
|
|
|
|
CHECK (typeSymbol(*res2.get<1>()) == "SeqModel"); // Syntax SeqModel
|
|
|
|
|
|
CHECK (typeSymbol(res2.get<1>()->get<0>()) == "match_results"); // "and" repeat(word) Terminal IterModel
|
|
|
|
|
|
CHECK (typeSymbol(res2.get<1>()->get<1>()) == "IterModel"); //
|
|
|
|
|
|
|
|
|
|
|
|
CHECK (res2.get<0>().size() == 1);
|
|
|
|
|
|
CHECK (res2.get<0>()[0].str() == "charisma");
|
|
|
|
|
|
CHECK (res2.get<1>() != std::nullopt);
|
|
|
|
|
|
CHECK (res2.get<1>()->N == 2);
|
|
|
|
|
|
CHECK (res2.get<1>()->get<0>().str() == "and");
|
|
|
|
|
|
CHECK (res2.get<1>()->get<1>().size() == 2 );
|
|
|
|
|
|
CHECK (res2.get<1>()->get<1>()[0].str() == "divine" );
|
|
|
|
|
|
CHECK (res2.get<1>()->get<1>()[1].str() == "blessing" );
|
|
|
|
|
|
|
|
|
|
|
|
string s3{s1+" , "+s2};
|
|
|
|
|
|
syntax.parse(s3);
|
|
|
|
|
|
CHECK (syntax.success());
|
|
|
|
|
|
|
|
|
|
|
|
Model res3 = syntax.getResult();
|
|
|
|
|
|
CHECK (typeSymbol(res3) == "SeqModel");
|
|
|
|
|
|
CHECK (res3.get<0>().size() == 4);
|
|
|
|
|
|
CHECK (res3.get<0>()[0].str() == "fearmongering");
|
|
|
|
|
|
CHECK (res3.get<0>()[1].str() == "scapegoating");
|
|
|
|
|
|
CHECK (res3.get<0>()[2].str() == "intimidation");
|
|
|
|
|
|
CHECK (res3.get<0>()[3].str() == "charisma");
|
|
|
|
|
|
CHECK (res3.get<1>() != std::nullopt);
|
|
|
|
|
|
CHECK (res3.get<1>()->N == 2);
|
|
|
|
|
|
CHECK (res3.get<1>()->get<0>().str() == "and");
|
|
|
|
|
|
CHECK (res3.get<1>()->get<1>().size() == 2);
|
|
|
|
|
|
CHECK (res3.get<1>()->get<1>()[0].str() == "divine");
|
|
|
|
|
|
CHECK (res3.get<1>()->get<1>()[1].str() == "blessing");
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-01-24 01:41:55 +01:00
|
|
|
|
/** @test define syntax with bracketed sub-expressions */
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
void
|
|
|
|
|
|
acceptBracketed()
|
|
|
|
|
|
{
|
2025-01-24 01:41:55 +01:00
|
|
|
|
string word{"\\w+"};
|
|
|
|
|
|
|
|
|
|
|
|
CHECK (not accept(word).bracket(word) .parse("so sad"));
|
|
|
|
|
|
CHECK ( accept(word).bracketOpt(word).parse("so sad"));
|
|
|
|
|
|
CHECK ( accept(word).bracketOpt(word).parse("so (sad)"));
|
|
|
|
|
|
|
|
|
|
|
|
CHECK (accept_bracket(word).parse(" ( again ) ").getResult().str() == "again");
|
|
|
|
|
|
|
|
|
|
|
|
CHECK (not accept_bracket(word) .parse("(again"));
|
|
|
|
|
|
CHECK (not accept_bracketOpt(word).parse("(again"));
|
|
|
|
|
|
CHECK ( accept_bracketOpt(word).parse("again)")); // just stops before the trailing ')'
|
|
|
|
|
|
CHECK ( accept_bracketOpt(word).parse("again)").consumed() == 5);
|
|
|
|
|
|
CHECK ( accept_bracketOpt(word).parse(" again")); // backtracks also over the whitespace
|
|
|
|
|
|
|
|
|
|
|
|
CHECK (not accept_bracket("[]",word).parse("(again)"));
|
|
|
|
|
|
CHECK (not accept_bracket("[]",word).parse("[again)"));
|
|
|
|
|
|
CHECK (not accept_bracket("[]",word).parse("(again]"));
|
|
|
|
|
|
CHECK ( accept_bracket("[]",word).parse("[again]"));
|
|
|
|
|
|
CHECK ( accept_bracket("a","n","...").parse("again")); // arbitrary expressions for open / close
|
|
|
|
|
|
CHECK (not accept_bracket("a","n","...").parse(" gain")); // opening expression "a" missing
|
|
|
|
|
|
CHECK (not accept_bracket("a","n", word).parse("again")); // "\\w+" consumes eagerly => closing expression not found
|
Library: change DSL scheme to handle optional and repeated better
It seemed that using postfix-decorating operators would be a good fit
for the DSL. Exploring this idea further showed however, that such a scheme
is indeed a good fit from the implementation side, but leads to rather confusing
and hard to grasp DSL statements for many non-trivial syntax definition.
The reason is: such a postfix-decorator will by default work on ''everything defined''
up to that point; this is too much in many cases.
The other alternative would be a function-style definition, which has the benefit
to take the sub-clause directly as argument (so the scope is always explicit).
The downside is that argument arrangement is a bit more tricky for the repetition
combinator (there can be mis-matches, since we take the »SPEC« as free-template argument)
And, moreover, with function-style, having more top-level entrance points would
be helpful. Overall, no fundamental roadblock, just more technicalities in the setup
of the DSL functions.
With that re-arrangd structure, an optional combinator could be easily integrated,
and a solution was provided to pick up the parser function from a sub-expression
defined as Syntax object.
2025-01-23 19:48:30 +01:00
|
|
|
|
}
|
Library: need support for specification parsing
Unfortunately, there are some common syntactic structures, which can not easily be dissected by regular expressions alone, since they entail nested subexpressions. While it is possible to get beyond those fundamental limitations with some trickery, doing so remains precisely that, ''trickery.''
After fighting some inner conflicts, since ''I do know how to write a parser'' —
in the end I have brought myself to just do it.
And indeed, as you'd might expect, I have looked into existing library solutions,
and I would not like to have any one of them as part of the project.
* I do not want a ''parser engine'' or ''parser generator''
* I want the directness of recursive-descent, but combined with Regular Expressions as terminal
* I want to see the structure of the used grammar at the definition site of the custom parser function
* I want deep integration of ''model bindings'' into the parse process, i.e. binding-λ
* I do not want to write model-dissecting or pattern-matching code after the parse
* I do not want to expose ''Monads'' as an interface, since they tend to spread unhealthy structure to surrounding code
* I do not want to leak technicalities of the parse mechanics into the using code
* I do not want to impose hard to remember specific conventions onto the user
Thus I've set the following aims:
* The usage should require only a single header include (ideally header-only)
* The entrance point should be a small number of DSL-starter functions
* The parser shall be implemented by recursive-descent, using the parser-combinator technique
* But I want that wrapped into a DSL, to be able to control what is (not) provided or exposed.
* I want a stateful, applicative logic, since parsing, by its very nature, is stateful!
* I want complete compile-time typing, visible to the optimiser, without a virtual »Parser« interface
And last but not least, ''I do not want to create a ticket, since I do not know if those goals can be achieved...''
2025-01-17 18:40:44 +01:00
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
LAUNCHER (Parse_test, "unit common");
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
}}} // namespace util::parse::test
|
|
|
|
|
|
|