123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412 |
- [/==============================================================================
- Copyright (C) 2001-2018 Joel de Guzman
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- I would like to thank Rainbowverse, llc (https://primeorbial.com/)
- for sponsoring this work and donating it to the community.
- ===============================================================================/]
- [section:annotation Annotations - Decorating the ASTs]
- As a prerequisite in understanding this tutorial, please review the previous
- [tutorial_employee employee example]. This example builds on top of that
- example.
- Stop and think about it... We're actually generating ASTs (abstract syntax
- trees) in our previoius examples. We parsed a single structure and generated
- an in-memory representation of it in the form of a struct: the struct
- employee. If we changed the implementation to parse one or more employees,
- the result would be a std::vector<employee>. We can go on and add more
- hierarchy: teams, departments, corporations, etc. We can have an AST
- representation of it all.
- This example shows how to annotate the AST with the iterator positions for
- access to the source code when post processing using a client supplied
- `on_success` handler. The example will show how to get the position in input
- source stream that corresponds to a given element in the AST.
- In addition, This example also shows how to "inject" client data, using the
- "with" directive, that the `on_success` handler can access as it is called
- within the parse traversal through the parser's context.
- The full cpp file for this example can be found here:
- [@../../../example/x3/annotation.cpp annotation.cpp]
- [heading The AST]
- First, we'll update our previous employee struct, this time separating the
- person into its own struct. So now, we have two structs, the `person` and the
- `employee`. Take note too that we now inherit `person` and `employee` from
- `x3::position_tagged` which provides positional information that we can use
- to tell the AST's position in the input stream anytime.
- namespace client { namespace ast
- {
- struct person : x3::position_tagged
- {
- person(
- std::string const& first_name = ""
- , std::string const& last_name = ""
- )
- : first_name(first_name)
- , last_name(last_name)
- {}
- std::string first_name, last_name;
- };
- struct employee : x3::position_tagged
- {
- int age;
- person who;
- double salary;
- };
- }}
- Like before, we need to tell __fusion__ about our structs to make them
- first-class fusion citizens that the grammar can utilize:
- BOOST_FUSION_ADAPT_STRUCT(client::ast::person,
- first_name, last_name
- )
- BOOST_FUSION_ADAPT_STRUCT(client::ast::employee,
- age, who, salary
- )
- [heading x3::position_cache]
- Before we proceed, let me introduce a helper class called the
- `position_cache`. It is a simple class that collects iterator ranges that
- point to where each element in the AST are located in the input stream. Given
- an AST, you can query the position_cache about AST's position. For example:
- auto pos = positions.position_of(my_ast);
- Where `my_ast` is the AST, `positions` and is the `position_cache`,
- `position_of` returns an iterator range that points to the start and end
- (`pos.begin()` and `pos.end()`) positions where the AST was parsed from.
- `positions.begin()` and `positions.end()` points to the start and end of the
- entire input stream.
- [heading on_success]
- The `on_success` gives you everything you want from semantic actions without
- the visual clutter. Declarative code can and should be free from imperative
- code. `on_success` as a concept and mechanism is an important departure from
- how things are done in Spirit's previous version: Qi.
- As demonstrated in the previous [tutorial_employee employee example], the
- preferred way to extract data from an input source is by having the parser
- collect the data for us into C++ structs as it traverses the input stream.
- Ideally, Spirit X3 grammars are fully attributed and declared in such a way
- that you do not have to add any imperative code and there should be no need
- for semantic actions at all. The parser simply works as declared and you get
- your data back as a result.
- However, there are certain cases where there's no way to avoid introducing
- imperative code. But semantic actions mess up our clean declarative grammars.
- If we care to keep our code clean, `on_success` handlers are alternative
- callback hooks to client code that are executed by the parser after a
- successful parse without polluting the grammar. Like semantic actions,
- `on_success` handlers have access to the AST, the iterators, and context.
- But, unlike semantic actions, `on_success` handlers are cleanly separated
- from the actual grammar.
- [heading Annotation Handler]
- As discussed, we annotate the AST with its position in the input stream with
- our `on_success` handler:
- // tag used to get the position cache from the context
- struct position_cache_tag;
- struct annotate_position
- {
- template <typename T, typename Iterator, typename Context>
- inline void on_success(Iterator const& first, Iterator const& last
- , T& ast, Context const& context)
- {
- auto& position_cache = x3::get<position_cache_tag>(context).get();
- position_cache.annotate(ast, first, last);
- }
- };
- `position_cache_tag` is a special tag we will use to get a reference to the
- actual `position_cache`, client data that we will inject at very start, when
- we call parse. More on that later.
- Our `on_success` handler gets a reference to the actual `position_cache` and
- calls its `annotate` member function, passing in the AST and the iterators.
- `position_cache.annotate(ast, first, last)` annotates the AST with
- information required by `x3::position_tagged`.
- [heading The Parser]
- Now we'll write a parser for our employee. To simplify, inputs will be of the
- form:
- { age, "forename", "surname", salary }
- [#__tutorial_annotated_employee_parser__]
- Here we go:
- namespace parser
- {
- using x3::int_;
- using x3::double_;
- using x3::lexeme;
- using ascii::char_;
- struct quoted_string_class;
- struct person_class;
- struct employee_class;
- x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
- x3::rule<person_class, ast::person> const person = "person";
- x3::rule<employee_class, ast::employee> const employee = "employee";
- auto const quoted_string_def = lexeme['"' >> +(char_ - '"') >> '"'];
- auto const person_def = quoted_string >> ',' >> quoted_string;
- auto const employee_def =
- '{'
- >> int_ >> ','
- >> person >> ','
- >> double_
- >> '}'
- ;
- auto const employees = employee >> *(',' >> employee);
- BOOST_SPIRIT_DEFINE(quoted_string, person, employee);
- }
- [heading Rule Declarations]
- struct quoted_string_class;
- struct person_class;
- struct employee_class;
- x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
- x3::rule<person_class, ast::person> const person = "person";
- x3::rule<employee_class, ast::employee> const employee = "employee";
- Go back and review the original [link __tutorial_employee_parser__ employee parser].
- What has changed?
- * We split the single employee rule into three smaller rules: `quoted_string`,
- `person` and `employee`.
- * We're using forward declared rule classes: `quoted_string_class`, `person_class`,
- and `employee_class`.
- [heading Rule Classes]
- Like before, in this example, the rule classes, `quoted_string_class`,
- `person_class`, and `employee_class` provide statically known IDs for the
- rules required by X3 to perform its tasks. In addition to that, the rule
- class can also be extended to have some user-defined customization hooks that
- are called:
- * On success: After a rule sucessfully parses an input.
- * On Error: After a rule fails to parse.
- By subclassing the rule class from a client supplied handler such as our
- `annotate_position` handler above:
- struct person_class : annotate_position {};
- struct employee_class : annotate_position {};
- The code above tells X3 to check the rule class if it has an `on_success` or
- `on_error` member functions and appropriately calls them on such events.
- [#__tutorial_with_directive__]
- [heading The with Directive]
- For any parser `p`, one can inject supplementary data that semantic actions
- and handlers can access later on when they are called. The general syntax is:
- with<tag>(data)[p]
- For our particular example, we use to inject the `position_cache` into the
- parse for our `annotate_position` on_success handler to have access to:
- auto const parser =
- // we pass our position_cache to the parser so we can access
- // it later in our on_sucess handlers
- with<position_cache_tag>(std::ref(positions))
- [
- employees
- ];
- Typically this is done just before calling `x3::parse` or `x3::phrase_parse`.
- `with` is a very lightwight operation. It is possible to inject as much data
- as you want, even multiple `with` directives:
- with<tag1>(data1)
- [
- with<tag2>(data2)[p]
- ]
- Multiple `with` directives can (perhaps not obviously) be injected from
- outside the called function. Here's an outline:
- template <typename Parser>
- void bar(Parser const& p)
- {
- // Inject data2
- auto const parser = with<tag2>(data2)[p];
- x3::parse(first, last, parser);
- }
- void foo()
- {
- // Inject data1
- auto const parser = with<tag1>(data1)[my_parser];
- bar(p);
- }
- [heading Let's Parse]
- Now we have the complete parse mechanism with support for annotations:
- using iterator_type = std::string::const_iterator;
- using position_cache = boost::spirit::x3::position_cache<std::vector<iterator_type>>;
- std::vector<client::ast::employee>
- parse(std::string const& input, position_cache& positions)
- {
- using boost::spirit::x3::ascii::space;
- std::vector<client::ast::employee> ast;
- iterator_type iter = input.begin();
- iterator_type const end = input.end();
- using boost::spirit::x3::with;
- // Our parser
- using client::parser::employees;
- using client::parser::position_cache_tag;
- auto const parser =
- // we pass our position_cache to the parser so we can access
- // it later in our on_sucess handlers
- with<position_cache_tag>(std::ref(positions))
- [
- employees
- ];
- bool r = phrase_parse(iter, end, parser, space, ast);
- // ... Some error checking here
- return ast;
- }
- Let's walk through the code.
- First, we have some typedefs for 1) The iterator type we are using for the
- parser, `iterator_type` and 2) For the `position_cache` type. The latter is a
- template that accepts the type of container it will hold. In this case, a
- `std::vector<iterator_type>`.
- The main parse function accepts an input, a std::string and a reference to a
- position_cache, and retuns an AST: `std::vector<client::ast::employee>`.
- Inside the parse function, we first create an AST where parsed data will be
- stored:
- std::vector<client::ast::employee> ast;
- Then finally, we create a parser, injecting a reference to the `position_cache`,
- and call phrase_parse:
- using client::parser::employees;
- using client::parser::position_cache_tag;
- auto const parser =
- // we pass our position_cache to the parser so we can access
- // it later in our on_sucess handlers
- with<position_cache_tag>(std::ref(positions))
- [
- employees
- ];
- bool r = phrase_parse(iter, end, parser, space, ast);
- On successful parse, the AST, `ast`, will contain the actual parsed data.
- [heading Getting The Source Positions]
- Now that we have our main parse function, let's have an example sourcefile to
- parse and show how we can obtain the position of an AST element, returned
- after a successful parse.
- Given this input:
- std::string input = R"(
- {
- 23,
- "Amanda",
- "Stefanski",
- 1000.99
- },
- {
- 35,
- "Angie",
- "Chilcote",
- 2000.99
- },
- {
- 43,
- "Dannie",
- "Dillinger",
- 3000.99
- },
- {
- 22,
- "Dorene",
- "Dole",
- 2500.99
- },
- {
- 38,
- "Rossana",
- "Rafferty",
- 5000.99
- }
- )";
- We call our parse function after instantiating a `position_cache` object that
- will hold the source stream positions:
- position_cache positions{input.begin(), input.end()};
- auto ast = parse(input, positions);
- We now have an AST, `ast`, that contains the parsed results. Let us get the
- source positions of the 2nd employee:
- auto pos = positions.position_of(ast[1]); // zero based of course!
- `pos` is an iterator range that contians iterators to the start and end of
- `ast[1]` in the input stream.
- [heading Config]
- If you read the previous [tutorial_minimal Program Structure] tutorial where
- we separated various logical modules of the parser into separate cpp and
- header files, and you are wondering how to provide the context configuration
- information (see [link tutorial_configuration Config Section]), we need to
- supplement the context like this:
- using phrase_context_type = x3::phrase_parse_context<x3::ascii::space_type>::type;
- typedef x3::context<
- error_handler_tag
- , std::reference_wrapper<position_cache>
- , phrase_context_type>
- context_type;
- [endsect]
|