123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211 |
- [/==============================================================================
- Copyright (C) 2015 Joel de Guzman
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section:rexpr RExpressions - Recursive ASTs!]
- In this example, we'll explore more on how to create heierarchical ASTs.
- We will parse a minimalistic JSON-like language and compile the results
- into our data structures in the form of a tree.
- /rexpr/ is a parser for RExpressions, a language resembling a minimal subset
- of json, limited to a dictionary (composed of key=value pairs) where the value
- can itself be a string or a recursive dictionary.
- Here's an Example:
- {
- "color" = "blue"
- "size" = "29 cm."
- "position" = {
- "x" = "123"
- "y" = "456"
- }
- }
- This simple parser for X3 is intended as a minimal starting point. It is
- minimal, yet complete with all things necessary parts that make up rules (and
- grammars), except error handling and reporting.
- The example can be found here:
- [@../../../example/x3/rexpr/rexpr_min/rexpr.cpp rexpr.cpp]
- The same parser, but complete with error handling and reporting, better file
- organization, separate compilation of grammars, and a full test harness, can
- be found in this directory:
- [@../../../example/x3/rexpr/rexpr_full/ rexpr/rexpr_full/]
- [note rexpr_full is the canonical structure proposed as best practice on how
- parsers are written using Spirit X3.]
- [heading The AST]
- Here's the AST:
- namespace client { namespace ast
- {
- namespace x3 = boost::spirit::x3;
- struct rexpr;
- struct rexpr_value : x3::variant<
- std::string
- , x3::forward_ast<rexpr>
- >
- {
- using base_type::base_type;
- using base_type::operator=;
- };
- typedef std::map<std::string, rexpr_value> rexpr_map;
- typedef std::pair<std::string, rexpr_value> rexpr_key_value;
- struct rexpr
- {
- rexpr_map entries;
- };
- }}
- `x3::variant` is a support utility in Spirit X3 that extends __boost_variant__.
- Typically, you use __boost_variant__ right out of the box and refer to a
- particular template instantiation using a typedef. For example:
- typedef boost::variant<std::string, int> my_variant;
- Instead of doing that, we create a `class` (or `struct` in the case) that
- subclasses from `x3::variant`. By making the variant a subclass, you have a
- distinct type in your namespace. You also have control of the constructors
- and assignment operators, as well as giving you the freedom to add more
- functionality.
- `rexpr_value` has two variant elements. It can either be a `std::string` or a
- `rexpr`. Since `rexpr` recursively contains a `rexpr_value`, it has to be
- forward declared. This recursive data structure requires `rexpr` to be wrapped
- in a `x3::forward_ast` as shown in the declaration.
- We need to tell fusion about our `rexpr` struct to make it a first-class fusion
- citizen:
- BOOST_FUSION_ADAPT_STRUCT(
- client::ast::rexpr,
- entries
- )
- So essentially, a `rexpr_value` value is either a `std::string` or a
- `std::map<std::string, rexpr_value>`.
- [heading Walking the AST]
- We add a utility to print out the AST:
- int const tabsize = 4;
- struct rexpr_printer
- {
- typedef void result_type;
- rexpr_printer(int indent = 0)
- : indent(indent) {}
- void operator()(rexpr const& ast) const
- {
- std::cout << '{' << std::endl;
- for (auto const& entry : ast.entries)
- {
- tab(indent+tabsize);
- std::cout << '"' << entry.first << "\" = ";
- boost::apply_visitor(rexpr_printer(indent+tabsize), entry.second);
- }
- tab(indent);
- std::cout << '}' << std::endl;
- }
- void operator()(std::string const& text) const
- {
- std::cout << '"' << text << '"' << std::endl;
- }
- void tab(int spaces) const
- {
- for (int i = 0; i < spaces; ++i)
- std::cout << ' ';
- }
- int indent;
- };
- Traversing the AST is a recursive excercise. `rexpr_printer` is a function
- object and the main entry point is `void operator()(rexpr const& ast)`. Notice
- how it recursively calls an instance of itself for each of the key value pairs,
- using `boost::apply_visitor`. Before and after iterating through all the
- elements in the map, the braces are printed to surround the entire body, taking
- care of correct indentation at each point in the recursive traversal.
- The `operator()` for `std::string` should be self explanatory. It simply prints
- the text inside double quotes.
- [heading The Grammar]
- namespace client { namespace parser
- {
- namespace x3 = boost::spirit::x3;
- namespace ascii = boost::spirit::x3::ascii;
- using x3::lit;
- using x3::lexeme;
- using ascii::char_;
- using ascii::string;
- x3::rule<class rexpr_value, ast::rexpr_value>
- rexpr_value = "rexpr_value";
- x3::rule<class rexpr, ast::rexpr>
- rexpr = "rexpr";
- x3::rule<class rexpr_key_value, ast::rexpr_key_value>
- rexpr_key_value = "rexpr_key_value";
- auto const quoted_string =
- lexeme['"' >> *(char_ - '"') >> '"'];
- auto const rexpr_value_def =
- quoted_string | rexpr;
- auto const rexpr_key_value_def =
- quoted_string >> '=' >> rexpr_value;
- auto const rexpr_def =
- '{' >> *rexpr_key_value >> '}';
- BOOST_SPIRIT_DEFINE(rexpr_value, rexpr, rexpr_key_value);
- }}
- We declare three rules `rexpr_value`, `rexpr` and `rexpr_key_value`. Same deal
- as before. The attributes declared for each rule are the same structures we
- declared in our AST above. These are the structures our rules will produce.
- So, going top-down (from the start rule), `rexpr` is defined as zero or more
- `rexpr_key_value`s enclosed inside the curly braces. That's the kleene star in
- action there: `*rexpr_key_value`.
- [note Take note of the convention: we use `rexpr` name as the rule name and
- `rexpr_def` as the rule definition name.]
- To make the grammar clean, we capture the `quoted_string` production using
- C++ `auto`.
- `rexpr_value` is a `quoted_string` followed by the assignment operator `'='`,
- followed by a `rexpr_value`. `rexpr_value` is a `quoted_string` /OR/ a `rexpr`.
- This uses the alternative operator `|`, and maps nicely to the variant AST the
- rule is associated with.
- Finally, we associate the rules and their definitions:
- BOOST_SPIRIT_DEFINE(rexpr_value, rexpr, rexpr_key_value);
- [endsect]
|