rexpr.qbk 6.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211
  1. [/==============================================================================
  2. Copyright (C) 2015 Joel de Guzman
  3. Distributed under the Boost Software License, Version 1.0. (See accompanying
  4. file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  5. ===============================================================================/]
  6. [section:rexpr RExpressions - Recursive ASTs!]
  7. In this example, we'll explore more on how to create heierarchical ASTs.
  8. We will parse a minimalistic JSON-like language and compile the results
  9. into our data structures in the form of a tree.
  10. /rexpr/ is a parser for RExpressions, a language resembling a minimal subset
  11. of json, limited to a dictionary (composed of key=value pairs) where the value
  12. can itself be a string or a recursive dictionary.
  13. Here's an Example:
  14. {
  15. "color" = "blue"
  16. "size" = "29 cm."
  17. "position" = {
  18. "x" = "123"
  19. "y" = "456"
  20. }
  21. }
  22. This simple parser for X3 is intended as a minimal starting point. It is
  23. minimal, yet complete with all things necessary parts that make up rules (and
  24. grammars), except error handling and reporting.
  25. The example can be found here:
  26. [@../../../example/x3/rexpr/rexpr_min/rexpr.cpp rexpr.cpp]
  27. The same parser, but complete with error handling and reporting, better file
  28. organization, separate compilation of grammars, and a full test harness, can
  29. be found in this directory:
  30. [@../../../example/x3/rexpr/rexpr_full/ rexpr/rexpr_full/]
  31. [note rexpr_full is the canonical structure proposed as best practice on how
  32. parsers are written using Spirit X3.]
  33. [heading The AST]
  34. Here's the AST:
  35. namespace client { namespace ast
  36. {
  37. namespace x3 = boost::spirit::x3;
  38. struct rexpr;
  39. struct rexpr_value : x3::variant<
  40. std::string
  41. , x3::forward_ast<rexpr>
  42. >
  43. {
  44. using base_type::base_type;
  45. using base_type::operator=;
  46. };
  47. typedef std::map<std::string, rexpr_value> rexpr_map;
  48. typedef std::pair<std::string, rexpr_value> rexpr_key_value;
  49. struct rexpr
  50. {
  51. rexpr_map entries;
  52. };
  53. }}
  54. `x3::variant` is a support utility in Spirit X3 that extends __boost_variant__.
  55. Typically, you use __boost_variant__ right out of the box and refer to a
  56. particular template instantiation using a typedef. For example:
  57. typedef boost::variant<std::string, int> my_variant;
  58. Instead of doing that, we create a `class` (or `struct` in the case) that
  59. subclasses from `x3::variant`. By making the variant a subclass, you have a
  60. distinct type in your namespace. You also have control of the constructors
  61. and assignment operators, as well as giving you the freedom to add more
  62. functionality.
  63. `rexpr_value` has two variant elements. It can either be a `std::string` or a
  64. `rexpr`. Since `rexpr` recursively contains a `rexpr_value`, it has to be
  65. forward declared. This recursive data structure requires `rexpr` to be wrapped
  66. in a `x3::forward_ast` as shown in the declaration.
  67. We need to tell fusion about our `rexpr` struct to make it a first-class fusion
  68. citizen:
  69. BOOST_FUSION_ADAPT_STRUCT(
  70. client::ast::rexpr,
  71. entries
  72. )
  73. So essentially, a `rexpr_value` value is either a `std::string` or a
  74. `std::map<std::string, rexpr_value>`.
  75. [heading Walking the AST]
  76. We add a utility to print out the AST:
  77. int const tabsize = 4;
  78. struct rexpr_printer
  79. {
  80. typedef void result_type;
  81. rexpr_printer(int indent = 0)
  82. : indent(indent) {}
  83. void operator()(rexpr const& ast) const
  84. {
  85. std::cout << '{' << std::endl;
  86. for (auto const& entry : ast.entries)
  87. {
  88. tab(indent+tabsize);
  89. std::cout << '"' << entry.first << "\" = ";
  90. boost::apply_visitor(rexpr_printer(indent+tabsize), entry.second);
  91. }
  92. tab(indent);
  93. std::cout << '}' << std::endl;
  94. }
  95. void operator()(std::string const& text) const
  96. {
  97. std::cout << '"' << text << '"' << std::endl;
  98. }
  99. void tab(int spaces) const
  100. {
  101. for (int i = 0; i < spaces; ++i)
  102. std::cout << ' ';
  103. }
  104. int indent;
  105. };
  106. Traversing the AST is a recursive excercise. `rexpr_printer` is a function
  107. object and the main entry point is `void operator()(rexpr const& ast)`. Notice
  108. how it recursively calls an instance of itself for each of the key value pairs,
  109. using `boost::apply_visitor`. Before and after iterating through all the
  110. elements in the map, the braces are printed to surround the entire body, taking
  111. care of correct indentation at each point in the recursive traversal.
  112. The `operator()` for `std::string` should be self explanatory. It simply prints
  113. the text inside double quotes.
  114. [heading The Grammar]
  115. namespace client { namespace parser
  116. {
  117. namespace x3 = boost::spirit::x3;
  118. namespace ascii = boost::spirit::x3::ascii;
  119. using x3::lit;
  120. using x3::lexeme;
  121. using ascii::char_;
  122. using ascii::string;
  123. x3::rule<class rexpr_value, ast::rexpr_value>
  124. rexpr_value = "rexpr_value";
  125. x3::rule<class rexpr, ast::rexpr>
  126. rexpr = "rexpr";
  127. x3::rule<class rexpr_key_value, ast::rexpr_key_value>
  128. rexpr_key_value = "rexpr_key_value";
  129. auto const quoted_string =
  130. lexeme['"' >> *(char_ - '"') >> '"'];
  131. auto const rexpr_value_def =
  132. quoted_string | rexpr;
  133. auto const rexpr_key_value_def =
  134. quoted_string >> '=' >> rexpr_value;
  135. auto const rexpr_def =
  136. '{' >> *rexpr_key_value >> '}';
  137. BOOST_SPIRIT_DEFINE(rexpr_value, rexpr, rexpr_key_value);
  138. }}
  139. We declare three rules `rexpr_value`, `rexpr` and `rexpr_key_value`. Same deal
  140. as before. The attributes declared for each rule are the same structures we
  141. declared in our AST above. These are the structures our rules will produce.
  142. So, going top-down (from the start rule), `rexpr` is defined as zero or more
  143. `rexpr_key_value`s enclosed inside the curly braces. That's the kleene star in
  144. action there: `*rexpr_key_value`.
  145. [note Take note of the convention: we use `rexpr` name as the rule name and
  146. `rexpr_def` as the rule definition name.]
  147. To make the grammar clean, we capture the `quoted_string` production using
  148. C++ `auto`.
  149. `rexpr_value` is a `quoted_string` followed by the assignment operator `'='`,
  150. followed by a `rexpr_value`. `rexpr_value` is a `quoted_string` /OR/ a `rexpr`.
  151. This uses the alternative operator `|`, and maps nicely to the variant AST the
  152. rule is associated with.
  153. Finally, we associate the rules and their definitions:
  154. BOOST_SPIRIT_DEFINE(rexpr_value, rexpr, rexpr_key_value);
  155. [endsect]