123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278 |
- [/==============================================================================
- Copyright (C) 2001-2015 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section:roman Roman Numerals]
- This example demonstrates:
- * The Symbol Table
- * Non-terminal rules
- [heading Symbol Table]
- The symbol table holds a dictionary of symbols where each symbol is a sequence
- of characters. The template class, can work efficiently with 8, 16, 32 and even
- 64 bit characters. Mutable data of type T are associated with each symbol.
- Traditionally, symbol table management is maintained separately outside the BNF
- grammar through semantic actions. Contrary to standard practice, the Spirit
- symbol table class `symbols` is a parser. An object of which may be used
- anywhere in the EBNF grammar specification. It is an example of a dynamic
- parser. A dynamic parser is characterized by its ability to modify its behavior
- at run time. Initially, an empty symbols object matches nothing. At any time,
- symbols may be added or removed, thus, dynamically altering its behavior.
- Each entry in a symbol table may have an associated mutable data slot. In this
- regard, one can view the symbol table as an associative container (or map) of
- key-value pairs where the keys are strings.
- The symbols class expects one template parameter to specify the data type
- associated with each symbol: its attribute. There are a couple of
- namespaces in X3 where you can find various versions of the symbols class
- for handling different character encoding including ascii, standard,
- standard_wide, iso8859_1, and unicode. The default symbol parser type in
- the main x3 namespace is standard.
- Here's a parser for roman hundreds (100..900) using the symbol table. Keep in
- mind that the data associated with each slot is the parser's attribute (which is
- passed to attached semantic actions).
- struct hundreds_ : x3::symbols<unsigned>
- {
- hundreds_()
- {
- add
- ("C" , 100)
- ("CC" , 200)
- ("CCC" , 300)
- ("CD" , 400)
- ("D" , 500)
- ("DC" , 600)
- ("DCC" , 700)
- ("DCCC" , 800)
- ("CM" , 900)
- ;
- }
- } hundreds;
- Here's a parser for roman tens (10..90):
- struct tens_ : x3::symbols<unsigned>
- {
- tens_()
- {
- add
- ("X" , 10)
- ("XX" , 20)
- ("XXX" , 30)
- ("XL" , 40)
- ("L" , 50)
- ("LX" , 60)
- ("LXX" , 70)
- ("LXXX" , 80)
- ("XC" , 90)
- ;
- }
- } tens;
- and, finally, for ones (1..9):
- struct ones_ : x3::symbols<unsigned>
- {
- ones_()
- {
- add
- ("I" , 1)
- ("II" , 2)
- ("III" , 3)
- ("IV" , 4)
- ("V" , 5)
- ("VI" , 6)
- ("VII" , 7)
- ("VIII" , 8)
- ("IX" , 9)
- ;
- }
- } ones;
- Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions.
- They are all parsers.
- [heading Rules]
- Up until now, we've been inlining our parser expressions, passing them directly
- to the `phrase_parse` function. The expression evaluates into a temporary,
- unnamed parser which is passed into the `phrase_parse` function, used, and then
- destroyed. This is fine for small parsers. When the expressions get complicated,
- you'd want to break the expressions into smaller easier-to-understand pieces,
- name them, and refer to them from other parser expressions by name.
- A parser expression can be assigned to what is called a "rule". There are
- various ways to declare rules. The simplest form is:
- rule<ID> const r = "some-name";
- [heading Rule ID]
- At the very least, the rule needs an identification tag. This ID can be any
- struct or class type and need not be defined. Forward declaration would suffice.
- In subsequent tutorials, we will see that the rule ID can have additional
- functionalities for error handling and annotation.
- [heading Rule Name]
- The name is optional, but is useful for debugging and error handling, as
- we'll see later. Notice that rule `r` is declared `const`. Rules are
- immutable and are best declared as `const`. Rules are lightweight and can be
- passed around by value. Its only member variable is a `std::string`: its
- name.
- [note Unlike Qi (Spirit V2), X3 rules can be used with both `phrase_parse` and
- `parse` without having to specify the skip parser]
- [heading Rule Attributes]
- For our next example, there's one more rule form you should know about:
- rule<ID, Attribute> const r = "some-name";
- The Attribute parameter specifies the attribute type of the rule. You've seen
- that our parsers can have an attribute. Recall that the `double_` parser has
- an attribute of `double`. To be precise, these are /synthesized/ attributes.
- The parser "synthesizes" the attribute value. If the parser is a function,
- think of them as function return values.
- [heading Rule Definition]
- After having declared a rule, you need a definition for the rule. Example:
- auto const r_def = double_ >> *(',' >> double_);
- By convention, rule definitions have a _def suffix. Like rules, rule definitions
- are immutable and are best declared as `const`.
- [#__tutorial_spirit_define__]
- [heading BOOST_SPIRIT_DEFINE]
- Now that we have a rule and its definition, we tie the rule with a rule
- definition using the `BOOST_SPIRIT_DEFINE` macro:
- BOOST_SPIRIT_DEFINE(r);
- Behind the scenes, what's actually happening is that we are defining a `parse_rule`
- function in the client namespace that tells X3 how to invoke the rule. For example,
- given a rule named `my_rule` and a corresponding definition named `my_rule_def`,
- `BOOST_SPIRIT_DEFINE(my_rule)` expands to this code:
- template <typename Iterator, typename Context>
- inline bool parse_rule(
- decltype(my_rule)
- , Iterator& first, Iterator const& last
- , Context const& context, decltype(my_rule)::attribute_type& attr)
- {
- using boost::spirit::x3::unused;
- static auto const def_ = my_rule_def;
- return def_.parse(first, last, context, unused, attr);
- }
- And so for each rule defined using `BOOST_SPIRIT_DEFINE`, there is an
- overloaded `parse_rule` function. At parse time, Spirit X3 recursively calls
- the appropriate `parse_rule` function.
- [note `BOOST_SPIRIT_DEFINE` is variadic and may be used for one or more rules.
- Example: `BOOST_SPIRIT_DEFINE(r1, r2, r3);`]
- [heading Grammars]
- Unlike Qi (Spirit V2), X3 discards the notion of a grammar as a concrete
- entity for encapsulating rules. In X3, a grammar is simply a logical group of
- rules that work together, typically with a single top-level start rule which
- serves as the main entry point. X3 grammars are grouped using namespaces.
- The roman numeral grammar is a very nice and simple example of a grammar:
- namespace parser
- {
- using x3::eps;
- using x3::lit;
- using x3::_val;
- using x3::_attr;
- using ascii::char_;
- auto set_zero = [&](auto& ctx){ _val(ctx) = 0; };
- auto add1000 = [&](auto& ctx){ _val(ctx) += 1000; };
- auto add = [&](auto& ctx){ _val(ctx) += _attr(ctx); };
- x3::rule<class roman, unsigned> const roman = "roman";
- auto const roman_def =
- eps [set_zero]
- >>
- (
- -(+lit('M') [add1000])
- >> -hundreds [add]
- >> -tens [add]
- >> -ones [add]
- )
- ;
- BOOST_SPIRIT_DEFINE(roman);
- }
- Things to take notice of:
- * The start rule's attribute is `unsigned`.
- * `_val(ctx)` gets a reference to the rule's synthesized attribute.
- * `_attr(ctx)` gets a reference to the parser's synthesized attribute.
- * `eps` is a special spirit parser that consumes no input but is always
- successful. We use it to initialize the rule's synthesized
- attribute, to zero before anything else. The actual parser starts at
- `+lit('M')`, parsing roman thousands. Using `eps` this way is good
- for doing pre and post initializations.
- * The rule `roman` and the definition `roman_def` are const objects.
- * The rule's ID is `class roman`. C++ allows you to declare the class
- in the actual template declaration as you can see in the example:
- x3::rule<class roman, unsigned> const roman = "roman";
- [heading Let's Parse!]
- bool r = parse(iter, end, roman, result);
- if (r && iter == end)
- {
- std::cout << "-------------------------\n";
- std::cout << "Parsing succeeded\n";
- std::cout << "result = " << result << std::endl;
- std::cout << "-------------------------\n";
- }
- else
- {
- std::string rest(iter, end);
- std::cout << "-------------------------\n";
- std::cout << "Parsing failed\n";
- std::cout << "stopped at: \": " << rest << "\"\n";
- std::cout << "-------------------------\n";
- }
- `roman` is our roman numeral parser. This time around we are using the
- no-skipping version of the parse functions. We do not want to skip any spaces!
- We are also passing in an attribute, `unsigned result`, which will receive the
- parsed value.
- The full cpp file for this example can be found here:
- [@../../../example/x3/roman.cpp roman.cpp]
- [endsect]
|