123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225 |
- [/==============================================================================
- Copyright (C) 2001-2011 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section Employee - Parsing into structs]
- It's a common question in the __spirit_list__: How do I parse and place
- the results into a C++ struct? Of course, at this point, you already
- know various ways to do it, using semantic actions. There are many ways
- to skin a cat. Spirit2, being fully attributed, makes it even easier.
- The next example demonstrates some features of Spirit2 that make this
- easy. In the process, you'll learn about:
- * More about attributes
- * Auto rules
- * Some more built-in parsers
- * Directives
- [import ../../example/qi/employee.cpp]
- First, let's create a struct representing an employee:
- [tutorial_employee_struct]
- Then, we need to tell __fusion__ about our employee struct to make it a first-class
- fusion citizen that the grammar can utilize. If you don't know fusion yet,
- it is a __boost__ library for working with heterogeneous collections of data,
- commonly referred to as tuples. Spirit uses fusion extensively as part of its
- infrastructure.
- In fusion's view, a struct is just a form of a tuple. You can adapt any struct
- to be a fully conforming fusion tuple:
- [tutorial_employee_adapt_struct]
- Now we'll write a parser for our employee. Inputs will be of the form:
- employee{ age, "surname", "forename", salary }
- Here goes:
- [tutorial_employee_parser]
- The full cpp file for this example can be found here: [@../../example/qi/employee.cpp]
- Let's walk through this one step at a time (not necessarily from top to bottom).
- template <typename Iterator>
- struct employee_parser : grammar<Iterator, employee(), space_type>
- `employee_parser` is a grammar. Like before, we make it a template so that we can
- reuse it for different iterator types. The grammar's signature is:
- employee()
- meaning, the parser generates employee structs. `employee_parser` skips white
- spaces using `space_type` as its skip parser.
- employee_parser() : employee_parser::base_type(start)
- Initializes the base class.
- rule<Iterator, std::string(), space_type> quoted_string;
- rule<Iterator, employee(), space_type> start;
- Declares two rules: `quoted_string` and `start`. `start` has the same template
- parameters as the grammar itself. `quoted_string` has a `std::string` attribute.
- [heading Lexeme]
- lexeme['"' >> +(char_ - '"') >> '"'];
- `lexeme` inhibits space skipping from the open brace to the closing brace.
- The expression parses quoted strings.
- +(char_ - '"')
- parses one or more chars, except the double quote. It stops when it sees
- a double quote.
- [heading Difference]
- The expression:
- a - b
- parses `a` but not `b`. Its attribute is just `A`; the attribute of `a`. `b`'s
- attribute is ignored. Hence, the attribute of:
- char_ - '"'
- is just `char`.
- [heading Plus]
- +a
- is similar to Kleene star. Rather than match everything, `+a` matches one or more.
- Like it's related function, the Kleene star, its attribute is a `std::vector<A>`
- where `A` is the attribute of `a`. So, putting all these together, the attribute
- of
- +(char_ - '"')
- is then:
- std::vector<char>
- [heading Sequence Attribute]
- Now what's the attribute of
- '"' >> +(char_ - '"') >> '"'
- ?
- Well, typically, the attribute of:
- a >> b >> c
- is:
- fusion::vector<A, B, C>
- where `A` is the attribute of `a`, `B` is the attribute of `b` and `C` is the
- attribute of `c`. What is `fusion::vector`? - a tuple.
- [note If you don't know what I am talking about, see: [@http://tinyurl.com/6xun4j
- Fusion Vector]. It might be a good idea to have a look into __fusion__ at this
- point. You'll definitely see more of it in the coming pages.]
- [heading Attribute Collapsing]
- Some parsers, especially those very little literal parsers you see, like `'"'`,
- do not have attributes.
- Nodes without attributes are disregarded. In a sequence, like above, all nodes
- with no attributes are filtered out of the `fusion::vector`. So, since `'"'` has
- no attribute, and `+(char_ - '"')` has a `std::vector<char>` attribute, the
- whole expression's attribute should have been:
- fusion::vector<std::vector<char> >
- But wait, there's one more collapsing rule: If the attribute is followed by a
- single element `fusion::vector`, The element is stripped naked from its container.
- To make a long story short, the attribute of the expression:
- '"' >> +(char_ - '"') >> '"'
- is:
- std::vector<char>
- [heading Auto Rules]
- It is typical to see rules like:
- r = p[_val = _1];
- If you have a rule definition such as the above, where the attribute of the RHS
- (right hand side) of the rule is compatible with the attribute of the LHS (left
- hand side), then you can rewrite it as:
- r %= p;
- The attribute of `p` automatically uses the attribute of `r`.
- So, going back to our `quoted_string` rule:
- quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];
- is a simplified version of:
- quoted_string = lexeme['"' >> +(char_ - '"') >> '"'][_val = _1];
- The attribute of the `quoted_string` rule: `std::string` *is compatible* with
- the attribute of the RHS: `std::vector<char>`. The RHS extracts the parsed
- attribute directly into the rule's attribute, in-situ.
- [note `r %= p` and `r = p` are equivalent if there are no semantic actions
- associated with `p`. ]
- [heading Finally]
- We're down to one rule, the start rule:
- start %=
- lit("employee")
- >> '{'
- >> int_ >> ','
- >> quoted_string >> ','
- >> quoted_string >> ','
- >> double_
- >> '}'
- ;
- Applying our collapsing rules above, the RHS has an attribute of:
- fusion::vector<int, std::string, std::string, double>
- These nodes do not have an attribute:
- * `lit("employee")`
- * `'{'`
- * `','`
- * `'}'`
- [note In case you are wondering, `lit("employee")` is the same as "employee". We
- had to wrap it inside `lit` because immediately after it is `>> '{'`. You can't
- right-shift a `char[]` and a `char` - you know, C++ syntax rules.]
- Recall that the attribute of `start` is the `employee` struct:
- [tutorial_employee_struct]
- Now everything is clear, right? The `struct employee` *IS* compatible with
- `fusion::vector<int, std::string, std::string, double>`. So, the RHS of `start`
- uses start's attribute (a `struct employee`) in-situ when it does its work.
- [endsect]
|