123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241 |
- [/==============================================================================
- Copyright (C) 2001-2011 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section Mini XML - ASTs!]
- Stop and think about it... We've come very close to generating an AST
- (abstract syntax tree) in our last example. We parsed a single structure and
- generated an in-memory representation of it in the form of a struct: the
- `struct employee`. If we changed the implementation to parse one or more
- employees, the result would be a `std::vector<employee>`. We can go on and add
- more hierarchy: teams, departments, corporations. Then we'll have an AST
- representation of it all.
- In this example (actually two examples), we'll now explore how to create
- ASTs. We will parse a minimalistic XML-like language and compile the results
- into our data structures in the form of a tree.
- Along the way, we'll see new features:
- * Inherited attributes
- * Variant attributes
- * Local Variables
- * Not Predicate
- * Lazy Lit
- The full cpp files for these examples can be found here:
- [@../../example/qi/mini_xml1.cpp] and here: [@../../example/qi/mini_xml2.cpp]
- There are a couple of sample toy-xml files in the mini_xml_samples subdirectory:
- [@../../example/qi/mini_xml_samples/1.toyxml],
- [@../../example/qi/mini_xml_samples/2.toyxml], and
- [@../../example/qi/mini_xml_samples/3.toyxml] for testing purposes.
- The example [@../../example/qi/mini_xml_samples/4.toyxml] has an error in it.
- [import ../../example/qi/mini_xml1.cpp]
- [import ../../example/qi/mini_xml2.cpp]
- [heading First Cut]
- Without further delay, here's the first version of the XML grammar:
- [tutorial_xml1_grammar]
- Going bottom up, let's examine the `text` rule:
- rule<Iterator, std::string(), space_type> text;
- and its definition:
- text = lexeme[+(char_ - '<') [_val += _1]];
- The semantic action collects the chars and appends them (via +=) to the
- `std::string` attribute of the rule (represented by the placeholder `_val`).
- [heading Alternates]
- rule<Iterator, mini_xml_node(), space_type> node;
- and its definition:
- node = (xml | text) [_val = _1];
- We'll see a `mini_xml_node` structure later. Looking at the rule
- definition, we see some alternation going on here. An xml `node` is
- either an `xml` OR `text`. Hmmm... hold on to that thought...
- rule<Iterator, std::string(), space_type> start_tag;
- Again, with an attribute of `std::string`. Then, it's definition:
- start_tag =
- '<'
- >> !char_('/')
- >> lexeme[+(char_ - '>') [_val += _1]]
- >> '>'
- ;
- [heading Not Predicate]
- `start_tag` is similar to the `text` rule apart from the added `'<'` and `'>'`.
- But wait, to make sure that the `start_tag` does not parse `end_tag`s too, we
- add: `!char_('/')`. This is a "Not Predicate":
- !p
- It will try the parser, `p`. If it is successful, fail; otherwise, pass. In
- other words, it negates the result of `p`. Like the `eps`, it does not consume
- any input though. It will always rewind the iterator position to where it
- was upon entry. So, the expression:
- !char_('/')
- basically says: we should not have a `'/'` at this point.
- [heading Inherited Attribute]
- The `end_tag`:
- rule<Iterator, void(std::string), space_type> end_tag;
- Ohh! Now we see an inherited attribute there: `std::string`. The `end_tag` does
- not have a synthesized attribute. Let's see its definition:
- end_tag =
- "</"
- >> lit(_r1)
- >> '>'
- ;
- `_r1` is yet another __phoenix__ placeholder for the first inherited attribute
- (we have only one, use `_r2`, `_r3`, etc. if you have more).
- [heading A Lazy Lit]
- Check out how we used `lit` here, this time, not with a literal string, but with
- the value of the first inherited attribute, which is specified as `std::string` in
- our rule declaration.
- Finally, our `xml` rule:
- rule<Iterator, mini_xml(), space_type> xml;
- `mini_xml` is our attribute here. We'll see later what it is. Let's see its
- definition:
- xml =
- start_tag [at_c<0>(_val) = _1]
- >> *node [push_back(at_c<1>(_val), _1)]
- >> end_tag(at_c<0>(_val))
- ;
- Those who know __fusion__ now will notice `at_c<0>` and `at_c<1>`. This gives us
- a hint that `mini_xml` is a sort of a tuple - a fusion sequence. `at_c<N>` here
- is a lazy version of the tuple accessors, provided by __phoenix__.
- [heading How it all works]
- So, what's happening?
- # Upon parsing `start_tag`, the parsed start-tag string is placed in
- `at_c<0>(_val)`.
- # Then we parse zero or more `node`s. At each step, we `push_back` the result
- into `at_c<1>(_val)`.
- # Finally, we parse the `end_tag` giving it an inherited attribute: `at_c<0>(_val)`.
- This is the string we obtained from the `start_tag`. Investigate `end_tag` above.
- It will fail to parse if it gets something different from what we got from the
- `start_tag`. This ensures that our tags are balanced.
- To give the last item some more light, what happens is this:
- end_tag(at_c<0>(_val))
- calls:
- end_tag =
- "</"
- >> lit(_r1)
- >> '>'
- ;
- passing in `at_c<0>(_val)`, the string from start tag. This is referred to in the
- `end_tag` body as `_r1`.
- [heading The Structures]
- Let's see our structures. It will definitely be hierarchical: xml is
- hierarchical. It will also be recursive: xml is recursive.
- [tutorial_xml1_structures]
- [heading Of Alternates and Variants]
- So that's what a `mini_xml_node` looks like. We had a hint that it is either
- a `string` or a `mini_xml`. For this, we use __boost_variant__. `boost::recursive_wrapper`
- wraps `mini_xml`, making it a recursive data structure.
- Yep, you got that right: the attribute of an alternate:
- a | b
- is a
- boost::variant<A, B>
- where `A` is the attribute of `a` and `B` is the attribute of `b`.
- [heading Adapting structs again]
- `mini_xml` is no brainier. It is a plain ol' struct. But as we've seen in our
- employee example, we can adapt that to be a __fusion__ sequence:
- [tutorial_xml1_adapt_structures]
- [heading One More Take]
- Here's another version. The AST structure remains the same, but this time,
- you'll see that we make use of auto-rules making the grammar
- semantic-action-less. Here it is:
- [tutorial_xml2_grammar]
- This one shouldn't be any more difficult to understand after going through the
- first xml parser example. The rules are almost the same, except that, we got rid
- of semantic actions and used auto-rules (see the employee example if you missed
- that). There is some new stuff though. It's all in the `xml` rule:
- [heading Local Variables]
- rule<Iterator, mini_xml(), locals<std::string>, space_type> xml;
- Wow, we have four template parameters now. What's that `locals` guy doing there?
- Well, it declares that the rule `xml` will have one local variable: a `string`.
- Let's see how this is used in action:
- xml %=
- start_tag[_a = _1]
- >> *node
- >> end_tag(_a)
- ;
- # Upon parsing `start_tag`, the parsed start-tag string is placed in
- the local variable specified by (yet another) __phoenix__ placeholder:
- `_a`. We have only one local variable. If we had more, these are designated
- by `_b`..`_z`.
- # Then we parse zero or more `node`s.
- # Finally, we parse the `end_tag` giving it an inherited attribute: `_a`, our
- local variable.
- There are no actions involved in stuffing data into our `xml` attribute. It's
- all taken care of thanks to the auto-rule.
- [endsect]
|