123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205 |
- [/==============================================================================
- Copyright (C) 2001-2011 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section Introduction]
- Boost Spirit is an object-oriented, recursive-descent parser and
- output generation library for C++. It allows you to write grammars and
- format descriptions using a format similar to Extended Backus Naur
- Form (EBNF)[footnote [@http://www.cl.cam.ac.uk/%7Emgk25/iso-14977.pdf
- ISO-EBNF]] directly in C++. These inline grammar
- specifications can mix freely with other C++ code and, thanks to the
- generative power of C++ templates, are immediately executable. In
- retrospect, conventional compiler-compilers or parser-generators have
- to perform an additional translation step from the source EBNF code to
- C or C++ code.
- The syntax and semantics of the libraries' API directly form domain-specific
- embedded languages (DSEL). In fact, Spirit exposes 3 different DSELs to the
- user:
- * one for creating parser grammars,
- * one for the specification of the required tokens to be used for parsing,
- * and one for the description of the required output formats.
- Since the target input grammars and output formats are written entirely in C++
- we do not need any separate tools to compile, preprocess or integrate those
- into the build process. __spirit__ allows seamless integration of the parsing
- and output generation process with other C++ code. This often allows for
- simpler and more efficient code.
- Both the created parsers and generators are fully attributed, which allows you
- to easily build and handle hierarchical data structures in memory. These data
- structures resemble the structure of the input data and can directly be used
- to generate arbitrarily-formatted output.
- The [link spirit.spiritstructure figure] below depicts the overall structure
- of the Boost Spirit library. The library consists of 4 major parts:
- * __classic__: This is the almost-unchanged code base taken from the
- former Boost Spirit V1.8 distribution. It has been moved into the namespace
- boost::spirit::classic. A special compatibility layer has been added to
- ensure complete compatibility with existing code using Spirit V1.8.
- * __qi__: This is the parser library allowing you to build recursive
- descent parsers. The exposed domain-specific language can be used to describe
- the grammars to implement, and the rules for storing the parsed information.
- * __lex__: This is the library usable to create tokenizers (lexers). The
- domain-specific language exposed by __lex__ allows you to define regular
- expressions used to match tokens (create token definitions), associate these
- regular expressions with code to be executed whenever they are matched, and
- to add the token definitions to the lexical analyzer.
- * __karma__: This is the generator library allowing you to create code for
- recursive descent, data type-driven output formatting. The exposed
- domain-specific language is almost equivalent to the parser description language
- used in __qi__, except that it is used to describe the required output
- format to generate from a given data structure.
- [fig spiritstructure.png..The overall structure of the Boost Spirit library..spirit.spiritstructure]
- The three components, __qi__, __karma__ and __lex__, are designed to be used
- either stand alone, or together. The general methodology is to use the token
- sequence generated by __lex__ as the input for a parser generated by __qi__.
- On the opposite side of the equation, the hierarchical data structures generated
- by __qi__ are used for the output generators created using __karma__.
- However, there is nothing to stop you from using any of these components all
- by themselves.
- The [link spirit.spiritkarmaflow figure] below shows the typical data flow of
- some input being converted to some internal representation. After some
- (optional) transformation these data are converted back into some different,
- external representation. The picture highlights Spirit's place in this data
- transformation flow.
- [fig spiritkarmaflow.png..The place of __qi__ and __karma__ in a data transformation flow of a typical application..spirit.spiritkarmaflow]
- [heading A Quick Overview of Parsing with __qi__]
- __qi__ is Spirit's sublibrary dealing with generating parsers based on a given
- target grammar (essentially a format description of the input data to read).
- A simple EBNF grammar snippet:
- group ::= '(' expression ')'
- factor ::= integer | group
- term ::= factor (('*' factor) | ('/' factor))*
- expression ::= term (('+' term) | ('-' term))*
- is approximated using facilities of Spirit's /Qi/ sublibrary as seen in this
- code snippet:
- group = '(' >> expression >> ')';
- factor = integer | group;
- term = factor >> *(('*' >> factor) | ('/' >> factor));
- expression = term >> *(('+' >> term) | ('-' >> term));
- Through the magic of expression templates, this is perfectly valid and
- executable C++ code. The production rule `expression` is, in fact, an object that
- has a member function `parse` that does the work given a source code written in
- the grammar that we have just declared. Yes, it's a calculator. We shall
- simplify for now by skipping the type declarations and the definition of the
- rule `integer` invoked by `factor`. Now, the production rule `expression` in our
- grammar specification, traditionally called the `start` symbol, can recognize
- inputs such as:
- 12345
- -12345
- +12345
- 1 + 2
- 1 * 2
- 1/2 + 3/4
- 1 + 2 + 3 + 4
- 1 * 2 * 3 * 4
- (1 + 2) * (3 + 4)
- (-1 + 2) * (3 + -4)
- 1 + ((6 * 200) - 20) / 6
- (1 + (2 + (3 + (4 + 5))))
- Certainly we have modified the original EBNF syntax. This is done to
- conform to C++ syntax rules. Most notably we see the abundance of
- shift >> operators. Since there are no 'empty' operators in C++, it is
- simply not possible to write something like:
- a b
- as seen in math syntax, for example, to mean multiplication or, in our case,
- as seen in EBNF syntax to mean sequencing (b should follow a). __qi__
- uses the shift `>>` operator instead for this purpose. We take the `>>` operator,
- with arrows pointing to the right, to mean "is followed by". Thus we write:
- a >> b
- The alternative operator `|` and the parentheses `()` remain as is. The
- assignment operator `=` is used in place of EBNF's `::=`. Last but not least,
- the Kleene star `*`, which in this case is a postfix operator in EBNF becomes a
- prefix. Instead of:
- a* //... in EBNF syntax,
- we write:
- *a //... in Spirit.
- since there are no postfix stars, `*`, in C/C++. Finally, we terminate each
- rule with the ubiquitous semi-colon, `;`.
- [heading A Quick Overview of Output Generation with __karma__]
- Spirit not only allows you to describe the structure of the input, it also enables
- the specification of the output format for your data in a similar way, and based
- on a single syntax and compatible semantics.
- Let's assume we need to generate a textual representation from a simple data
- structure such as a `std::vector<int>`. Conventional code probably would look like:
- std::vector<int> v (initialize_and_fill());
- std::vector<int>::iterator end = v.end();
- for (std::vector<int>::iterator it = v.begin(); it != end; ++it)
- std::cout << *it << std::endl;
- which is not very flexible and quite difficult to maintain when it comes to
- changing the required output format. Spirit's sublibrary /Karma/ allows you to
- specify output formats for arbitrary data structures in a very flexible way.
- The following snippet is the /Karma/ format description used to create the
- same output as the traditional code above:
- *(int_ << eol)
- Here are some more examples of format descriptions for different output
- representations of the same `std::vector<int>`:
- [table Different output formats for `std::vector<int>`
- [ [Format] [Example] [Description] ]
- [ [`'[' << *(int_ << ',') << ']'`] [`[1,8,10,]`] [Comma separated list of integers] ]
- [ [`*('(' << int_ << ')' << ',')`] [`(1),(8),(10),`] [Comma separated list of integers in parenthesis] ]
- [ [`*hex`] [`18a`] [A list of hexadecimal numbers] ]
- [ [`*(double_ << ',')`] [`1.0,8.0,10.0,`] [A list of floating point numbers] ]
- ]
- We will see later in this documentation how it is possible to avoid printing
- the trailing `','`.
- Overall, the syntax is similar to __qi__ with the exception that we use the `<<`
- operator for output concatenation. This should be easy to understand as it
- follows the conventions used in the Standard's I/O streams.
- Another important feature of __karma__ allows you to fully decouple the data
- type from the output format. You can use the same output format with different
- data types as long as these conform conceptually. The next table gives some
- related examples.
- [table Different data types usable with the output format `*(int_ << eol)`
- [ [Data type] [Description] ]
- [ [`int i[4]`] [C style arrays] ]
- [ [`std::vector<int>`] [Standard vector] ]
- [ [`std::list<int>`] [Standard list] ]
- [ [`boost::array<long, 20>`] [Boost array] ]
- ]
- [endsect]
|