123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103 |
- [/==============================================================================
- Copyright (C) 2001-2011 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section:lexer_quickstart2 Quickstart 2 - A better word counter using __lex__]
- People familiar with __flex__ will probably complain about the example from the
- section __sec_lex_quickstart_1__ as being overly complex and not being
- written to leverage the possibilities provided by this tool. In particular the
- previous example did not directly use the lexer actions to count the lines,
- words, and characters. So the example provided in this step of the tutorial will
- show how to use semantic actions in __lex__. Even though this examples still
- counts textual elements, the purpose is to introduce new concepts and
- configuration options along the lines (for the full example code
- see here: [@../../example/lex/word_count_lexer.cpp word_count_lexer.cpp]).
- [import ../example/lex/word_count_lexer.cpp]
- [heading Prerequisites]
- In addition to the only required `#include` specific to /Spirit.Lex/ this
- example needs to include a couple of header files from the __phoenix__
- library. This example shows how to attach functors to token definitions, which
- could be done using any type of C++ technique resulting in a callable object.
- Using __phoenix__ for this task simplifies things and avoids adding
- dependencies to other libraries (__phoenix__ is already in use for
- __spirit__ anyway).
- [wcl_includes]
- To make all the code below more readable we introduce the following namespaces.
- [wcl_namespaces]
- To give a preview at what to expect from this example, here is the flex program
- which has been used as the starting point. The useful code is directly included
- inside the actions associated with each of the token definitions.
- [wcl_flex_version]
- [heading Semantic Actions in __lex__]
- __lex__ uses a very similar way of associating actions with the token
- definitions (which should look familiar to anybody knowledgeable with
- __spirit__ as well): specifying the operations to execute inside of a pair of
- `[]` brackets. In order to be able to attach semantic actions to token
- definitions for each of them there is defined an instance of a `token_def<>`.
- [wcl_token_definition]
- The semantics of the shown code is as follows. The code inside the `[]`
- brackets will be executed whenever the corresponding token has been matched by
- the lexical analyzer. This is very similar to __flex__, where the action code
- associated with a token definition gets executed after the recognition of a
- matching input sequence. The code above uses function objects constructed using
- __phoenix__, but it is possible to insert any C++ function or function object
- as long as it exposes the proper interface. For more details on please refer
- to the section __sec_lex_semactions__.
- [heading Associating Token Definitions with the Lexer]
- If you compare this code to the code from __sec_lex_quickstart_1__ with regard
- to the way how token definitions are associated with the lexer, you will notice
- a different syntax being used here. In the previous example we have been
- using the `self.add()` style of the API, while we here directly assign the token
- definitions to `self`, combining the different token definitions using the `|`
- operator. Here is the code snippet again:
- this->self
- = word [++ref(w), ref(c) += distance(_1)]
- | eol [++ref(c), ++ref(l)]
- | any [++ref(c)]
- ;
- This way we have a very powerful and natural way of building the lexical
- analyzer. If translated into English this may be read as: The lexical analyzer
- will recognize ('`=`') tokens as defined by any of ('`|`') the token
- definitions `word`, `eol`, and `any`.
- A second difference to the previous example is that we do not explicitly
- specify any token ids to use for the separate tokens. Using semantic actions to
- trigger some useful work has freed us from the need to define those. To ensure
- every token gets assigned a id the __lex__ library internally assigns unique
- numbers to the token definitions, starting with the constant defined by
- `boost::spirit::lex::min_token_id`.
- [heading Pulling everything together]
- In order to execute the code defined above we still need to instantiate an
- instance of the lexer type, feed it from some input sequence and create a pair
- of iterators allowing to iterate over the token sequence as created by the
- lexer. This code shows how to achieve these steps:
- [wcl_main]
- [endsect]
|