lexer_quickstart2.qbk 4.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
  1. [/==============================================================================
  2. Copyright (C) 2001-2011 Joel de Guzman
  3. Copyright (C) 2001-2011 Hartmut Kaiser
  4. Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. ===============================================================================/]
  7. [section:lexer_quickstart2 Quickstart 2 - A better word counter using __lex__]
  8. People familiar with __flex__ will probably complain about the example from the
  9. section __sec_lex_quickstart_1__ as being overly complex and not being
  10. written to leverage the possibilities provided by this tool. In particular the
  11. previous example did not directly use the lexer actions to count the lines,
  12. words, and characters. So the example provided in this step of the tutorial will
  13. show how to use semantic actions in __lex__. Even though this examples still
  14. counts textual elements, the purpose is to introduce new concepts and
  15. configuration options along the lines (for the full example code
  16. see here: [@../../example/lex/word_count_lexer.cpp word_count_lexer.cpp]).
  17. [import ../example/lex/word_count_lexer.cpp]
  18. [heading Prerequisites]
  19. In addition to the only required `#include` specific to /Spirit.Lex/ this
  20. example needs to include a couple of header files from the __phoenix__
  21. library. This example shows how to attach functors to token definitions, which
  22. could be done using any type of C++ technique resulting in a callable object.
  23. Using __phoenix__ for this task simplifies things and avoids adding
  24. dependencies to other libraries (__phoenix__ is already in use for
  25. __spirit__ anyway).
  26. [wcl_includes]
  27. To make all the code below more readable we introduce the following namespaces.
  28. [wcl_namespaces]
  29. To give a preview at what to expect from this example, here is the flex program
  30. which has been used as the starting point. The useful code is directly included
  31. inside the actions associated with each of the token definitions.
  32. [wcl_flex_version]
  33. [heading Semantic Actions in __lex__]
  34. __lex__ uses a very similar way of associating actions with the token
  35. definitions (which should look familiar to anybody knowledgeable with
  36. __spirit__ as well): specifying the operations to execute inside of a pair of
  37. `[]` brackets. In order to be able to attach semantic actions to token
  38. definitions for each of them there is defined an instance of a `token_def<>`.
  39. [wcl_token_definition]
  40. The semantics of the shown code is as follows. The code inside the `[]`
  41. brackets will be executed whenever the corresponding token has been matched by
  42. the lexical analyzer. This is very similar to __flex__, where the action code
  43. associated with a token definition gets executed after the recognition of a
  44. matching input sequence. The code above uses function objects constructed using
  45. __phoenix__, but it is possible to insert any C++ function or function object
  46. as long as it exposes the proper interface. For more details on please refer
  47. to the section __sec_lex_semactions__.
  48. [heading Associating Token Definitions with the Lexer]
  49. If you compare this code to the code from __sec_lex_quickstart_1__ with regard
  50. to the way how token definitions are associated with the lexer, you will notice
  51. a different syntax being used here. In the previous example we have been
  52. using the `self.add()` style of the API, while we here directly assign the token
  53. definitions to `self`, combining the different token definitions using the `|`
  54. operator. Here is the code snippet again:
  55. this->self
  56. = word [++ref(w), ref(c) += distance(_1)]
  57. | eol [++ref(c), ++ref(l)]
  58. | any [++ref(c)]
  59. ;
  60. This way we have a very powerful and natural way of building the lexical
  61. analyzer. If translated into English this may be read as: The lexical analyzer
  62. will recognize ('`=`') tokens as defined by any of ('`|`') the token
  63. definitions `word`, `eol`, and `any`.
  64. A second difference to the previous example is that we do not explicitly
  65. specify any token ids to use for the separate tokens. Using semantic actions to
  66. trigger some useful work has freed us from the need to define those. To ensure
  67. every token gets assigned a id the __lex__ library internally assigns unique
  68. numbers to the token definitions, starting with the constant defined by
  69. `boost::spirit::lex::min_token_id`.
  70. [heading Pulling everything together]
  71. In order to execute the code defined above we still need to instantiate an
  72. instance of the lexer type, feed it from some input sequence and create a pair
  73. of iterators allowing to iterate over the token sequence as created by the
  74. lexer. This code shows how to achieve these steps:
  75. [wcl_main]
  76. [endsect]