123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186 |
- [/==============================================================================
- Copyright (C) 2001-2011 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section:lexer_semantic_actions Lexer Semantic Actions]
- The main task of a lexer normally is to recognize tokens in the input.
- Traditionally this has been complemented with the possibility to execute
- arbitrary code whenever a certain token has been detected. __lex__ has been
- designed to support this mode of operation as well. We borrow from the concept
- of semantic actions for parsers (__qi__) and generators (__karma__). Lexer
- semantic actions may be attached to any token definition. These are C++
- functions or function objects that are called whenever a token definition
- successfully recognizes a portion of the input. Say you have a token definition
- `D`, and a C++ function `f`, you can make the lexer call `f` whenever it matches
- an input by attaching `f`:
- D[f]
- The expression above links `f` to the token definition, `D`. The required
- prototype of `f` is:
- void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id, Context& ctx);
- [variablelist where:
- [[`Iterator& start`] [This is the iterator pointing to the begin of the
- matched range in the underlying input sequence. The
- type of the iterator is the same as specified while
- defining the type of the `lexertl::actor_lexer<...>`
- (its first template parameter). The semantic action
- is allowed to change the value of this iterator
- influencing, the matched input sequence.]]
- [[`Iterator& end`] [This is the iterator pointing to the end of the
- matched range in the underlying input sequence. The
- type of the iterator is the same as specified while
- defining the type of the `lexertl::actor_lexer<...>`
- (its first template parameter). The semantic action
- is allowed to change the value of this iterator
- influencing, the matched input sequence.]]
- [[`pass_flag& matched`] [This value is pre/initialized to `pass_normal`.
- If the semantic action sets it to `pass_fail` this
- behaves as if the token has not been matched in
- the first place. If the semantic action sets this
- to `pass_ignore` the lexer ignores the current
- token and tries to match a next token from the
- input.]]
- [[`Idtype& id`] [This is the token id of the type Idtype (most of
- the time this will be a `std::size_t`) for the
- matched token. The semantic action is allowed to
- change the value of this token id, influencing the
- if of the created token.]]
- [[`Context& ctx`] [This is a reference to a lexer specific,
- unspecified type, providing the context for the
- current lexer state. It can be used to access
- different internal data items and is needed for
- lexer state control from inside a semantic
- action.]]
- ]
- When using a C++ function as the semantic action the following prototypes are
- allowed as well:
- void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id);
- void f (Iterator& start, Iterator& end, pass_flag& matched);
- void f (Iterator& start, Iterator& end);
- void f ();
- [important In order to use lexer semantic actions you need to use type
- `lexertl::actor_lexer<>` as your lexer class (instead of the
- type `lexertl::lexer<>` as described in earlier examples).]
- [heading The context of a lexer semantic action]
- The last parameter passed to any lexer semantic action is a reference to an
- unspecified type (see the `Context` type in the table above). This type is
- unspecified because it depends on the token type returned by the lexer. It is
- implemented in the internals of the iterator type exposed by the lexer.
- Nevertheless, any context type is expected to expose a couple of
- functions allowing to influence the behavior of the lexer. The following table
- gives an overview and a short description of the available functionality.
- [table Functions exposed by any context passed to a lexer semantic action
- [[Name] [Description]]
- [[`Iterator const& get_eoi() const`]
- [The function `get_eoi()` may be used by to access the end iterator of
- the input stream the lexer has been initialized with]]
- [[`void more()`]
- [The function `more()` tells the lexer that the next time it matches a
- rule, the corresponding token should be appended onto the current token
- value rather than replacing it.]]
- [[`Iterator const& less(Iterator const& it, int n)`]
- [The function `less()` returns an iterator positioned to the nth input
- character beyond the current token start iterator (i.e. by passing the
- return value to the parameter `end` it is possible to return all but the
- first n characters of the current token back to the input stream.]]
- [[`bool lookahead(std::size_t id)`]
- [The function `lookahead()` can be used to implement lookahead for lexer
- engines not supporting constructs like flex' `a/b`
- (match `a`, but only when followed by `b`). It invokes the lexer on the
- input following the current token without actually moving forward in the
- input stream. The function returns whether the lexer was able to match a
- token with the given token-id `id`.]]
- [[`std::size_t get_state() const` and `void set_state(std::size_t state)`]
- [The functions `get_state()` and `set_state()` may be used to introspect
- and change the current lexer state.]]
- [[`token_value_type get_value() const` and `void set_value(Value const&)`]
- [The functions `get_value()` and `set_value()` may be used to introspect
- and change the current token value.]]
- ]
- [heading Lexer Semantic Actions Using Phoenix]
- Even if it is possible to write your own function object implementations (i.e.
- using Boost.Lambda or Boost.Bind), the preferred way of defining lexer semantic
- actions is to use __phoenix__. In this case you can access the parameters
- described above by using the predefined __spirit__ placeholders:
- [table Predefined Phoenix placeholders for lexer semantic actions
- [[Placeholder] [Description]]
- [[`_start`]
- [Refers to the iterator pointing to the beginning of the matched input
- sequence. Any modifications to this iterator value will be reflected in
- the generated token.]]
- [[`_end`]
- [Refers to the iterator pointing past the end of the matched input
- sequence. Any modifications to this iterator value will be reflected in
- the generated token.]]
- [[`_pass`]
- [References the value signaling the outcome of the semantic action. This
- is pre-initialized to `lex::pass_flags::pass_normal`. If this is set to
- `lex::pass_flags::pass_fail`, the lexer will behave as if no token has
- been matched, if is set to `lex::pass_flags::pass_ignore`, the lexer will
- ignore the current match and proceed trying to match tokens from the
- input.]]
- [[`_tokenid`]
- [Refers to the token id of the token to be generated. Any modifications
- to this value will be reflected in the generated token.]]
- [[`_val`]
- [Refers to the value the next token will be initialized from. Any
- modifications to this value will be reflected in the generated token.]]
- [[`_state`]
- [Refers to the lexer state the input has been match in. Any modifications
- to this value will be reflected in the lexer itself (the next match will
- start in the new state). The currently generated token is not affected
- by changes to this variable.]]
- [[`_eoi`]
- [References the end iterator of the overall lexer input. This value
- cannot be changed.]]
- ]
- The context object passed as the last parameter to any lexer semantic action is
- not directly accessible while using __phoenix__ expressions. We rather provide
- predefined Phoenix functions allowing to invoke the different support functions
- as mentioned above. The following table lists the available support functions
- and describes their functionality:
-
- [table Support functions usable from Phoenix expressions inside lexer semantic actions
- [[Plain function] [Phoenix function] [Description]]
- [[`ctx.more()`]
- [`more()`]
- [The function `more()` tells the lexer that the next time it matches a
- rule, the corresponding token should be appended onto the current token
- value rather than replacing it.]]
- [[`ctx.less()`]
- [`less(n)`]
- [The function `less()` takes a single integer parameter `n` and returns an
- iterator positioned to the nth input character beyond the current token
- start iterator (i.e. by assigning the return value to the placeholder
- `_end` it is possible to return all but the first `n` characters of the
- current token back to the input stream.]]
- [[`ctx.lookahead()`]
- [`lookahead(std::size_t)` or `lookahead(token_def)`]
- [The function `lookahead()` takes a single parameter specifying the token
- to match in the input. The function can be used for instance to implement
- lookahead for lexer engines not supporting constructs like flex' `a/b`
- (match `a`, but only when followed by `b`). It invokes the lexer on the
- input following the current token without actually moving forward in the
- input stream. The function returns whether the lexer was able to match
- the specified token.]]
- ]
- [endsect]
|