123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107 |
- [/==============================================================================
- Copyright (C) 2001-2015 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section Parsing Expression Grammar]
- Parsing Expression Grammars (PEG) [footnote Bryan Ford: Parsing Expression
- Grammars: A Recognition-Based Syntactic Foundation,
- [@http://pdos.csail.mit.edu/~baford/packrat/popl04/]] are a derivative of
- Extended Backus-Naur Form (EBNF) [footnote Richard E. Pattis: EBNF: A Notation
- to Describe Syntax, [@http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf]]
- with a different interpretation, designed to represent a recursive descent
- parser. A PEG can be directly represented as a recursive-descent parser.
- Like EBNF, PEG is a formal grammar for describing a formal language in
- terms of a set of rules used to recognize strings of this language.
- Unlike EBNF, PEGs have an exact interpretation. There is only one valid
- parse tree (see __ast__) for each PEG grammar.
- [heading Sequences]
- Sequences are represented by juxtaposition like in EBNF:
- a b
- The PEG expression above states that, in order for this to succeed,
- `b` must follow `a`. Here's the syntax diagram:
- [:__sd_sequence__]
- Here's a trivial example:
- 'x' digit
- which means the character `x` must be followed by a digit.
- [note In __x3__, we use the `>>` for sequences since C++ does not
- allow juxtaposition.]
- [heading Alternatives]
- Alternatives are represented in PEG using the slash:
- a / b
- [note In __x3__, we use the `|` for alternatives just as in EBNF.]
- Alternatives allow for choices. The expression above reads: try to match
- `a`. If `a` succeeds, success, if not try to match `b`. This is a bit of
- a deviation from the usual EBNF interpretation where you simply match
- `a` *or* `b`. Here's the syntax diagram:
- [:__sd_choice__]
- PEGs allow for ambiguity in the alternatives. In the expression above,
- both `a` or `b` can both match an input string. However, only the first
- matching alternative is valid. As noted, there can only be one valid
- parse tree. [/FIXME: $$$ explain more about this $$$]
- [heading Loops]
- Again, like EBNF, PEG uses the regular-expression Kleene star and the
- plus loops:
- a*
- a+
- [note __x3__ uses the prefix star and plus since there is no postfix star or
- plus in C++.]
- Here are the syntax diagrams:
- [:__sd_kleene__]
- [:__sd_plus__]
- The first, called the Kleene star, matches zero or more of its subject
- `a`. The second, plus, matches one ore more of its subject `a`.
- Unlike EBNF, PEGs have greedy loops. It will match as much as it can
- until its subject fails to match without regard to what follows. The following
- is a classic example of a fairly common EBNF/regex expression failing to match
- in PEG:
- alnum* digit
- In PEG, alnum will eat as much alpha-numeric characters as it can
- leaving nothing more left behind. Thus, the trailing digit will get
- nothing. Loops are simply implemented in recursive descent code as for/while
- loops making them extremely efficient. That is a definite advantage. On the
- other hand, those who are familiar with EBNF and regex behavior might find the
- behavior a major gotcha. PEG provides a couple of other mechanisms to circumvent
- this. We will see more of these other mechanisms shortly.
- [heading Difference]
- In some cases, you may want to restrict a certain expression. You can think of a
- PEG expression as a match for a potentially infinite set of strings. The
- difference operator allows you to restrict this set:
- a - b
- The expression reads: match `a` but not `b`.
- [endsect]
|