123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596 |
- [/==============================================================================
- Copyright (C) 2001-2015 Joel de Guzman
- Copyright (C) 2001-2011 Hartmut Kaiser
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ===============================================================================/]
- [section Syntax Diagram]
- In the next section, we will deal with Parsing Expression Grammars
- (PEG) [footnote Bryan Ford: Parsing Expression Grammars: A Recognition-Based
- Syntactic Foundation, [@http://pdos.csail.mit.edu/~baford/packrat/popl04/]],
- a variant of Extended Backus-Naur Form (EBNF) [footnote Richard E. Pattis: EBNF:
- A Notation to Describe Syntax, [@http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf]]
- with a different interpretation. It is easier to understand PEG using Syntax
- Diagrams. Syntax diagrams represent a grammar graphically. It was used
- extensively by Niklaus Wirth [footnote Niklaus Wirth: The Programming Language
- Pascal. (July 1973)] in the "Pascal User Manual". Syntax Diagrams are easily
- understandable by programmers due to their similarity to flow charts. The
- isomorphism of the diagrams and functions make them ideal for representing
- __rd__ parsers which are essentially mutually recursive functions.
- [heading Elements]
- All diagrams have one entry and one exit point. Arrows connect all possible
- paths through the grammar from the entry point to the exit point.
- [:__sd_start_stop__]
- Terminals are represented by round boxes. Terminals are atomic and
- usually represent plain characters, strings or tokens.
- [:__sd_terminals__]
- Non-terminals are represented by boxes. Diagrams are modularized using
- named non-terminals. A complex diagram can be broken down into a set of
- non-terminals. Non-terminals also allow recursion (i.e. a non-terminal can call
- itself).
- [:__sd_non_terminals__]
- [heading Constructs]
- The most basic composition is the Sequence. B follows A:
- [:__sd_sequence__]
- The ordered choice henceforth we will call /alternatives/. In PEG, ordered
- choice and alternatives are not quite the same. PEG allows ambiguity of choice
- where one or more branches can succeed. In PEG, in case of ambiguity, the first
- one always wins.
- [:__sd_choice__]
- The optional (zero-or-one):
- [:__sd_optional__]
- Now, the loops. We have the zero-or-more and one-or-more:
- [:__sd_kleene__]
- [:__sd_plus__]
- Take note that, as in PEG, these loops behave greedily. If there is
- another 'A' just before the end-point, it will always fail because the
- preceding loop has already exhausted all 'A's and there is nothing more left.
- This is a crucial difference between PEG and general Context Free Grammars
- (CFGs). This behavior is quite obvious with syntax diagrams as they resemble
- flow-charts.
- [heading Predicates]
- Now, the following are Syntax Diagram versions of PEG predicates. These are not
- traditionally found in Syntax Diagrams. These are special extensions we invented
- to closely follow PEGs.
- First, we introduce a new element, the Predicate:
- [:__sd_predicate__]
- This is similar to the conditionals in flow charts where the 'No' branch is
- absent and always signals a failed parse.
- We have two versions of the predicate, the /And-Predicate/ and the
- /Not-Predicate/:
- [:__sd_and_predicate__]
- [:__sd_not_predicate__]
- The /And-Predicate/ tries the predicate, P, and succeeds if P succeeds,
- or otherwise fail. The opposite is true with the /Not-Predicate/. It
- tries the predicate, P, and fails if P succeeds, or otherwise succeeds. Both
- versions do a look-ahead but do not consume any input regardless if P succeeds
- or not.
- [endsect]
|