peg.qbk 3.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. [/==============================================================================
  2. Copyright (C) 2001-2011 Joel de Guzman
  3. Copyright (C) 2001-2011 Hartmut Kaiser
  4. Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. ===============================================================================/]
  7. [section Parsing Expression Grammar]
  8. Parsing Expression Grammars (PEG) [footnote Bryan Ford: Parsing Expression
  9. Grammars: A Recognition-Based Syntactic Foundation,
  10. [@http://pdos.csail.mit.edu/~baford/packrat/popl04/]] are a derivative of
  11. Extended Backus-Naur Form (EBNF) [footnote Richard E. Pattis: EBNF: A Notation
  12. to Describe Syntax, [@http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf]]
  13. with a different interpretation, designed to represent a recursive descent
  14. parser. A PEG can be directly represented as a recursive-descent parser.
  15. Like EBNF, PEG is a formal grammar for describing a formal language in
  16. terms of a set of rules used to recognize strings of this language.
  17. Unlike EBNF, PEGs have an exact interpretation. There is only one valid
  18. parse tree (see __ast__) for each PEG grammar.
  19. [heading Sequences]
  20. Sequences are represented by juxtaposition like in EBNF:
  21. a b
  22. The PEG expression above states that, in order for this to succeed,
  23. `b` must follow `a`. Here's the syntax diagram:
  24. [:__sd_sequence__]
  25. Here's a trivial example:
  26. 'x' digit
  27. which means the character `x` must be followed by a digit.
  28. [note In __qi__, we use the `>>` for sequences since C++ does not
  29. allow juxtaposition, while in __karma__ we use the `<<` instead.]
  30. [heading Alternatives]
  31. Alternatives are represented in PEG using the slash:
  32. a / b
  33. [note In __qi__ and __karma__, we use the `|` for alternatives just as in EBNF.]
  34. Alternatives allow for choices. The expression above reads: try to match
  35. `a`. If `a` succeeds, success, if not try to match `b`. This is a bit of
  36. a deviation from the usual EBNF interpretation where you simply match
  37. `a` *or* `b`. Here's the syntax diagram:
  38. [:__sd_choice__]
  39. PEGs allow for ambiguity in the alternatives. In the expression above,
  40. both `a` or `b` can both match an input string. However, only the first
  41. matching alternative is valid. As noted, there can only be one valid
  42. parse tree. [/FIXME: $$$ explain more about this $$$]
  43. [heading Loops]
  44. Again, like EBNF, PEG uses the regular-expression Kleene star and the
  45. plus loops:
  46. a*
  47. a+
  48. [note __qi__ and __karma__ use the prefix star and plus since there is no
  49. postfix star or plus in C++.]
  50. Here are the syntax diagrams:
  51. [:__sd_kleene__]
  52. [:__sd_plus__]
  53. The first, called the Kleene star, matches zero or more of its subject
  54. `a`. The second, plus, matches one ore more of its subject `a`.
  55. Unlike EBNF, PEGs have greedy loops. It will match as much as it can
  56. until its subject fails to match without regard to what follows. The
  57. following is a classic example of a fairly common EBNF/regex expression
  58. failing to match in PEG:
  59. alnum* digit
  60. In PEG, alnum will eat as much alpha-numeric characters as it can
  61. leaving nothing more left behind. Thus, the trailing digit will get
  62. nothing. Loops are simply implemented in recursive descent code as
  63. for/while loops making them extremely efficient. That is a definite
  64. advantage. On the other hand, those who are familiar with EBNF and regex
  65. behavior might find the behavior a major gotcha. PEG provides a couple
  66. of other mechanisms to circumvent this. We will see more of these other
  67. mechanisms shortly.
  68. [heading Difference]
  69. In some cases, you may want to restrict a certain expression. You can
  70. think of a PEG expression as a match for a potentially infinite set of
  71. strings. The difference operator allows you to restrict this set:
  72. a - b
  73. The expression reads: match `a` but not `b`.
  74. [note There is no difference operator in __karma__, as the concept does not
  75. make sense in the context of output generation.]
  76. [endsect]