parsing_expression_grammar.html 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203
  1. <html>
  2. <head>
  3. <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
  4. <title>Parsing Expression Grammar</title>
  5. <link rel="stylesheet" href="../../../../../../../doc/src/boostbook.css" type="text/css">
  6. <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
  7. <link rel="home" href="../../index.html" title="Spirit X3 3.0.4">
  8. <link rel="up" href="../abstracts.html" title="Abstracts">
  9. <link rel="prev" href="syntax_diagram.html" title="Syntax Diagram">
  10. <link rel="next" href="primitive_attributes.html" title="Attributes of Primitive Components">
  11. </head>
  12. <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
  13. <table cellpadding="2" width="100%"><tr>
  14. <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../boost.png"></td>
  15. <td align="center"><a href="../../../../../../../index.html">Home</a></td>
  16. <td align="center"><a href="../../../../../../../libs/libraries.htm">Libraries</a></td>
  17. <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
  18. <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
  19. <td align="center"><a href="../../../../../../../more/index.htm">More</a></td>
  20. </tr></table>
  21. <hr>
  22. <div class="spirit-nav">
  23. <a accesskey="p" href="syntax_diagram.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../abstracts.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="primitive_attributes.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
  24. </div>
  25. <div class="section">
  26. <div class="titlepage"><div><div><h3 class="title">
  27. <a name="spirit_x3.abstracts.parsing_expression_grammar"></a><a class="link" href="parsing_expression_grammar.html" title="Parsing Expression Grammar">Parsing
  28. Expression Grammar</a>
  29. </h3></div></div></div>
  30. <p>
  31. Parsing Expression Grammars (PEG) <a href="#ftn.spirit_x3.abstracts.parsing_expression_grammar.f0" class="footnote" name="spirit_x3.abstracts.parsing_expression_grammar.f0"><sup class="footnote">[5]</sup></a> are a derivative of Extended Backus-Naur Form (EBNF) <a href="#ftn.spirit_x3.abstracts.parsing_expression_grammar.f1" class="footnote" name="spirit_x3.abstracts.parsing_expression_grammar.f1"><sup class="footnote">[6]</sup></a> with a different interpretation, designed to represent a recursive
  32. descent parser. A PEG can be directly represented as a recursive-descent
  33. parser.
  34. </p>
  35. <p>
  36. Like EBNF, PEG is a formal grammar for describing a formal language in terms
  37. of a set of rules used to recognize strings of this language. Unlike EBNF,
  38. PEGs have an exact interpretation. There is only one valid parse tree (see
  39. Abstract Syntax Tree) for each PEG grammar.
  40. </p>
  41. <h5>
  42. <a name="spirit_x3.abstracts.parsing_expression_grammar.h0"></a>
  43. <span class="phrase"><a name="spirit_x3.abstracts.parsing_expression_grammar.sequences"></a></span><a class="link" href="parsing_expression_grammar.html#spirit_x3.abstracts.parsing_expression_grammar.sequences">Sequences</a>
  44. </h5>
  45. <p>
  46. Sequences are represented by juxtaposition like in EBNF:
  47. </p>
  48. <pre class="programlisting"><span class="identifier">a</span> <span class="identifier">b</span>
  49. </pre>
  50. <p>
  51. The PEG expression above states that, in order for this to succeed, <code class="computeroutput"><span class="identifier">b</span></code> must follow <code class="computeroutput"><span class="identifier">a</span></code>.
  52. Here's the syntax diagram:
  53. </p>
  54. <div class="blockquote"><blockquote class="blockquote"><p>
  55. <span class="inlinemediaobject"><img src="../.././images/sequence.png" alt="sequence"></span>
  56. </p></blockquote></div>
  57. <p>
  58. Here's a trivial example:
  59. </p>
  60. <pre class="programlisting"><span class="char">'x'</span> <span class="identifier">digit</span>
  61. </pre>
  62. <p>
  63. which means the character <code class="computeroutput"><span class="identifier">x</span></code>
  64. must be followed by a digit.
  65. </p>
  66. <div class="note"><table border="0" summary="Note">
  67. <tr>
  68. <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../images/note.png"></td>
  69. <th align="left">Note</th>
  70. </tr>
  71. <tr><td align="left" valign="top"><p>
  72. In <span class="emphasis"><em>Spirit.X3</em></span>, we use the <code class="computeroutput"><span class="special">&gt;&gt;</span></code>
  73. for sequences since C++ does not allow juxtaposition.
  74. </p></td></tr>
  75. </table></div>
  76. <h5>
  77. <a name="spirit_x3.abstracts.parsing_expression_grammar.h1"></a>
  78. <span class="phrase"><a name="spirit_x3.abstracts.parsing_expression_grammar.alternatives"></a></span><a class="link" href="parsing_expression_grammar.html#spirit_x3.abstracts.parsing_expression_grammar.alternatives">Alternatives</a>
  79. </h5>
  80. <p>
  81. Alternatives are represented in PEG using the slash:
  82. </p>
  83. <pre class="programlisting"><span class="identifier">a</span> <span class="special">/</span> <span class="identifier">b</span>
  84. </pre>
  85. <div class="note"><table border="0" summary="Note">
  86. <tr>
  87. <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../images/note.png"></td>
  88. <th align="left">Note</th>
  89. </tr>
  90. <tr><td align="left" valign="top"><p>
  91. In <span class="emphasis"><em>Spirit.X3</em></span>, we use the <code class="computeroutput"><span class="special">|</span></code>
  92. for alternatives just as in EBNF.
  93. </p></td></tr>
  94. </table></div>
  95. <p>
  96. Alternatives allow for choices. The expression above reads: try to match
  97. <code class="computeroutput"><span class="identifier">a</span></code>. If <code class="computeroutput"><span class="identifier">a</span></code>
  98. succeeds, success, if not try to match <code class="computeroutput"><span class="identifier">b</span></code>.
  99. This is a bit of a deviation from the usual EBNF interpretation where you
  100. simply match <code class="computeroutput"><span class="identifier">a</span></code> <span class="bold"><strong>or</strong></span> <code class="computeroutput"><span class="identifier">b</span></code>.
  101. Here's the syntax diagram:
  102. </p>
  103. <div class="blockquote"><blockquote class="blockquote"><p>
  104. <span class="inlinemediaobject"><img src="../.././images/alternative.png" alt="alternative"></span>
  105. </p></blockquote></div>
  106. <p>
  107. PEGs allow for ambiguity in the alternatives. In the expression above, both
  108. <code class="computeroutput"><span class="identifier">a</span></code> or <code class="computeroutput"><span class="identifier">b</span></code>
  109. can both match an input string. However, only the first matching alternative
  110. is valid. As noted, there can only be one valid parse tree.
  111. </p>
  112. <h5>
  113. <a name="spirit_x3.abstracts.parsing_expression_grammar.h2"></a>
  114. <span class="phrase"><a name="spirit_x3.abstracts.parsing_expression_grammar.loops"></a></span><a class="link" href="parsing_expression_grammar.html#spirit_x3.abstracts.parsing_expression_grammar.loops">Loops</a>
  115. </h5>
  116. <p>
  117. Again, like EBNF, PEG uses the regular-expression Kleene star and the plus
  118. loops:
  119. </p>
  120. <pre class="programlisting"><span class="identifier">a</span><span class="special">*</span>
  121. <span class="identifier">a</span><span class="special">+</span>
  122. </pre>
  123. <div class="note"><table border="0" summary="Note">
  124. <tr>
  125. <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../images/note.png"></td>
  126. <th align="left">Note</th>
  127. </tr>
  128. <tr><td align="left" valign="top"><p>
  129. <span class="emphasis"><em>Spirit.X3</em></span> uses the prefix star and plus since there
  130. is no postfix star or plus in C++.
  131. </p></td></tr>
  132. </table></div>
  133. <p>
  134. Here are the syntax diagrams:
  135. </p>
  136. <div class="blockquote"><blockquote class="blockquote"><p>
  137. <span class="inlinemediaobject"><img src="../.././images/kleene.png" alt="kleene"></span>
  138. </p></blockquote></div>
  139. <div class="blockquote"><blockquote class="blockquote"><p>
  140. <span class="inlinemediaobject"><img src="../.././images/plus.png" alt="plus"></span>
  141. </p></blockquote></div>
  142. <p>
  143. The first, called the Kleene star, matches zero or more of its subject <code class="computeroutput"><span class="identifier">a</span></code>. The second, plus, matches one ore more
  144. of its subject <code class="computeroutput"><span class="identifier">a</span></code>.
  145. </p>
  146. <p>
  147. Unlike EBNF, PEGs have greedy loops. It will match as much as it can until
  148. its subject fails to match without regard to what follows. The following
  149. is a classic example of a fairly common EBNF/regex expression failing to
  150. match in PEG:
  151. </p>
  152. <pre class="programlisting"><span class="identifier">alnum</span><span class="special">*</span> <span class="identifier">digit</span>
  153. </pre>
  154. <p>
  155. In PEG, alnum will eat as much alpha-numeric characters as it can leaving
  156. nothing more left behind. Thus, the trailing digit will get nothing. Loops
  157. are simply implemented in recursive descent code as for/while loops making
  158. them extremely efficient. That is a definite advantage. On the other hand,
  159. those who are familiar with EBNF and regex behavior might find the behavior
  160. a major gotcha. PEG provides a couple of other mechanisms to circumvent this.
  161. We will see more of these other mechanisms shortly.
  162. </p>
  163. <h5>
  164. <a name="spirit_x3.abstracts.parsing_expression_grammar.h3"></a>
  165. <span class="phrase"><a name="spirit_x3.abstracts.parsing_expression_grammar.difference"></a></span><a class="link" href="parsing_expression_grammar.html#spirit_x3.abstracts.parsing_expression_grammar.difference">Difference</a>
  166. </h5>
  167. <p>
  168. In some cases, you may want to restrict a certain expression. You can think
  169. of a PEG expression as a match for a potentially infinite set of strings.
  170. The difference operator allows you to restrict this set:
  171. </p>
  172. <pre class="programlisting"><span class="identifier">a</span> <span class="special">-</span> <span class="identifier">b</span>
  173. </pre>
  174. <p>
  175. The expression reads: match <code class="computeroutput"><span class="identifier">a</span></code>
  176. but not <code class="computeroutput"><span class="identifier">b</span></code>.
  177. </p>
  178. <div class="footnotes">
  179. <br><hr style="width:100; text-align:left;margin-left: 0">
  180. <div id="ftn.spirit_x3.abstracts.parsing_expression_grammar.f0" class="footnote"><p><a href="#spirit_x3.abstracts.parsing_expression_grammar.f0" class="para"><sup class="para">[5] </sup></a>
  181. Bryan Ford: Parsing Expression Grammars: A Recognition-Based Syntactic
  182. Foundation, <a href="http://pdos.csail.mit.edu/~baford/packrat/popl04/" target="_top">http://pdos.csail.mit.edu/~baford/packrat/popl04/</a>
  183. </p></div>
  184. <div id="ftn.spirit_x3.abstracts.parsing_expression_grammar.f1" class="footnote"><p><a href="#spirit_x3.abstracts.parsing_expression_grammar.f1" class="para"><sup class="para">[6] </sup></a>
  185. Richard E. Pattis: EBNF: A Notation to Describe Syntax, <a href="http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf" target="_top">http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf</a>
  186. </p></div>
  187. </div>
  188. </div>
  189. <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
  190. <td align="left"></td>
  191. <td align="right"><div class="copyright-footer">Copyright &#169; 2001-2018 Joel de Guzman,
  192. Hartmut Kaiser<p>
  193. Distributed under the Boost Software License, Version 1.0. (See accompanying
  194. file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
  195. </p>
  196. </div></td>
  197. </tr></table>
  198. <hr>
  199. <div class="spirit-nav">
  200. <a accesskey="p" href="syntax_diagram.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../abstracts.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="primitive_attributes.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
  201. </div>
  202. </body>
  203. </html>