123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117 |
- <html>
- <head>
- <title>The Lazy Parsers</title>
- <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
- <link rel="stylesheet" href="theme/style.css" type="text/css">
- </head>
- <body>
- <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
- <tr>
- <td width="10">
- </td>
- <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>The
- Lazy Parser</b></font></td>
- <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
- </tr>
- </table>
- <br>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
- <td width="30"><a href="dynamic_parsers.html"><img src="theme/l_arr.gif" border="0"></a></td>
- <td width="30"><a href="select_parser.html"><img src="theme/r_arr.gif" border="0"></a></td>
- </tr>
- </table>
- <p>Closures are cool. It allows us to inject stack based local variables anywhere
- in our parse descent hierarchy. Typically, we store temporary variables, generated
- by our semantic actions, in our closure variables, as a means to pass information
- up and down the recursive descent.</p>
- <p>Now imagine this... Having in mind that closure variables can be just about
- any type, we can store a parser, a rule, or a pointer to a parser or rule, in
- a closure variable. <em>Yeah, right, so what?...</em> Ok, hold on... What if
- we can use this closure variable to initiate a parse? Think about it for a second.
- Suddenly we'll have some powerful dynamic parsers! Suddenly we'll have a full
- round trip from to <a href="../phoenix/index.html">Phoenix</a> and Spirit and
- back! <a href="../phoenix/index.html">Phoenix</a> semantic actions choose the
- right Spirit parser and Spirit parsers choose the right <a href="../phoenix/index.html">Phoenix</a>
- semantic action. Oh MAN, what a honky cool idea, I might say!!</p>
- <h2>lazy_p</h2>
- <p>This is the idea behind the <tt>lazy_p</tt> parser. The <tt>lazy_p</tt> syntax
- is:</p>
- <pre> lazy_p<span class="special">(</span>actor<span class="special">)</span></pre>
- <p>where actor is a <a href="../phoenix/index.html">Phoenix</a> expression that
- returns a Spirit parser. This returned parser is used in the parsing process.
- </p>
- <p>Example: </p>
- <pre> lazy_p<span class="special">(</span>phoenix<span class="special">::</span>val<span class="special">(</span>int_p<span class="special">))[</span>assign_a<span class="special">(</span>result<span class="special">)]</span>
- </pre>
- <p>Semantic actions attached to the <tt>lazy_p</tt> parser expects the same signature
- as that of the returned parser (<tt>int_p</tt>, in our example above).</p>
- <h2>lazy_p example</h2>
- <p>To give you a better glimpse (see the <tt><a href="../example/intermediate/lazy_parser.cpp">lazy_parser.cpp</a></tt>),
- say you want to parse inputs such as:</p>
- <pre> <span class=identifier>dec
- </span><span class="special">{</span><span class=identifier><br> 1 2 3<br> bin
- </span><span class="special">{</span><span class=identifier><br> 1 10 11<br> </span><span class="special">}</span><span class=identifier><br> 4 5 6<br> </span><span class="special">}</span></pre>
- <p>where <tt>bin {...}</tt> and <tt>dec {...}</tt> specifies the numeric format
- (binary or decimal) that we are expecting to read. If we analyze the input,
- we want a grammar like:</p>
- <pre><code><font color="#000000"><span class=special> </span><span class=identifier>base </span><span class="special">=</span><span class=identifier> </span><span class="string">"bin"</span><span class=identifier> </span><span class="special">|</span><span class=identifier> </span><span class="string">"dec"</span><span class="special">;</span><span class=identifier>
- block </span><span class=special>= </span><span class="identifier">base</span><span class=special> >> </span><span class="literal">'{'</span><span class=special> >> *</span><span class="identifier">block_line</span><span class=special> >> </span><span class="literal">'}'</span><span class=special>;
- </span>block_line <span class=special>= </span><span class="identifier">number</span><span class=special> | </span><span class=identifier>block</span><span class=special>;</span></font></code></pre>
- <p>We intentionally left out the <code><font color="#000000"><span class="identifier"><tt>number</tt></span></font></code>
- rule. The tricky part is that the way <tt>number</tt> rule behaves depends on
- the result of the <tt>base</tt> rule. If <tt>base</tt> got a <em>"bin"</em>,
- then number should parse binary numbers. If <tt>base</tt> got a <em>"dec"</em>,
- then number should parse decimal numbers. Typically we'll have to rewrite our
- grammar to accomodate the different parsing behavior:</p>
- <pre><code><font color="#000000"><span class=identifier> block </span><span class=special>=
- </span><span class=identifier>"bin"</span> <span class=special>>> </span><span class="literal">'{'</span><span class=special> >> *</span>bin_line<span class=special> >> </span><span class="literal">'}'</span><span class=special>
- | </span><span class=identifier>"dec"</span> <span class=special>>> </span><span class="literal">'{'</span><span class=special> >> *</span>dec_line<span class=special> >> </span><span class="literal">'}'</span><span class=special>
- ;
- </span>bin_line <span class=special>= </span><span class="identifier">bin_p</span><span class=special> | </span><span class=identifier>block</span><span class=special>;
- </span>dec_line <span class=special>= </span><span class="identifier">int_p</span><span class=special> | </span><span class=identifier>block</span><span class=special>;</span></font></code></pre>
- <p>while this is fine, the redundancy makes us want to find a better solution;
- after all, we'd want to make full use of Spirit's dynamic parsing capabilities.
- Apart from that, there will be cases where the set of parsing behaviors for
- our <tt>number</tt> rule is not known when the grammar is written. We'll only
- be given a map of string descriptors and corresponding rules [e.g. (("dec",
- int_p), ("bin", bin_p) ... etc...)].</p>
- <p>The basic idea is to have a rule for binary and decimal numbers. That's easy
- enough to do (see <a href="numerics.html">numerics</a>). When <tt>base</tt>
- is being parsed, in your semantic action, store a pointer to the selected base
- in a closure variable (e.g. <tt>block.int_rule</tt>). Here's an example:</p>
- <pre><code><font color="#000000"><span class=special> </span><span class=identifier>base
- </span><span class="special">=</span><span class=identifier> str_p</span><span class="special">(</span><span class="string">"bin"</span><span class="special">)[</span><span class=identifier>block.int_rule</span> = <span class="special">&</span>var<span class="special">(</span><span class="identifier">bin_rule</span><span class="special">)]
- | </span><span class=identifier>str_p</span><span class="special">(</span><span class="string">"dec"</span><span class="special">)[</span><span class=identifier>block.int_rule</span> = <span class="special">&</span>var<span class="special">(</span><span class="identifier">dec_rule</span><span class="special">)]
- ;</span></font></code></pre>
- <p>With this setup, your number rule will now look something like:</p>
- <pre><code><font color="#000000"><span class=special> </span><span class=identifier>number </span><span class="special">=</span><span class=identifier> lazy_p</span><span class="special">(*</span><span class=identifier>block.int_rule</span><span class="special">);</span></font></code></pre>
- <p>The <tt><a href="../example/intermediate/lazy_parser.cpp">lazy_parser.cpp</a></tt>
- does it a bit differently, ingeniously using the <a href="symbols.html">symbol
- table</a> to dispatch the correct rule, but in essence, both strategies are
- similar. This technique, using the symbol table, is detailed in the Techiques section: <a href="techniques.html#nabialek_trick">nabialek_trick</a>. Admitedly, when you add up all the rules, the resulting grammar is
- more complex than the hard-coded grammar above. Yet, for more complex grammar
- patterns with a lot more rules to choose from, the additional setup is well
- worth it.</p>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
- <td width="30"><a href="dynamic_parsers.html"><img src="theme/l_arr.gif" border="0"></a></td>
- <td width="30"><a href="select_parser.html"><img src="theme/r_arr.gif" border="0"></a></td>
- </tr>
- </table>
- <br>
- <hr size="1">
- <p class="copyright">Copyright © 2003 Joel de Guzman<br>
- Copyright © 2003 Vaclav Vesely<br>
- <br>
- <font size="2">Use, modification and distribution is subject to the Boost Software
- License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt)</font></p>
- <p class="copyright"> </p>
- </body>
- </html>
|