123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162 |
- <html>
- <head>
- <title>Rationale</title>
- <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
- <link rel="stylesheet" href="theme/style.css" type="text/css">
- </head>
- <body>
- <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
- <tr>
- <td width="10">
- </td>
- <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Rationale</b></font></td>
- <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
- </tr>
- </table>
- <br>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
- <td width="30"><a href="faq.html"><img src="theme/l_arr.gif" border="0"></a></td>
- <td width="30"><a href="acknowledgments.html"><img src="theme/r_arr.gif" border="0"></a></td>
- </tr>
- </table>
- <p><img src="theme/lens.gif" width="15" height="16"> <strong>Virtual functions:
- From static to dynamic C++</strong></p>
- <p>Rules straddle the border between static and dynamic C++. In effect, a rule
- transforms compile-time polymorphism (using templates) into run-time polymorphism
- (using virtual functions). This is necessary due to C++'s inability to automatically
- declare a variable of a type deduced from an arbitrarily complex expression
- in the right-hand side (rhs) of an assignment. Basically, we want to do something
- like:</p>
- <pre><code><font color="#000000"> <span class=identifier>T </span><span class=identifier>rule </span><span class=special>= </span><span class=identifier>an_arbitrarily_complex_expression</span><span class=special>;</span></font></code></pre>
- <p>without having to know or care about the resulting type of the right-hand side
- (rhs) of the assignment expression. Apart from this, we also need a facility
- to forward declare an unknown type:</p>
- <pre><code><font color="#000000"><span class=special> </span><span class=identifier>T </span><span class=identifier>rule</span><span class=special>;
- </span><span class=special>...
- </span><span class=identifier>rule </span><span class=special>= </span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>;</span></font></code></pre>
- <p>These limitations lead us to this implementation of rules. This comes at the
- expense of the overhead of a virtual function call, once through each invocation
- of a rule.</p>
- <p><img src="theme/lens.gif" width="15" height="16"> <strong>Multiple declaration
- </strong> </p>
- <p>Some BNF variants allow multiple declarations of a <tt>rule</tt>. The declarations
- are taken as alternatives. Example:</p>
- <pre>
- <span class=identifier><code>r </code></span><code><span class=special>= </span><span class=identifier>a</span><span class=special>; </span><span class=identifier>
- r </span><span class=special>= </span><span class=identifier>b</span><span class=special>;</span></code></pre>
- <p> is equivalent to: </p>
- <pre>
- <span class=identifier><code>r </code></span><code><span class=special>= </span><span class=identifier>a </span><span class=special>| </span><span class=identifier>b</span><span class=special>;</span></code></pre>
- <p>Spirit v1.3 allowed this behavior. However, the current version of Spirit <b>no
- longer</b> allows this because experience shows that this behavior leads to
- unwanted gotchas (for instance, it does not allow rules to be held in containers).
- In the current release of Spirit, a second assignment to a rule will simply
- redefine it. The old definition is destructed. This follows more closely C++
- semantics and is more in line with what the user expects the rule to behave.</p>
- <p><img src="theme/lens.gif" width="15" height="16"> <b>Sequencing Syntax</b>
- <br>
- <br>
- The comma operator as in a, b seems to be a better candidate, syntax-wise. But
- then the problem is with its precedence. It has the lowest precedence in C/C++,
- which makes it virtually useless. <br>
- <br>
- Bjarne Stroustrup, in his article <a href="references.html#generalized_overloading">"Generalizing
- Overloading for C++2000"</a> talks about overloading whitespace. Such a
- feature would allow juxtapositioning of parser objects exactly as we do in (E)BNF
- (e.g. a b | c instead of a >> b | c). Unfortunately, the article was dated
- April 1, 1998. Oh well.</p>
- <p><img src="theme/lens.gif" width="15" height="16"> <b>Forward iterators</b><br>
- <br>
- In general, the scanner expects at least a standard conforming forward iterator.
- Forward iterators are needed for backtracking where the iterator needs to be
- saved and restored later. Generally speaking, Spirit is a backtracking parser.
- The implication of this is that at some point, the iterator position will have
- to be saved to allow the parser to backtrack to a previous point. Thus, for
- backtracking to work, the framework requires at least a forward iterator.<br>
- <br>
- Some parsers might require more specialized iterators (bi-directional or even
- random access). Perhaps in the future, deterministic parsers when added to the
- framework, will perform no backtracking and will need just a single token lookahead,
- hence will require input iterators only.</p>
- <p><img src="theme/lens.gif" width="15" height="16"><b> Why are subrules important?</b><br>
- <br>
- Subrules open up the oportunity to do aggressive meta programming as well because
- they do not rely on virtual functions. The virtual function is the meta-programmer's
- hell. Not only does it slow down the program due to the virtual function indirect
- call, it is also an opaque wall where no metaprogram can get past. It kills
- all meta-information beyond the virtual function call. Worse, the virtual function
- cannot be templated. Which means that its arguments have to be tied to a actual
- types. Many problems stem from this limitation. <br>
- <br>
- While Spirit is a currently classified as a non-deterministic recursive-descent
- parser, Doug Gregor first noted that other parsing techniques apart from top-down
- recursive descent may be applied. For instance, apart from non-deterministic
- recursive descent, deterministic LL(1) and LR(1) can theoretically be implemented
- using the same expression template front end. Spirit rules use virtual functions
- to encode the RHS parser expression in an opaque abstract parser type. While
- it serves its purpose well, the rule's virtual functions are the stumbling blocks
- to more advanced metaprogramming. Subrules are free from virtual functions.</p>
- <p><img src="theme/lens.gif" width="15" height="16"><b> <a name="exhaustive_rd"></a>Exhaustive
- backtracking and greedy RD</b></p>
- <p>Spirit doesn't do exhaustive backtracking like regular expressions are expected
- to. For example:</p>
- <pre> <span class="special">*</span>chlit_p<span class="special">(</span><span class="quotes">'a'</span><span class="special">) >></span> chlit_p<span class="special">(</span><span class="quotes">'a'</span><span class="special">);</span></pre>
- <p>will always fail to match because Spirit's Kleene star does not back off when
- the rest of the rule fails to match. </p>
- <p>Actually, there's a solution to this greedy RD problem. Such a scheme is discussed
- in section 6.6.2 of <a
- href="http://www.cs.vu.nl/%7Edick/PTAPG.html">Parsing Techniques: A Practical
- Guide</a>. The trick involves passing a <em>tail</em> parser (in addition to
- the scanner) to each parser. The start parser will then simply be: <tt>start
- >> end_p;</tt> (end_p is the start's tail). </p>
- <p>Spirit is greedy --using straight forward, naive RD. It is certainly possible
- to implement the fully backtracking scheme presented above, but there will be
- also certainly be a performance hit. The scheme will always try to match all
- possible parser paths (full parser hierarchy traversal) until it reaches a point
- of certainty, that the whole thing matches or fails to match. </p>
- <table border="0" width="80%" align="center">
- <tr>
- <td class="note_box"><p><img src="theme/note.gif" width="16" height="16"><b>Backtracking
- and Greedy RD </b><br>
- <br>
- Spirit is quite consistent and intuitive about when it backtracks and
- to where, although it may not be obvious to those coming from different
- backgrounds. In general, any (sub)parser will, given the same input, always
- match the same portion of the input (or fail to match the input at all).
- This means that Spirit is inherently greedy. Spirit will only backtrack
- when a (sub)parser fails to match the input, and it will always backtrack
- to the next choice point upward (not backward) in the parser structure.
- In other words abb|ab will match "ab", as will a(bb|b), but
- (ab|a)b won't because the (ab|a) subparser will always match the 'b' after
- the 'a' if it is available.</p>
- <p>--Rainer Deyke</p>
- </td>
- </tr>
- </table>
- <p>There's a strong preference on "simplicity with all the knobs when you
- need them" approach, right now. On the other hand, the flexibility of Spirit
- makes it possible to have different optional schemes available. It might be
- possible to implement an exhaustive backtracking RD scheme as an optional feature
- in the future. </p>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
- <td width="30"><a href="faq.html"><img src="theme/l_arr.gif" border="0"></a></td>
- <td width="30"><a href="acknowledgments.html"><img src="theme/r_arr.gif" border="0"></a></td>
- </tr>
- </table>
- <br>
- <hr size="1">
- <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br>
- <br>
- <font size="2">Use, modification and distribution is subject to the Boost Software
- License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt)</font></p>
- <p class="copyright"> </p>
- </body>
- </html>
|