123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373 |
- <html>
- <head>
- <title>Techniques</title>
- <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
- <link rel="stylesheet" href="theme/style.css" type="text/css">
- </head>
- <body>
- <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
- <tr>
- <td width="10">
- </td>
- <td width="85%"> <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Techniques</b></font></td>
- <td width="112"><a href="http://spirit.sf.net"><img src="theme/spirit.gif" width="112" height="48" align="right" border="0"></a></td>
- </tr>
- </table>
- <br>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
- <td width="30"><a href="style_guide.html"><img src="theme/l_arr.gif" border="0"></a></td>
- <td width="30"><a href="faq.html"><img src="theme/r_arr.gif" border="0"></a></td>
- </tr>
- </table>
- <ul>
- <li><a href="#templatized_functors">Templatized Functors</a></li>
- <li><a href="#multiple_scanner_support">Rule With Multiple Scanners</a></li>
- <li><a href="#no_rules">Look Ma' No Rules!</a></li>
- <li><a href="#typeof">typeof</a></li>
- <li><a href="#nabialek_trick">Nabialek trick</a></li>
- </ul>
- <h3><a name="templatized_functors"></a> Templatized Functors</h3>
- <p>For the sake of genericity, it is often better to make the functor's member
- <tt>operator()</tt> a template. That way, we do not have to concern ourselves
- with the type of the argument to expect as long as the behavior is appropriate.
- For instance, rather than hard-coding <tt>char const*</tt> as the argument of
- a generic semantic action, it is better to make it a template member function.
- That way, it can accept any type of iterator:</p>
- <pre><code><font color="#000000"><span class=special> </span><span class=keyword>struct </span><span class=identifier>my_functor
- </span><span class=special>{
- </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>IteratorT</span><span class=special>>
- </span><span class=keyword>void </span><span class=keyword>operator</span><span class=special>()(</span><span class=identifier>IteratorT </span><span class=identifier>first</span><span class=special>, </span><span class=identifier>IteratorT </span><span class=identifier>last</span><span class=special>) </span><span class=keyword>const</span><span class=special>;
- </span><span class=special>};</span></font></code></pre>
- <p>Take note that this is only possible with functors. It is not possible to pass
- in template functions as semantic actions unless you cast it to the correct
- function signature; in which case, you <em>monomorphize</em> the function. This
- clearly shows that functors are superior to plain functions.</p>
- <h3><b><a name="multiple_scanner_support" id="multiple_scanner_support"></a> Rule
- With Multiple Scanners</b></h3>
- <p>As of v1.8.0, rules can use one or more scanner types. There are cases, for
- instance, where we need a rule that can work on the phrase and character levels.
- Rule/scanner mismatch has been a source of confusion and is the no. 1 <a href="faq.html#scanner_business">FAQ</a>.
- To address this issue, we now have <a href="rule.html#multiple_scanner_support">multiple
- scanner support</a>. </p>
- <p>Here is an example of a grammar with a rule <tt>r</tt> that can be called with
- 3 types of scanners (phrase-level, lexeme, and lower-case). See the <a href="rule.html">rule</a>,
- <a href="grammar.html">grammar</a>, <a href="scanner.html#lexeme_scanner">lexeme_scanner</a>
- and <a href="scanner.html#as_lower_scanner">as_lower_scanner </a>for more information.
- </p>
- <p>Here's the grammar (see <a href="../example/techniques/multiple_scanners.cpp">multiple_scanners.cpp</a>):
- </p>
- <pre><span class=special> </span><span class=keyword>struct </span><span class=identifier>my_grammar </span><span class=special>: </span><span class=identifier>grammar</span><span class=special><</span><span class=identifier>my_grammar</span><span class=special>>
- </span><span class=special>{
- </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>>
- </span><span class=keyword>struct </span><span class=identifier>definition
- </span><span class=special>{
- </span><span class=identifier>definition</span><span class=special>(</span><span class=identifier>my_grammar </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>self</span><span class=special>)
- </span><span class=special>{
- </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>lower_p</span><span class=special>;
- </span><span class=identifier>rr </span><span class=special>= </span><span class=special>+(</span><span class=identifier>lexeme_d</span><span class=special>[</span><span class=identifier>r</span><span class=special>] </span><span class=special>>> </span><span class=identifier>as_lower_d</span><span class=special>[</span><span class=identifier>r</span><span class=special>] </span><span class=special>>> </span><span class=identifier>r</span><span class=special>);
- </span><span class=special>}
- </span><span class=keyword>typedef </span><span class=identifier>scanner_list</span><span class=special><
- </span><span class=identifier>ScannerT
- </span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>lexeme_scanner</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type
- </span><span class=special>, </span><span class=keyword>typename </span><span class=identifier>as_lower_scanner</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type
- </span><span class=special>> </span><span class=identifier>scanners</span><span class=special>;
- </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>scanners</span><span class=special>> </span><span class=identifier>r</span><span class=special>;
- </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=identifier>rr</span><span class=special>;
- </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>start</span><span class=special>() </span><span class=keyword>const </span><span class=special>{ </span><span class=keyword>return </span><span class=identifier>rr</span><span class=special>; </span><span class=special>}
- </span><span class=special>};
- </span><span class=special>};</span></pre>
- <p>By default support for multiple scanners is disabled. The macro
- <tt>BOOST_SPIRIT_RULE_SCANNERTYPE_LIMIT</tt> must be defined to the
- maximum number of scanners allowed in a scanner_list. The value must
- be greater than 1 to enable multiple scanners. Given the
- example above, to define a limit of three scanners for the list, the
- following line must be inserted into the source file before the
- inclusion of Spirit headers:
- </p>
- <pre><span class=special> </span><span class=preprocessor>#define </span><span class=identifier>BOOST_SPIRIT_RULE_SCANNERTYPE_LIMIT</span> <span class=literal>3</span></pre>
- <h3><span class=special></span><b> <a name="no_rules" id="no_rules"></a> Look
- Ma' No Rules</b></h3>
- <p>You use grammars and you use lots of 'em? Want a fly-weight, no-cholesterol,
- super-optimized grammar? Read on...</p>
- <p>I have a love-hate relationship with rules. I guess you know the reasons why.
- A lot of problems stem from the limitation of rules. Dynamic polymorphism and
- static polymorphism in C++ do not mix well. There is no notion of virtual template
- functions in C++; at least not just yet. Thus, the <strong>rule is tied to a
- specific scanner type</strong>. This results in problems such as the <a href="faq.html#scanner_business">scanner
- business</a>, our no. 1 FAQ. Apart from that, the virtual functions in rules
- slow down parsing, kill all meta-information, and kills inlining, hence bloating
- the generated code, especially for very tiny rules such as:</p>
- <pre> r <span class="special">=</span> ch_p<span class="special">(</span><span class="quotes">'x'</span><span class="special">) >></span> uint_p<span class="special">;</span></pre>
- <p> The rule's limitation is the main reason why the grammar is designed the way
- it is now, with a nested template definition class. The rule's limitation is
- also the reason why subrules exists. But do we really need rules? Of course!
- Before C++ adopts some sort of auto-type deduction, such as that proposed by
- David Abrahams in clc++m:</p>
- <pre>
- <code><span class=keyword>auto </span><span class=identifier>r </span><span class=special>= ...</span><span class=identifier>definition </span><span class=special>...</span></code></pre>
- <p> we are tied to the rule as RHS placeholders. However.... in some occasions
- we can get by without rules! For instance, rather than writing:</p>
- <pre>
- <code><span class=identifier>rule</span><span class=special><> </span><span class=identifier>x </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'x'</span><span class=special>);</span></code></pre>
- <p> It's better to write:</p>
- <pre>
- <code><span class=identifier>chlit</span><span class=special><> </span><span class=identifier>x </span><span class=special>= </span><span class=identifier>ch_p</span><span class=special>(</span><span class=literal>'x'</span><span class=special>);</span></code></pre>
- <p> That's trivial. But what if the rule is rather complicated? Ok, let's proceed
- stepwise... I'll investigate a simple skip_parser based on the C grammar from
- Hartmut Kaiser. Basically, the grammar is written as (see <a href="../example/techniques/no_rules/no_rule1.cpp">no_rule1.cpp</a>):</p>
- <pre><code> <span class=keyword>struct </span><span class=identifier>skip_grammar </span><span class=special>: </span><span class=identifier>grammar</span><span class=special><</span><span class=identifier>skip_grammar</span><span class=special>>
- {
- </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>>
- </span><span class=keyword>struct </span><span class=identifier>definition
- </span><span class=special>{
- </span><span class=identifier>definition</span><span class=special>(</span><span class=identifier>skip_grammar </span><span class=keyword>const</span><span class=special>& /*</span><span class=identifier>self</span><span class=special>*/)
- {
- </span><span class=identifier>skip
- </span><span class=special>= </span><span class=identifier>space_p
- </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n'
- </span><span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/"
- </span><span class=special>;
- }
- </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=identifier>skip</span><span class=special>;
- </span><span class=identifier>rule</span><span class=special><</span><span class=identifier>ScannerT</span><span class=special>> </span><span class=keyword>const</span><span class=special>&
- </span><span class=identifier>start</span><span class=special>() </span><span class=keyword>const </span><span class=special>{ </span><span class=keyword>return </span><span class=identifier>skip</span><span class=special>; }
- };
- };</span></code></pre>
- <p> Ok, so far so good. Can we do better? Well... since there are no recursive
- rules there (in fact there's only one rule), you can expand the type of rule's
- RHS as the rule type (see <a href="../example/techniques/no_rules/no_rule2.cpp">no_rule2.cpp</a>):</p>
- <pre><code><span class=special> </span><span class=keyword>struct </span><span class=identifier>skip_grammar </span><span class=special>: </span><span class=identifier>grammar</span><span class=special><</span><span class=identifier>skip_grammar</span><span class=special>>
- {
- </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>>
- </span><span class=keyword>struct </span><span class=identifier>definition
- </span><span class=special>{
- </span> <span class=identifier>definition</span><span class=special>(</span><span class=identifier>skip_grammar </span><span class=keyword>const</span><span class=special>& /*</span><span class=identifier>self</span><span class=special>*/)
- : </span><span class=identifier>skip</span><span class=special>
- ( </span><span class=identifier>space_p
- </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n'
- </span><span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/"
- </span><span class=special>)
- {
- }
- </span><span class=keyword>typedef
- </span><span class=identifier>alternative</span><span class=special><</span><span class=identifier>alternative</span><span class=special><</span><span class=identifier>space_parser</span><span class=special>, </span><span class=identifier>sequence</span><span class=special><</span><span class=identifier>sequence</span><span class=special><
- </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*>, </span><span class=identifier>kleene_star</span><span class=special><</span><span class=identifier>difference</span><span class=special><</span><span class=identifier>anychar_parser</span><span class=special>,
- </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> > > >, </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> > >, </span><span class=identifier>sequence</span><span class=special><</span><span class=identifier>sequence</span><span class=special><
- </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*>, </span><span class=identifier>kleene_star</span><span class=special><</span><span class=identifier>difference</span><span class=special><</span><span class=identifier>anychar_parser</span><span class=special>,
- </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*> > > >, </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*> > >
- </span><span class=identifier>skip_t</span><span class=special>;
- </span><span class=special> </span><span class=identifier>skip_t </span><span class=identifier>skip</span><span class=special>;
- </span><span class=identifier>skip_t </span><span class=keyword>const</span><span class=special>&
- </span><span class=identifier>start</span><span class=special>() </span><span class=keyword>const </span><span class=special>{ </span><span class=keyword>return </span><span class=identifier>skip</span><span class=special>; }
- };
- };</span></code></pre>
- <p> Ughhh! How did I do that? How was I able to get at the complex typedef? Am
- I insane? Well, not really... there's a trick! What you do is define the typedef
- <tt>skip_t</tt> first as int:</p>
- <pre>
- <code><span class=keyword>typedef </span><span class=keyword>int </span><span class=identifier>skip_t</span><span class=special>;</span></code></pre>
- <p> Try to compile. Then, the compiler will generate an obnoxious error message
- such as:</p>
- <pre>
- <code><span class=string>"cannot convert boost::spirit::alternative<... blah blah...to int"</span><span class=special>.</span></code></pre>
- <p> <strong>THERE YOU GO!</strong> You got it's type! I just copy and paste the
- correct type (removing explicit qualifications, if preferred).</p>
- <p> Can we still go further? Yes. Remember that the grammar was designed for rules.
- The nested template definition class is needed to get around the rule's limitations.
- Without rules, I propose a new class called <tt>sub_grammar</tt>, the grammar's
- low-fat counterpart:</p>
- <pre><code><span class=special> </span><span class=keyword>namespace </span><span class=identifier>boost </span><span class=special>{ </span><span class=keyword>namespace </span><span class=identifier>spirit
- </span><span class=special>{
- </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>DerivedT</span><span class=special>>
- </span><span class=keyword>struct </span><span class=identifier>sub_grammar </span><span class=special>: </span><span class=identifier>parser</span><span class=special><</span><span class=identifier>DerivedT</span><span class=special>>
- {
- </span><span class=keyword>typedef </span><span class=identifier>sub_grammar </span><span class=identifier>self_t</span><span class=special>;
- </span><span class=keyword>typedef </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>embed_t</span><span class=special>;
- </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>>
- </span><span class=keyword>struct </span><span class=identifier>result
- </span><span class=special>{
- </span><span class=keyword>typedef </span><span class=keyword>typename </span><span class=identifier>parser_result</span><span class=special><
- </span><span class=keyword>typename </span><span class=identifier>DerivedT</span><span class=special>::</span><span class=identifier>start_t</span><span class=special>, </span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type
- </span><span class=identifier>type</span><span class=special>;
- };
- </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>derived</span><span class=special>() </span><span class=keyword>const
- </span><span class=special>{ </span><span class=keyword>return </span><span class=special>*</span><span class=keyword>static_cast</span><span class=special><</span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>*>(</span><span class=keyword>this</span><span class=special>); }
- </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>ScannerT</span><span class=special>>
- </span><span class=keyword>typename </span><span class=identifier>parser_result</span><span class=special><</span><span class=identifier>self_t</span><span class=special>, </span><span class=identifier>ScannerT</span><span class=special>>::</span><span class=identifier>type
- </span><span class=identifier>parse</span><span class=special>(</span><span class=identifier>ScannerT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>scan</span><span class=special>) </span><span class=keyword>const
- </span><span class=special>{
- </span><span class=keyword>return </span><span class=identifier>derived</span><span class=special>().</span><span class=identifier>start</span><span class=special>.</span><span class=identifier>parse</span><span class=special>(</span><span class=identifier>scan</span><span class=special>);
- }
- };
- }}</span></code></pre>
- <p>With the <tt>sub_grammar</tt> class, we can define our skipper grammar this
- way (see <a href="../example/techniques/no_rules/no_rule3.cpp">no_rule3.cpp</a>):</p>
- <pre><code><span class=special> </span><span class=keyword>struct </span><span class=identifier>skip_grammar </span><span class=special>: </span><span class=identifier>sub_grammar</span><span class=special><</span><span class=identifier>skip_grammar</span><span class=special>>
- {
- </span><span class=keyword>typedef
- </span><span class=identifier>alternative</span><span class=special><</span><span class=identifier>alternative</span><span class=special><</span><span class=identifier>space_parser</span><span class=special>, </span><span class=identifier>sequence</span><span class=special><</span><span class=identifier>sequence</span><span class=special><
- </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*>, </span><span class=identifier>kleene_star</span><span class=special><</span><span class=identifier>difference</span><span class=special><</span><span class=identifier>anychar_parser</span><span class=special>,
- </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> > > >, </span><span class=identifier>chlit</span><span class=special><</span><span class=keyword>char</span><span class=special>> > >, </span><span class=identifier>sequence</span><span class=special><</span><span class=identifier>sequence</span><span class=special><
- </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*>, </span><span class=identifier>kleene_star</span><span class=special><</span><span class=identifier>difference</span><span class=special><</span><span class=identifier>anychar_parser</span><span class=special>,
- </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*> > > >, </span><span class=identifier>strlit</span><span class=special><</span><span class=keyword>const </span><span class=keyword>char</span><span class=special>*> > >
- </span><span class=identifier>start_t</span><span class=special>;
- </span><span class=identifier>skip_grammar</span><span class=special>()
- : </span><span class=identifier>start
- </span><span class=special>(
- </span><span class=identifier>space_p
- </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n'
- </span><span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/"
- </span><span class=special>)
- {}
- </span><span class=identifier>start_t </span><span class=identifier>start</span><span class=special>;
- };</span></code></pre>
- <p>But what for, you ask? You can simply use the <tt>start_t</tt> type above as-is.
- It's already a parser! We can just type:</p>
- <pre>
- <code><span class=identifier>skipper_t </span><span class=identifier>skipper </span><span class=special>=
- </span><span class=identifier>space_p
- </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n' </span><br> <span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/"</span>
- <span class=special> ;</span></code></pre>
- <p> and use <tt>skipper</tt> just as we would any parser? Well, a subtle difference
- is that <tt>skipper</tt>, used this way will be embedded <strong>by value </strong>when<strong>
- </strong>you compose more complex parsers using it. That is, if we use <tt>skipper</tt>
- inside another production, the whole thing will be stored in the composite.
- Heavy!</p>
- <p> The proposed <tt>sub_grammar</tt> OTOH will be held by reference. Note:</p>
- <pre><code> <span class=keyword>typedef </span><span class=identifier>DerivedT </span><span class=keyword>const</span><span class=special>& </span><span class=identifier>embed_t</span><span class=special>;</span></code></pre>
- <p>The proposed <tt>sub_grammar</tt> does not have the inherent limitations of
- rules, is very lighweight, and should be blazingly fast (can be fully inlined
- and does not use virtual functions). Perhaps this class will be part of a future
- spirit release. </p>
- <table width="80%" border="0" align="center">
- <tr>
- <td class="note_box"><img src="theme/note.gif" width="16" height="16"> <strong>The
- no-rules result</strong><br> <br>
- So, how much did we save? On MSVCV7.1, the original code: <a href="../example/techniques/no_rules/no_rule1.cpp">no_rule1.cpp</a>
- compiles to <strong>28k</strong>. Eliding rules, <a href="../example/techniques/no_rules/no_rule2.cpp">no_rule2.cpp</a>,
- we got <strong>24k</strong>. Not bad, we shaved off 4k amounting to a 14%
- reduction. But you'll be in for a surprise. The last version, using the
- sub-grammar: <a href="../example/techniques/no_rules/no_rule3.cpp">no_rule3.cpp</a>,
- compiles to <strong>5.5k</strong>! That's a whopping 80% reduction.<br>
- <br>
- <table width="100%" border="1">
- <tr>
- <td><a href="../example/techniques/no_rules/no_rule1.cpp">no_rule1.cpp</a></td>
- <td><strong>28k</strong></td>
- <td>standard rule and grammar</td>
- </tr>
- <tr>
- <td><a href="../example/techniques/no_rules/no_rule2.cpp">no_rule2.cpp</a></td>
- <td><strong>24k</strong></td>
- <td>standard grammar, no rule</td>
- </tr>
- <tr>
- <td><a href="../example/techniques/no_rules/no_rule3.cpp">no_rule3.cpp</a></td>
- <td><strong>5.5k</strong></td>
- <td>sub_grammar, no rule, no grammar</td>
- </tr>
- </table> </td>
- </tr>
- </table>
- <h3><b> <a name="typeof" id="typeof"></a> typeof</b></h3>
- <p>Some compilers already support the <tt>typeof</tt> keyword. Examples are g++
- and Metrowerks CodeWarrior. Someday, <tt>typeof</tt> will become commonplace.
- It is worth noting that we can use <tt>typeof</tt> to define non-recursive rules
- without using the rule class. To give an example, we'll use the skipper example
- above; this time using <tt>typeof</tt>. First, to avoid redundancy, we'll introduce
- a macro <tt>RULE</tt>: </p>
- <pre><code> <span class=preprocessor>#define </span><span class=identifier>RULE</span><span class=special>(</span><span class=identifier>name</span><span class=special>, </span><span class=identifier>definition</span><span class=special>) </span><span class="keyword">typeof</span><span class=special>(</span><span class=identifier>definition</span><span class=special>) </span><span class=identifier>name </span><span class=special>= </span><span class=identifier>definition</span></code></pre>
- <p>Then, simply:</p>
- <pre><code><span class=identifier> </span><span class=identifier>RULE</span><span class=special>(
- </span><span class=identifier>skipper</span><span class=special>,
- ( </span><span class=identifier>space_p
- </span><span class=special>| </span><span class=string>"//" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=literal>'\n'</span><span class=special>) >> </span><span class=literal>'\n'
- </span><span class=special>| </span><span class=string>"/*" </span><span class=special>>> *(</span><span class=identifier>anychar_p </span><span class=special>- </span><span class=string>"*/"</span><span class=special>) >> </span><span class=string>"*/"
- </span><span class=special>)
- );</span></code></pre>
- <p>(see <a href="../example/techniques/typeof.cpp">typeof.cpp</a>)</p>
- <p>That's it! Now you can use skipper just as you would any parser. Be reminded,
- however, that <tt>skipper</tt> above will be embedded by value when<strong>
- </strong>you compose more complex parsers using it (see <tt>sub_grammar</tt> rationale above). You can use the <tt>sub_grammar</tt> class to avoid this problem.</p>
- <h3><a name="nabialek_trick"></a> Nabialek trick</h3>
- <p>This technique, I'll call the <strong><em>"Nabialek trick" </em></strong>(from the name of its inventor, Sam Nabialek), can improve the rule dispatch from linear non-deterministic to deterministic. The trick applies to grammars where a keyword (operator, etc), precedes a production. There are lots of grammars similar to this:</p>
- <pre> <span class=identifier>r </span><span class=special>=
- </span><span class=identifier>keyword1 </span><span class=special>>> </span><span class=identifier>production1
- </span><span class=special>| </span><span class=identifier>keyword2 </span><span class=special>>> </span><span class=identifier>production2
- </span><span class=special>| </span><span class=identifier>keyword3 </span><span class=special>>> </span><span class=identifier>production3
- </span><span class=special>| </span><span class=identifier>keyword4 </span><span class=special>>> </span><span class=identifier>production4
- </span><span class=special>| </span><span class=identifier>keyword5 </span><span class=special>>> </span><span class=identifier>production5
- </span><span class=comment>/*** etc ***/
- </span><span class=special>;</span></pre>
- <p>The cascaded alternatives are tried one at a time through trial and error until something matches. The Nabialek trick takes advantage of the <a href="symbols.html">symbol table</a>'s search properties to optimize the dispatching of the alternatives. For an example, see <a href="../example/techniques/nabialek.cpp">nabialek.cpp</a>. The grammar works as follows. There are two rules (<tt>one</tt> and <tt>two</tt>). When "one" is recognized, rule <tt>one</tt> is invoked. When "two" is recognized, rule <tt>two</tt> is invoked. Here's the grammar:</p>
- <pre><span class=special> </span><span class=identifier>one </span><span class=special>= </span><span class=identifier>name</span><span class=special>;
- </span><span class=identifier>two </span><span class=special>= </span><span class=identifier>name </span><span class=special>>> </span><span class=literal>',' </span><span class=special>>> </span><span class=identifier>name</span><span class=special>;
-
- </span><span class=identifier>continuations</span><span class=special>.</span><span class=identifier>add
- </span><span class=special>(</span><span class=string>"one"</span><span class=special>, &</span><span class=identifier>one</span><span class=special>)
- </span><span class=special>(</span><span class=string>"two"</span><span class=special>, &</span><span class=identifier>two</span><span class=special>)
- </span><span class=special>;
-
- </span><span class=identifier>line </span><span class=special>= </span><span class=identifier>continuations</span><span class=special>[</span><span class=identifier>set_rest</span><span class=special><</span><span class=identifier>rule_t</span><span class=special>>(</span><span class=identifier>rest</span><span class=special>)] </span><span class=special>>> </span><span class=identifier>rest</span><span class=special>;</span></pre>
- <p>where continuations is a <a href="symbols.html">symbol table</a> with pointer to rule_t slots. one, two, name, line and rest are rules:</p>
- <pre><span class=special> </span><span class=identifier>rule_t </span><span class=identifier>name</span><span class=special>;
- </span><span class=identifier>rule_t </span><span class=identifier>line</span><span class=special>;
- </span><span class=identifier>rule_t </span><span class=identifier>rest</span><span class=special>;
- </span><span class=identifier>rule_t </span><span class=identifier>one</span><span class=special>;
- </span><span class=identifier>rule_t </span><span class=identifier>two</span><span class=special>;
-
- </span><span class=identifier>symbols</span><span class=special><</span><span class=identifier>rule_t</span><span class=special>*> </span><span class=identifier>continuations</span><span class=special>;</span></pre>
- <p>set_rest, the semantic action attached to continuations is:</p>
- <pre><span class=special> </span><span class=keyword>template </span><span class=special><</span><span class=keyword>typename </span><span class=identifier>Rule</span><span class=special>>
- </span><span class=keyword>struct </span><span class=identifier>set_rest
- </span><span class=special>{
- </span><span class=identifier>set_rest</span><span class=special>(</span><span class=identifier>Rule</span><span class=special>& </span><span class=identifier>the_rule</span><span class=special>)
- </span><span class=special>: </span><span class=identifier>the_rule</span><span class=special>(</span><span class=identifier>the_rule</span><span class=special>) </span><span class=special>{}
- </span><span class=keyword>void </span><span class=keyword>operator</span><span class=special>()(</span><span class=identifier>Rule</span><span class=special>* </span><span class=identifier>newRule</span><span class=special>) </span><span class=keyword>const
- </span><span class=special>{ </span><span class=identifier>m_theRule </span><span class=special>= </span><span class=special>*</span><span class=identifier>newRule</span><span class=special>; </span><span class=special>}
- </span><span class=identifier>Rule</span><span class=special>& </span><span class=identifier>the_rule</span><span class=special>;
- </span><span class=special>};</span></pre>
- <p>Notice how the rest <tt>rule</tt> gets set dynamically when the set_rule action is called. The dynamic grammar parses inputs such as:</p>
- <p> "one only"<br>
- "one again"<br>
- "two first, second"</p>
- <p>The cool part is that the <tt>rest</tt> rule is set (by the <tt>set_rest</tt> action) depending on what the symbol table got. If it got a <em>"one"</em> then rest = one. If it got <em>"two"</em>, then rest = two. Very nifty! This technique should be very fast, especially when there are lots of keywords. It would be nice to add special facilities to make this easy to use. I imagine:</p>
- <pre><span class=special> </span><span class=identifier>r </span><span class=special>= </span><span class=identifier>keywords </span><span class=special>>> </span><span class=identifier>rest</span><span class=special>;</span></pre>
- <p>where <tt>keywords</tt> is a special parser (based on the symbol table) that automatically sets its RHS (rest) depending on the acquired symbol. This, I think, is mighty cool! Someday perhaps... </p>
- <p><img src="theme/note.gif" width="16" height="16"> Also, see the <a href="switch_parser.html">switch parser</a> for another deterministic parsing trick for character/token prefixes. </p>
- <span class=special></span>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30"><a href="../index.html"><img src="theme/u_arr.gif" border="0"></a></td>
- <td width="30"><a href="style_guide.html"><img src="theme/l_arr.gif" border="0"></a></td>
- <td width="30"><a href="faq.html"><img src="theme/r_arr.gif" border="0"></a></td>
- </tr>
- </table>
- <br>
- <hr size="1">
- <p class="copyright">Copyright © 1998-2003 Joel de Guzman<br>
- <br>
- <font size="2">Use, modification and distribution is subject to the Boost Software
- License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt)</font></p>
- <p class="copyright"> </p>
- </body>
- </html>
|