123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462 |
- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
- <html>
- <head>
- <meta content=
- "HTML Tidy for Windows (vers 1st February 2003), see www.w3.org"
- name="generator">
- <title>
- Quick Start
- </title>
- <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
- <link rel="stylesheet" href="theme/style.css" type="text/css">
- </head>
- <body>
- <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
- <tr>
- <td width="10"></td>
- <td width="85%">
- <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Quick
- Start</b></font>
- </td>
- <td width="112">
- <a href="http://spirit.sf.net"><img src="theme/spirit.gif"
- width="112" height="48" align="right" border="0"></a>
- </td>
- </tr>
- </table><br>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30">
- <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a>
- </td>
- <td width="30">
- <a href="introduction.html"><img src="theme/l_arr.gif" border="0">
- </a>
- </td>
- <td width="30">
- <a href="basic_concepts.html"><img src="theme/r_arr.gif" border="0">
- </a>
- </td>
- </tr>
- </table>
- <h2>
- <b>Why would you want to use Spirit?</b>
- </h2>
- <p>
- Spirit is designed to be a practical parsing tool. At the very least, the
- ability to generate a fully-working parser from a formal EBNF
- specification inlined in C++ significantly reduces development time.
- While it may be practical to use a full-blown, stand-alone parser such as
- YACC or ANTLR when we want to develop a computer language such as C or
- Pascal, it is certainly overkill to bring in the big guns when we wish to
- write extremely small micro-parsers. At that end of the spectrum,
- programmers typically approach the job at hand not as a formal parsing
- task but through ad hoc hacks using primitive tools such as
- <tt>scanf</tt>. True, there are tools such as regular-expression
- libraries (such as <a href=
- "http://www.boost.org/libs/regex/index.html">boost regex</a>) or scanners
- (such as <a href="http://www.boost.org/libs/tokenizer/index.html">boost
- tokenizer</a>), but these tools do not scale well when we need to write
- more elaborate parsers. Attempting to write even a moderately-complex
- parser using these tools leads to code that is hard to understand and
- maintain.
- </p>
- <p>
- One prime objective is to make the tool easy to use. When one thinks of a
- parser generator, the usual reaction is "it must be big and complex with
- a steep learning curve." Not so. Spirit is designed to be fully scalable.
- The framework is structured in layers. This permits learning on an
- as-needed basis, after only learning the minimal core and basic concepts.
- </p>
- <p>
- For development simplicity and ease in deployment, the entire framework
- consists of only header files, with no libraries to link against or
- build. Just put the spirit distribution in your include path, compile and
- run. Code size? -very tight. In the quick start example that we shall
- present in a short while, the code size is dominated by the instantiation
- of the <tt>std::vector</tt> and <tt>std::iostream</tt>.
- </p>
- <h2>
- <b>Trivial Example #1</b></h2>
- <p>Create a parser that will parse
- a floating-point number.
- </p>
- <pre><code><font color="#000000"> </font></code><span class="identifier">real_p</span>
- </pre>
- <p>
- (You've got to admit, that's trivial!) The above code actually generates
- a Spirit <tt>real_parser</tt> (a built-in parser) which parses a floating
- point number. Take note that parsers that are meant to be used directly
- by the user end with "<tt>_p</tt>" in their names as a Spirit convention.
- Spirit has many pre-defined parsers and consistent naming conventions
- help you keep from going insane!
- </p>
- <h2>
- <b>Trivial Example #2</b></h2>
- <p>
- Create a parser that will accept a line consisting of two floating-point
- numbers.
- </p>
-
- <pre><code><font color="#000000"> </font></code><code><span class=
- "identifier">real_p</span> <span class=
- "special">>></span> <span class="identifier">real_p</span></code>
- </pre>
- <p>
- Here you see the familiar floating-point numeric parser
- <code><tt>real_p</tt></code> used twice, once for each number. What's
- that <tt class="operators">>></tt> operator doing in there? Well,
- they had to be separated by something, and this was chosen as the
- "followed by" sequence operator. The above program creates a parser from
- two simpler parsers, glueing them together with the sequence operator.
- The result is a parser that is a composition of smaller parsers.
- Whitespace between numbers can implicitly be consumed depending on how
- the parser is invoked (see below).
- </p>
- <p>
- Note: when we combine parsers, we end up with a "bigger" parser, But it's
- still a parser. Parsers can get bigger and bigger, nesting more and more,
- but whenever you glue two parsers together, you end up with one bigger
- parser. This is an important concept.
- </p>
- <h2>
- <b>Trivial Example #3</b></h2>
- <p>
- Create a parser that will accept an arbitrary number of floating-point
- numbers. (Arbitrary means anything from zero to infinity)
- </p>
-
- <pre><code><font color="#000000"> </font></code><code><span class=
- "special">*</span><span class="identifier">real_p</span></code>
- </pre>
- <p>
- This is like a regular-expression Kleene Star, though the syntax might
- look a bit odd for a C++ programmer not used to seeing the <tt class=
- "operators">*</tt> operator overloaded like this. Actually, if you know
- regular expressions it may look odd too since the star is <b>before</b>
- the expression it modifies. C'est la vie. Blame it on the fact that we
- must work with the syntax rules of C++.
- </p>
- <p>
- Any expression that evaluates to a parser may be used with the Kleene
- Star. Keep in mind, though, that due to C++ operator precedence rules you
- may need to put the expression in parentheses for complex expressions.
- The Kleene Star is also known as a Kleene Closure, but we call it the
- Star in most places.
- </p>
- <h3>
- <b><a name="list_of_numbers"></a> Example #4 [ A Just Slightly Less Trivial Example</b>
- ] </h3>
- <p>
- This example will create a parser that accepts a comma-delimited list of numbers and put the numbers in a vector.
- </p>
- <h4><strong> Step 1. Create the parser</strong></h4>
- <pre><code><font color="#000000"> </font></code><code><span class=
- "identifier">real_p</span> <span class=
- "special">>></span> <span class="special">*(</span><span class=
- "identifier">ch_p</span><span class="special">(</span><span class=
- "literal">','</span><span class="special">)</span> <span class=
- "special">>></span> <span class=
- "identifier">real_p</span><span class="special">)</span></code>
- </pre>
- <p>
- Notice <tt>ch_p(',')</tt>. It is a literal character parser that can
- recognize the comma <tt>','</tt>. In this case, the Kleene Star is
- modifying a more complex parser, namely, the one generated by the
- expression:
- </p>
-
- <pre><code><font color="#000000"> </font></code><code><span class=
- "special">(</span><span class="identifier">ch_p</span><span class=
- "special">(</span><span class="literal">','</span><span class=
- "special">)</span> <span class="special">>></span> <span class=
- "identifier">real_p</span><span class="special">)</span></code>
- </pre>
- <p>
- Note that this is a case where the parentheses are necessary. The Kleene
- star encloses the complete expression above.
- </p>
- <h4>
- <b><strong>Step 2. </strong>Using a Parser (now that it's created)</b></h4>
- <p>
- Now that we have created a parser, how do we use it? Like the result of
- any C++ temporary object, we can either store it in a variable, or call
- functions directly on it.
- </p>
- <p>
- We'll gloss over some low-level C++ details and just get to the good
- stuff.
- </p>
- <p>
- If <b><tt>r</tt></b> is a rule (don't worry about what rules exactly are
- for now. This will be discussed later. Suffice it to say that the rule is
- a placeholder variable that can hold a parser), then we store the parser
- as a rule like this:
- </p>
-
- <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
- "identifier">r</span> <span class="special">=</span> <span class=
- "identifier">real_p</span> <span class=
- "special">>> *(</span><span class=
- "identifier">ch_p</span><span class="special">(</span><span class=
- "literal">','</span><span class="special">) >></span> <span class=
- "identifier">real_p</span><span class="special">);</span></font></code>
- </pre>
- <p>
- Not too exciting, just an assignment like any other C++ expression you've
- used for years. The cool thing about storing a parser in a rule is this:
- rules are parsers, and now you can refer to it <b>by name</b>. (In this
- case the name is <tt><b>r</b></tt>). Notice that this is now a full
- assignment expression, thus we terminate it with a semicolon,
- "<tt>;</tt>".
- </p>
- <p>
- That's it. We're done with defining the parser. So the next step is now
- invoking this parser to do its work. There are a couple of ways to do
- this. For now, we shall use the free <tt>parse</tt> function that takes
- in a <tt>char const*</tt>. The function accepts three arguments:
- </p>
- <blockquote>
- <p>
- <img src="theme/bullet.gif" width="12" height="12"> The null-terminated
- <tt>const char*</tt> input<br>
- <img src="theme/bullet.gif" width="12" height="12"> The parser
- object<br>
- <img src="theme/bullet.gif" width="12" height="12"> Another parser
- called the <b>skip parser</b>
- </p>
- </blockquote>
- <p>
- In our example, we wish to skip spaces and tabs. Another parser named
- <tt>space_p</tt> is included in Spirit's repertoire of predefined
- parsers. It is a very simple parser that simply recognizes whitespace. We
- shall use <tt>space_p</tt> as our skip parser. The skip parser is the one
- responsible for skipping characters in between parser elements such as
- the <tt>real_p</tt> and the <tt>ch_p</tt>.
- </p>
- <p>
- Ok, so now let's parse!
- </p>
-
- <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
- "identifier">r</span> <span class="special">=</span> <span class=
- "identifier">real_p</span> <span class=
- "special">>></span> <span class="special">*(</span><span class=
- "identifier">ch_p</span><span class="special">(</span><span class=
- "literal">','</span><span class="special">)</span> <span class=
- "special">>></span> <span class=
- "identifier">real_p</span><span class="special">);
- </span> <span class="identifier"> parse</span><span class=
- "special">(</span><span class="identifier">str</span><span class=
- "special">,</span> <span class="identifier">r</span><span class=
- "special">,</span> <span class="identifier">space_p</span><span class=
- "special">)</span> <span class=
- "comment">// Not a full statement yet, patience...</span></font></code>
- </pre>
- <p>
- The parse function returns an object (called <tt>parse_info</tt>) that
- holds, among other things, the result of the parse. In this example, we
- need to know:
- </p>
- <blockquote>
- <p>
- <img src="theme/bullet.gif" width="12" height="12"> Did the parser
- successfully recognize the input <tt>str</tt>?<br>
- <img src="theme/bullet.gif" width="12" height="12"> Did the parser
- <b>fully</b> parse and consume the input up to its end?
- </p>
- </blockquote>
- <p>
- To get a complete picture of what we have so far, let us also wrap this
- parser inside a function:
- </p>
-
- <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
- "keyword">bool
- </span> <span class="identifier"> parse_numbers</span><span class=
- "special">(</span><span class="keyword">char</span> <span class=
- "keyword">const</span><span class="special">*</span> <span class=
- "identifier">str</span><span class="special">)
- {
- </span> <span class="keyword"> return</span> <span class=
- "identifier">parse</span><span class="special">(</span><span class=
- "identifier">str</span><span class="special">,</span> <span class=
- "identifier">real_p</span> <span class=
- "special">>></span> <span class="special">*(</span><span class=
- "literal">','</span> <span class="special">>></span> <span class=
- "identifier">real_p</span><span class="special">),</span> <span class=
- "identifier">space_p</span><span class="special">).</span><span class=
- "identifier">full</span><span class="special">;
- }</span></font></code>
- </pre>
- <p>
- Note in this case we dropped the named rule and inlined the parser
- directly in the call to parse. Upon calling parse, the expression
- evaluates into a temporary, unnamed parser which is passed into the
- parse() function, used, and then destroyed.
- </p>
- <table border="0" width="80%" align="center">
- <tr>
- <td class="note_box">
- <img src="theme/note.gif" width="16" height="16"><b>char and wchar_t
- operands</b><br>
- <br>
- The careful reader may notice that the parser expression has
- <tt class="quotes">','</tt> instead of <tt>ch_p(',')</tt> as the
- previous examples did. This is ok due to C++ syntax rules of
- conversion. There are <tt>>></tt> operators that are overloaded
- to accept a <tt>char</tt> or <tt>wchar_t</tt> argument on its left or
- right (but not both). An operator may be overloaded if at least one
- of its parameters is a user-defined type. In this case, the
- <tt>real_p</tt> is the 2nd argument to <tt>operator<span class=
- "operators">>></span></tt>, and so the proper overload of
- <tt class="operators">>></tt> is used, converting
- <tt class="quotes">','</tt> into a character literal parser.<br>
- <br>
- The problem with omiting the <tt>ch_p</tt> call should be obvious:
- <tt>'a' >> 'b'</tt> is <b>not</b> a spirit parser, it is a
- numeric expression, right-shifting the ASCII (or another encoding)
- value of <tt class="quotes">'a'</tt> by the ASCII value of
- <tt class="quotes">'b'</tt>. However, both <tt>ch_p('a') >>
- 'b'</tt> and <tt>'a' >> ch_p('b')</tt> are Spirit sequence
- parsers for the letter <tt class="quotes">'a'</tt> followed by
- <tt class="quotes">'b'</tt>. You'll get used to it, sooner or
- later.
- </td>
- </tr>
- </table>
- <p>
- Take note that the object returned from the parse function has a member
- called <tt>full</tt> which returns true if both of our requirements above
- are met (i.e. the parser fully parsed the input).
- </p>
- <h4>
- <b> Step 3. Semantic Actions</b></h4>
- <p>
- Our parser above is really nothing but a recognizer. It answers the
- question <i class="quotes">"did the input match our grammar?"</i>, but it
- does not remember any data, nor does it perform any side effects.
- Remember: we want to put the parsed numbers into a vector. This is done
- in an <b>action</b> that is linked to a particular parser. For example,
- whenever we parse a real number, we wish to store the parsed number after
- a successful match. We now wish to extract information from the parser.
- Semantic actions do this. Semantic actions may be attached to any point
- in the grammar specification. These actions are C++ functions or functors
- that are called whenever a part of the parser successfully recognizes a
- portion of the input. Say you have a parser <b>P</b>, and a C++ function
- <b>F</b>, you can make the parser call <b>F</b> whenever it matches an
- input by attaching <b>F</b>:
- </p>
-
- <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
- "identifier">P</span><span class="special">[&</span><span class=
- "identifier">F</span><span class="special">]</span></font></code>
- </pre>
- <p>
- Or if <b>F</b> is a function object (a functor):
- </p>
-
- <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
- "identifier">P</span><span class="special">[</span><span class=
- "identifier">F</span><span class="special">]</span></font></code>
- </pre>
- <p>
- The function/functor signature depends on the type of the parser to which
- it is attached. The parser <tt>real_p</tt> passes a single argument: the
- parsed number. Thus, if we were to attach a function <b>F</b> to
- <tt>real_p</tt>, we need <b>F</b> to be declared as:
- </p>
-
- <pre><code> </code><code><span class=
- "keyword">void</span> <span class="identifier">F</span><span class=
- "special">(</span><span class="keyword">double</span> <span class=
- "identifier">n</span><span class="special">);</span></code></pre>
- <p>
- For our example however, again, we can take advantage of some predefined
- semantic functors and functor generators (<img src="theme/lens.gif"
- width="15" height="16"> A functor generator is a function that returns
- a functor). For our purpose, Spirit has a functor generator
- <tt>push_back_a(c)</tt>. In brief, this semantic action, when called,
- <b>appends</b> the parsed value it receives from the parser it is
- attached to, to the container <tt>c</tt>.
- </p>
- <p>
- Finally, here is our complete comma-separated list parser:
- </p>
-
- <pre><code><font color="#000000"> </font></code><code><font color="#000000"><span class=
- "keyword">bool
- </span> <span class="identifier">parse_numbers</span><span class=
- "special">(</span><span class="keyword">char</span> <span class=
- "keyword">const</span><span class="special">*</span> <span class=
- "identifier">str</span><span class="special">,</span> <span class=
- "identifier">vector</span><span class="special"><</span><span class=
- "keyword">double</span><span class=
- "special">>&</span> <span class="identifier">v</span><span class=
- "special">)
- {
- </span> <span class="keyword">return</span> <span class=
- "identifier">parse</span><span class="special">(</span><span class=
- "identifier">str</span><span class="special">,
- </span> <span class="comment"> // Begin grammar
- </span> <span class="special"> (
- </span> <span class="identifier">real_p</span><span class=
- "special">[</span><span class="identifier">push_back_a</span><span class=
- "special">(</span><span class="identifier">v</span><span class=
- "special">)]</span> <span class="special">>></span> <span class=
- "special">*(</span><span class="literal">','</span> <span class=
- "special">>></span> <span class=
- "identifier">real_p</span><span class="special">[</span><span class=
- "identifier">push_back_a</span><span class="special">(</span><span class=
- "identifier">v</span><span class="special">)])
- )
- </span> <span class="special"> ,
- </span> <span class="comment"> // End grammar
- </span> <span class="identifier"> space_p</span><span class=
- "special">).</span><span class="identifier">full</span><span class="special">;
- }</span></font></code>
- </pre>
- <p>
- This is the same parser as above. This time with appropriate semantic
- actions attached to strategic places to extract the parsed numbers and
- stuff them in the vector <tt>v</tt>. The parse_numbers function returns
- true when successful.
- </p>
- <p>
- <img src="theme/lens.gif" width="15" height="16"> The full source code
- can be <a href="../example/fundamental/number_list.cpp">viewed here</a>.
- This is part of the Spirit distribution.
- </p>
- <table border="0">
- <tr>
- <td width="10"></td>
- <td width="30">
- <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a>
- </td>
- <td width="30">
- <a href="introduction.html"><img src="theme/l_arr.gif" border="0">
- </a>
- </td>
- <td width="30">
- <a href="basic_concepts.html"><img src="theme/r_arr.gif" border="0">
- </a>
- </td>
- </tr>
- </table><br>
- <hr size="1">
- <p class="copyright">
- Copyright © 1998-2003 Joel de Guzman<br>
- Copyright © 2002 Chris Uzdavinis<br>
- <br>
- <font size="2">Use, modification and distribution is subject to the
- Boost Software License, Version 1.0. (See accompanying file
- LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)</font>
- </p>
- <blockquote>
-
- </blockquote>
- </body>
- </html>
|