basic_concepts.html 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  2. <html>
  3. <head>
  4. <meta content=
  5. "HTML Tidy for Windows (vers 1st February 2003), see www.w3.org"
  6. name="generator">
  7. <title>
  8. Basic Concepts
  9. </title>
  10. <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
  11. <link rel="stylesheet" href="theme/style.css" type="text/css">
  12. </head>
  13. <body>
  14. <table width="100%" border="0" background="theme/bkd2.gif" cellspacing="2">
  15. <tr>
  16. <td width="10"></td>
  17. <td width="85%">
  18. <font size="6" face="Verdana, Arial, Helvetica, sans-serif"><b>Basic
  19. Concepts</b></font>
  20. </td>
  21. <td width="112">
  22. <a href="http://spirit.sf.net"><img src="theme/spirit.gif"
  23. width="112" height="48" align="right" border="0"></a>
  24. </td>
  25. </tr>
  26. </table><br>
  27. <table border="0">
  28. <tr>
  29. <td width="10"></td>
  30. <td width="30">
  31. <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a>
  32. </td>
  33. <td width="30">
  34. <a href="quick_start.html"><img src="theme/l_arr.gif" border="0"></a>
  35. </td>
  36. <td width="30">
  37. <a href="organization.html"><img src="theme/r_arr.gif" border="0">
  38. </a>
  39. </td>
  40. </tr>
  41. </table>
  42. <p>
  43. There are a few fundamental concepts that need to be understood well: 1)
  44. The <strong>Parser</strong>, 2) <strong>Match</strong>, 3) The
  45. <strong>Scanner</strong>, and 4) <strong>Semantic Actions</strong>. These
  46. basic concepts interact with one another, and the functionalities of each
  47. interweave throughout the framework to make it one coherent whole.
  48. </p>
  49. <table width="48%" border="0" align="center">
  50. <tr>
  51. <td height="211">
  52. <img src="theme/intro1.png">
  53. </td>
  54. </tr>
  55. </table>
  56. <h2>
  57. The Parser
  58. </h2>
  59. <p>
  60. Central to the framework is the parser. The parser does the actual work
  61. of recognizing a linear input stream of data read sequentially from start
  62. to end by the scanner. The parser attempts to match the input following a
  63. well-defined set of specifications known as grammar rules. The parser
  64. reports the success or failure to its client through a match object. When
  65. successful, the parser calls a client-supplied semantic action. Finally,
  66. the semantic action extracts structural information depending on the data
  67. passed by the parser and the hierarchical context of the parser it is
  68. attached to.
  69. </p>
  70. <p>
  71. Parsers come in different flavors. The Spirit framework comes bundled
  72. with an extensive set of pre-defined parsers that perform various parsing
  73. tasks from the trivial to the complex. The parser, as a concept, has a
  74. public conceptual interface contract. Following the contract, anyone can
  75. write a conforming parser that will play along well with the framework's
  76. predefined components. We shall provide a blueprint detailing the
  77. conceptual interface of the parser later.
  78. </p>
  79. <p>
  80. Clients of the framework generally do not need to write their own
  81. hand-coded parsers at all. Spirit has an immense repertoire of
  82. pre-defined parsers covering all aspects of syntax and semantic analysis.
  83. We shall examine this repertoire of parsers in the following sections. In
  84. the rare case where a specific functionality is not available, it is
  85. extremely easy to write a user-defined parser. The ease in writing a
  86. parser entity is the main reason for Spirit's extensibility.
  87. </p>
  88. <h2>
  89. Primitives and Composites
  90. </h2>
  91. <p>
  92. Spirit parsers fall into two categories: <b>primitives</b> and
  93. <b>composites</b>. These two categories are more or less synonymous to
  94. terminals and non-terminals in parsing lingo. Primitives are
  95. non-decomposable atomic units. Composites on the other hand are parsers
  96. that are composed of other parsers which can in turn be a primitive or
  97. another composite. To illustrate, consider the Spirit expression:
  98. </p>
  99. <pre><code><font color="#000000"> </font></code><code><span class="identifier">real_p</span> <span class=
  100. "special">&gt;&gt;</span> <span class="special">*(</span><span class=
  101. "literal">','</span> <span class="special">&gt;&gt;</span> <span class=
  102. "identifier">real_p</span><span class="special">)</span></code>
  103. </pre>
  104. <p>
  105. <tt><tt>real_p</tt></tt> is a primitive parser that can parse real
  106. numbers. The quoted comma <tt class="quotes">','</tt> in the expression
  107. is a shortcut and is equivalent to <tt>ch_p<span class=
  108. "operators">(</span><span class="quotes">','</span><span class=
  109. "operators">)</span></tt>, which is another primitive parser that
  110. recognizes single characters.
  111. </p>
  112. <p>
  113. The expression above corresponds to the following parse tree:
  114. </p>
  115. <table width="29%" border="0" align="center">
  116. <tr>
  117. <td>
  118. <img src="theme/intro7.png">
  119. </td>
  120. </tr>
  121. </table>
  122. <p>
  123. The expression:
  124. </p>
  125. <pre><code><font color="#000000"> </font></code><span class=
  126. "literal">','</span> <span class="special">&gt;&gt;</span> <span class=
  127. "identifier">real_p</span>
  128. </pre>
  129. <p>
  130. composes a <b>sequence</b> parser. The <tt>sequence</tt> parser is a
  131. composite parser comprising two parsers: the one on its left hand side
  132. (lhs), <tt>ch_p<span class="operators">(</span><span class=
  133. "quotes">','</span><span class="operators">)</span></tt> ; and the other
  134. on its right hand side (rhs), <tt>real_p</tt>. This composite parser,
  135. when called, calls its lhs and rhs in sequence and reports a successful
  136. match only if both are successful.
  137. </p>
  138. <table width="14%" border="0" align="center">
  139. <tr>
  140. <td>
  141. <img src="theme/intro2.png">
  142. </td>
  143. </tr>
  144. </table>
  145. <p>
  146. The <tt>sequence</tt> parser is a binary composite. It is composed of two
  147. parsers. There are unary composites as well. Unary composites hold only a
  148. single subject. Like the binary composite, the unary composite may change
  149. the behavior of its embedded subject. One particular example is the
  150. <b>Kleene star</b>. The Kleene star, when called to parse, calls its sole
  151. subject zero or more times. "Zero or more" implies that the Kleene star
  152. always returns a successful match, possibly matching the null string: "".
  153. </p>
  154. <p>
  155. The expression:
  156. </p>
  157. <pre><code><font color="#000000"> </font></code><code><span class=
  158. "special">*(</span><span class="literal">','</span> <span class=
  159. "special">&gt;&gt;</span> <span class=
  160. "identifier">real_p</span><span class="special">)</span></code>
  161. </pre>
  162. <p>
  163. wraps the whole sequence composite above inside a <tt>kleene_star</tt>.
  164. </p>
  165. <table width="17%" border="0" align="center">
  166. <tr>
  167. <td>
  168. <img src="theme/intro3.png">
  169. </td>
  170. </tr>
  171. </table>
  172. <p>
  173. Finally, the full expression composes a <tt>real_p</tt> primitive parser
  174. and the <tt>kleene_star</tt> we have above into another higher level
  175. <tt>sequence</tt> parser composite.
  176. </p>
  177. <table width="34%" border="0" align="center">
  178. <tr>
  179. <td>
  180. <img src="theme/intro4.png">
  181. </td>
  182. </tr>
  183. </table>
  184. <p>
  185. A few simple classes, when composed and structured in a hierarchy, form a
  186. very powerful object-oriented recursive-descent parsing engine. These
  187. classes provide the infrastructure needed for the construction of
  188. more-complex parsers. The final parser composite is a non-deterministic
  189. recursive-descent parser with infinite look-ahead.
  190. </p>
  191. <p>
  192. Top-down descent traverses the hierarchy. The outer <tt>sequence</tt>
  193. calls the leftmost <tt>real_p</tt> parser. If successful, the
  194. <tt>kleene_star</tt> is called next. The <tt>kleene_star</tt> calls the
  195. inner <tt>sequence</tt> repeatedly in a loop until it fails to match, or
  196. the input is exhausted. Inside, <tt>ch_p(',')</tt> and then
  197. <tt>real_p</tt> are called in sequence. The following diagram illustrates
  198. what is happening, somewhat reminiscent of Pascal syntax diagrams.
  199. </p>
  200. <table width="37%" border="0" align="center">
  201. <tr>
  202. <td>
  203. <img src="theme/intro5.png">
  204. </td>
  205. </tr>
  206. </table>
  207. <p>
  208. The flexibility of object embedding and composition combined with
  209. recursion opens up a unique approach to parsing. Subclasses are free to
  210. form aggregates and algorithms of arbitrary complexity. Complex parsers
  211. can be created with the composition of only a few primitive classes.
  212. </p>
  213. <p>
  214. The framework is designed to be fully open-ended and extensible. New
  215. primitives or composites, from the trivial to the complex, may be added
  216. any time. Composition happens (statically) at compile time. This is
  217. possible through the expressive flexibility of C++ expression templates
  218. and template meta-programming.
  219. </p>
  220. <p>
  221. The result is a composite composed of primitives and smaller composites.
  222. This embedding strategy gives us the ability to build hierarchical
  223. structures that fully model EBNF expressions of arbitrary complexity.
  224. Later on, we shall see more primitive and composite building blocks.
  225. </p>
  226. <h2>
  227. The Scanner
  228. </h2>
  229. <p>
  230. Like the parser, the scanner is also an abstract concept. The task of the
  231. scanner is to feed the sequential input data stream to the parser. The
  232. scanner is composed of two STL conforming forward iterators, first and
  233. last, where first is held by reference and last, by value. The first
  234. iterator is held by reference to allow re-positioning by the parser. A
  235. set of policies governs how the scanner behaves. Parsers extract data
  236. from the scanner and position the iterator appropriately through its
  237. member functions.
  238. </p>
  239. <p>
  240. Knowledge of the intricacies of these policies is not required at all in
  241. most cases. However, knowledge of the scanner's basic API is required to
  242. write fully-conforming Spirit parsers. The scanner's API will be outlined
  243. in a separate section. In addition, for the power users and the
  244. adventurous among us, a full section will be devoted to covering the
  245. scanner policies. The scanner policies make Spirit very flexible and
  246. extensible. For instance, some of the policies may be modified to filter
  247. data. A practical example is a scanner policy that does not distinguish
  248. upper and lower case whereby making it useful for parsing case
  249. insensitive input. Another example is a scanner policy that strips white
  250. spaces from the input.
  251. </p>
  252. <h2>
  253. The Match
  254. </h2>
  255. <p>
  256. The parser has a conceptual parse member function taking in a scanner and
  257. returning a match object. The primary function of the match object is to
  258. report parsing success (or failure) back to the parser's caller; i.e., it
  259. evaluates to true if the parse function is successful, false otherwise.
  260. If the parse is successful, the match object may also be queried to
  261. report the number of characters matched (using <tt>match.length()</tt>).
  262. The length is non-negative if the match is successful, and the typical
  263. length of a parse failure is -1. A zero length is perfectly valid and
  264. still represents a successful match.
  265. </p>
  266. <p>
  267. Parsers may have attribute data associated with it. For example, the
  268. real_p parser has a numeric datum associated with it. This attribute is
  269. the parsed number. This attribute is passed on to the returned match
  270. object. The match object may be queried to get this attribute. This datum
  271. is valid only when the match is successful.
  272. </p>
  273. <h2>
  274. Semantic Actions
  275. </h2>
  276. <p>
  277. A composite parser forms a hierarchy. Parsing proceeds from the topmost
  278. parent parser which delegates and apportions the parsing task to its
  279. children recursively to its children's children and so on until a
  280. primitive is reached. By attaching semantic actions to various points in
  281. this hierarchy, in effect we can transform the flat linear input stream
  282. into a structured representation. This is essentially what parsers do.
  283. </p>
  284. <p>
  285. Recall our example above:
  286. </p>
  287. <pre><code><font color="#000000"> </font></code><code><span class=
  288. "identifier">real_p</span> <span class=
  289. "special">&gt;&gt;</span> <span class="special">*(</span><span class=
  290. "literal">','</span> <span class="special">&gt;&gt;</span> <span class=
  291. "identifier">real_p</span><span class="special">)</span></code>
  292. </pre>
  293. <p>
  294. By hooking a function (or functor) into the real_p parsers, we can
  295. extract the numbers from the input:
  296. </p>
  297. <pre><code><font color="#000000"> </font></code><span class=
  298. "identifier">real_p</span><span class="special">[&amp;</span><span class=
  299. "identifier">f</span><span class="special">]</span> <span class=
  300. "special">&gt;&gt;</span> <span class="special">*(</span><span class=
  301. "literal">','</span> <span class="special">&gt;&gt;</span> <span class=
  302. "identifier">real_p</span><span class="special">[&amp;</span><span class=
  303. "identifier">f</span><span class="special">])</span>
  304. </pre>
  305. <table width="41%" border="0" align="center">
  306. <tr>
  307. <td>
  308. <img src="theme/intro6.png">
  309. </td>
  310. </tr>
  311. </table>
  312. <p> where <tt>f</tt> is a function that takes in a single argument. The <tt><span class="operators">[&amp;</span>f<span class=
  313. "operators">]</span></tt> hooks the parser with the function such that when
  314. <tt>real_p</tt> recognizes a valid number, the function <tt>f</tt> is called.
  315. It is up to the function then to do what is appropriate. For example, it can
  316. stuff the numbers in a vector. Or perhaps, if the grammar is changed slightly
  317. by replacing <tt class="quotes">','</tt> with <tt class="quotes">'+'</tt>, then
  318. we have a primitive calculator that computes sums. The function <tt>f</tt> then
  319. can then be made to add all incoming numbers.<br>
  320. </p>
  321. <table border="0">
  322. <tr>
  323. <td width="10"></td>
  324. <td width="30">
  325. <a href="../index.html"><img src="theme/u_arr.gif" border="0"></a>
  326. </td>
  327. <td width="30">
  328. <a href="quick_start.html"><img src="theme/l_arr.gif" border="0"></a>
  329. </td>
  330. <td width="30">
  331. <a href="organization.html"><img src="theme/r_arr.gif" border="0">
  332. </a>
  333. </td>
  334. </tr>
  335. </table><br>
  336. <hr size="1">
  337. <p class="copyright">
  338. Copyright &copy; 1998-2003 Joel de Guzman<br>
  339. <br>
  340. <font size="2">Use, modification and distribution is subject to the Boost
  341. Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or
  342. copy at http://www.boost.org/LICENSE_1_0.txt)</font>
  343. </p>
  344. <p>&nbsp;
  345. </p>
  346. </body>
  347. </html>