BoyerMooreHorspool.html 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236
  1. <html>
  2. <head>
  3. <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
  4. <title>Boyer-Moore-Horspool Search</title>
  5. <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
  6. <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
  7. <link rel="home" href="../../index.html" title="The Boost Algorithm Library">
  8. <link rel="up" href="../../algorithm/Searching.html" title="Searching Algorithms">
  9. <link rel="prev" href="../../algorithm/Searching.html" title="Searching Algorithms">
  10. <link rel="next" href="KnuthMorrisPratt.html" title="Knuth-Morris-Pratt Search">
  11. </head>
  12. <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
  13. <table cellpadding="2" width="100%"><tr>
  14. <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
  15. <td align="center"><a href="../../../../../../index.html">Home</a></td>
  16. <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
  17. <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
  18. <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
  19. <td align="center"><a href="../../../../../../more/index.htm">More</a></td>
  20. </tr></table>
  21. <hr>
  22. <div class="spirit-nav">
  23. <a accesskey="p" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="KnuthMorrisPratt.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
  24. </div>
  25. <div class="section">
  26. <div class="titlepage"><div><div><h3 class="title">
  27. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool"></a><a class="link" href="BoyerMooreHorspool.html" title="Boyer-Moore-Horspool Search">Boyer-Moore-Horspool
  28. Search</a>
  29. </h3></div></div></div>
  30. <h5>
  31. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h0"></a>
  32. <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview">Overview</a>
  33. </h5>
  34. <p>
  35. The header file 'boyer_moore_horspool.hpp' contains an implementation of
  36. the Boyer-Moore-Horspool algorithm for searching sequences of values.
  37. </p>
  38. <p>
  39. The Boyer-Moore-Horspool search algorithm was published by Nigel Horspool
  40. in 1980. It is a refinement of the Boyer-Moore algorithm that trades space
  41. for time. It uses less space for internal tables than Boyer-Moore, and has
  42. poorer worst-case performance.
  43. </p>
  44. <p>
  45. The Boyer-Moore-Horspool algorithm cannot be used with comparison predicates
  46. like <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">search</span></code>.
  47. </p>
  48. <h5>
  49. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h1"></a>
  50. <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface">Interface</a>
  51. </h5>
  52. <p>
  53. Nomenclature: I refer to the sequence being searched for as the "pattern",
  54. and the sequence being searched in as the "corpus".
  55. </p>
  56. <p>
  57. For flexibility, the Boyer-Moore-Horspool algorithm has two interfaces; an
  58. object-based interface and a procedural one. The object-based interface builds
  59. the tables in the constructor, and uses operator () to perform the search.
  60. The procedural interface builds the table and does the search all in one
  61. step. If you are going to be searching for the same pattern in multiple corpora,
  62. then you should use the object interface, and only build the tables once.
  63. </p>
  64. <p>
  65. Here is the object interface:
  66. </p>
  67. <pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">&gt;</span>
  68. <span class="keyword">class</span> <span class="identifier">boyer_moore_horspool</span> <span class="special">{</span>
  69. <span class="keyword">public</span><span class="special">:</span>
  70. <span class="identifier">boyer_moore_horspool</span> <span class="special">(</span> <span class="identifier">patIter</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">last</span> <span class="special">);</span>
  71. <span class="special">~</span><span class="identifier">boyer_moore_horspool</span> <span class="special">();</span>
  72. <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
  73. <span class="identifier">pair</span><span class="special">&lt;</span><span class="identifier">corpusIter</span><span class="special">,</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span> <span class="keyword">operator</span> <span class="special">()</span> <span class="special">(</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span> <span class="special">);</span>
  74. <span class="special">};</span>
  75. </pre>
  76. <p>
  77. </p>
  78. <p>
  79. and here is the corresponding procedural interface:
  80. </p>
  81. <p>
  82. </p>
  83. <pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
  84. <span class="identifier">pair</span><span class="special">&lt;</span><span class="identifier">corpusIter</span><span class="special">,</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span> <span class="identifier">boyer_moore_horspool_search</span> <span class="special">(</span>
  85. <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span><span class="special">,</span>
  86. <span class="identifier">patIter</span> <span class="identifier">pat_first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">pat_last</span> <span class="special">);</span>
  87. </pre>
  88. <p>
  89. </p>
  90. <p>
  91. Each of the functions is passed two pairs of iterators. The first two define
  92. the corpus and the second two define the pattern. Note that the two pairs
  93. need not be of the same type, but they do need to "point" at the
  94. same type. In other words, <code class="computeroutput"><span class="identifier">patIter</span><span class="special">::</span><span class="identifier">value_type</span></code>
  95. and <code class="computeroutput"><span class="identifier">curpusIter</span><span class="special">::</span><span class="identifier">value_type</span></code> need to be the same type.
  96. </p>
  97. <p>
  98. The return value of the function is a pair of iterators pointing to the position
  99. of the pattern in the corpus. If the pattern is empty, it returns at empty
  100. range at the start of the corpus (<code class="computeroutput"><span class="identifier">corpus_first</span></code>,
  101. <code class="computeroutput"><span class="identifier">corpus_first</span></code>). If the pattern
  102. is not found, it returns at empty range at the end of the corpus (<code class="computeroutput"><span class="identifier">corpus_last</span></code>, <code class="computeroutput"><span class="identifier">corpus_last</span></code>).
  103. </p>
  104. <h5>
  105. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h2"></a>
  106. <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.compatibility_note"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.compatibility_note">Compatibility
  107. Note</a>
  108. </h5>
  109. <p>
  110. Earlier versions of this searcher returned only a single iterator. As explained
  111. in <a href="https://cplusplusmusings.wordpress.com/2016/02/01/sometimes-you-get-things-wrong/" target="_top">https://cplusplusmusings.wordpress.com/2016/02/01/sometimes-you-get-things-wrong/</a>,
  112. this was a suboptimal interface choice, and has been changed, starting in
  113. the 1.62.0 release. Old code that is expecting a single iterator return value
  114. can be updated by replacing the return value of the searcher's <code class="computeroutput"><span class="keyword">operator</span> <span class="special">()</span></code>
  115. with the <code class="computeroutput"><span class="special">.</span><span class="identifier">first</span></code>
  116. field of the pair.
  117. </p>
  118. <p>
  119. Instead of:
  120. </p>
  121. <pre class="programlisting"><span class="identifier">iterator</span> <span class="identifier">foo</span> <span class="special">=</span> <span class="identifier">searcher</span><span class="special">(</span><span class="identifier">a</span><span class="special">,</span> <span class="identifier">b</span><span class="special">);</span>
  122. </pre>
  123. <p>
  124. </p>
  125. <p>
  126. you now write:
  127. </p>
  128. <pre class="programlisting"><span class="identifier">iterator</span> <span class="identifier">foo</span> <span class="special">=</span> <span class="identifier">searcher</span><span class="special">(</span><span class="identifier">a</span><span class="special">,</span> <span class="identifier">b</span><span class="special">).</span><span class="identifier">first</span><span class="special">;</span>
  129. </pre>
  130. <p>
  131. </p>
  132. <h5>
  133. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h3"></a>
  134. <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance">Performance</a>
  135. </h5>
  136. <p>
  137. The execution time of the Boyer-Moore-Horspool algorithm is linear in the
  138. size of the string being searched; it can have a significantly lower constant
  139. factor than many other search algorithms: it doesn't need to check every
  140. character of the string to be searched, but rather skips over some of them.
  141. Generally the algorithm gets faster as the pattern being searched for becomes
  142. longer. Its efficiency derives from the fact that with each unsuccessful
  143. attempt to find a match between the search string and the text it is searching,
  144. it uses the information gained from that attempt to rule out as many positions
  145. of the text as possible where the string cannot match.
  146. </p>
  147. <h5>
  148. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h4"></a>
  149. <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use">Memory
  150. Use</a>
  151. </h5>
  152. <p>
  153. The algorithm an internal table that has one entry for each member of the
  154. "alphabet" in the pattern. For (8-bit) character types, this table
  155. contains 256 entries.
  156. </p>
  157. <h5>
  158. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h5"></a>
  159. <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity">Complexity</a>
  160. </h5>
  161. <p>
  162. The worst-case performance is <span class="emphasis"><em>O(m x n)</em></span>, where <span class="emphasis"><em>m</em></span>
  163. is the length of the pattern and <span class="emphasis"><em>n</em></span> is the length of
  164. the corpus. The average time is <span class="emphasis"><em>O(n)</em></span>. The best case
  165. performance is sub-linear, and is, in fact, identical to Boyer-Moore, but
  166. the initialization is quicker and the internal loop is simpler than Boyer-Moore.
  167. </p>
  168. <h5>
  169. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h6"></a>
  170. <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety">Exception
  171. Safety</a>
  172. </h5>
  173. <p>
  174. Both the object-oriented and procedural versions of the Boyer-Moore-Horspool
  175. algorithm take their parameters by value and do not use any information other
  176. than what is passed in. Therefore, both interfaces provide the strong exception
  177. guarantee.
  178. </p>
  179. <h5>
  180. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h7"></a>
  181. <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes">Notes</a>
  182. </h5>
  183. <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
  184. <li class="listitem">
  185. When using the object-based interface, the pattern must remain unchanged
  186. for during the searches; i.e, from the time the object is constructed
  187. until the final call to operator () returns.
  188. </li>
  189. <li class="listitem">
  190. The Boyer-Moore-Horspool algorithm requires random-access iterators for
  191. both the pattern and the corpus.
  192. </li>
  193. </ul></div>
  194. <h5>
  195. <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h8"></a>
  196. <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points">Customization
  197. points</a>
  198. </h5>
  199. <p>
  200. The Boyer-Moore-Horspool object takes a traits template parameter which enables
  201. the caller to customize how the precomputed table is stored. This table,
  202. called the skip table, contains (logically) one entry for every possible
  203. value that the pattern can contain. When searching 8-bit character data,
  204. this table contains 256 elements. The traits class defines the table to be
  205. used.
  206. </p>
  207. <p>
  208. The default traits class uses a <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">array</span></code>
  209. for small 'alphabets' and a <code class="computeroutput"><span class="identifier">tr1</span><span class="special">::</span><span class="identifier">unordered_map</span></code>
  210. for larger ones. The array-based skip table gives excellent performance,
  211. but could be prohibitively large when the 'alphabet' of elements to be searched
  212. grows. The unordered_map based version only grows as the number of unique
  213. elements in the pattern, but makes many more heap allocations, and gives
  214. slower lookup performance.
  215. </p>
  216. <p>
  217. To use a different skip table, you should define your own skip table object
  218. and your own traits class, and use them to instantiate the Boyer-Moore-Horspool
  219. object. The interface to these objects is described TBD.
  220. </p>
  221. </div>
  222. <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
  223. <td align="left"></td>
  224. <td align="right"><div class="copyright-footer">Copyright &#169; 2010-2012 Marshall Clow<p>
  225. Distributed under the Boost Software License, Version 1.0. (See accompanying
  226. file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
  227. </p>
  228. </div></td>
  229. </tr></table>
  230. <hr>
  231. <div class="spirit-nav">
  232. <a accesskey="p" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="KnuthMorrisPratt.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
  233. </div>
  234. </body>
  235. </html>