123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236 |
- <html>
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
- <title>Boyer-Moore-Horspool Search</title>
- <link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
- <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
- <link rel="home" href="../../index.html" title="The Boost Algorithm Library">
- <link rel="up" href="../../algorithm/Searching.html" title="Searching Algorithms">
- <link rel="prev" href="../../algorithm/Searching.html" title="Searching Algorithms">
- <link rel="next" href="KnuthMorrisPratt.html" title="Knuth-Morris-Pratt Search">
- </head>
- <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
- <table cellpadding="2" width="100%"><tr>
- <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
- <td align="center"><a href="../../../../../../index.html">Home</a></td>
- <td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
- <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
- <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
- <td align="center"><a href="../../../../../../more/index.htm">More</a></td>
- </tr></table>
- <hr>
- <div class="spirit-nav">
- <a accesskey="p" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="KnuthMorrisPratt.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
- </div>
- <div class="section">
- <div class="titlepage"><div><div><h3 class="title">
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool"></a><a class="link" href="BoyerMooreHorspool.html" title="Boyer-Moore-Horspool Search">Boyer-Moore-Horspool
- Search</a>
- </h3></div></div></div>
- <h5>
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h0"></a>
- <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview">Overview</a>
- </h5>
- <p>
- The header file 'boyer_moore_horspool.hpp' contains an implementation of
- the Boyer-Moore-Horspool algorithm for searching sequences of values.
- </p>
- <p>
- The Boyer-Moore-Horspool search algorithm was published by Nigel Horspool
- in 1980. It is a refinement of the Boyer-Moore algorithm that trades space
- for time. It uses less space for internal tables than Boyer-Moore, and has
- poorer worst-case performance.
- </p>
- <p>
- The Boyer-Moore-Horspool algorithm cannot be used with comparison predicates
- like <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">search</span></code>.
- </p>
- <h5>
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h1"></a>
- <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface">Interface</a>
- </h5>
- <p>
- Nomenclature: I refer to the sequence being searched for as the "pattern",
- and the sequence being searched in as the "corpus".
- </p>
- <p>
- For flexibility, the Boyer-Moore-Horspool algorithm has two interfaces; an
- object-based interface and a procedural one. The object-based interface builds
- the tables in the constructor, and uses operator () to perform the search.
- The procedural interface builds the table and does the search all in one
- step. If you are going to be searching for the same pattern in multiple corpora,
- then you should use the object interface, and only build the tables once.
- </p>
- <p>
- Here is the object interface:
- </p>
- <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">></span>
- <span class="keyword">class</span> <span class="identifier">boyer_moore_horspool</span> <span class="special">{</span>
- <span class="keyword">public</span><span class="special">:</span>
- <span class="identifier">boyer_moore_horspool</span> <span class="special">(</span> <span class="identifier">patIter</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">last</span> <span class="special">);</span>
- <span class="special">~</span><span class="identifier">boyer_moore_horspool</span> <span class="special">();</span>
- <span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">></span>
- <span class="identifier">pair</span><span class="special"><</span><span class="identifier">corpusIter</span><span class="special">,</span> <span class="identifier">corpusIter</span><span class="special">></span> <span class="keyword">operator</span> <span class="special">()</span> <span class="special">(</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span> <span class="special">);</span>
- <span class="special">};</span>
- </pre>
- <p>
- </p>
- <p>
- and here is the corresponding procedural interface:
- </p>
- <p>
- </p>
- <pre class="programlisting"><span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">></span>
- <span class="identifier">pair</span><span class="special"><</span><span class="identifier">corpusIter</span><span class="special">,</span> <span class="identifier">corpusIter</span><span class="special">></span> <span class="identifier">boyer_moore_horspool_search</span> <span class="special">(</span>
- <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span><span class="special">,</span>
- <span class="identifier">patIter</span> <span class="identifier">pat_first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">pat_last</span> <span class="special">);</span>
- </pre>
- <p>
- </p>
- <p>
- Each of the functions is passed two pairs of iterators. The first two define
- the corpus and the second two define the pattern. Note that the two pairs
- need not be of the same type, but they do need to "point" at the
- same type. In other words, <code class="computeroutput"><span class="identifier">patIter</span><span class="special">::</span><span class="identifier">value_type</span></code>
- and <code class="computeroutput"><span class="identifier">curpusIter</span><span class="special">::</span><span class="identifier">value_type</span></code> need to be the same type.
- </p>
- <p>
- The return value of the function is a pair of iterators pointing to the position
- of the pattern in the corpus. If the pattern is empty, it returns at empty
- range at the start of the corpus (<code class="computeroutput"><span class="identifier">corpus_first</span></code>,
- <code class="computeroutput"><span class="identifier">corpus_first</span></code>). If the pattern
- is not found, it returns at empty range at the end of the corpus (<code class="computeroutput"><span class="identifier">corpus_last</span></code>, <code class="computeroutput"><span class="identifier">corpus_last</span></code>).
- </p>
- <h5>
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h2"></a>
- <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.compatibility_note"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.compatibility_note">Compatibility
- Note</a>
- </h5>
- <p>
- Earlier versions of this searcher returned only a single iterator. As explained
- in <a href="https://cplusplusmusings.wordpress.com/2016/02/01/sometimes-you-get-things-wrong/" target="_top">https://cplusplusmusings.wordpress.com/2016/02/01/sometimes-you-get-things-wrong/</a>,
- this was a suboptimal interface choice, and has been changed, starting in
- the 1.62.0 release. Old code that is expecting a single iterator return value
- can be updated by replacing the return value of the searcher's <code class="computeroutput"><span class="keyword">operator</span> <span class="special">()</span></code>
- with the <code class="computeroutput"><span class="special">.</span><span class="identifier">first</span></code>
- field of the pair.
- </p>
- <p>
- Instead of:
- </p>
- <pre class="programlisting"><span class="identifier">iterator</span> <span class="identifier">foo</span> <span class="special">=</span> <span class="identifier">searcher</span><span class="special">(</span><span class="identifier">a</span><span class="special">,</span> <span class="identifier">b</span><span class="special">);</span>
- </pre>
- <p>
- </p>
- <p>
- you now write:
- </p>
- <pre class="programlisting"><span class="identifier">iterator</span> <span class="identifier">foo</span> <span class="special">=</span> <span class="identifier">searcher</span><span class="special">(</span><span class="identifier">a</span><span class="special">,</span> <span class="identifier">b</span><span class="special">).</span><span class="identifier">first</span><span class="special">;</span>
- </pre>
- <p>
- </p>
- <h5>
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h3"></a>
- <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance">Performance</a>
- </h5>
- <p>
- The execution time of the Boyer-Moore-Horspool algorithm is linear in the
- size of the string being searched; it can have a significantly lower constant
- factor than many other search algorithms: it doesn't need to check every
- character of the string to be searched, but rather skips over some of them.
- Generally the algorithm gets faster as the pattern being searched for becomes
- longer. Its efficiency derives from the fact that with each unsuccessful
- attempt to find a match between the search string and the text it is searching,
- it uses the information gained from that attempt to rule out as many positions
- of the text as possible where the string cannot match.
- </p>
- <h5>
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h4"></a>
- <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use">Memory
- Use</a>
- </h5>
- <p>
- The algorithm an internal table that has one entry for each member of the
- "alphabet" in the pattern. For (8-bit) character types, this table
- contains 256 entries.
- </p>
- <h5>
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h5"></a>
- <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity">Complexity</a>
- </h5>
- <p>
- The worst-case performance is <span class="emphasis"><em>O(m x n)</em></span>, where <span class="emphasis"><em>m</em></span>
- is the length of the pattern and <span class="emphasis"><em>n</em></span> is the length of
- the corpus. The average time is <span class="emphasis"><em>O(n)</em></span>. The best case
- performance is sub-linear, and is, in fact, identical to Boyer-Moore, but
- the initialization is quicker and the internal loop is simpler than Boyer-Moore.
- </p>
- <h5>
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h6"></a>
- <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety">Exception
- Safety</a>
- </h5>
- <p>
- Both the object-oriented and procedural versions of the Boyer-Moore-Horspool
- algorithm take their parameters by value and do not use any information other
- than what is passed in. Therefore, both interfaces provide the strong exception
- guarantee.
- </p>
- <h5>
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h7"></a>
- <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes">Notes</a>
- </h5>
- <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
- <li class="listitem">
- When using the object-based interface, the pattern must remain unchanged
- for during the searches; i.e, from the time the object is constructed
- until the final call to operator () returns.
- </li>
- <li class="listitem">
- The Boyer-Moore-Horspool algorithm requires random-access iterators for
- both the pattern and the corpus.
- </li>
- </ul></div>
- <h5>
- <a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h8"></a>
- <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points"></a></span><a class="link" href="BoyerMooreHorspool.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points">Customization
- points</a>
- </h5>
- <p>
- The Boyer-Moore-Horspool object takes a traits template parameter which enables
- the caller to customize how the precomputed table is stored. This table,
- called the skip table, contains (logically) one entry for every possible
- value that the pattern can contain. When searching 8-bit character data,
- this table contains 256 elements. The traits class defines the table to be
- used.
- </p>
- <p>
- The default traits class uses a <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">array</span></code>
- for small 'alphabets' and a <code class="computeroutput"><span class="identifier">tr1</span><span class="special">::</span><span class="identifier">unordered_map</span></code>
- for larger ones. The array-based skip table gives excellent performance,
- but could be prohibitively large when the 'alphabet' of elements to be searched
- grows. The unordered_map based version only grows as the number of unique
- elements in the pattern, but makes many more heap allocations, and gives
- slower lookup performance.
- </p>
- <p>
- To use a different skip table, you should define your own skip table object
- and your own traits class, and use them to instantiate the Boyer-Moore-Horspool
- object. The interface to these objects is described TBD.
- </p>
- </div>
- <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
- <td align="left"></td>
- <td align="right"><div class="copyright-footer">Copyright © 2010-2012 Marshall Clow<p>
- Distributed under the Boost Software License, Version 1.0. (See accompanying
- file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
- </p>
- </div></td>
- </tr></table>
- <hr>
- <div class="spirit-nav">
- <a accesskey="p" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../../algorithm/Searching.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="KnuthMorrisPratt.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
- </div>
- </body>
- </html>
|