parallel.html 33 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533
  1. <html>
  2. <head>
  3. <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
  4. <title>3.- Parallel Algorithms</title>
  5. <link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">
  6. <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
  7. <link rel="home" href="../index.html" title="Boost.Sort">
  8. <link rel="up" href="../index.html" title="Boost.Sort">
  9. <link rel="prev" href="single_thread/windows_single/complex_benchmarks.html" title="Complex (Several Types)">
  10. <link rel="next" href="parallel/sample_sort.html" title="3.2.- Sample_Sort">
  11. </head>
  12. <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
  13. <table cellpadding="2" width="100%"><tr>
  14. <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
  15. <td align="center"><a href="../../../../../index.html">Home</a></td>
  16. <td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
  17. <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
  18. <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
  19. <td align="center"><a href="../../../../../more/index.htm">More</a></td>
  20. </tr></table>
  21. <hr>
  22. <div class="spirit-nav">
  23. <a accesskey="p" href="single_thread/windows_single/complex_benchmarks.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="parallel/sample_sort.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
  24. </div>
  25. <div class="section">
  26. <div class="titlepage"><div><div><h2 class="title" style="clear: both">
  27. <a name="sort.parallel"></a><a class="link" href="parallel.html" title="3.- Parallel Algorithms">3.- Parallel Algorithms</a>
  28. </h2></div></div></div>
  29. <div class="toc"><dl class="toc">
  30. <dt><span class="section"><a href="parallel.html#sort.parallel.block_indirect_sort">3.1- block_indirect_sort</a></span></dt>
  31. <dd><dl>
  32. <dt><span class="section"><a href="parallel.html#sort.parallel.block_indirect_sort.block_description">Description</a></span></dt>
  33. <dt><span class="section"><a href="parallel.html#sort.parallel.block_indirect_sort.block_benchmark">Benchmark</a></span></dt>
  34. <dt><span class="section"><a href="parallel.html#sort.parallel.block_indirect_sort.block_programming">Programming</a></span></dt>
  35. <dt><span class="section"><a href="parallel.html#sort.parallel.block_indirect_sort.block_internal">Internal
  36. Description</a></span></dt>
  37. <dt><span class="section"><a href="parallel.html#sort.parallel.block_indirect_sort.design_process">Design
  38. Process</a></span></dt>
  39. </dl></dd>
  40. <dt><span class="section"><a href="parallel/sample_sort.html">3.2.- Sample_Sort</a></span></dt>
  41. <dd><dl>
  42. <dt><span class="section"><a href="parallel/sample_sort.html#sort.parallel.sample_sort.sample_description">Description</a></span></dt>
  43. <dt><span class="section"><a href="parallel/sample_sort/sample_programming.html">Programming</a></span></dt>
  44. </dl></dd>
  45. <dt><span class="section"><a href="parallel/parallel_stable_sort.html">3.3.- Parallel_stable_sort</a></span></dt>
  46. <dd><dl>
  47. <dt><span class="section"><a href="parallel/parallel_stable_sort.html#sort.parallel.parallel_stable_sort.parallel_description">Description</a></span></dt>
  48. <dt><span class="section"><a href="parallel/parallel_stable_sort/parallel_programming.html">Programming</a></span></dt>
  49. </dl></dd>
  50. <dt><span class="section"><a href="parallel/linux_parallel.html">3.4- Linux Benchmarks</a></span></dt>
  51. <dt><span class="section"><a href="parallel/windows_parallel.html">3.5- Windows Benchmark</a></span></dt>
  52. </dl></div>
  53. <div class="blockquote"><blockquote class="blockquote">
  54. <h5>
  55. <a name="sort.parallel.h0"></a>
  56. <span class="phrase"><a name="sort.parallel.parallel_algorithms"></a></span><a class="link" href="parallel.html#sort.parallel.parallel_algorithms"><span class="underline">Parallel Algorithms</span></a>
  57. </h5>
  58. <div class="blockquote"><blockquote class="blockquote">
  59. <p>
  60. This table provides a brief description of the parallel algorithms of the
  61. library.
  62. </p>
  63. <p>
  64. <span class="bold"><strong>
  65. <pre class="programlisting"> | | | |
  66. Algorithm |Stable | Additional memory |Best, average, and worst case |
  67. ----------------------+-------+------------------------+------------------------------+
  68. block_indirect_sort | no |block_size * num_threads| N, N LogN , N LogN |
  69. sample_sort | yes | N | N, N LogN , N LogN |
  70. parallel_stable_sort | yes | N / 2 | N, N LogN , N LogN |
  71. | | | |
  72. </pre>
  73. </strong></span>
  74. </p>
  75. <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
  76. <li class="listitem">
  77. <span class="bold"><strong>Sample_sort</strong></span> is a implementation of
  78. the <span class="bold"><strong><a href="https://en.wikipedia.org/wiki/Samplesort" target="_top">Samplesort
  79. algorithm</a></strong></span> done by Francisco Tapia.
  80. </li>
  81. <li class="listitem">
  82. <span class="bold"><strong>Parallel_stable_sort</strong></span> is based on the
  83. samplesort algorithm, but using a half of the memory used by sample_sort,
  84. conceived and implemented by Francisco Tapia.
  85. </li>
  86. <li class="listitem">
  87. <span class="bold"><strong>Block_indirect_sort</strong></span> is a novel parallel
  88. sort algorithm, very fast, with low additional memory consumption,
  89. conceived and implemented by Francisco Tapia.
  90. </li>
  91. </ul></div>
  92. <p>
  93. The <span class="bold"><strong>block_size</strong></span> is an internal parameter
  94. of the algorithm, which in order to achieve the highest speed, changes
  95. according to the size of the objects to sort according to the next table.
  96. The strings use a block_size of 128.
  97. </p>
  98. <p>
  99. <span class="bold"><strong>
  100. <pre class="programlisting"> | | | | | | | |
  101. object size (bytes) | 1 - 15 | 16 - 31 | 32 - 63 | 64 - 127|128 - 255|256 - 511| 512 - |
  102. --------------------------------+--------+---------+---------+---------+---------+---------+----------+
  103. block_size (number of elements) | 4096 | 2048 | 1024 | 768 | 512 | 256 | 128 |
  104. | | | | | | | |
  105. </pre>
  106. </strong></span>
  107. </p>
  108. </blockquote></div>
  109. <h5>
  110. <a name="sort.parallel.h1"></a>
  111. <span class="phrase"><a name="sort.parallel.thread_specification"></a></span><a class="link" href="parallel.html#sort.parallel.thread_specification"><span class="underline">Thread Specification</span></a>
  112. </h5>
  113. <div class="blockquote"><blockquote class="blockquote">
  114. <p>
  115. The parallel algorithms have a integer parameter indicating the <span class="bold"><strong>number of threads</strong></span> to use in the sorting process,
  116. which always is the last value in the call.
  117. </p>
  118. <p>
  119. The default value (if left unspecified) is the number of HW threads of
  120. the machine where the program is running provided by std::thread::hardware_concurrency().
  121. If the number is 1 or 0, the algorithm runs with only 1 thread.
  122. </p>
  123. <p>
  124. The number of threads is not a fixed number, but is calculated in each
  125. execution. The number of threads passed can be greater than the number
  126. of hardware threads on the machine. We can pass 100 threads in a machine
  127. with 4 HW threads, and in the same mode we can pass a function as (std::thread::hardware_concurrency()
  128. / 4 ). If this value is 0, the program is executed with 1 thread.
  129. </p>
  130. </blockquote></div>
  131. <h5>
  132. <a name="sort.parallel.h2"></a>
  133. <span class="phrase"><a name="sort.parallel.programming"></a></span><a class="link" href="parallel.html#sort.parallel.programming"><span class="underline">Programming</span></a>
  134. </h5>
  135. <div class="blockquote"><blockquote class="blockquote">
  136. <p>
  137. You only need to include the file boost/sort/sort.hpp to use these algorithms.
  138. </p>
  139. <p>
  140. All the algorithms run in the namespace boost::sort
  141. </p>
  142. <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">sort</span><span class="special">/</span><span class="identifier">sort</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">&gt;</span>
  143. </pre>
  144. <p>
  145. The parallel algorithms have 4 invocation formats:
  146. </p>
  147. <pre class="programlisting"><span class="identifier">algorithm</span> <span class="special">(</span> <span class="identifier">first</span> <span class="identifier">iterator</span><span class="special">,</span> <span class="identifier">last</span> <span class="identifier">iterator</span><span class="special">,</span> <span class="identifier">comparison</span> <span class="identifier">object</span><span class="special">,</span> <span class="identifier">number</span> <span class="identifier">of</span> <span class="identifier">threads</span> <span class="special">)</span>
  148. <span class="identifier">algorithm</span> <span class="special">(</span> <span class="identifier">first</span> <span class="identifier">iterator</span><span class="special">,</span> <span class="identifier">last</span> <span class="identifier">iterator</span><span class="special">,</span> <span class="identifier">comparison</span> <span class="identifier">object</span> <span class="special">)</span>
  149. <span class="identifier">algorithm</span> <span class="special">(</span> <span class="identifier">first</span> <span class="identifier">iterator</span><span class="special">,</span> <span class="identifier">last</span> <span class="identifier">iterator</span><span class="special">,</span> <span class="identifier">number</span> <span class="identifier">of</span> <span class="identifier">threads</span> <span class="special">)</span>
  150. <span class="identifier">algorithm</span> <span class="special">(</span> <span class="identifier">first</span> <span class="identifier">iterator</span><span class="special">,</span> <span class="identifier">last</span> <span class="identifier">iterator</span> <span class="special">)</span>
  151. </pre>
  152. <p>
  153. These algorithms need a <span class="bold"><strong>C++11 compliant compiler</strong></span>,
  154. but don't need any other code or library. With older compilers correct
  155. operation isn't guaranteed.
  156. </p>
  157. <p>
  158. If the number of threads is unspecified, use the result of std::thread::hardware_concurrency()
  159. </p>
  160. <p>
  161. These algorithms use a <span class="bold"><strong>comparison object</strong></span>,
  162. in the same way as the standard library sort algorithms. If you don't define
  163. it, the comparison object defaults to std::less, which uses the &lt; operator
  164. internally for comparisons.
  165. </p>
  166. <p>
  167. These algorithms are <span class="bold"><strong>exception safe</strong></span>, meaning
  168. that, the exceptions generated by the algorithms guarantee the integrity
  169. of the objects to sort, but not their relative order. If the exception
  170. is generated inside the objects (in the move or copy constructors) the
  171. results are undefined.
  172. </p>
  173. </blockquote></div>
  174. </blockquote></div>
  175. <div class="section">
  176. <div class="titlepage"><div><div><h3 class="title">
  177. <a name="sort.parallel.block_indirect_sort"></a><a class="link" href="parallel.html#sort.parallel.block_indirect_sort" title="3.1- block_indirect_sort">3.1- block_indirect_sort</a>
  178. </h3></div></div></div>
  179. <div class="section">
  180. <div class="titlepage"><div><div><h4 class="title">
  181. <a name="sort.parallel.block_indirect_sort.block_description"></a><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.block_description" title="Description">Description</a>
  182. </h4></div></div></div>
  183. <div class="blockquote"><blockquote class="blockquote">
  184. <p>
  185. <span class="bold"><strong>BLOCK_INDIRECT_SORT</strong></span> is a new unstable
  186. parallel sort, conceived and implemented by Francisco Jose Tapia for
  187. the Boost Library.
  188. </p>
  189. <p>
  190. The most important characteristics of this algorithm are the <span class="bold"><strong>speed</strong></span> and the <span class="bold"><strong>low memory
  191. consumption</strong></span>.
  192. </p>
  193. <p>
  194. <span class="bold"><strong>
  195. <pre class="programlisting"> | | | |
  196. Algorithm |Stable | Additional memory |Best, average, and worst case |
  197. ----------------------+-------+------------------------+------------------------------+
  198. block_indirect_sort | no |block_size * num_threads| N, N LogN, N LogN |
  199. | | | |
  200. </pre>
  201. </strong></span>
  202. </p>
  203. <p>
  204. The block_size is an internal parameter of the algorithm, which in order
  205. to achieve the highest speed, changes according with the size of the
  206. objects to sort according to the next table.The strings use a block_size
  207. of 128.
  208. </p>
  209. <p>
  210. <span class="bold"><strong>
  211. <pre class="programlisting"> | | | | | | | |
  212. object size (bytes) | 1 - 15 | 16 - 31 | 32 - 63 | 64 - 127|128 - 255|256 - 511| 512 - |
  213. --------------------------------+--------+---------+---------+---------+---------+---------+----------+
  214. block_size (number of elements) | 4096 | 2048 | 1024 | 768 | 512 | 256 | 128 |
  215. | | | | | | | |
  216. </pre>
  217. </strong></span>
  218. </p>
  219. </blockquote></div>
  220. </div>
  221. <p>
  222. <br>
  223. </p>
  224. <div class="section">
  225. <div class="titlepage"><div><div><h4 class="title">
  226. <a name="sort.parallel.block_indirect_sort.block_benchmark"></a><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.block_benchmark" title="Benchmark">Benchmark</a>
  227. </h4></div></div></div>
  228. <div class="blockquote"><blockquote class="blockquote"><p>
  229. Sorting 100 000 000 64 bits numbers, the measured memory used was: <span class="bold"><strong>
  230. <pre class="programlisting"> | | |
  231. Algorithm | Time (secs) | Memory used |
  232. ----------------------------------+-------------+-------------+
  233. Open MP Parallel Sort | 1.1990 | 1564 MB |
  234. Threading Building Blocks (TBB) | 1.6411 | 789 MB |
  235. Block Indirect Sort | 0.9270 | 790 MB |
  236. | | |
  237. </pre>
  238. </strong></span>
  239. </p></blockquote></div>
  240. </div>
  241. <p>
  242. <br>
  243. </p>
  244. <div class="section">
  245. <div class="titlepage"><div><div><h4 class="title">
  246. <a name="sort.parallel.block_indirect_sort.block_programming"></a><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.block_programming" title="Programming">Programming</a>
  247. </h4></div></div></div>
  248. <div class="blockquote"><blockquote class="blockquote">
  249. <h5>
  250. <a name="sort.parallel.block_indirect_sort.block_programming.h0"></a>
  251. <span class="phrase"><a name="sort.parallel.block_indirect_sort.block_programming.thread_specification"></a></span><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.block_programming.thread_specification"><span class="underline">Thread specification</span></a>
  252. </h5>
  253. <div class="blockquote"><blockquote class="blockquote">
  254. <p>
  255. This algorithm has an integer parameter indicating the <span class="bold"><strong>number
  256. of threads</strong></span> to use in the sorting process, which always is
  257. the last value in the call. The default value (if left unspecified)
  258. is the number of HW threads of the machine where the program is running
  259. provided by std::thread::hardware_concurrency().
  260. </p>
  261. <p>
  262. If the number is 1 or 0, the algorithm runs with only 1 thread.
  263. </p>
  264. <p>
  265. The number of threads is not a fixed number, but is calculated in each
  266. execution. The number of threads passed can be greater than the number
  267. of hardware threads on the machine. We can pass 100 threads in a machine
  268. with 4 HW threads, and in the same mode we can pass a function as (std::thread::hardware_concurrency()
  269. / 4 ). If this value is 0, the program is executed with 1 thread.
  270. </p>
  271. </blockquote></div>
  272. <h5>
  273. <a name="sort.parallel.block_indirect_sort.block_programming.h1"></a>
  274. <span class="phrase"><a name="sort.parallel.block_indirect_sort.block_programming.programming"></a></span><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.block_programming.programming"><span class="underline">Programming</span></a>
  275. </h5>
  276. <div class="blockquote"><blockquote class="blockquote">
  277. <p>
  278. You only need to include the file boost/sort/sort.hpp to use this algorithm
  279. </p>
  280. <p>
  281. The algorithm runs in the namespace boost::sort
  282. </p>
  283. <pre class="programlisting"><span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">sort</span><span class="special">/</span><span class="identifier">sort</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">&gt;</span>
  284. <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">class</span> <span class="identifier">iter_t</span><span class="special">&gt;</span>
  285. <span class="keyword">void</span> <span class="identifier">block_indirect_sort</span> <span class="special">(</span><span class="identifier">iter_t</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">iter_t</span> <span class="identifier">last</span><span class="special">);</span>
  286. <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">class</span> <span class="identifier">iter_t</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">compare</span><span class="special">&gt;</span>
  287. <span class="keyword">void</span> <span class="identifier">block_indirect_sort</span> <span class="special">(</span><span class="identifier">iter_t</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">iter_t</span> <span class="identifier">last</span><span class="special">,</span> <span class="identifier">compare</span> <span class="identifier">comp</span><span class="special">);</span>
  288. <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">class</span> <span class="identifier">iter_t</span><span class="special">&gt;</span>
  289. <span class="keyword">void</span> <span class="identifier">block_indirect_sort</span> <span class="special">(</span><span class="identifier">iter_t</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">iter_t</span> <span class="identifier">last</span><span class="special">,</span> <span class="identifier">uint32_t</span> <span class="identifier">num_thread</span><span class="special">);</span>
  290. <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">class</span> <span class="identifier">iter_t</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">compare</span><span class="special">&gt;</span>
  291. <span class="keyword">void</span> <span class="identifier">block_indirect_sort</span> <span class="special">(</span><span class="identifier">iter_t</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">iter_t</span> <span class="identifier">last</span><span class="special">,</span> <span class="identifier">compare</span> <span class="identifier">comp</span><span class="special">,</span> <span class="identifier">uint32_t</span> <span class="identifier">num_thread</span><span class="special">);</span>
  292. </pre>
  293. <p>
  294. This algorithm needs a <span class="bold"><strong>C++11 compliant compiler</strong></span>.
  295. You don't need any other code or library. With older compilers correct
  296. operation is not guaranteed.
  297. </p>
  298. <p>
  299. If the number of threads is unspecified, use the result of std::thread::hardware_concurrency()
  300. </p>
  301. <p>
  302. This algorithm uses a <span class="bold"><strong>comparison object</strong></span>,
  303. in the same way as the standard library sort algorithms. If not defined,
  304. the comparison object is std::less, which uses the &lt; operator internally.
  305. </p>
  306. <p>
  307. This algorithm is <span class="bold"><strong>exception safe</strong></span>,
  308. meaning that, the exceptions generated by the algorithm guarantee the
  309. integrity of the objects to sort, but not their relative order. If
  310. the exception is generated inside the objects (in the move or in the
  311. copy constructor.. ) the results can be unpredictable.
  312. </p>
  313. </blockquote></div>
  314. </blockquote></div>
  315. </div>
  316. <p>
  317. <br>
  318. </p>
  319. <div class="section">
  320. <div class="titlepage"><div><div><h4 class="title">
  321. <a name="sort.parallel.block_indirect_sort.block_internal"></a><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.block_internal" title="Internal Description">Internal
  322. Description</a>
  323. </h4></div></div></div>
  324. <div class="blockquote"><blockquote class="blockquote">
  325. <p>
  326. There are two primary categories of parallelization in sorting algorithms.
  327. </p>
  328. <h5>
  329. <a name="sort.parallel.block_indirect_sort.block_internal.h0"></a>
  330. <span class="phrase"><a name="sort.parallel.block_indirect_sort.block_internal.subdivision_algorithms"></a></span><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.block_internal.subdivision_algorithms"><span class="underline">Subdivision Algorithms</span></a>
  331. </h5>
  332. <div class="blockquote"><blockquote class="blockquote">
  333. <p>
  334. Filter the data and generate two or more parts. Each part obtained
  335. is filtered and divided by other threads. This filter and division
  336. process is done until the size of the part is smaller than a predefined
  337. size, then is sorted by a single thread.
  338. </p>
  339. <p>
  340. The algorithm most frequently used in the filter and sorting is quick
  341. sort
  342. </p>
  343. <p>
  344. These algorithms are fast with a small number of threads, but are inefficient
  345. with a great number of threads. Examples of this category are
  346. </p>
  347. <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
  348. <li class="listitem">
  349. Intel Threading Building Blocks (TBB)
  350. </li>
  351. <li class="listitem">
  352. Microsoft PPL Parallel Sort.
  353. </li>
  354. </ul></div>
  355. </blockquote></div>
  356. <h5>
  357. <a name="sort.parallel.block_indirect_sort.block_internal.h1"></a>
  358. <span class="phrase"><a name="sort.parallel.block_indirect_sort.block_internal.merging_algorithms"></a></span><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.block_internal.merging_algorithms"><span class="underline">Merging Algorithms</span></a>
  359. </h5>
  360. <div class="blockquote"><blockquote class="blockquote">
  361. <p>
  362. Divide the data in parts, and each part is sorted by a thread. When
  363. the parts are sorted, they are merged to obtain the final results.
  364. These algorithms need additional memory for the merge, usually the
  365. same size as the data.
  366. </p>
  367. <p>
  368. With a small number of threads, these algorithms have similar speed
  369. to the subdivision algorithms, but with many threads are much faster.
  370. </p>
  371. <p>
  372. Examples of this category are
  373. </p>
  374. <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
  375. <li class="listitem">
  376. GCC Parallel Sort (based on OpenMP)
  377. </li>
  378. <li class="listitem">
  379. Microsoft PPL Parallel Buffered Sort
  380. </li>
  381. </ul></div>
  382. <p>
  383. This generates an <span class="bold"><strong>undesirable duality</strong></span>.
  384. With a small number of threads the optimal algorithm is not the optimal
  385. for a big number of threads.
  386. </p>
  387. <p>
  388. For this reason, the SW designed for a <span class="bold"><strong>small
  389. machine</strong></span> is <span class="bold"><strong>inadequate</strong></span> for
  390. a <span class="bold"><strong>big machine</strong></span> and vice versa.
  391. </p>
  392. <p>
  393. Using only <span class="bold"><strong>merging algorithms</strong></span>, has
  394. the <span class="bold"><strong>problem of the additional memory</strong></span>
  395. used, usually of the same size as the data.
  396. </p>
  397. </blockquote></div>
  398. <h5>
  399. <a name="sort.parallel.block_indirect_sort.block_internal.h2"></a>
  400. <span class="phrase"><a name="sort.parallel.block_indirect_sort.block_internal.new_parallel_sort_algorithm_bloc"></a></span><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.block_internal.new_parallel_sort_algorithm_bloc"><span class="underline">New Parallel Sort Algorithm (Block Indirect Sort)</span></a>
  401. </h5>
  402. <div class="blockquote"><blockquote class="blockquote">
  403. <p>
  404. This algorithm, named Block Indirect Sort, created for processors connected
  405. with shared memory, is a hybrid algorithm.
  406. </p>
  407. <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
  408. <li class="listitem">
  409. With small number of threads, it is a subdivision algorithm.
  410. </li>
  411. <li class="listitem">
  412. With many threads it is a merging algorithm, with a small auxiliary
  413. memory ( block_size * number of threads).
  414. </li>
  415. </ul></div>
  416. <p>
  417. This algorithm <span class="bold"><strong>eliminates the duality</strong></span>.
  418. The same code has <span class="bold"><strong>optimal performance</strong></span>
  419. with a small and a big number of threads.
  420. </p>
  421. <p>
  422. The number of threads to use is evaluated in each execution. When the
  423. program runs with a <span class="bold"><strong>small number of threads</strong></span>
  424. the algorithm internally uses a <span class="bold"><strong>subdivision algorithm</strong></span>
  425. and has similar performance to TBB, and when run with <span class="bold"><strong>many
  426. threads</strong></span>, it internally uses the <span class="bold"><strong>new
  427. algorithm</strong></span> and has the performance of GCC Parallel Sort,
  428. with the additional advantage of <span class="bold"><strong>reduced memory
  429. consumption</strong></span>.
  430. </p>
  431. </blockquote></div>
  432. </blockquote></div>
  433. </div>
  434. <p>
  435. <br>
  436. </p>
  437. <div class="section">
  438. <div class="titlepage"><div><div><h4 class="title">
  439. <a name="sort.parallel.block_indirect_sort.design_process"></a><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.design_process" title="Design Process">Design
  440. Process</a>
  441. </h4></div></div></div>
  442. <div class="blockquote"><blockquote class="blockquote">
  443. <h5>
  444. <a name="sort.parallel.block_indirect_sort.design_process.h0"></a>
  445. <span class="phrase"><a name="sort.parallel.block_indirect_sort.design_process.initial_idea"></a></span><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.design_process.initial_idea"><span class="underline">Initial Idea</span></a>
  446. </h5>
  447. <div class="blockquote"><blockquote class="blockquote">
  448. <p>
  449. The <span class="bold"><strong>initial idea</strong></span>, was to build a
  450. <span class="bold"><strong>merge algorithm</strong></span>, to be <span class="bold"><strong>fast
  451. with many threads, with a low additional memory</strong></span>.
  452. </p>
  453. <p>
  454. This algortihm is <span class="bold"><strong>not based in any other idea
  455. or algorithm</strong></span>. The technique used in the algorithm (indirect
  456. blocks) <span class="bold"><strong>is new, and had been conceived and implemented
  457. for this algorithm</strong></span>.
  458. </p>
  459. <p>
  460. As sample of their results, we can see the the sorting 100 000 000
  461. 64 bits numbers, ramdomly generated, in a machine with 12 threads.
  462. </p>
  463. <p>
  464. <span class="bold"><strong>
  465. <pre class="programlisting"> | | |
  466. Algorithm | Time (secs) | Memory used |
  467. ----------------------------------+-------------+-------------+
  468. Open MP Parallel Sort | 1.1990 | 1564 MB |
  469. Threading Building Blocks (TBB) | 1.6411 | 789 MB |
  470. Block Indirect Sort | 0.9270 | 790 MB |
  471. | | |
  472. </pre>
  473. </strong></span>
  474. </p>
  475. <p>
  476. The best words about this algorithm are expressed by their <span class="bold"><strong><a class="link" href="parallel/linux_parallel.html" title="3.4- Linux Benchmarks">Benchmarks
  477. results</a></strong></span>
  478. </p>
  479. </blockquote></div>
  480. <h5>
  481. <a name="sort.parallel.block_indirect_sort.design_process.h1"></a>
  482. <span class="phrase"><a name="sort.parallel.block_indirect_sort.design_process.design_process"></a></span><a class="link" href="parallel.html#sort.parallel.block_indirect_sort.design_process.design_process"><span class="underline">Design process</span></a>
  483. </h5>
  484. <div class="blockquote"><blockquote class="blockquote">
  485. <p>
  486. The process had been <span class="bold"><strong>long and very hard</strong></span>,
  487. mainly, by the uncertainty about if the ideas are correct and run so
  488. fast as need for to be useful. This is complicated by the fact that
  489. we can&#8217;t be sure of the efficiency until the last part of the code
  490. is done and the first benchmark has run.
  491. </p>
  492. <p>
  493. But it had been a <span class="bold"><strong>very exciting process</strong></span>,
  494. each time a problem is resolved, a new internal algorithm is designed,
  495. tested &#8230;, and see, step by step, the advance of the process.
  496. </p>
  497. <p>
  498. I discovered <span class="bold"><strong>many new problems</strong></span> during
  499. this process, unknown until now, which forced me to design <span class="bold"><strong>new internal algorithms</strong></span> to resolve them, and
  500. divide the work in many parts to execute in parallel mode. Due to this,
  501. you can find many nice algorithms inside the sorting algorithm written
  502. to resolve and parallelize the internal problems.
  503. </p>
  504. <p>
  505. If you are interested in a detailed description of the algorithm, you
  506. can find it here : <span class="bold"><strong><a href="../../papers/block_indirect_sort_en.pdf" target="_top">block_indirect_sort_en.pdf</a></strong></span>.
  507. </p>
  508. </blockquote></div>
  509. </blockquote></div>
  510. </div>
  511. </div>
  512. </div>
  513. <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
  514. <td align="left"></td>
  515. <td align="right"><div class="copyright-footer">Copyright &#169; 2014-2017 Steven
  516. Ross, Francisco Tapia, Orson Peters<p>
  517. Distributed under the <a href="http://boost.org/LICENSE_1_0.txt" target="_top">Boost
  518. Software License, Version 1.0</a>.
  519. </p>
  520. </div></td>
  521. </tr></table>
  522. <hr>
  523. <div class="spirit-nav">
  524. <a accesskey="p" href="single_thread/windows_single/complex_benchmarks.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="parallel/sample_sort.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
  525. </div>
  526. </body>
  527. </html>