collective.qbk 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367
  1. [section:collectives Collective operations]
  2. [link mpi.tutorial.point_to_point Point-to-point operations] are the
  3. core message passing primitives in Boost.MPI. However, many
  4. message-passing applications also require higher-level communication
  5. algorithms that combine or summarize the data stored on many different
  6. processes. These algorithms support many common tasks such as
  7. "broadcast this value to all processes", "compute the sum of the
  8. values on all processors" or "find the global minimum."
  9. [section:broadcast Broadcast]
  10. The [funcref boost::mpi::broadcast `broadcast`] algorithm is
  11. by far the simplest collective operation. It broadcasts a value from a
  12. single process to all other processes within a [classref
  13. boost::mpi::communicator communicator]. For instance, the
  14. following program broadcasts "Hello, World!" from process 0 to every
  15. other process. (`hello_world_broadcast.cpp`)
  16. #include <boost/mpi.hpp>
  17. #include <iostream>
  18. #include <string>
  19. #include <boost/serialization/string.hpp>
  20. namespace mpi = boost::mpi;
  21. int main()
  22. {
  23. mpi::environment env;
  24. mpi::communicator world;
  25. std::string value;
  26. if (world.rank() == 0) {
  27. value = "Hello, World!";
  28. }
  29. broadcast(world, value, 0);
  30. std::cout << "Process #" << world.rank() << " says " << value
  31. << std::endl;
  32. return 0;
  33. }
  34. Running this program with seven processes will produce a result such
  35. as:
  36. [pre
  37. Process #0 says Hello, World!
  38. Process #2 says Hello, World!
  39. Process #1 says Hello, World!
  40. Process #4 says Hello, World!
  41. Process #3 says Hello, World!
  42. Process #5 says Hello, World!
  43. Process #6 says Hello, World!
  44. ]
  45. [endsect:broadcast]
  46. [section:gather Gather]
  47. The [funcref boost::mpi::gather `gather`] collective gathers
  48. the values produced by every process in a communicator into a vector
  49. of values on the "root" process (specified by an argument to
  50. `gather`). The /i/th element in the vector will correspond to the
  51. value gathered from the /i/th process. For instance, in the following
  52. program each process computes its own random number. All of these
  53. random numbers are gathered at process 0 (the "root" in this case),
  54. which prints out the values that correspond to each processor.
  55. (`random_gather.cpp`)
  56. #include <boost/mpi.hpp>
  57. #include <iostream>
  58. #include <vector>
  59. #include <cstdlib>
  60. namespace mpi = boost::mpi;
  61. int main()
  62. {
  63. mpi::environment env;
  64. mpi::communicator world;
  65. std::srand(time(0) + world.rank());
  66. int my_number = std::rand();
  67. if (world.rank() == 0) {
  68. std::vector<int> all_numbers;
  69. gather(world, my_number, all_numbers, 0);
  70. for (int proc = 0; proc < world.size(); ++proc)
  71. std::cout << "Process #" << proc << " thought of "
  72. << all_numbers[proc] << std::endl;
  73. } else {
  74. gather(world, my_number, 0);
  75. }
  76. return 0;
  77. }
  78. Executing this program with seven processes will result in output such
  79. as the following. Although the random values will change from one run
  80. to the next, the order of the processes in the output will remain the
  81. same because only process 0 writes to `std::cout`.
  82. [pre
  83. Process #0 thought of 332199874
  84. Process #1 thought of 20145617
  85. Process #2 thought of 1862420122
  86. Process #3 thought of 480422940
  87. Process #4 thought of 1253380219
  88. Process #5 thought of 949458815
  89. Process #6 thought of 650073868
  90. ]
  91. The `gather` operation collects values from every process into a
  92. vector at one process. If instead the values from every process need
  93. to be collected into identical vectors on every process, use the
  94. [funcref boost::mpi::all_gather `all_gather`] algorithm,
  95. which is semantically equivalent to calling `gather` followed by a
  96. `broadcast` of the resulting vector.
  97. [endsect:gather]
  98. [section:scatter Scatter]
  99. The [funcref boost::mpi::scatter `scatter`] collective scatters
  100. the values from a vector in the "root" process in a communicator into
  101. values in all the processes of the communicator.
  102. The /i/th element in the vector will correspond to the
  103. value received by the /i/th process. For instance, in the following
  104. program, the root process produces a vector of random nomber and send
  105. one value to each process that will print it. (`random_scatter.cpp`)
  106. #include <boost/mpi.hpp>
  107. #include <boost/mpi/collectives.hpp>
  108. #include <iostream>
  109. #include <cstdlib>
  110. #include <vector>
  111. namespace mpi = boost::mpi;
  112. int main(int argc, char* argv[])
  113. {
  114. mpi::environment env(argc, argv);
  115. mpi::communicator world;
  116. std::srand(time(0) + world.rank());
  117. std::vector<int> all;
  118. int mine = -1;
  119. if (world.rank() == 0) {
  120. all.resize(world.size());
  121. std::generate(all.begin(), all.end(), std::rand);
  122. }
  123. mpi::scatter(world, all, mine, 0);
  124. for (int r = 0; r < world.size(); ++r) {
  125. world.barrier();
  126. if (r == world.rank()) {
  127. std::cout << "Rank " << r << " got " << mine << '\n';
  128. }
  129. }
  130. return 0;
  131. }
  132. Executing this program with seven processes will result in output such
  133. as the following. Although the random values will change from one run
  134. to the next, the order of the processes in the output will remain the
  135. same because of the barrier.
  136. [pre
  137. Rank 0 got 1409381269
  138. Rank 1 got 17045268
  139. Rank 2 got 440120016
  140. Rank 3 got 936998224
  141. Rank 4 got 1827129182
  142. Rank 5 got 1951746047
  143. Rank 6 got 2117359639
  144. ]
  145. [endsect:scatter]
  146. [section:reduce Reduce]
  147. The [funcref boost::mpi::reduce `reduce`] collective
  148. summarizes the values from each process into a single value at the
  149. user-specified "root" process. The Boost.MPI `reduce` operation is
  150. similar in spirit to the STL _accumulate_ operation, because it takes
  151. a sequence of values (one per process) and combines them via a
  152. function object. For instance, we can randomly generate values in each
  153. process and the compute the minimum value over all processes via a
  154. call to [funcref boost::mpi::reduce `reduce`]
  155. (`random_min.cpp`):
  156. #include <boost/mpi.hpp>
  157. #include <iostream>
  158. #include <cstdlib>
  159. namespace mpi = boost::mpi;
  160. int main()
  161. {
  162. mpi::environment env;
  163. mpi::communicator world;
  164. std::srand(time(0) + world.rank());
  165. int my_number = std::rand();
  166. if (world.rank() == 0) {
  167. int minimum;
  168. reduce(world, my_number, minimum, mpi::minimum<int>(), 0);
  169. std::cout << "The minimum value is " << minimum << std::endl;
  170. } else {
  171. reduce(world, my_number, mpi::minimum<int>(), 0);
  172. }
  173. return 0;
  174. }
  175. The use of `mpi::minimum<int>` indicates that the minimum value
  176. should be computed. `mpi::minimum<int>` is a binary function object
  177. that compares its two parameters via `<` and returns the smaller
  178. value. Any associative binary function or function object will
  179. work provided it's stateless. For instance, to concatenate strings with `reduce` one could use
  180. the function object `std::plus<std::string>` (`string_cat.cpp`):
  181. #include <boost/mpi.hpp>
  182. #include <iostream>
  183. #include <string>
  184. #include <functional>
  185. #include <boost/serialization/string.hpp>
  186. namespace mpi = boost::mpi;
  187. int main()
  188. {
  189. mpi::environment env;
  190. mpi::communicator world;
  191. std::string names[10] = { "zero ", "one ", "two ", "three ",
  192. "four ", "five ", "six ", "seven ",
  193. "eight ", "nine " };
  194. std::string result;
  195. reduce(world,
  196. world.rank() < 10? names[world.rank()]
  197. : std::string("many "),
  198. result, std::plus<std::string>(), 0);
  199. if (world.rank() == 0)
  200. std::cout << "The result is " << result << std::endl;
  201. return 0;
  202. }
  203. In this example, we compute a string for each process and then perform
  204. a reduction that concatenates all of the strings together into one,
  205. long string. Executing this program with seven processors yields the
  206. following output:
  207. [pre
  208. The result is zero one two three four five six
  209. ]
  210. [h4 Binary operations for reduce]
  211. Any kind of binary function objects can be used with `reduce`. For
  212. instance, and there are many such function objects in the C++ standard
  213. `<functional>` header and the Boost.MPI header
  214. `<boost/mpi/operations.hpp>`. Or, you can create your own
  215. function object. Function objects used with `reduce` must be
  216. associative, i.e. `f(x, f(y, z))` must be equivalent to `f(f(x, y),
  217. z)`. If they are also commutative (i..e, `f(x, y) == f(y, x)`),
  218. Boost.MPI can use a more efficient implementation of `reduce`. To
  219. state that a function object is commutative, you will need to
  220. specialize the class [classref boost::mpi::is_commutative
  221. `is_commutative`]. For instance, we could modify the previous example
  222. by telling Boost.MPI that string concatenation is commutative:
  223. namespace boost { namespace mpi {
  224. template<>
  225. struct is_commutative<std::plus<std::string>, std::string>
  226. : mpl::true_ { };
  227. } } // end namespace boost::mpi
  228. By adding this code prior to `main()`, Boost.MPI will assume that
  229. string concatenation is commutative and employ a different parallel
  230. algorithm for the `reduce` operation. Using this algorithm, the
  231. program outputs the following when run with seven processes:
  232. [pre
  233. The result is zero one four five six two three
  234. ]
  235. Note how the numbers in the resulting string are in a different order:
  236. this is a direct result of Boost.MPI reordering operations. The result
  237. in this case differed from the non-commutative result because string
  238. concatenation is not commutative: `f("x", "y")` is not the same as
  239. `f("y", "x")`, because argument order matters. For truly commutative
  240. operations (e.g., integer addition), the more efficient commutative
  241. algorithm will produce the same result as the non-commutative
  242. algorithm. Boost.MPI also performs direct mappings from function
  243. objects in `<functional>` to `MPI_Op` values predefined by MPI (e.g.,
  244. `MPI_SUM`, `MPI_MAX`); if you have your own function objects that can
  245. take advantage of this mapping, see the class template [classref
  246. boost::mpi::is_mpi_op `is_mpi_op`].
  247. [warning Due to the underlying MPI limitations, it is important to note that the operation must be stateless.]
  248. [h4 All process variant]
  249. Like [link mpi.tutorial.collectives.gather `gather`], `reduce` has an "all"
  250. variant called [funcref boost::mpi::all_reduce `all_reduce`]
  251. that performs the reduction operation and broadcasts the result to all
  252. processes. This variant is useful, for instance, in establishing
  253. global minimum or maximum values.
  254. The following code (`global_min.cpp`) shows a broadcasting version of
  255. the `random_min.cpp` example:
  256. #include <boost/mpi.hpp>
  257. #include <iostream>
  258. #include <cstdlib>
  259. namespace mpi = boost::mpi;
  260. int main(int argc, char* argv[])
  261. {
  262. mpi::environment env(argc, argv);
  263. mpi::communicator world;
  264. std::srand(world.rank());
  265. int my_number = std::rand();
  266. int minimum;
  267. mpi::all_reduce(world, my_number, minimum, mpi::minimum<int>());
  268. if (world.rank() == 0) {
  269. std::cout << "The minimum value is " << minimum << std::endl;
  270. }
  271. return 0;
  272. }
  273. In that example we provide both input and output values, requiring
  274. twice as much space, which can be a problem depending on the size
  275. of the transmitted data.
  276. If there is no need to preserve the input value, the output value
  277. can be omitted. In that case the input value will be overridden with
  278. the output value and Boost.MPI is able, in some situation, to implement
  279. the operation with a more space efficient solution (using the `MPI_IN_PLACE`
  280. flag of the MPI C mapping), as in the following example (`in_place_global_min.cpp`):
  281. #include <boost/mpi.hpp>
  282. #include <iostream>
  283. #include <cstdlib>
  284. namespace mpi = boost::mpi;
  285. int main(int argc, char* argv[])
  286. {
  287. mpi::environment env(argc, argv);
  288. mpi::communicator world;
  289. std::srand(world.rank());
  290. int my_number = std::rand();
  291. mpi::all_reduce(world, my_number, mpi::minimum<int>());
  292. if (world.rank() == 0) {
  293. std::cout << "The minimum value is " << my_number << std::endl;
  294. }
  295. return 0;
  296. }
  297. [endsect:reduce]
  298. [endsect:collectives]