123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266 |
- [/============================================================================
- Boost.odeint
- Copyright 2013 Karsten Ahnert
- Copyright 2013 Pascal Germroth
- Copyright 2013 Mario Mulansky
- Use, modification and distribution is subject to the Boost Software License,
- Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt)
- =============================================================================/]
- [section Parallel computation with OpenMP and MPI]
- Parallelization is a key feature for modern numerical libraries due to the vast
- availability of many cores nowadays, even on Laptops.
- odeint currently supports parallelization with OpenMP and MPI, as described in
- the following sections.
- However, it should be made clear from the beginning that the difficulty of
- efficiently distributing ODE integration on many cores/machines lies in the
- parallelization of the system function, which is still the user's
- responsibility.
- Simply using a parallel odeint backend without parallelizing the system function
- will bring you almost no performance gains.
- [section OpenMP]
- [import ../examples/openmp/phase_chain.cpp]
- odeint's OpenMP support is implemented as an external backend, which needs to be
- manually included. Depending on the compiler some additional flags may be
- needed, i.e. [^-fopenmp] for GCC.
- [phase_chain_openmp_header]
- In the easiest parallelization approach with OpenMP we use a standard `vector`
- as the state type:
- [phase_chain_vector_state]
- We initialize the state with some random data:
- [phase_chain_init]
- Now we have to configure the stepper to use the OpenMP backend.
- This is done by explicitly providing the `openmp_range_algebra` as a template
- parameter to the stepper.
- This algebra requires the state type to be a model of Random Access Range and
- will be used from multiple threads by the algebra.
- [phase_chain_stepper]
- Additional to providing the stepper with OpenMP parallelization we also need
- a parallelized system function to exploit the available cores.
- Here this is shown for a simple one-dimensional chain of phase oscillators with
- nearest neighbor coupling:
- [phase_chain_rhs]
- [note In the OpenMP backends the system function will always be called
- sequentially from the thread used to start the integration.]
- Finally, we perform the integration by using one of the integrate functions from
- odeint.
- As you can see, the parallelization is completely hidden in the stepper and the
- system function.
- OpenMP will take care of distributing the work among the threads and join them
- automatically.
- [phase_chain_integrate]
- After integrating, the data can be accessed immediately and be processed
- further.
- Note, that you can specify the OpenMP scheduling by calling `omp_set_schedule`
- in the beginning of your program:
- [phase_chain_scheduling]
- See [github_link examples/openmp/phase_chain.cpp
- openmp/phase_chain.cpp] for the complete example.
- [heading Split state]
- [import ../examples/openmp/phase_chain_omp_state.cpp]
- For advanced cases odeint offers another approach to use OpenMP that allows for
- a more exact control of the parallelization.
- For example, for odd-sized data where OpenMP's thread boundaries don't match
- cache lines and hurt performance it might be advisable to copy the data from the
- continuous `vector<T>` into separate, individually aligned, vectors.
- For this, odeint provides the `openmp_state<T>` type, essentially an alias for
- `vector<vector<T>>`.
- Here, the initialization is done with a `vector<double>`, but then we use
- odeint's `split` function to fill an `openmp_state`.
- The splitting is done such that the sizes of the individual regions differ at
- most by 1 to make the computation as uniform as possible.
- [phase_chain_state_init]
- Of course, the system function has to be changed to deal with the
- `openmp_state`.
- Note that each sub-region of the state is computed in a single task, but at the
- borders read access to the neighbouring regions is required.
- [phase_chain_state_rhs]
- Using the `openmp_state<T>` state type automatically selects `openmp_algebra`
- which executes odeint's internal computations on parallel regions.
- Hence, no manual configuration of the stepper is necessary.
- At the end of the integration, we use `unsplit` to concatenate the sub-regions
- back together into a single vector.
- [phase_chain_state_integrate]
- [note You don't actually need to use `openmp_state<T>` for advanced use cases,
- `openmp_algebra` is simply an alias for `openmp_nested_algebra<range_algebra>`
- and supports any model of Random Access Range as the outer, parallel state type,
- and will use the given algebra on its elements.]
- See [github_link examples/openmp/phase_chain_omp_state.cpp
- openmp/phase_chain_omp_state.cpp] for the complete example.
- [endsect]
- [section MPI]
- [import ../examples/mpi/phase_chain.cpp]
- To expand the parallel computation across multiple machines we can use MPI.
- The system function implementation is similar to the OpenMP variant with split
- data, the main difference being that while OpenMP uses a spawn/join model where
- everything not explicitly paralleled is only executed in the main thread, in
- MPI's model each node enters the `main()` method independently, diverging based
- on its rank and synchronizing through message-passing and explicit barriers.
- odeint's MPI support is implemented as an external backend, too.
- Depending on the MPI implementation the code might need to be compiled with i.e.
- [^mpic++].
- [phase_chain_mpi_header]
- Instead of reading another thread's data, we asynchronously send and receive the
- relevant data from neighbouring nodes, performing some computation in the interim
- to hide the latency.
- [phase_chain_mpi_rhs]
- Analogous to `openmp_state<T>` we use `mpi_state< InnerState<T> >`, which
- automatically selects `mpi_nested_algebra` and the appropriate MPI-oblivious
- inner algebra (since our inner state is a `vector`, the inner algebra will be
- `range_algebra` as in the OpenMP example).
- [phase_chain_state]
- In the main program we construct a `communicator` which tells us the `size` of
- the cluster and the current node's `rank` within that.
- We generate the input data on the master node only, avoiding unnecessary work on
- the other nodes.
- Instead of simply copying chunks, `split` acts as a MPI collective function here
- and sends/receives regions from master to each slave.
- The input argument is ignored on the slaves, but the master node receives
- a region in its output and will participate in the computation.
- [phase_chain_mpi_init]
- Now that `x_split` contains (only) the local chunk for each node, we start the
- integration.
- To print the result on the master node, we send the processed data back using
- `unsplit`.
- [phase_chain_mpi_integrate]
- [note `mpi_nested_algebra::for_each`[~N] doesn't use any MPI constructs, it
- simply calls the inner algebra on the local chunk and the system function is not
- guarded by any barriers either, so if you don't manually place any (for example
- in parameter studies cases where the elements are completely independent) you
- might see the nodes diverging, returning from this call at different times.]
- See [github_link examples/mpi/phase_chain.cpp
- mpi/phase_chain.cpp] for the complete example.
- [endsect]
- [section Concepts]
- [section MPI State]
- As used by `mpi_nested_algebra`.
- [heading Notation]
- [variablelist
- [[`InnerState`] [The inner state type]]
- [[`State`] [The MPI-state type]]
- [[`state`] [Object of type `State`]]
- [[`world`] [Object of type `boost::mpi::communicator`]]
- ]
- [heading Valid Expressions]
- [table
- [[Name] [Expression] [Type] [Semantics]]
- [[Construct a state with a communicator]
- [`State(world)`] [`State`] [Constructs the State.]]
- [[Construct a state with the default communicator]
- [`State()`] [`State`] [Constructs the State.]]
- [[Get the current node's inner state]
- [`state()`] [`InnerState`] [Returns a (const) reference.]]
- [[Get the communicator]
- [`state.world`] [`boost::mpi::communicator`] [See __boost_mpi.]]
- ]
- [heading Models]
- * `mpi_state<InnerState>`
- [endsect]
- [section OpenMP Split State]
- As used by `openmp_nested_algebra`, essentially a Random Access Container with
- `ValueType = InnerState`.
- [heading Notation]
- [variablelist
- [[`InnerState`] [The inner state type]]
- [[`State`] [The split state type]]
- [[`state`] [Object of type `State`]]
- ]
- [heading Valid Expressions]
- [table
- [[Name] [Expression] [Type] [Semantics]]
- [[Construct a state for `n` chunks]
- [`State(n)`] [`State`] [Constructs underlying `vector`.]]
- [[Get a chunk]
- [`state[i]`] [`InnerState`] [Accesses underlying `vector`.]]
- [[Get the number of chunks]
- [`state.size()`] [`size_type`] [Returns size of underlying `vector`.]]
- ]
- [heading Models]
- * `openmp_state<ValueType>` with `InnerState = vector<ValueType>`
- [endsect]
- [section Splitter]
- [heading Notation]
- [variablelist
- [[`Container1`] [The continuous-data container type]]
- [[`x`] [Object of type `Container1`]]
- [[`Container2`] [The chunked-data container type]]
- [[`y`] [Object of type `Container2`]]
- ]
- [heading Valid Expressions]
- [table
- [[Name] [Expression] [Type] [Semantics]]
- [[Copy chunks of input to output elements]
- [`split(x, y)`] [`void`]
- [Calls `split_impl<Container1, Container2>::split(x, y)`, splits `x` into
- `y.size()` chunks.]]
- [[Join chunks of input elements to output]
- [`unsplit(y, x)`] [`void`]
- [Calls `unsplit_impl<Container2, Container1>::unsplit(y, x)`, assumes `x`
- is of the correct size ['__sigma `y[i].size()`], does not resize `x`.]]
- ]
- [heading Models]
- * defined for `Container1` = __boost_range and `Container2 = openmp_state`
- * and `Container2 = mpi_state`.
- To implement splitters for containers incompatible with __boost_range,
- specialize the `split_impl` and `unsplit_impl` types:
- ```
- template< class Container1, class Container2 , class Enabler = void >
- struct split_impl {
- static void split( const Container1 &from , Container2 &to );
- };
- template< class Container2, class Container1 , class Enabler = void >
- struct unsplit_impl {
- static void unsplit( const Container2 &from , Container1 &to );
- };
- ```
- [endsect]
- [endsect]
- [endsect]
|