Fiber

Oliver Kowalke 2013 Oliver Kowalke Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) C++ Library to cooperatively schedule and synchronize micro-threads Fiber

multiplex fibers
      across multiple
      cores

(see numa::work_stealing). A fiber launched on a particular thread continues running on that thread unless migrated. It might be unblocked (see Blocking below) by some other thread, but that only transitions the fiber from blocked to ready on its current thread — it does not cause the fiber to resume on the thread that unblocked it. thread-local storage Unless migrated, a fiber may access thread-local storage; however that storage will be shared among all fibers running on the same thread. For fiber-local storage, please see fiber_specific_ptr. BOOST_FIBERS_NO_ATOMICS The fiber synchronization objects provided by this library will, by default, safely synchronize fibers running on different threads. However, this level of synchronization can be removed (for performance) by building the library with BOOST_FIBERS_NO_ATOMICS defined. When the library is built with that macro, you must ensure that all the fibers referencing a particular synchronization object are running in the same thread. Please see Synchronization. Blocking Normally, when this documentation states that a particular fiber blocks (or equivalently, suspends), it means that it yields control, allowing other fibers on the same thread to run. The synchronization mechanisms provided by Boost.Fiber have this behavior. A fiber may, of course, use normal thread synchronization mechanisms; however a fiber that invokes any of these mechanisms will block its entire thread, preventing any other fiber from running on that thread in the meantime. For instance, when a fiber wants to wait for a value from another fiber in the same thread, using std::future would be unfortunate: std::future::get() would block the whole thread, preventing the other fiber from delivering its value. Use future<> instead. Similarly, a fiber that invokes a normal blocking I/O operation will block its entire thread. Fiber authors are encouraged to consistently use asynchronous I/O. Boost.Asio and other asynchronous I/O operations can straightforwardly be adapted for Boost.Fiber: see Integrating Fibers with Asynchronous Callbacks. Boost.Fiber depends upon Boost.Context. Boost version 1.61.0 or greater is required. This library requires C++11!

<anchor id="implementation"/><link linkend="fiber.overview.implementations__fcontext_t__ucontext_t_and_winfiber">Implementations: fcontext_t, ucontext_t and WinFiber</link> Boost.Fiber uses call/cc from Boost.Context as building-block. fcontext_t The implementation uses fcontext_t per default. fcontext_t is based on assembler and not available for all platforms. It provides a much better performance than ucontext_t (the context switch takes two magnitudes of order less CPU cycles; see section performance) and WinFiber. ucontext_t As an alternative, ucontext_t can be used by compiling with BOOST_USE_UCONTEXT and b2 property context-impl=ucontext. ucontext_t might be available on a broader range of POSIX-platforms but has some disadvantages (for instance deprecated since POSIX.1-2003, not C99 conform). call/cc supports Segmented stacks only with ucontext_t as its implementation. WinFiber With BOOST_USE_WINFIB and b2 property context-impl=winfib Win32-Fibers are used as implementation for call/cc. Because the TIB (thread information block) is not fully described in the MSDN, it might be possible that not all required TIB-parts are swapped. The first call of call/cc converts the thread into a Windows fiber by invoking ConvertThreadToFiber(). If desired, ConvertFiberToThread() has to be called by the user explicitly in order to release resources allocated by ConvertThreadToFiber() (e.g. after using boost.context).

Windows using fcontext_t: turn off global program optimization (/GL) and change /EHsc (compiler assumes that functions declared as extern "C" never throw a C++ exception) to /EHs (tells compiler assumes that functions declared as extern "C" may throw an exception).

launch
            == dispatch

is entered immediately. In other words, launching a fiber with dispatch suspends the caller (the previously-running fiber) until the fiber scheduler has a chance to resume it later. post Effects: A fiber launched with

launch
            == post

is passed to the fiber scheduler as ready, but it is not yet entered. The caller (the previously-running fiber) continues executing. The newly-launched fiber will be entered when the fiber scheduler has a chance to resume it later. Note: If launch is not explicitly specified, post is the default.

<anchor id="class_fiber"/><link linkend="fiber.fiber_mgmt.fiber">Class <code><phrase role="identifier">fiber</phrase></code></link> #include <boost/fiber/fiber.hpp> namespace boost { namespace fibers { class fiber { public: class id; constexpr fiber() noexcept; template< typename Fn, typename ... Args > fiber( Fn &&, Args && ...); template< typename Fn, typename ... Args > fiber( launch, Fn &&, Args && ...); template< typename StackAllocator, typename Fn, typename ... Args > fiber( std::allocator_arg_t, StackAllocator &&, Fn &&, Args && ...); template< typename StackAllocator, typename Fn, typename ... Args > fiber( launch, std::allocator_arg_t, StackAllocator &&, Fn &&, Args && ...); ~fiber(); fiber( fiber const&) = delete; fiber & operator=( fiber const&) = delete; fiber( fiber &&) noexcept; fiber & operator=( fiber &&) noexcept; void swap( fiber &) noexcept; bool joinable() const noexcept; id get_id() const noexcept; void detach(); void join(); template< typename PROPS > PROPS & properties(); }; bool operator<( fiber const&, fiber const&) noexcept; void swap( fiber &, fiber &) noexcept; template< typename SchedAlgo, typename ... Args > void use_scheduling_algorithm( Args && ...) noexcept; bool has_ready_fibers() noexcept; }} Default constructor constexpr fiber() noexcept; Effects: Constructs a fiber instance that refers to not-a-fiber. Postconditions:

this->get_id()
              == fiber::id()

Throws: Nothing Constructor template< typename Fn, typename ... Args > fiber( Fn && fn, Args && ... args); template< typename Fn, typename ... Args > fiber( launch policy, Fn && fn, Args && ... args); template< typename StackAllocator, typename Fn, typename ... Args > fiber( std::allocator_arg_t, StackAllocator && salloc, Fn && fn, Args && ... args); template< typename StackAllocator, typename Fn, typename ... Args > fiber( launch policy, std::allocator_arg_t, StackAllocator && salloc, Fn && fn, Args && ... args); Preconditions: Fn must be copyable or movable. Effects: fn is copied or moved into internal storage for access by the new fiber. If launch is specified (or defaulted) to post, the new fiber is marked ready and will be entered at the next opportunity. If launch is specified as dispatch, the calling fiber is suspended and the new fiber is entered immediately. Postconditions: *this refers to the newly created fiber of execution. Throws: fiber_error if an error occurs. Note: StackAllocator is required to allocate a stack for the internal __econtext__. If StackAllocator is not explicitly passed, the default stack allocator depends on BOOST_USE_SEGMENTED_STACKS: if defined, you will get a segmented_stack, else a fixedsize_stack. See also: std::allocator_arg_t, Stack allocation Move constructor fiber( fiber && other) noexcept; Effects: Transfers ownership of the fiber managed by other to the newly constructed fiber instance. Postconditions:

other.get_id()
              == fiber::id()

and get_id() returns the value of other.get_id() prior to the construction Throws: Nothing Move assignment operator fiber & operator=( fiber && other) noexcept; Effects: Transfers ownership of the fiber managed by other (if any) to *this. Postconditions:

other->get_id()
              == fiber::id()

and get_id() returns the value of other.get_id() prior to the assignment. Throws: Nothing Destructor ~fiber(); Effects: If the fiber is fiber::joinable(), calls std::terminate. Destroys *this. Note: The programmer must ensure that the destructor is never executed while the fiber is still fiber::joinable(). Even if you know that the fiber has completed, you must still call either fiber::join() or fiber::detach() before destroying the fiber object. Member function joinable() bool joinable() const noexcept; Returns: true if *this refers to a fiber of execution, which may or may not have completed; otherwise false. Throws: Nothing Member function join() void join(); Preconditions: the fiber is fiber::joinable(). Effects: Waits for the referenced fiber of execution to complete. Postconditions: The fiber of execution referenced on entry has completed. *this no longer refers to any fiber of execution. Throws: fiber_error Error Conditions: resource_deadlock_would_occur: if

this->get_id()
              == boost::this_fiber::get_id()

. invalid_argument: if the fiber is not fiber::joinable(). Member function detach() void detach(); Preconditions: the fiber is fiber::joinable(). Effects: The fiber of execution becomes detached, and no longer has an associated fiber object. Postconditions: *this no longer refers to any fiber of execution. Throws: fiber_error Error Conditions: invalid_argument: if the fiber is not fiber::joinable(). Member function get_id() fiber::id get_id() const noexcept; Returns: If *this refers to a fiber of execution, an instance of fiber::id that represents that fiber. Otherwise returns a default-constructed fiber::id. Throws: Nothing See also: this_fiber::get_id() Templated member function properties() template< typename PROPS > PROPS & properties(); Preconditions: *this refers to a fiber of execution. use_scheduling_algorithm() has been called from this thread with a subclass of algorithm_with_properties<> with the same template argument PROPS. Returns: a reference to the scheduler properties instance for *this. Throws: std::bad_cast if use_scheduling_algorithm() was called with a algorithm_with_properties subclass with some other template parameter than PROPS. Note: algorithm_with_properties<> provides a way for a user-coded scheduler to associate extended properties, such as priority, with a fiber instance. This method allows access to those user-provided properties. See also: Customization Member function swap() void swap( fiber & other) noexcept; Effects: Exchanges the fiber of execution associated with *this and other, so *this becomes associated with the fiber formerly associated with other, and vice-versa. Postconditions: this->get_id() returns the same value as other.get_id() prior to the call. other.get_id() returns the same value as this->get_id() prior to the call. Throws: Nothing Non-member function swap() void swap( fiber & l, fiber & r) noexcept; Effects: Same as l.swap( r). Throws: Nothing Non-member function operator<() bool operator<( fiber const& l, fiber const& r) noexcept; Returns: true if

l.get_id()
              < r.get_id()

is true, false otherwise. Throws: Nothing. Non-member function use_scheduling_algorithm() template< typename SchedAlgo, typename ... Args > void use_scheduling_algorithm( Args && ... args) noexcept; Effects: Directs Boost.Fiber to use SchedAlgo, which must be a concrete subclass of algorithm, as the scheduling algorithm for all fibers in the current thread. Pass any required SchedAlgo constructor arguments as args. Note: If you want a given thread to use a non-default scheduling algorithm, make that thread call use_scheduling_algorithm() before any other Boost.Fiber entry point. If no scheduler has been set for the current thread by the time Boost.Fiber needs to use it, the library will create a default round_robin instance for this thread. Throws: Nothing See also: Scheduling, Customization Non-member function has_ready_fibers() bool has_ready_fibers() noexcept; Returns: true if scheduler has fibers ready to run. Throws: Nothing Note: Can be used for work-stealing to find an idle scheduler.

<anchor id="class_id"/><link linkend="fiber.fiber_mgmt.id">Class fiber::id</link> #include <boost/fiber/fiber.hpp> namespace boost { namespace fibers { class id { public: constexpr id() noexcept; bool operator==( id const&) const noexcept; bool operator!=( id const&) const noexcept; bool operator<( id const&) const noexcept; bool operator>( id const&) const noexcept; bool operator<=( id const&) const noexcept; bool operator>=( id const&) const noexcept; template< typename charT, class traitsT > friend std::basic_ostream< charT, traitsT > & operator<<( std::basic_ostream< charT, traitsT > &, id const&); }; }} Constructor constexpr id() noexcept; Effects: Represents an instance of not-a-fiber. Throws: Nothing. Member function operator==() bool operator==( id const& other) const noexcept; Returns: true if *this and other represent the same fiber, or both represent not-a-fiber, false otherwise. Throws: Nothing. Member function operator!=() bool operator!=( id const& other) const noexcept; Returns: ! (other == * this) Throws: Nothing. Member function operator<() bool operator<( id const& other) const noexcept; Returns: true if *this != other is true and the implementation-defined total order of fiber::id values places *this before other, false otherwise. Throws: Nothing. Member function operator>() bool operator>( id const& other) const noexcept; Returns:

other <
              * this

Throws: Nothing. Member function operator<=() bool operator<=( id const& other) const noexcept; Returns:

! (other <
              * this)

Throws: Nothing. Member function operator>=() bool operator>=( id const& other) const noexcept; Returns:

! (*
              this <
              other)

Throws: Nothing. operator<< template< typename charT, class traitsT > std::basic_ostream< charT, traitsT > & operator<<( std::basic_ostream< charT, traitsT > & os, id const& other); Efects: Writes the representation of other to stream os. The representation is unspecified. Returns: os

<anchor id="scheduling"/><link linkend="fiber.scheduling">Scheduling</link> The fibers in a thread are coordinated by a fiber manager. Fibers trade control cooperatively, rather than preemptively: the currently-running fiber retains control until it invokes some operation that passes control to the manager. Each time a fiber suspends (or yields), the fiber manager consults a scheduler to determine which fiber will run next. Boost.Fiber provides the fiber manager, but the scheduler is a customization point. (See Customization.) Each thread has its own scheduler. Different threads in a process may use different schedulers. By default, Boost.Fiber implicitly instantiates round_robin as the scheduler for each thread. You are explicitly permitted to code your own algorithm subclass. For the most part, your algorithm subclass need not defend against cross-thread calls: the fiber manager intercepts and defers such calls. Most algorithm methods are only ever directly called from the thread whose fibers it is managing — with exceptions as documented below. Your algorithm subclass is engaged on a particular thread by calling use_scheduling_algorithm(): void thread_fn() { boost::fibers::use_scheduling_algorithm< my_fiber_scheduler >(); ... } A scheduler class must implement interface algorithm. Boost.Fiber provides schedulers: round_robin, work_stealing, numa::work_stealing and shared_work. void thread( std::uint32_t thread_count) { // thread registers itself at work-stealing scheduler boost::fibers::use_scheduling_algorithm< boost::fibers::algo::work_stealing >( thread_count); ... } // count of logical cpus std::uint32_t thread_count = std::thread::hardware_concurrency(); // start worker-threads first std::vector< std::thread > threads; for ( std::uint32_t i = 1 /* count start-thread */; i < thread_count; ++i) { // spawn thread threads.emplace_back( thread, thread_count); } // start-thread registers itself at work-stealing scheduler boost::fibers::use_scheduling_algorithm< boost::fibers::algo::work_stealing >( thread_count); ... The example spawns as many threads as std::thread::hardware_concurrency() returns. Each thread runs a work_stealing scheduler. Each instance of this scheduler needs to know how many threads run the work-stealing scheduler in the program. If the local queue of one thread runs out of ready fibers, the thread tries to steal a ready fiber from another thread running this scheduler. Class algorithm algorithm is the abstract base class defining the interface that a fiber scheduler must implement. #include <boost/fiber/algo/algorithm.hpp> namespace boost { namespace fibers { namespace algo { struct algorithm { virtual ~algorithm(); virtual void awakened( context *) noexcept = 0; virtual context * pick_next() noexcept = 0; virtual bool has_ready_fibers() const noexcept = 0; virtual void suspend_until( std::chrono::steady_clock::time_point const&) noexcept = 0; virtual void notify() noexcept = 0; }; }}} Member function awakened() virtual void awakened( context * f) noexcept = 0; Effects: Informs the scheduler that fiber f is ready to run. Fiber f might be newly launched, or it might have been blocked but has just been awakened, or it might have called this_fiber::yield(). Note: This method advises the scheduler to add fiber f to its collection of fibers ready to run. A typical scheduler implementation places f into a queue. See also: round_robin Member function pick_next() virtual context * pick_next() noexcept = 0; Returns: the fiber which is to be resumed next, or nullptr if there is no ready fiber. Note: This is where the scheduler actually specifies the fiber which is to run next. A typical scheduler implementation chooses the head of the ready queue. See also: round_robin Member function has_ready_fibers() virtual bool has_ready_fibers() const noexcept = 0; Returns: true if scheduler has fibers ready to run. Member function suspend_until() virtual void suspend_until( std::chrono::steady_clock::time_point const& abs_time) noexcept = 0; Effects: Informs the scheduler that no fiber will be ready until time-point abs_time. Note: This method allows a custom scheduler to yield control to the containing environment in whatever way makes sense. The fiber manager is stating that suspend_until() need not return until abs_time — or algorithm::notify() is called — whichever comes first. The interaction with notify() means that, for instance, calling std::this_thread::sleep_until(abs_time) would be too simplistic. round_robin::suspend_until() uses a std::condition_variable to coordinate with round_robin::notify(). Note: Given that notify() might be called from another thread, your suspend_until() implementation — like the rest of your algorithm implementation — must guard any data it shares with your notify() implementation. Member function notify() virtual void notify() noexcept = 0; Effects: Requests the scheduler to return from a pending call to algorithm::suspend_until(). Note: Alone among the algorithm methods, notify() may be called from another thread. Your notify() implementation must guard any data it shares with the rest of your algorithm implementation. Class round_robin This class implements algorithm, scheduling fibers in round-robin fashion. #include <boost/fiber/algo/round_robin.hpp> namespace boost { namespace fibers { namespace algo { class round_robin : public algorithm { virtual void awakened( context *) noexcept; virtual context * pick_next() noexcept; virtual bool has_ready_fibers() const noexcept; virtual void suspend_until( std::chrono::steady_clock::time_point const&) noexcept; virtual void notify() noexcept; }; }}} Member function awakened() virtual void awakened( context * f) noexcept; Effects: Enqueues fiber f onto a ready queue. Throws: Nothing. Member function pick_next() virtual context * pick_next() noexcept; Returns: the fiber at the head of the ready queue, or nullptr if the queue is empty. Throws: Nothing. Note: Placing ready fibers onto the tail of a queue, and returning them from the head of that queue, shares the thread between ready fibers in round-robin fashion. Member function has_ready_fibers() virtual bool has_ready_fibers() const noexcept; Returns: true if scheduler has fibers ready to run. Throws: Nothing. Member function suspend_until() virtual void suspend_until( std::chrono::steady_clock::time_point const& abs_time) noexcept; Effects: Informs round_robin that no ready fiber will be available until time-point abs_time. This implementation blocks in std::condition_variable::wait_until(). Throws: Nothing. Member function notify() virtual void notify() noexcept = 0; Effects: Wake up a pending call to round_robin::suspend_until(), some fibers might be ready. This implementation wakes suspend_until() via std::condition_variable::notify_all(). Throws: Nothing. Class work_stealing This class implements algorithm; if the local ready-queue runs out of ready fibers, ready fibers are stolen from other schedulers. The victim scheduler (from which a ready fiber is stolen) is selected at random. Worker-threads are stored in a static variable, dynamically adding/removing worker threads is not supported. #include <boost/fiber/algo/work_stealing.hpp> namespace boost { namespace fibers { namespace algo { class work_stealing : public algorithm { public: work_stealing( std::uint32_t thread_count, bool suspend = false); work_stealing( work_stealing const&) = delete; work_stealing( work_stealing &&) = delete; work_stealing & operator=( work_stealing const&) = delete; work_stealing & operator=( work_stealing &&) = delete; virtual void awakened( context *) noexcept; virtual context * pick_next() noexcept; virtual bool has_ready_fibers() const noexcept; virtual void suspend_until( std::chrono::steady_clock::time_point const&) noexcept; virtual void notify() noexcept; }; }}} Constructor work_stealing( std::uint32_t thread_count, bool suspend = false); Effects: Constructs work-stealing scheduling algorithm. thread_count represents the number of threads running this algorithm. Throws: system_error Note: If suspend is set to true, then the scheduler suspends if no ready fiber could be stolen. The scheduler will by woken up if a sleeping fiber times out or it was notified from remote (other thread or fiber scheduler). Member function awakened() virtual void awakened( context * f) noexcept; Effects: Enqueues fiber f onto the shared ready queue. Throws: Nothing. Member function pick_next() virtual context * pick_next() noexcept; Returns: the fiber at the head of the ready queue, or nullptr if the queue is empty. Throws: Nothing. Note: Placing ready fibers onto the tail of the sahred queue, and returning them from the head of that queue, shares the thread between ready fibers in round-robin fashion. Member function has_ready_fibers() virtual bool has_ready_fibers() const noexcept; Returns: true if scheduler has fibers ready to run. Throws: Nothing. Member function suspend_until() virtual void suspend_until( std::chrono::steady_clock::time_point const& abs_time) noexcept; Effects: Informs work_stealing that no ready fiber will be available until time-point abs_time. This implementation blocks in std::condition_variable::wait_until(). Throws: Nothing. Member function notify() virtual void notify() noexcept = 0; Effects: Wake up a pending call to work_stealing::suspend_until(), some fibers might be ready. This implementation wakes suspend_until() via std::condition_variable::notify_all(). Throws: Nothing. Class shared_work Because of the non-locality of data, shared_work is less performant than work_stealing. This class implements algorithm, scheduling fibers in round-robin fashion. Ready fibers are shared between all instances (running on different threads) of shared_work, thus the work is distributed equally over all threads. Worker-threads are stored in a static variable, dynamically adding/removing worker threads is not supported. #include <boost/fiber/algo/shared_work.hpp> namespace boost { namespace fibers { namespace algo { class shared_work : public algorithm { virtual void awakened( context *) noexcept; virtual context * pick_next() noexcept; virtual bool has_ready_fibers() const noexcept; virtual void suspend_until( std::chrono::steady_clock::time_point const&) noexcept; virtual void notify() noexcept; }; }}} Member function awakened() virtual void awakened( context * f) noexcept; Effects: Enqueues fiber f onto the shared ready queue. Throws: Nothing. Member function pick_next() virtual context * pick_next() noexcept; Returns: the fiber at the head of the ready queue, or nullptr if the queue is empty. Throws: Nothing. Note: Placing ready fibers onto the tail of the shared queue, and returning them from the head of that queue, shares the thread between ready fibers in round-robin fashion. Member function has_ready_fibers() virtual bool has_ready_fibers() const noexcept; Returns: true if scheduler has fibers ready to run. Throws: Nothing. Member function suspend_until() virtual void suspend_until( std::chrono::steady_clock::time_point const& abs_time) noexcept; Effects: Informs shared_work that no ready fiber will be available until time-point abs_time. This implementation blocks in std::condition_variable::wait_until(). Throws: Nothing. Member function notify() virtual void notify() noexcept = 0; Effects: Wake up a pending call to shared_work::suspend_until(), some fibers might be ready. This implementation wakes suspend_until() via std::condition_variable::notify_all(). Throws: Nothing. Custom Scheduler Fiber Properties A scheduler class directly derived from algorithm can use any information available from context to implement the algorithm interface. But a custom scheduler might need to track additional properties for a fiber. For instance, a priority-based scheduler would need to track a fiber’s priority. Boost.Fiber provides a mechanism by which your custom scheduler can associate custom properties with each fiber. Class fiber_properties A custom fiber properties class must be derived from fiber_properties. #include <boost/fiber/properties.hpp> namespace boost { namespace fibers { class fiber_properties { public: fiber_properties( context *) noexcept; virtual ~fiber_properties(); protected: void notify() noexcept; }; }} Constructor fiber_properties( context * f) noexcept; Effects: Constructs base-class component of custom subclass. Throws: Nothing. Note: Your subclass constructor must accept a context* and pass it to the base-class fiber_properties constructor. Member function notify() void notify() noexcept; Effects: Pass control to the custom algorithm_with_properties<> subclass’s algorithm_with_properties::property_change() method. Throws: Nothing. Note: A custom scheduler’s algorithm_with_properties::pick_next() method might dynamically select from the ready fibers, or algorithm_with_properties::awakened() might instead insert each ready fiber into some form of ready queue for pick_next(). In the latter case, if application code modifies a fiber property (e.g. priority) that should affect that fiber’s relationship to other ready fibers, the custom scheduler must be given the opportunity to reorder its ready queue. The custom property subclass should implement an access method to modify such a property; that access method should call notify() once the new property value has been stored. This passes control to the custom scheduler’s property_change() method, allowing the custom scheduler to reorder its ready queue appropriately. Use at your discretion. Of course, if you define a property which does not affect the behavior of the pick_next() method, you need not call notify() when that property is modified. Template algorithm_with_properties<> A custom scheduler that depends on a custom properties class PROPS should be derived from algorithm_with_properties<PROPS>. PROPS should be derived from fiber_properties. #include <boost/fiber/algorithm.hpp> namespace boost { namespace fibers { namespace algo { template< typename PROPS > struct algorithm_with_properties { virtual void awakened( context *, PROPS &) noexcept = 0; virtual context * pick_next() noexcept; virtual bool has_ready_fibers() const noexcept; virtual void suspend_until( std::chrono::steady_clock::time_point const&) noexcept = 0; virtual void notify() noexcept = 0; PROPS & properties( context *) noexcept; virtual void property_change( context *, PROPS &) noexcept; virtual fiber_properties * new_properties( context *); }; }}} Member function awakened() virtual void awakened( context * f, PROPS & properties) noexcept; Effects: Informs the scheduler that fiber f is ready to run, like algorithm::awakened(). Passes the fiber’s associated PROPS instance. Throws: Nothing. Note: An algorithm_with_properties<> subclass must override this method instead of algorithm::awakened(). Member function pick_next() virtual context * pick_next() noexcept; Returns: the fiber which is to be resumed next, or nullptr if there is no ready fiber. Throws: Nothing. Note: same as algorithm::pick_next() Member function has_ready_fibers() virtual bool has_ready_fibers() const noexcept; Returns: true if scheduler has fibers ready to run. Throws: Nothing. Note: same as algorithm::has_ready_fibers() Member function suspend_until() virtual void suspend_until( std::chrono::steady_clock::time_point const& abs_time) noexcept = 0; Effects: Informs the scheduler that no fiber will be ready until time-point abs_time. Note: same as algorithm::suspend_until() Member function notify() virtual void notify() noexcept = 0; Effects: Requests the scheduler to return from a pending call to algorithm_with_properties::suspend_until(). Note: same as algorithm::notify() Member function properties() PROPS& properties( context * f) noexcept; Returns: the PROPS instance associated with fiber f. Throws: Nothing. Note: The fiber’s associated PROPS instance is already passed to algorithm_with_properties::awakened() and algorithm_with_properties::property_change(). However, every algorithm subclass is expected to track a collection of ready context instances. This method allows your custom scheduler to retrieve the fiber_properties subclass instance for any context in its collection. Member function property_change() virtual void property_change( context * f, PROPS & properties) noexcept; Effects: Notify the custom scheduler of a possibly-relevant change to a property belonging to fiber f. properties contains the new values of all relevant properties. Throws: Nothing. Note: This method is only called when a custom fiber_properties subclass explicitly calls fiber_properties::notify(). Member function new_properties() virtual fiber_properties * new_properties( context * f); Returns: A new instance of fiber_properties subclass PROPS. Note: By default, algorithm_with_properties<>::new_properties() simply returns

new
            PROPS(f)

, placing the PROPS instance on the heap. Override this method to allocate PROPS some other way. The returned fiber_properties pointer must point to the PROPS instance to be associated with fiber f. Class context While you are free to treat context* as an opaque token, certain context members may be useful to a custom scheduler implementation. Of particular note is the fact that context contains a hook to participate in a boost::intrusive::list typedef’ed as boost::fibers::scheduler::ready_queue_t. This hook is reserved for use by algorithm implementations. (For instance, round_robin contains a ready_queue_t instance to manage its ready fibers.) See context::ready_is_linked(), context::ready_link(), context::ready_unlink(). Your algorithm implementation may use any container you desire to manage passed context instances. ready_queue_t avoids some of the overhead of typical STL containers. #include <boost/fiber/context.hpp> namespace boost { namespace fibers { enum class type { none = unspecified, main_context = unspecified, // fiber associated with thread's stack dispatcher_context = unspecified, // special fiber for maintenance operations worker_context = unspecified, // fiber not special to the library pinned_context = unspecified // fiber must not be migrated to another thread }; class context { public: class id; static context * active() noexcept; context( context const&) = delete; context & operator=( context const&) = delete; id get_id() const noexcept; void detach() noexcept; void attach( context *) noexcept; bool is_context( type) const noexcept; bool is_terminated() const noexcept; bool ready_is_linked() const noexcept; bool remote_ready_is_linked() const noexcept; bool wait_is_linked() const noexcept; template< typename List > void ready_link( List &) noexcept; template< typename List > void remote_ready_link( List &) noexcept; template< typename List > void wait_link( List &) noexcept; void ready_unlink() noexcept; void remote_ready_unlink() noexcept; void wait_unlink() noexcept; void suspend() noexcept; void schedule( context *) noexcept; }; bool operator<( context const& l, context const& r) noexcept; }} Static member function active() static context * active() noexcept; Returns: Pointer to instance of current fiber. Throws: Nothing Member function get_id() context::id get_id() const noexcept; Returns: If *this refers to a fiber of execution, an instance of fiber::id that represents that fiber. Otherwise returns a default-constructed fiber::id. Throws: Nothing See also: fiber::get_id() Member function attach() void attach( context * f) noexcept; Precondition:

this->get_scheduler()
            == nullptr

Effects: Attach fiber f to scheduler running *this. Postcondition:

this->get_scheduler()
            != nullptr

Throws: Nothing Note: A typical call: boost::fibers::context::active()->attach(f); Note: f must not be the running fiber’s context. It must not be blocked or terminated. It must not be a pinned_context. It must be currently detached. It must not currently be linked into an algorithm implementation’s ready queue. Most of these conditions are implied by f being owned by an algorithm implementation: that is, it has been passed to algorithm::awakened() but has not yet been returned by algorithm::pick_next(). Typically a pick_next() implementation would call attach() with the context* it is about to return. It must first remove f from its ready queue. You should never pass a pinned_context to attach() because you should never have called its detach() method in the first place. Member function detach() void detach() noexcept; Precondition:

(this->get_scheduler() != nullptr) && !
            this->is_context(pinned_context)

Effects: Detach fiber *this from its scheduler running *this. Throws: Nothing Postcondition:

this->get_scheduler()
            == nullptr

Note: This method must be called on the thread with which the fiber is currently associated. *this must not be the running fiber’s context. It must not be blocked or terminated. It must not be a pinned_context. It must not be detached already. It must not already be linked into an algorithm implementation’s ready queue. Most of these conditions are implied by *this being passed to algorithm::awakened(); an awakened() implementation must, however, test for pinned_context. It must call detach() before linking *this into its ready queue. Note: In particular, it is erroneous to attempt to migrate a fiber from one thread to another by calling both detach() and attach() in the algorithm::pick_next() method. pick_next() is called on the intended destination thread. detach() must be called on the fiber’s original thread. You must call detach() in the corresponding awakened() method. Note: Unless you intend make a fiber available for potential migration to a different thread, you should call neither detach() nor attach() with its context. Member function is_context() bool is_context( type t) const noexcept; Returns: true if *this is of the specified type. Throws: Nothing Note: type::worker_context here means any fiber not special to the library. For type::main_context the context is associated with the main fiber of the thread: the one implicitly created by the thread itself, rather than one explicitly created by Boost.Fiber. For type::dispatcher_context the context is associated with a dispatching fiber, responsible for dispatching awakened fibers to a scheduler’s ready-queue. The dispatching fiber is an implementation detail of the fiber manager. The context of the main or dispatching fiber — any fiber for which is_context(pinned_context) is true — must never be passed to context::detach(). Member function is_terminated() bool is_terminated() const noexcept; Returns: true if *this is no longer a valid context. Throws: Nothing Note: The context has returned from its fiber-function and is no longer considered a valid context. Member function ready_is_linked() bool ready_is_linked() const noexcept; Returns: true if *this is stored in an algorithm implementation’s ready-queue. Throws: Nothing Note: Specifically, this method indicates whether context::ready_link() has been called on *this. ready_is_linked() has no information about participation in any other containers. Member function remote_ready_is_linked() bool remote_ready_is_linked() const noexcept; Returns: true if *this is stored in the fiber manager’s remote-ready-queue. Throws: Nothing Note: A context signaled as ready by another thread is first stored in the fiber manager’s remote-ready-queue. This is the mechanism by which the fiber manager protects an algorithm implementation from cross-thread algorithm::awakened() calls. Member function wait_is_linked() bool wait_is_linked() const noexcept; Returns: true if *this is stored in the wait-queue of some synchronization object. Throws: Nothing Note: The context of a fiber waiting on a synchronization object (e.g. mutex, condition_variable etc.) is stored in the wait-queue of that synchronization object. Member function ready_link() template< typename List > void ready_link( List & lst) noexcept; Effects: Stores *this in ready-queue lst. Throws: Nothing Note: Argument lst must be a doubly-linked list from Boost.Intrusive, e.g. an instance of boost::fibers::scheduler::ready_queue_t. Specifically, it must be a boost::intrusive::list compatible with the list_member_hook stored in the context object. Member function remote_ready_link() template< typename List > void remote_ready_link( List & lst) noexcept; Effects: Stores *this in remote-ready-queue lst. Throws: Nothing Note: Argument lst must be a doubly-linked list from Boost.Intrusive. Member function wait_link() template< typename List > void wait_link( List & lst) noexcept; Effects: Stores *this in wait-queue lst. Throws: Nothing Note: Argument lst must be a doubly-linked list from Boost.Intrusive. Member function ready_unlink() void ready_unlink() noexcept; Effects: Removes *this from ready-queue: undoes the effect of context::ready_link(). Throws: Nothing Member function remote_ready_unlink() void remote_ready_unlink() noexcept; Effects: Removes *this from remote-ready-queue. Throws: Nothing Member function wait_unlink() void wait_unlink() noexcept; Effects: Removes *this from wait-queue. Throws: Nothing Member function suspend() void suspend() noexcept; Effects: Suspends the running fiber (the fiber associated with *this) until some other fiber passes this to context::schedule(). *this is marked as not-ready, and control passes to the scheduler to select another fiber to run. Throws: Nothing Note: This is a low-level API potentially useful for integration with other frameworks. It is not intended to be directly invoked by a typical application program. Note: The burden is on the caller to arrange for a call to schedule() with a pointer to this at some future time. Member function schedule() void schedule( context * ctx ) noexcept; Effects: Mark the fiber associated with context *ctx as being ready to run. This does not immediately resume that fiber; rather it passes the fiber to the scheduler for subsequent resumption. If the scheduler is idle (has not returned from a call to algorithm::suspend_until()), algorithm::notify() is called to wake it up. Throws: Nothing Note: This is a low-level API potentially useful for integration with other frameworks. It is not intended to be directly invoked by a typical application program. Note: It is explicitly supported to call schedule(ctx) from a thread other than the one on which *ctx is currently suspended. The corresponding fiber will be resumed on its original thread in due course. Non-member function operator<() bool operator<( context const& l, context const& r) noexcept; Returns: true if l.get_id() < r.get_id() is true, false otherwise. Throws: Nothing.

<anchor id="stack"/><link linkend="fiber.stack">Stack allocation</link> A fiber uses internally an __econtext__ which manages a set of registers and a stack. The memory used by the stack is allocated/deallocated via a stack_allocator which is required to model a stack-allocator concept. A stack_allocator can be passed to fiber::fiber() or to fibers::async(). stack-allocator concept A stack_allocator must satisfy the stack-allocator concept requirements shown in the following table, in which a is an object of a stack_allocator type, sctx is a stack_context, and size is a std::size_t: expression return type notes a(size) creates a stack allocator a.allocate() stack_context creates a stack

a.deallocate(
                sctx)

void deallocates the stack created by a.allocate() The implementation of allocate() might include logic to protect against exceeding the context's available stack size rather than leaving it as undefined behaviour. Calling deallocate() with a stack_context not obtained from allocate() results in undefined behaviour. The memory for the stack is not required to be aligned; alignment takes place inside __econtext__. See also Boost.Context stack allocation. In particular, traits_type methods are as described for boost::context::stack_traits. Class protected_fixedsize_stack Boost.Fiber provides the class protected_fixedsize_stack which models the stack-allocator concept. It appends a guard page at the end of each stack to protect against exceeding the stack. If the guard page is accessed (read or write operation) a segmentation fault/access violation is generated by the operating system. Using protected_fixedsize_stack is expensive. Launching a new fiber with a stack of this type incurs the overhead of setting the memory protection; once allocated, this stack is just as efficient to use as fixedsize_stack. The appended guard page is not mapped to physical memory, only virtual addresses are used. #include <boost/fiber/protected_fixedsize.hpp> namespace boost { namespace fibers { struct protected_fixedsize { protected_fixesize(std::size_t size = traits_type::default_size()); stack_context allocate(); void deallocate( stack_context &); } }} Member function allocate() stack_context allocate(); Preconditions:

traits_type::minimum_size()
            <= size

and

traits_type::is_unbounded()
            || (
            size <=
            traits_type::maximum_size()
            )

. Effects: Allocates memory of at least size bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture (the stack grows downwards/upwards) the stored address is the highest/lowest address of the stack. Member function deallocate() void deallocate( stack_context & sctx); Preconditions: sctx.sp is valid, traits_type::minimum_size() <= sctx.size and traits_type::is_unbounded() || ( sctx.size <= traits_type::maximum_size() ). Effects: Deallocates the stack space. Class pooled_fixedsize_stack Boost.Fiber provides the class pooled_fixedsize_stack which models the stack-allocator concept. In contrast to protected_fixedsize_stack it does not append a guard page at the end of each stack. The memory is managed internally by boost::pool<>. #include <boost/fiber/pooled_fixedsize_stack.hpp> namespace boost { namespace fibers { struct pooled_fixedsize_stack { pooled_fixedsize_stack(std::size_t stack_size = traits_type::default_size(), std::size_t next_size = 32, std::size_t max_size = 0); stack_context allocate(); void deallocate( stack_context &); } }} Constructor pooled_fixedsize_stack(std::size_t stack_size, std::size_t next_size, std::size_t max_size); Preconditions:

traits_type::is_unbounded()
            || (
            traits_type::maximum_size()
            >= stack_size)

and

0
            < next_size

traits_type::is_unbounded()
            || (
            traits_type::maximum_size()
            >= stack_size)

. Effects: Allocates memory of at least stack_size bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture (the stack grows downwards/upwards) the stored address is the highest/lowest address of the stack. Member function deallocate() void deallocate( stack_context & sctx); Preconditions: sctx.sp is valid, traits_type::is_unbounded() || ( traits_type::maximum_size() >= sctx.size). Effects: Deallocates the stack space. This stack allocator is not thread safe. Class fixedsize_stack Boost.Fiber provides the class fixedsize_stack which models the stack-allocator concept. In contrast to protected_fixedsize_stack it does not append a guard page at the end of each stack. The memory is simply managed by std::malloc() and std::free(). #include <boost/context/fixedsize_stack.hpp> namespace boost { namespace fibers { struct fixedsize_stack { fixedsize_stack(std::size_t size = traits_type::default_size()); stack_context allocate(); void deallocate( stack_context &); } }} Member function allocate() stack_context allocate(); Preconditions:

traits_type::minimum_size()
            <= size

and

traits_type::is_unbounded()
            || (
            traits_type::maximum_size()
            >= size)

. Effects: Allocates memory of at least size bytes and stores a pointer to the stack and its actual size in sctx. Depending on the architecture (the stack grows downwards/upwards) the stored address is the highest/lowest address of the stack. Member function deallocate() void deallocate( stack_context & sctx); Preconditions: sctx.sp is valid, traits_type::minimum_size() <= sctx.size and traits_type::is_unbounded() || ( traits_type::maximum_size() >= sctx.size). Effects: Deallocates the stack space. Class segmented_stack Boost.Fiber supports usage of a segmented_stack, i.e. the stack grows on demand. The fiber is created with a minimal stack size which will be increased as required. Class segmented_stack models the stack-allocator concept. In contrast to protected_fixedsize_stack and fixedsize_stack it creates a stack which grows on demand. Segmented stacks are currently only supported by gcc from version 4.7 and clang from version 3.4 onwards. In order to use a segmented_stack Boost.Fiber must be built with property segmented-stacks, e.g. toolset=gcc segmented-stacks=on and applying BOOST_USE_SEGMENTED_STACKS at b2/bjam command line. Segmented stacks can only be used with callcc() using property context-impl=ucontext. #include <boost/fiber/segmented_stack.hpp> namespace boost { namespace fibers { struct segmented_stack { segmented_stack(std::size_t stack_size = traits_type::default_size()); stack_context allocate(); void deallocate( stack_context &); } }} Member function allocate() stack_context allocate(); Preconditions:

traits_type::minimum_size()
            <= size

and

traits_type::is_unbounded()
            || (
            traits_type::maximum_size()
            >= size)

<anchor id="synchronization"/><link linkend="fiber.synchronization">Synchronization</link> In general, Boost.Fiber synchronization objects can neither be moved nor copied. A synchronization object acts as a mutually-agreed rendezvous point between different fibers. If such an object were copied somewhere else, the new copy would have no consumers. If such an object were moved somewhere else, leaving the original instance in an unspecified state, existing consumers would behave strangely. The fiber synchronization objects provided by this library will, by default, safely synchronize fibers running on different threads. However, this level of synchronization can be removed (for performance) by building the library with BOOST_FIBERS_NO_ATOMICS defined. When the library is built with that macro, you must ensure that all the fibers referencing a particular synchronization object are running in the same thread.

Threads
        of Execution

The smallest ordered sequence of instructions that can be managed independently by a scheduler is called a

Thread
          of Execution

. via message passing. Enumeration channel_op_status channel operations return the state of the channel. enum class channel_op_status { success, empty, full, closed, timeout }; success Effects: Operation was successful. empty Effects: channel is empty, operation failed. full Effects: channel is full, operation failed. closed Effects: channel is closed, operation failed. timeout Effects: The operation did not become ready before specified timeout elapsed.

2<=capacity &&
                0==(capacity &
                (capacity-1))

Effects: The constructor constructs an object of class buffered_channel with an internal buffer of size capacity. Throws: fiber_error Error Conditions: invalid_argument: if

0==capacity ||
                0!=(capacity &
                (capacity-1))

. Notes: A push(), push_wait_for() or push_wait_until() will not block until the number of values in the channel becomes equal to capacity. The channel can hold only

capacity
                - 1

elements, otherwise it is considered to be full. Member function close() void close() noexcept; Effects: Deactivates the channel. No values can be put after calling this->close(). Fibers blocked in this->pop(), this->pop_wait_for() or this->pop_wait_until() will return closed. Fibers blocked in this->value_pop() will receive an exception. Throws: Nothing. Note: close() is like closing a pipe. It informs waiting consumers that no more values will arrive. Member function push() channel_op_status push( value_type const& va); channel_op_status push( value_type && va); Effects: If channel is closed, returns closed. Otherwise enqueues the value in the channel, wakes up a fiber blocked on this->pop(), this->value_pop(), this->pop_wait_for() or this->pop_wait_until() and returns success. If the channel is full, the fiber is blocked. Throws: Exceptions thrown by copy- or move-operations. Member function try_push() channel_op_status try_push( value_type const& va); channel_op_status try_push( value_type && va); Effects: If channel is closed, returns closed. Otherwise enqueues the value in the channel, wakes up a fiber blocked on this->pop(), this->value_pop(), this->pop_wait_for() or this->pop_wait_until() and returns success. If the channel is full, it doesn't block and returns full. Throws: Exceptions thrown by copy- or move-operations. Member function pop() channel_op_status pop( value_type & va); Effects: Dequeues a value from the channel. If the channel is empty, the fiber gets suspended until at least one new item is push()ed (return value success and va contains dequeued value) or the channel gets close()d (return value closed). Throws: Exceptions thrown by copy- or move-operations. Member function value_pop() value_type value_pop(); Effects: Dequeues a value from the channel. If the channel is empty, the fiber gets suspended until at least one new item is push()ed or the channel gets close()d (which throws an exception). Throws: fiber_error if *this is closed or by copy- or move-operations. Error conditions: std::errc::operation_not_permitted Member function try_pop() channel_op_status try_pop( value_type & va); Effects: If channel is empty, returns empty. If channel is closed, returns closed. Otherwise it returns success and va contains the dequeued value. Throws: Exceptions thrown by copy- or move-operations. Member function pop_wait_for() template< typename Rep, typename Period > channel_op_status pop_wait_for( value_type & va, std::chrono::duration< Rep, Period > const& timeout_duration) Effects: Accepts std::chrono::duration and internally computes a timeout time as (system time + timeout_duration). If channel is not empty, immediately dequeues a value from the channel. Otherwise the fiber gets suspended until at least one new item is push()ed (return value success and va contains dequeued value), or the channel gets close()d (return value closed), or the system time reaches the computed timeout time (return value timeout). Throws: timeout-related exceptions or by copy- or move-operations. Member function pop_wait_until() template< typename Clock, typename Duration > channel_op_status pop_wait_until( value_type & va, std::chrono::time_point< Clock, Duration > const& timeout_time) Effects: Accepts a

std::chrono::time_point<
                Clock,
                Duration >

. If channel is not empty, immediately dequeues a value from the channel. Otherwise the fiber gets suspended until at least one new item is push()ed (return value success and va contains dequeued value), or the channel gets close()d (return value closed), or the system time reaches the passed time_point (return value timeout). Throws: timeout-related exceptions or by copy- or move-operations. Non-member function

begin(
          buffered_channel<
          T >
          &)

template< typename T > buffered_channel< T >::iterator begin( buffered_channel< T > &); Returns: Returns a range-iterator (input-iterator). Non-member function

end(
          buffered_channel<
          T >
          &)

template< typename T > buffered_channel< R >::iterator end( buffered_channel< T > &); Returns: Returns an end range-iterator (input-iterator).

rendezvous
          point

. typedef boost::fibers::unbuffered_channel< int > channel_t; void send( channel_t & chan) { for ( int i = 0; i < 5; ++i) { chan.push( i); } chan.close(); } void recv( channel_t & chan) { int i; while ( boost::fibers::channel_op_status::success == chan.pop(i) ) { std::cout << "received " << i << std::endl; } } channel_t chan{ 1 }; boost::fibers::fiber f1( std::bind( send, std::ref( chan) ) ); boost::fibers::fiber f2( std::bind( recv, std::ref( chan) ) ); f1.join(); f2.join(); Range-for syntax is supported: typedef boost::fibers::unbuffered_channel< int > channel_t; void foo( channel_t & chan) { chan.push( 1); chan.push( 1); chan.push( 2); chan.push( 3); chan.push( 5); chan.push( 8); chan.push( 12); chan.close(); } void bar( channel_t & chan) { for ( unsigned int value : chan) { std::cout << value << " "; } std::cout << std::endl; } Template unbuffered_channel<> #include <boost/fiber/unbuffered_channel.hpp> namespace boost { namespace fibers { template< typename T > class unbuffered_channel { public: typedef T value_type; class iterator; unbuffered_channel(); unbuffered_channel( unbuffered_channel const& other) = delete; unbuffered_channel & operator=( unbuffered_channel const& other) = delete; void close() noexcept; channel_op_status push( value_type const& va); channel_op_status push( value_type && va); template< typename Rep, typename Period > channel_op_status push_wait_for( value_type const& va, std::chrono::duration< Rep, Period > const& timeout_duration); channel_op_status push_wait_for( value_type && va, std::chrono::duration< Rep, Period > const& timeout_duration); template< typename Clock, typename Duration > channel_op_status push_wait_until( value_type const& va, std::chrono::time_point< Clock, Duration > const& timeout_time); template< typename Clock, typename Duration > channel_op_status push_wait_until( value_type && va, std::chrono::time_point< Clock, Duration > const& timeout_time); channel_op_status pop( value_type & va); value_type value_pop(); template< typename Rep, typename Period > channel_op_status pop_wait_for( value_type & va, std::chrono::duration< Rep, Period > const& timeout_duration); template< typename Clock, typename Duration > channel_op_status pop_wait_until( value_type & va, std::chrono::time_point< Clock, Duration > const& timeout_time); }; template< typename T > unbuffered_channel< T >::iterator begin( unbuffered_channel< T > & chan); template< typename T > unbuffered_channel< T >::iterator end( unbuffered_channel< T > & chan); }} Constructor unbuffered_channel(); Effects: The constructor constructs an object of class unbuffered_channel. Member function close() void close() noexcept; Effects: Deactivates the channel. No values can be put after calling this->close(). Fibers blocked in this->pop(), this->pop_wait_for() or this->pop_wait_until() will return closed. Fibers blocked in this->value_pop() will receive an exception. Throws: Nothing. Note: close() is like closing a pipe. It informs waiting consumers that no more values will arrive. Member function push() channel_op_status push( value_type const& va); channel_op_status push( value_type && va); Effects: If channel is closed, returns closed. Otherwise enqueues the value in the channel, wakes up a fiber blocked on this->pop(), this->value_pop(), this->pop_wait_for() or this->pop_wait_until() and returns success. Throws: Exceptions thrown by copy- or move-operations. Member function pop() channel_op_status pop( value_type & va); Effects: Dequeues a value from the channel. If the channel is empty, the fiber gets suspended until at least one new item is push()ed (return value success and va contains dequeued value) or the channel gets close()d (return value closed). Throws: Exceptions thrown by copy- or move-operations. Member function value_pop() value_type value_pop(); Effects: Dequeues a value from the channel. If the channel is empty, the fiber gets suspended until at least one new item is push()ed or the channel gets close()d (which throws an exception). Throws: fiber_error if *this is closed or by copy- or move-operations. Error conditions: std::errc::operation_not_permitted Member function pop_wait_for() template< typename Rep, typename Period > channel_op_status pop_wait_for( value_type & va, std::chrono::duration< Rep, Period > const& timeout_duration) Effects: Accepts std::chrono::duration and internally computes a timeout time as (system time + timeout_duration). If channel is not empty, immediately dequeues a value from the channel. Otherwise the fiber gets suspended until at least one new item is push()ed (return value success and va contains dequeued value), or the channel gets close()d (return value closed), or the system time reaches the computed timeout time (return value timeout). Throws: timeout-related exceptions or by copy- or move-operations. Member function pop_wait_until() template< typename Clock, typename Duration > channel_op_status pop_wait_until( value_type & va, std::chrono::time_point< Clock, Duration > const& timeout_time) Effects: Accepts a

std::chrono::time_point<
                Clock,
                Duration >

begin(
          unbuffered_channel<
          T >
          &)

template< typename T > unbuffered_channel< T >::iterator begin( unbuffered_channel< T > &); Returns: Returns a range-iterator (input-iterator). Non-member function

end(
          unbuffered_channel<
          T >
          &)

template< typename T > unbuffered_channel< R >::iterator end( unbuffered_channel< T > &); Returns: Returns an end range-iterator (input-iterator).

false
                == other.valid()

. Throws: Nothing. Destructor ~future(); Effects: Destroys the future; ownership is abandoned. Note: ~future() does not block the calling fiber. Consider a sequence such as: instantiate promise<> obtain its future<> via promise::get_future() launch fiber, capturing promise<> destroy future<> call promise::set_value() The final set_value() call succeeds, but the value is silently discarded: no additional future<> can be obtained from that promise<>. Member function operator=() future & operator=( future && other) noexcept; Effects: Moves the shared state of other to this. After the assignment,

false ==
                other.valid()

. Throws: Nothing. Member function valid() bool valid() const noexcept; Effects: Returns true if future contains a shared state. Throws: Nothing. Member function share() shared_future< R > share(); Effects: Move the state to a shared_future<>. Returns: a shared_future<> containing the shared state formerly belonging to *this. Postcondition:

false ==
                valid()

Throws: future_error with error condition future_errc::no_state. Member function get() R get(); // member only of generic future template R & get(); // member only of future< R & > template specialization void get(); // member only of future< void > template specialization Precondition:

true ==
                valid()

Returns: Waits until promise::set_value() or promise::set_exception() is called. If promise::set_value() is called, returns the value. If promise::set_exception() is called, throws the indicated exception. Postcondition:

false ==
                valid()

Throws: future_error with error condition future_errc::no_state, future_errc::broken_promise. Any exception passed to promise::set_exception(). Member function get_exception_ptr() std::exception_ptr get_exception_ptr(); Precondition:

true ==
                valid()

Returns: Waits until promise::set_value() or promise::set_exception() is called. If set_value() is called, returns a default-constructed std::exception_ptr. If set_exception() is called, returns the passed std::exception_ptr. Throws: future_error with error condition future_errc::no_state. Note: get_exception_ptr() does not invalidate the future. After calling get_exception_ptr(), you may still call future::get(). Member function wait() void wait(); Effects: Waits until promise::set_value() or promise::set_exception() is called. Throws: future_error with error condition future_errc::no_state. Templated member function wait_for() template< class Rep, class Period > future_status wait_for( std::chrono::duration< Rep, Period > const& timeout_duration) const; Effects: Waits until promise::set_value() or promise::set_exception() is called, or timeout_duration has passed. Result: A future_status is returned indicating the reason for returning. Throws: future_error with error condition future_errc::no_state or timeout-related exceptions. Templated member function wait_until() template< typename Clock, typename Duration > future_status wait_until( std::chrono::time_point< Clock, Duration > const& timeout_time) const; Effects: Waits until promise::set_value() or promise::set_exception() is called, or timeout_time has passed. Result: A future_status is returned indicating the reason for returning. Throws: future_error with error condition future_errc::no_state or timeout-related exceptions. Template shared_future<> A shared_future<> contains a shared state which might be shared with other shared_future<> instances. #include <boost/fiber/future/future.hpp> namespace boost { namespace fibers { template< typename R > class shared_future { public: shared_future() noexcept; ~shared_future(); shared_future( shared_future const& other); shared_future( future< R > && other) noexcept; shared_future( shared_future && other) noexcept; shared_future & operator=( shared_future && other) noexcept; shared_future & operator=( future< R > && other) noexcept; shared_future & operator=( shared_future const& other) noexcept; bool valid() const noexcept; R const& get(); // member only of generic shared_future template R & get(); // member only of shared_future< R & > template specialization void get(); // member only of shared_future< void > template specialization std::exception_ptr get_exception_ptr(); void wait() const; template< class Rep, class Period > future_status wait_for( std::chrono::duration< Rep, Period > const& timeout_duration) const; template< typename Clock, typename Duration > future_status wait_until( std::chrono::time_point< Clock, Duration > const& timeout_time) const; }; }} Default constructor shared_future(); Effects: Creates a shared_future with no shared state. After construction

false
                == valid()

. Throws: Nothing. Move constructor shared_future( future< R > && other) noexcept; shared_future( shared_future && other) noexcept; Effects: Constructs a shared_future with the shared state of other. After construction

false
                == other.valid()

. Throws: Nothing. Copy constructor shared_future( shared_future const& other) noexcept; Effects: Constructs a shared_future with the shared state of other. After construction other.valid() is unchanged. Throws: Nothing. Destructor ~shared_future(); Effects: Destroys the shared_future; ownership is abandoned if not shared. Note: ~shared_future() does not block the calling fiber. Member function operator=() shared_future & operator=( future< R > && other) noexcept; shared_future & operator=( shared_future && other) noexcept; shared_future & operator=( shared_future const& other) noexcept; Effects: Moves or copies the shared state of other to this. After the assignment, the state of other.valid() depends on which overload was invoked: unchanged for the overload accepting

shared_future
                const&

, otherwise false. Throws: Nothing. Member function valid() bool valid() const noexcept; Effects: Returns true if shared_future contains a shared state. Throws: Nothing. Member function get() R const& get(); // member only of generic shared_future template R & get(); // member only of shared_future< R & > template specialization void get(); // member only of shared_future< void > template specialization Precondition:

true ==
                valid()

false ==
                valid()

true ==
                valid()

Returns: Waits until promise::set_value() or promise::set_exception() is called. If set_value() is called, returns a default-constructed std::exception_ptr. If set_exception() is called, returns the passed std::exception_ptr. Throws: future_error with error condition future_errc::no_state. Note: get_exception_ptr() does not invalidate the shared_future. After calling get_exception_ptr(), you may still call shared_future::get(). Member function wait() void wait(); Effects: Waits until promise::set_value() or promise::set_exception() is called. Throws: future_error with error condition future_errc::no_state. Templated member function wait_for() template< class Rep, class Period > future_status wait_for( std::chrono::duration< Rep, Period > const& timeout_duration) const; Effects: Waits until promise::set_value() or promise::set_exception() is called, or timeout_duration has passed. Result: A future_status is returned indicating the reason for returning. Throws: future_error with error condition future_errc::no_state or timeout-related exceptions. Templated member function wait_until() template< typename Clock, typename Duration > future_status wait_until( std::chrono::time_point< Clock, Duration > const& timeout_time) const; Effects: Waits until promise::set_value() or promise::set_exception() is called, or timeout_time has passed. Result: A future_status is returned indicating the reason for returning. Throws: future_error with error condition future_errc::no_state or timeout-related exceptions. Non-member function fibers::async() #include <boost/fiber/future/async.hpp> namespace boost { namespace fibers { template< class Function, class ... Args > future< std::result_of_t< std::decay_t< Function >( std::decay_t< Args > ... ) > > async( Function && fn, Args && ... args); template< class Function, class ... Args > future< std::result_of_t< std::decay_t< Function >( std::decay_t< Args > ... ) > > async( launch policy, Function && fn, Args && ... args); template< typename StackAllocator, class Function, class ... Args > future< std::result_of_t< std::decay_t< Function >( std::decay_t< Args > ... ) > > async( launch policy, std::allocator_arg_t, StackAllocator salloc, Function && fn, Args && ... args); template< typename StackAllocator, typename Allocator, class Function, class ... Args > future< std::result_of_t< std::decay_t< Function >( std::decay_t< Args > ... ) > > async( launch policy, std::allocator_arg_t, StackAllocator salloc, Allocator alloc, Function && fn, Args && ... args); }} Effects: Executes fn in a fiber and returns an associated future<>. Result: future< std::result_of_t< std::decay_t< Function >( std::decay_t< Args > ... ) > > representing the shared state associated with the asynchronous execution of fn. Throws: fiber_error or future_error if an error occurs. Notes: The overloads accepting std::allocator_arg_t use the passed StackAllocator when constructing the launched fiber. The overloads accepting launch use the passed launch when constructing the launched fiber. The default launch is post, as for the fiber constructor. Deferred futures are not supported.

<anchor id="class_promise"/><link linkend="fiber.synchronization.futures.promise">Template <code><phrase role="identifier">promise</phrase><phrase role="special"><></phrase></code></link> A promise<> provides a mechanism to store a value (or exception) that can later be retrieved from the corresponding future<> object. promise<> and future<> communicate via their underlying shared state. #include <boost/fiber/future/promise.hpp> namespace boost { namespace fibers { template< typename R > class promise { public: promise(); template< typename Allocator > promise( std::allocator_arg_t, Allocator); promise( promise &&) noexcept; promise & operator=( promise &&) noexcept; promise( promise const&) = delete; promise & operator=( promise const&) = delete; ~promise(); void swap( promise &) noexcept; future< R > get_future(); void set_value( R const&); // member only of generic promise template void set_value( R &&); // member only of generic promise template void set_value( R &); // member only of promise< R & > template void set_value(); // member only of promise< void > template void set_exception( std::exception_ptr p); }; template< typename R > void swap( promise< R > &, promise< R > &) noexcept; } Default constructor promise(); Effects: Creates a promise with an empty shared state. Throws: Exceptions caused by memory allocation. Constructor template< typename Allocator > promise( std::allocator_arg_t, Allocator alloc); Effects: Creates a promise with an empty shared state by using alloc. Throws: Exceptions caused by memory allocation. See also: std::allocator_arg_t Move constructor promise( promise && other) noexcept; Effects: Creates a promise by moving the shared state from other. Postcondition: other contains no valid shared state. Throws: Nothing. Destructor ~promise(); Effects: Destroys *this and abandons the shared state if shared state is ready; otherwise stores future_error with error condition future_errc::broken_promise as if by promise::set_exception(): the shared state is set ready. Member function operator=() promise & operator=( promise && other) noexcept; Effects: Transfers the ownership of shared state to *this. Postcondition: other contains no valid shared state. Throws: Nothing. Member function swap() void swap( promise & other) noexcept; Effects: Swaps the shared state between other and *this. Throws: Nothing. Member function get_future() future< R > get_future(); Returns: A future<> with the same shared state. Throws: future_error with future_errc::future_already_retrieved or future_errc::no_state. Member function set_value() void set_value( R const& value); // member only of generic promise template void set_value( R && value); // member only of generic promise template void set_value( R & value); // member only of promise< R & > template void set_value(); // member only of promise< void > template Effects: Store the result in the shared state and marks the state as ready. Throws: future_error with future_errc::future_already_satisfied or future_errc::no_state. Member function set_exception() void set_exception( std::exception_ptr); Effects: Store an exception pointer in the shared state and marks the state as ready. Throws: future_error with future_errc::future_already_satisfied or future_errc::no_state. Non-member function swap() template< typename R > void swap( promise< R > & l, promise< R > & r) noexcept; Effects: Same as

l.swap(
                r)

<anchor id="class_packaged_task"/><link linkend="fiber.synchronization.futures.packaged_task">Template <code><phrase role="identifier">packaged_task</phrase><phrase role="special"><></phrase></code></link> A packaged_task<> wraps a callable target that returns a value so that the return value can be computed asynchronously. Conventional usage of packaged_task<> is like this: Instantiate packaged_task<> with template arguments matching the signature of the callable. Pass the callable to the constructor. Call packaged_task::get_future() and capture the returned future<> instance. Launch a fiber to run the new packaged_task<>, passing any arguments required by the original callable. Call fiber::detach() on the newly-launched fiber. At some later point, retrieve the result from the future<>. This is, in fact, pretty much what fibers::async() encapsulates. #include <boost/fiber/future/packaged_task.hpp> namespace boost { namespace fibers { template< class R, typename ... Args > class packaged_task< R( Args ... ) > { public: packaged_task() noexcept; template< typename Fn > explicit packaged_task( Fn &&); template< typename Fn, typename Allocator > packaged_task( std::allocator_arg_t, Allocator const&, Fn &&); packaged_task( packaged_task &&) noexcept; packaged_task & operator=( packaged_task &&) noexcept; packaged_task( packaged_task const&) = delete; packaged_task & operator=( packaged_task const&) = delete; ~packaged_task(); void swap( packaged_task &) noexcept; bool valid() const noexcept; future< R > get_future(); void operator()( Args ...); void reset(); }; template< typename Signature > void swap( packaged_task< Signature > &, packaged_task< Signature > &) noexcept; }} Default constructor packaged_task() packaged_task() noexcept; Effects: Constructs an object of class packaged_task with no shared state. Throws: Nothing. Templated constructor packaged_task() template< typename Fn > explicit packaged_task( Fn && fn); template< typename Fn, typename Allocator > packaged_task( std::allocator_arg_t, Allocator const& alloc, Fn && fn); Effects: Constructs an object of class packaged_task with a shared state and copies or moves the callable target fn to internal storage. Throws: Exceptions caused by memory allocation. Note: The signature of Fn should have a return type convertible to R. See also: std::allocator_arg_t Move constructor packaged_task( packaged_task && other) noexcept; Effects: Creates a packaged_task by moving the shared state from other. Postcondition: other contains no valid shared state. Throws: Nothing. Destructor ~packaged_task(); Effects: Destroys *this and abandons the shared state if shared state is ready; otherwise stores future_error with error condition future_errc::broken_promise as if by promise::set_exception(): the shared state is set ready. Member function operator=() packaged_task & operator=( packaged_task && other) noexcept; Effects: Transfers the ownership of shared state to *this. Postcondition: other contains no valid shared state. Throws: Nothing. Member function swap() void swap( packaged_task & other) noexcept; Effects: Swaps the shared state between other and *this. Throws: Nothing. Member function valid() bool valid() const noexcept; Effects: Returns true if *this contains a shared state. Throws: Nothing. Member function get_future() future< R > get_future(); Returns: A future<> with the same shared state. Throws: future_error with future_errc::future_already_retrieved or future_errc::no_state. Member function operator()() void operator()( Args && ... args); Effects: Invokes the stored callable target. Any exception thrown by the callable target fn is stored in the shared state as if by promise::set_exception(). Otherwise, the value returned by fn is stored in the shared state as if by promise::set_value(). Throws: future_error with future_errc::no_state. Member function reset() void reset(); Effects: Resets the shared state and abandons the result of previous executions. A new shared state is constructed. Throws: future_error with future_errc::no_state. Non-member function swap() template< typename Signature > void swap( packaged_task< Signature > & l, packaged_task< Signature > & r) noexcept; Effects: Same as

l.swap(
                r)

<anchor id="migration"/><link linkend="fiber.migration">Migrating fibers between threads</link> Overview Each fiber owns a stack and manages its execution state, including all registers and CPU flags, the instruction pointer and the stack pointer. That means, in general, a fiber is not bound to a specific thread. The main fiber on each thread, that is, the fiber on which the thread is launched, cannot migrate to any other thread. Also Boost.Fiber implicitly creates a dispatcher fiber for each thread — this cannot migrate either. , Of course it would be problematic to migrate a fiber that relies on thread-local storage. Migrating a fiber from a logical CPU with heavy workload to another logical CPU with a lighter workload might speed up the overall execution. Note that in the case of NUMA-architectures, it is not always advisable to migrate data between threads. Suppose fiber f is running on logical CPU cpu0 which belongs to NUMA node node0. The data of f are allocated on the physical memory located at node0. Migrating the fiber from cpu0 to another logical CPU cpuX which is part of a different NUMA node nodeX might reduce the performance of the application due to increased latency of memory access. Only fibers that are contained in algorithm’s ready queue can migrate between threads. You cannot migrate a running fiber, nor one that is blocked. You cannot migrate a fiber if its context::is_context() method returns true for pinned_context. In Boost.Fiber a fiber is migrated by invoking context::detach() on the thread from which the fiber migrates and context::attach() on the thread to which the fiber migrates. Thus, fiber migration is accomplished by sharing state between instances of a user-coded algorithm implementation running on different threads. The fiber’s original thread calls algorithm::awakened(), passing the fiber’s context*. The awakened() implementation calls context::detach(). At some later point, when the same or a different thread calls algorithm::pick_next(), the pick_next() implementation selects a ready fiber and calls context::attach() on it before returning it. As stated above, a context for which

is_context(pinned_context)
      == true

must never be passed to either context::detach() or context::attach(). It may only be returned from pick_next() called by the same thread that passed that context to awakened(). Example of work sharing In the example work_sharing.cpp multiple worker fibers are created on the main thread. Each fiber gets a character as parameter at construction. This character is printed out ten times. Between each iteration the fiber calls this_fiber::yield(). That puts the fiber in the ready queue of the fiber-scheduler shared_ready_queue, running in the current thread. The next fiber ready to be executed is dequeued from the shared ready queue and resumed by shared_ready_queue running on any participating thread. All instances of shared_ready_queue share one global concurrent queue, used as ready queue. This mechanism shares all worker fibers between all instances of shared_ready_queue, thus between all participating threads. Setup of threads and fibers In main() the fiber-scheduler is installed and the worker fibers and the threads are launched. boost::fibers::use_scheduling_algorithm< boost::fibers::algo::shared_work >(); for ( char c : std::string("abcdefghijklmnopqrstuvwxyz")) { boost::fibers::fiber([c](){ whatevah( c); }).detach(); ++fiber_count; } boost::fibers::detail::thread_barrier b( 4); std::thread threads[] = { std::thread( thread, & b), std::thread( thread, & b), std::thread( thread, & b) }; b.wait(); { lock_type lk( mtx_count); cnd_count.wait( lk, [](){ return 0 == fiber_count; } ); } BOOST_ASSERT( 0 == fiber_count); for ( std::thread & t : threads) { t.join(); } Install the scheduling algorithm boost::fibers::algo::shared_work in the main thread too, so each new fiber gets launched into the shared pool. Launch a number of worker fibers; each worker fiber picks up a character that is passed as parameter to fiber-function whatevah. Each worker fiber gets detached. Increment fiber counter for each new fiber. Launch a couple of threads that join the work sharing. sync with other threads: allow them to start processing lock_type is typedef'ed as std::unique_lock< std::mutex > Suspend main fiber and resume worker fibers in the meanwhile. Main fiber gets resumed (e.g returns from condition_variable_any::wait()) if all worker fibers are complete. Releasing lock of mtx_count is required before joining the threads, otherwise the other threads would be blocked inside condition_variable::wait() and would never return (deadlock). wait for threads to terminate The start of the threads is synchronized with a barrier. The main fiber of each thread (including main thread) is suspended until all worker fibers are complete. When the main fiber returns from condition_variable::wait(), the thread terminates: the main thread joins all other threads. void thread( boost::fibers::detail::thread_barrier * b) { std::ostringstream buffer; buffer << "thread started " << std::this_thread::get_id() << std::endl; std::cout << buffer.str() << std::flush; boost::fibers::use_scheduling_algorithm< boost::fibers::algo::shared_work >(); b->wait(); lock_type lk( mtx_count); cnd_count.wait( lk, [](){ return 0 == fiber_count; } ); BOOST_ASSERT( 0 == fiber_count); } Install the scheduling algorithm boost::fibers::algo::shared_work in order to join the work sharing. sync with other threads: allow them to start processing Suspend main fiber and resume worker fibers in the meanwhile. Main fiber gets resumed (e.g returns from condition_variable_any::wait()) if all worker fibers are complete. Each worker fiber executes function whatevah() with character me as parameter. The fiber yields in a loop and prints out a message if it was migrated to another thread. void whatevah( char me) { try { std::thread::id my_thread = std::this_thread::get_id(); { std::ostringstream buffer; buffer << "fiber " << me << " started on thread " << my_thread << '\n'; std::cout << buffer.str() << std::flush; } for ( unsigned i = 0; i < 10; ++i) { boost::this_fiber::yield(); std::thread::id new_thread = std::this_thread::get_id(); if ( new_thread != my_thread) { my_thread = new_thread; std::ostringstream buffer; buffer << "fiber " << me << " switched to thread " << my_thread << '\n'; std::cout << buffer.str() << std::flush; } } } catch ( ... ) { } lock_type lk( mtx_count); if ( 0 == --fiber_count) { lk.unlock(); cnd_count.notify_all(); } } get ID of initial thread loop ten times yield to other fibers get ID of current thread test if fiber was migrated to another thread Decrement fiber counter for each completed fiber. Notify all fibers waiting on cnd_count. Scheduling fibers The fiber scheduler shared_ready_queue is like round_robin, except that it shares a common ready queue among all participating threads. A thread participates in this pool by executing use_scheduling_algorithm() before any other Boost.Fiber operation. The important point about the ready queue is that it’s a class static, common to all instances of shared_ready_queue. Fibers that are enqueued via algorithm::awakened() (fibers that are ready to be resumed) are thus available to all threads. It is required to reserve a separate, scheduler-specific queue for the thread’s main fiber and dispatcher fibers: these may not be shared between threads! When we’re passed either of these fibers, push it there instead of in the shared queue: it would be Bad News for thread B to retrieve and attempt to execute thread A’s main fiber. [awakened_ws] When algorithm::pick_next() gets called inside one thread, a fiber is dequeued from rqueue_ and will be resumed in that thread. [pick_next_ws] The source code above is found in work_sharing.cpp.

<anchor id="callbacks"/><link linkend="fiber.callbacks">Integrating Fibers with Asynchronous Callbacks</link>

<anchor id="Data_or_Exception"/><link linkend="fiber.callbacks.data_or_exception">Data or Exception</link> But a more natural API for a function that obtains data is to return only the data on success, throwing an exception on error. As with write() above, it’s certainly possible to code a read() wrapper in terms of read_ec(). But since a given application is unlikely to need both, let’s code read() from scratch, leveraging promise::set_exception(): std::string read( AsyncAPI & api) { boost::fibers::promise< std::string > promise; boost::fibers::future< std::string > future( promise.get_future() ); // Both 'promise' and 'future' will survive until our lambda has been // called. #if ! defined(BOOST_NO_CXX14_INITIALIZED_LAMBDA_CAPTURES) api.init_read([&promise]( AsyncAPI::errorcode ec, std::string const& data) mutable { if ( ! ec) { promise.set_value( data); } else { promise.set_exception( std::make_exception_ptr( make_exception("read", ec) ) ); } }); #else // defined(BOOST_NO_CXX14_INITIALIZED_LAMBDA_CAPTURES) api.init_read( std::bind([]( boost::fibers::promise< std::string > & promise, AsyncAPI::errorcode ec, std::string const& data) mutable { if ( ! ec) { promise.set_value( data); } else { promise.set_exception( std::make_exception_ptr( make_exception("read", ec) ) ); } }, std::move( promise), std::placeholders::_1, std::placeholders::_2) ); #endif // BOOST_NO_CXX14_INITIALIZED_LAMBDA_CAPTURES return future.get(); } future::get() will do the right thing, either returning std::string or throwing an exception.

<anchor id="callbacks_asio"/><link linkend="fiber.callbacks.then_there_s____boost_asio__">Then There’s <ulink url="http://www.boost.org/doc/libs/release/libs/asio/index.html">Boost.Asio</ulink></link> Since the simplest form of Boost.Asio asynchronous operation completion token is a callback function, we could apply the same tactics for Asio as for our hypothetical AsyncAPI asynchronous operations. Fortunately we need not. Boost.Asio incorporates a mechanism This mechanism has been proposed as a conventional way to allow the caller of an arbitrary async function to specify completion handling: N4045. by which the caller can customize the notification behavior of any async operation. Therefore we can construct a completion token which, when passed to a Boost.Asio async operation, requests blocking for the calling fiber. A typical Asio async function might look something like this: per N4045 template < ..., class CompletionToken > deduced_return_type async_something( ... , CompletionToken&& token) { // construct handler_type instance from CompletionToken handler_type<CompletionToken, ...>::type handler(token); // construct async_result instance from handler_type async_result<decltype(handler)> result(handler); // ... arrange to call handler on completion ... // ... initiate actual I/O operation ... return result.get(); } We will engage that mechanism, which is based on specializing Asio’s handler_type<> template for the CompletionToken type and the signature of the specific callback. The remainder of this discussion will refer back to async_something() as the Asio async function under consideration. The implementation described below uses lower-level facilities than promise and future because the promise mechanism interacts badly with io_service::stop(). It produces broken_promise exceptions. boost::fibers::asio::yield is a completion token of this kind. yield is an instance of yield_t: class yield_t { public: yield_t() = default; /** * @code * static yield_t yield; * boost::system::error_code myec; * func(yield[myec]); * @endcode * @c yield[myec] returns an instance of @c yield_t whose @c ec_ points * to @c myec. The expression @c yield[myec] "binds" @c myec to that * (anonymous) @c yield_t instance, instructing @c func() to store any * @c error_code it might produce into @c myec rather than throwing @c * boost::system::system_error. */ yield_t operator[]( boost::system::error_code & ec) const { yield_t tmp; tmp.ec_ = & ec; return tmp; } //private: // ptr to bound error_code instance if any boost::system::error_code * ec_{ nullptr }; }; yield_t is in fact only a placeholder, a way to trigger Boost.Asio customization. It can bind a boost::system::error_code for use by the actual handler. yield is declared as: // canonical instance thread_local yield_t yield{}; Asio customization is engaged by specializing boost::asio::handler_type<> for yield_t: // Handler type specialisation for fibers::asio::yield. // When 'yield' is passed as a completion handler which accepts only // error_code, use yield_handler<void>. yield_handler will take care of the // error_code one way or another. template< typename ReturnType > struct handler_type< fibers::asio::yield_t, ReturnType( boost::system::error_code) > { typedef fibers::asio::detail::yield_handler< void > type; }; (There are actually four different specializations in detail/yield.hpp, one for each of the four Asio async callback signatures we expect.) The above directs Asio to use yield_handler as the actual handler for an async operation to which yield is passed. There’s a generic yield_handler<T> implementation and a yield_handler<void> specialization. Let’s start with the <void> specialization: // yield_handler<void> is like yield_handler<T> without value_. In fact it's // just like yield_handler_base. template<> class yield_handler< void >: public yield_handler_base { public: explicit yield_handler( yield_t const& y) : yield_handler_base{ y } { } // nullary completion callback void operator()() { ( * this)( boost::system::error_code() ); } // inherit operator()(error_code) overload from base class using yield_handler_base::operator(); }; async_something(), having consulted the handler_type<> traits specialization, instantiates a yield_handler<void> to be passed as the actual callback for the async operation. yield_handler’s constructor accepts the yield_t instance (the yield object passed to the async function) and passes it along to yield_handler_base: // This class encapsulates common elements between yield_handler<T> (capturing // a value to return from asio async function) and yield_handler<void> (no // such value). See yield_handler<T> and its <void> specialization below. Both // yield_handler<T> and yield_handler<void> are passed by value through // various layers of asio functions. In other words, they're potentially // copied multiple times. So key data such as the yield_completion instance // must be stored in our async_result<yield_handler<>> specialization, which // should be instantiated only once. class yield_handler_base { public: yield_handler_base( yield_t const& y) : // capture the context* associated with the running fiber ctx_{ boost::fibers::context::active() }, // capture the passed yield_t yt_( y ) { } // completion callback passing only (error_code) void operator()( boost::system::error_code const& ec) { BOOST_ASSERT_MSG( ycomp_, "Must inject yield_completion* " "before calling yield_handler_base::operator()()"); BOOST_ASSERT_MSG( yt_.ec_, "Must inject boost::system::error_code* " "before calling yield_handler_base::operator()()"); // If originating fiber is busy testing state_ flag, wait until it // has observed (completed != state_). yield_completion::lock_t lk{ ycomp_->mtx_ }; yield_completion::state_t state = ycomp_->state_; // Notify a subsequent yield_completion::wait() call that it need not // suspend. ycomp_->state_ = yield_completion::complete; // set the error_code bound by yield_t * yt_.ec_ = ec; // unlock the lock that protects state_ lk.unlock(); // If ctx_ is still active, e.g. because the async operation // immediately called its callback (this method!) before the asio // async function called async_result_base::get(), we must not set it // ready. if ( yield_completion::waiting == state) { // wake the fiber fibers::context::active()->schedule( ctx_); } } //private: boost::fibers::context * ctx_; yield_t yt_; // We depend on this pointer to yield_completion, which will be injected // by async_result. yield_completion::ptr_t ycomp_{}; }; yield_handler_base stores a copy of the yield_t instance — which, as shown above, contains only an error_code*. It also captures the context* for the currently-running fiber by calling context::active(). You will notice that yield_handler_base has one more data member (ycomp_) that is initialized to nullptr by its constructor — though its operator()() method relies on ycomp_ being non-null. More on this in a moment. Having constructed the yield_handler<void> instance, async_something() goes on to construct an async_result specialized for the handler_type<>::type: in this case, async_result<yield_handler<void>>. It passes the yield_handler<void> instance to the new async_result instance. // Without the need to handle a passed value, our yield_handler<void> // specialization is just like async_result_base. template<> class async_result< boost::fibers::asio::detail::yield_handler< void > > : public boost::fibers::asio::detail::async_result_base { public: typedef void type; explicit async_result( boost::fibers::asio::detail::yield_handler< void > & h): boost::fibers::asio::detail::async_result_base{ h } { } }; Naturally that leads us straight to async_result_base: // Factor out commonality between async_result<yield_handler<T>> and // async_result<yield_handler<void>> class async_result_base { public: explicit async_result_base( yield_handler_base & h) : ycomp_{ new yield_completion{} } { // Inject ptr to our yield_completion instance into this // yield_handler<>. h.ycomp_ = this->ycomp_; // if yield_t didn't bind an error_code, make yield_handler_base's // error_code* point to an error_code local to this object so // yield_handler_base::operator() can unconditionally store through // its error_code* if ( ! h.yt_.ec_) { h.yt_.ec_ = & ec_; } } void get() { // Unless yield_handler_base::operator() has already been called, // suspend the calling fiber until that call. ycomp_->wait(); // The only way our own ec_ member could have a non-default value is // if our yield_handler did not have a bound error_code AND the // completion callback passed a non-default error_code. if ( ec_) { throw_exception( boost::system::system_error{ ec_ } ); } } private: // If yield_t does not bind an error_code instance, store into here. boost::system::error_code ec_{}; yield_completion::ptr_t ycomp_; }; This is how yield_handler_base::ycomp_ becomes non-null: async_result_base’s constructor injects a pointer back to its own yield_completion member. Recall that the canonical yield_t instance yield initializes its error_code* member ec_ to nullptr. If this instance is passed to async_something() (ec_ is still nullptr), the copy stored in yield_handler_base will likewise have null ec_. async_result_base’s constructor sets yield_handler_base’s yield_t’s ec_ member to point to its own error_code member. The stage is now set. async_something() initiates the actual async operation, arranging to call its yield_handler<void> instance on completion. Let’s say, for the sake of argument, that the actual async operation’s callback has signature void(error_code). But since it’s an async operation, control returns at once to async_something(). async_something() calls async_result<yield_handler<void>>::get(), and will return its return value. async_result<yield_handler<void>>::get() inherits async_result_base::get(). async_result_base::get() immediately calls yield_completion::wait(). // Bundle a completion bool flag with a spinlock to protect it. struct yield_completion { enum state_t { init, waiting, complete }; typedef fibers::detail::spinlock mutex_t; typedef std::unique_lock< mutex_t > lock_t; typedef boost::intrusive_ptr< yield_completion > ptr_t; std::atomic< std::size_t > use_count_{ 0 }; mutex_t mtx_{}; state_t state_{ init }; void wait() { // yield_handler_base::operator()() will set state_ `complete` and // attempt to wake a suspended fiber. It would be Bad if that call // happened between our detecting (complete != state_) and suspending. lock_t lk{ mtx_ }; // If state_ is already set, we're done here: don't suspend. if ( complete != state_) { state_ = waiting; // suspend(unique_lock<spinlock>) unlocks the lock in the act of // resuming another fiber fibers::context::active()->suspend( lk); } } friend void intrusive_ptr_add_ref( yield_completion * yc) noexcept { BOOST_ASSERT( nullptr != yc); yc->use_count_.fetch_add( 1, std::memory_order_relaxed); } friend void intrusive_ptr_release( yield_completion * yc) noexcept { BOOST_ASSERT( nullptr != yc); if ( 1 == yc->use_count_.fetch_sub( 1, std::memory_order_release) ) { std::atomic_thread_fence( std::memory_order_acquire); delete yc; } } }; Supposing that the pending async operation has not yet completed, yield_completion::completed_ will still be false, and wait() will call context::suspend() on the currently-running fiber. Other fibers will now have a chance to run. Some time later, the async operation completes. It calls yield_handler<void>::operator()(error_code const&) with an error_code indicating either success or failure. We’ll consider both cases. yield_handler<void> explicitly inherits operator()(error_code const&) from yield_handler_base. yield_handler_base::operator()(error_code const&) first sets yield_completion::completed_ true. This way, if async_something()’s async operation completes immediately — if yield_handler_base::operator() is called even before async_result_base::get() — the calling fiber will not suspend. The actual error_code produced by the async operation is then stored through the stored yield_t::ec_ pointer. If async_something()’s caller used (e.g.) yield[my_ec] to bind a local error_code instance, the actual error_code value is stored into the caller’s variable. Otherwise, it is stored into async_result_base::ec_. If the stored fiber context yield_handler_base::ctx_ is not already running, it is marked as ready to run by passing it to context::schedule(). Control then returns from yield_handler_base::operator(): the callback is done. In due course, that fiber is resumed. Control returns from context::suspend() to yield_completion::wait(), which returns to async_result_base::get(). If the original caller passed yield[my_ec] to async_something() to bind a local error_code instance, then yield_handler_base::operator() stored its error_code to the caller’s my_ec instance, leaving async_result_base::ec_ initialized to success. If the original caller passed yield to async_something() without binding a local error_code variable, then yield_handler_base::operator() stored its error_code into async_result_base::ec_. If in fact that error_code is success, then all is well. Otherwise — the original caller did not bind a local error_code and yield_handler_base::operator() was called with an error_code indicating error — async_result_base::get() throws system_error with that error_code. The case in which async_something()’s completion callback has signature void() is similar. yield_handler<void>::operator()() invokes the machinery above with a success error_code. A completion callback with signature void(error_code, T) (that is: in addition to error_code, callback receives some data item) is handled somewhat differently. For this kind of signature, handler_type<>::type specifies yield_handler<T> (for T other than void). A yield_handler<T> reserves a value_ pointer to a value of type T: // asio uses handler_type<completion token type, signature>::type to decide // what to instantiate as the actual handler. Below, we specialize // handler_type< yield_t, ... > to indicate yield_handler<>. So when you pass // an instance of yield_t as an asio completion token, asio selects // yield_handler<> as the actual handler class. template< typename T > class yield_handler: public yield_handler_base { public: // asio passes the completion token to the handler constructor explicit yield_handler( yield_t const& y) : yield_handler_base{ y } { } // completion callback passing only value (T) void operator()( T t) { // just like callback passing success error_code (*this)( boost::system::error_code(), std::move(t) ); } // completion callback passing (error_code, T) void operator()( boost::system::error_code const& ec, T t) { BOOST_ASSERT_MSG( value_, "Must inject value ptr " "before caling yield_handler<T>::operator()()"); // move the value to async_result<> instance BEFORE waking up a // suspended fiber * value_ = std::move( t); // forward the call to base-class completion handler yield_handler_base::operator()( ec); } //private: // pointer to destination for eventual value // this must be injected by async_result before operator()() is called T * value_{ nullptr }; }; This pointer is initialized to nullptr. When async_something() instantiates async_result<yield_handler<T>>: // asio constructs an async_result<> instance from the yield_handler specified // by handler_type<>::type. A particular asio async method constructs the // yield_handler, constructs this async_result specialization from it, then // returns the result of calling its get() method. template< typename T > class async_result< boost::fibers::asio::detail::yield_handler< T > > : public boost::fibers::asio::detail::async_result_base { public: // type returned by get() typedef T type; explicit async_result( boost::fibers::asio::detail::yield_handler< T > & h) : boost::fibers::asio::detail::async_result_base{ h } { // Inject ptr to our value_ member into yield_handler<>: result will // be stored here. h.value_ = & value_; } // asio async method returns result of calling get() type get() { boost::fibers::asio::detail::async_result_base::get(); return std::move( value_); } private: type value_{}; }; this async_result<> specialization reserves a member of type T to receive the passed data item, and sets yield_handler<T>::value_ to point to its own data member. async_result<yield_handler<T>> overrides get(). The override calls async_result_base::get(), so the calling fiber suspends as described above. yield_handler<T>::operator()(error_code, T) stores its passed T value into async_result<yield_handler<T>>::value_. Then it passes control to yield_handler_base::operator()(error_code) to deal with waking the original fiber as described above. When async_result<yield_handler<T>>::get() resumes, it returns the stored value_ to async_something() and ultimately to async_something()’s caller. The case of a callback signature void(T) is handled by having yield_handler<T>::operator()(T) engage the void(error_code, T) machinery, passing a success error_code. The source code above is found in yield.hpp and detail/yield.hpp.

<anchor id="nonblocking"/><link linkend="fiber.nonblocking">Integrating Fibers with Nonblocking I/O</link> Overview Nonblocking I/O is distinct from asynchronous I/O. A true async I/O operation promises to initiate the operation and notify the caller on completion, usually via some sort of callback (as described in Integrating Fibers with Asynchronous Callbacks). In contrast, a nonblocking I/O operation refuses to start at all if it would be necessary to block, returning an error code such as EWOULDBLOCK. The operation is performed only when it can complete immediately. In effect, the caller must repeatedly retry the operation until it stops returning EWOULDBLOCK. In a classic event-driven program, it can be something of a headache to use nonblocking I/O. At the point where the nonblocking I/O is attempted, a return value of EWOULDBLOCK requires the caller to pass control back to the main event loop, arranging to retry again on the next iteration. Worse, a nonblocking I/O operation might partially succeed. That means that the relevant business logic must continue receiving control on every main loop iteration until all required data have been processed: a doubly-nested loop, implemented as a callback-driven state machine. Boost.Fiber can simplify this problem immensely. Once you have integrated with the application's main loop as described in Sharing a Thread with Another Main Loop, waiting for the next main-loop iteration is as simple as calling this_fiber::yield(). Example Nonblocking API For purposes of illustration, consider this API: class NonblockingAPI { public: NonblockingAPI(); // nonblocking operation: may return EWOULDBLOCK int read( std::string & data, std::size_t desired); ... }; Polling for Completion We can build a low-level wrapper around NonblockingAPI::read() that shields its caller from ever having to deal with EWOULDBLOCK: // guaranteed not to return EWOULDBLOCK int read_chunk( NonblockingAPI & api, std::string & data, std::size_t desired) { int error; while ( EWOULDBLOCK == ( error = api.read( data, desired) ) ) { // not ready yet -- try again on the next iteration of the // application's main loop boost::this_fiber::yield(); } return error; } Filling All Desired Data Given read_chunk(), we can straightforwardly iterate until we have all desired data: // keep reading until desired length, EOF or error // may return both partial data and nonzero error int read_desired( NonblockingAPI & api, std::string & data, std::size_t desired) { // we're going to accumulate results into 'data' data.clear(); std::string chunk; int error = 0; while ( data.length() < desired && ( error = read_chunk( api, chunk, desired - data.length() ) ) == 0) { data.append( chunk); } return error; } (Of course there are more efficient ways to accumulate string data. That's not the point of this example.) Wrapping it Up Finally, we can define a relevant exception: // exception class augmented with both partially-read data and errorcode class IncompleteRead : public std::runtime_error { public: IncompleteRead( std::string const& what, std::string const& partial, int ec) : std::runtime_error( what), partial_( partial), ec_( ec) { } std::string get_partial() const { return partial_; } int get_errorcode() const { return ec_; } private: std::string partial_; int ec_; }; and write a simple read() function that either returns all desired data or throws IncompleteRead: // read all desired data or throw IncompleteRead std::string read( NonblockingAPI & api, std::size_t desired) { std::string data; int ec( read_desired( api, data, desired) ); // for present purposes, EOF isn't a failure if ( 0 == ec || EOF == ec) { return data; } // oh oh, partial read std::ostringstream msg; msg << "NonblockingAPI::read() error " << ec << " after " << data.length() << " of " << desired << " characters"; throw IncompleteRead( msg.str(), data, ec); } Once we can transparently wait for the next main-loop iteration using this_fiber::yield(), ordinary encapsulation Just Works. The source code above is found in adapt_nonblocking.cpp.

<anchor id="when_any"/><link linkend="fiber.when_any">when_any / when_all functionality</link> Overview A bit of wisdom from the early days of computing still holds true today: prefer to model program state using the instruction pointer rather than with Boolean flags. In other words, if the program must do something and then do something almost the same, but with minor changes... perhaps parts of that something should be broken out as smaller separate functions, rather than introducing flags to alter the internal behavior of a monolithic function. To that we would add: prefer to describe control flow using C++ native constructs such as function calls, if, while, for, do et al. rather than as chains of callbacks. One of the great strengths of Boost.Fiber is the flexibility it confers on the coder to restructure an application from chains of callbacks to straightforward C++ statement sequence, even when code in that fiber is in fact interleaved with code running in other fibers. There has been much recent discussion about the benefits of when_any and when_all functionality. When dealing with asynchronous and possibly unreliable services, these are valuable idioms. But of course when_any and when_all are closely tied to the use of chains of callbacks. This section presents recipes for achieving the same ends, in the context of a fiber that wants to do something when one or more other independent activities have completed. Accordingly, these are wait_something() functions rather than when_something() functions. The expectation is that the calling fiber asks to launch those independent activities, then waits for them, then sequentially proceeds with whatever processing depends on those results. The function names shown (e.g. wait_first_simple()) are for illustrative purposes only, because all these functions have been bundled into a single source file. Presumably, if (say) wait_first_success() best suits your application needs, you could introduce that variant with the name wait_any(). The functions presented in this section accept variadic argument lists of task functions. Corresponding wait_something() functions accepting a container of task functions are left as an exercise for the interested reader. Those should actually be simpler. Most of the complexity would arise from overloading the same name for both purposes. All the source code for this section is found in wait_stuff.cpp. Example Task Function We found it convenient to model an asynchronous task using this function: template< typename T > T sleeper_impl( T item, int ms, bool thrw = false) { std::ostringstream descb, funcb; descb << item; std::string desc( descb.str() ); funcb << " sleeper(" << item << ")"; Verbose v( funcb.str() ); boost::this_fiber::sleep_for( std::chrono::milliseconds( ms) ); if ( thrw) { throw std::runtime_error( desc); } return item; } with type-specific sleeper() front ends for std::string, double and int. Verbose simply prints a message to std::cout on construction and destruction. Basically: sleeper() prints a start message; sleeps for the specified number of milliseconds; if thrw is passed as true, throws a string description of the passed item; else returns the passed item. On the way out, sleeper() produces a stop message. This function will feature in the example calls to the various functions presented below.

<anchor id="wait_first_simple_section"/><link linkend="fiber.when_any.when_any.when_any__simple_completion">when_any, simple completion</link> The simplest case is when you only need to know that the first of a set of asynchronous tasks has completed — but you don't need to obtain a return value, and you're confident that they will not throw exceptions. For this we introduce a Done class to wrap a bool variable with a condition_variable and a mutex: // Wrap canonical pattern for condition_variable + bool flag struct Done { private: boost::fibers::condition_variable cond; boost::fibers::mutex mutex; bool ready = false; public: typedef std::shared_ptr< Done > ptr; void wait() { std::unique_lock< boost::fibers::mutex > lock( mutex); cond.wait( lock, [this](){ return ready; }); } void notify() { { std::unique_lock< boost::fibers::mutex > lock( mutex); ready = true; } // release mutex cond.notify_one(); } }; The pattern we follow throughout this section is to pass a std::shared_ptr<> to the relevant synchronization object to the various tasks' fiber functions. This eliminates nagging questions about the lifespan of the synchronization object relative to the last of the fibers. wait_first_simple() uses that tactic for Done: template< typename ... Fns > void wait_first_simple( Fns && ... functions) { // Use shared_ptr because each function's fiber will bind it separately, // and we're going to return before the last of them completes. auto done( std::make_shared< Done >() ); wait_first_simple_impl( done, std::forward< Fns >( functions) ... ); done->wait(); } wait_first_simple_impl() is an ordinary recursion over the argument pack, capturing Done::ptr for each new fiber: // Degenerate case: when there are no functions to wait for, return // immediately. void wait_first_simple_impl( Done::ptr) { } // When there's at least one function to wait for, launch it and recur to // process the rest. template< typename Fn, typename ... Fns > void wait_first_simple_impl( Done::ptr done, Fn && function, Fns && ... functions) { boost::fibers::fiber( [done, function](){ function(); done->notify(); }).detach(); wait_first_simple_impl( done, std::forward< Fns >( functions) ... ); } The body of the fiber's lambda is extremely simple, as promised: call the function, notify Done when it returns. The first fiber to do so allows wait_first_simple() to return — which is why it's useful to have std::shared_ptr<Done> manage the lifespan of our Done object rather than declaring it as a stack variable in wait_first_simple(). This is how you might call it: wait_first_simple( [](){ sleeper("wfs_long", 150); }, [](){ sleeper("wfs_medium", 100); }, [](){ sleeper("wfs_short", 50); }); In this example, control resumes after wait_first_simple() when

sleeper("wfs_short",
          50)

completes — even though the other two sleeper() fibers are still running.

future<
          T >

items instead of simply T. Once we have a future<> in hand, all we need do is call future::get(), which will either return the value or rethrow the exception. template< typename Fn, typename ... Fns > typename std::result_of< Fn() >::type wait_first_outcome( Fn && function, Fns && ... functions) { // In this case, the value we pass through the channel is actually a // future -- which is already ready. future can carry either a value or an // exception. typedef typename std::result_of< Fn() >::type return_t; typedef boost::fibers::future< return_t > future_t; typedef boost::fibers::buffered_channel< future_t > channel_t; auto chanp(std::make_shared< channel_t >( 64) ); // launch all the relevant fibers wait_first_outcome_impl< return_t >( chanp, std::forward< Fn >( function), std::forward< Fns >( functions) ... ); // retrieve the first future future_t future( chanp->value_pop() ); // close the channel: no subsequent push() has to succeed chanp->close(); // either return value or throw exception return future.get(); } So far so good — but there's a timing issue. How should we obtain the future<> to buffered_channel::push() on the queue? We could call fibers::async(). That would certainly produce a future<> for the task function. The trouble is that it would return too quickly! We only want future<> items for completed tasks on our queue<>. In fact, we only want the future<> for the one that completes first. If each fiber launched by wait_first_outcome() were to push() the result of calling async(), the queue would only ever report the result of the leftmost task item — not the one that completes most quickly. Calling future::get() on the future returned by async() wouldn't be right. You can only call get() once per future<> instance! And if there were an exception, it would be rethrown inside the helper fiber at the producer end of the queue, rather than propagated to the consumer end. We could call future::wait(). That would block the helper fiber until the future<> became ready, at which point we could push() it to be retrieved by wait_first_outcome(). That would work — but there's a simpler tactic that avoids creating an extra fiber. We can wrap the task function in a packaged_task<>. While one naturally thinks of passing a packaged_task<> to a new fiber — that is, in fact, what async() does — in this case, we're already running in the helper fiber at the producer end of the queue! We can simply call the packaged_task<>. On return from that call, the task function has completed, meaning that the future<> obtained from the packaged_task<> is certain to be ready. At that point we can simply push() it to the queue. template< typename T, typename CHANP, typename Fn > void wait_first_outcome_impl( CHANP chan, Fn && function) { boost::fibers::fiber( // Use std::bind() here for C++11 compatibility. C++11 lambda capture // can't move a move-only Fn type, but bind() can. Let bind() move the // channel pointer and the function into the bound object, passing // references into the lambda. std::bind( []( CHANP & chan, typename std::decay< Fn >::type & function) { // Instantiate a packaged_task to capture any exception thrown by // function. boost::fibers::packaged_task< T() > task( function); // Immediately run this packaged_task on same fiber. We want // function() to have completed BEFORE we push the future. task(); // Pass the corresponding future to consumer. Ignore // channel_op_status returned by push(): might be closed; we // simply don't care. chan->push( task.get_future() ); }, chan, std::forward< Fn >( function) )).detach(); } Calling it might look like this: std::string result = wait_first_outcome( [](){ return sleeper("wfos_first", 50); }, [](){ return sleeper("wfos_second", 100); }, [](){ return sleeper("wfos_third", 150); }); std::cout << "wait_first_outcome(success) => " << result << std::endl; assert(result == "wfos_first"); std::string thrown; try { result = wait_first_outcome( [](){ return sleeper("wfof_first", 50, true); }, [](){ return sleeper("wfof_second", 100); }, [](){ return sleeper("wfof_third", 150); }); } catch ( std::exception const& e) { thrown = e.what(); } std::cout << "wait_first_outcome(fail) threw '" << thrown << "'" << std::endl; assert(thrown == "wfof_first");

queue<
          future<
          T >
          >

we already constructed for wait_first_outcome(), though, we can readily recast the interface function to deliver the first successful result. That does beg the question: what if all the task functions throw an exception? In that case we'd probably better know about it. The C++ Parallelism Draft Technical Specification proposes a std::exception_list exception capable of delivering a collection of std::exception_ptrs. Until that becomes universally available, let's fake up an exception_list of our own: class exception_list : public std::runtime_error { public: exception_list( std::string const& what) : std::runtime_error( what) { } typedef std::vector< std::exception_ptr > bundle_t; // N4407 proposed std::exception_list API typedef bundle_t::const_iterator iterator; std::size_t size() const noexcept { return bundle_.size(); } iterator begin() const noexcept { return bundle_.begin(); } iterator end() const noexcept { return bundle_.end(); } // extension to populate void add( std::exception_ptr ep) { bundle_.push_back( ep); } private: bundle_t bundle_; }; Now we can build wait_first_success(), using wait_first_outcome_impl(). Instead of retrieving only the first future<> from the queue, we must now loop over future<> items. Of course we must limit that iteration! If we launch only count producer fibers, the (count+1)st buffered_channel::pop() call would block forever. Given a ready future<>, we can distinguish failure by calling future::get_exception_ptr(). If the future<> in fact contains a result rather than an exception, get_exception_ptr() returns nullptr. In that case, we can confidently call future::get() to return that result to our caller. If the std::exception_ptr is not nullptr, though, we collect it into our pending exception_list and loop back for the next future<> from the queue. If we fall out of the loop — if every single task fiber threw an exception — we throw the exception_list exception into which we've been collecting those std::exception_ptrs. template< typename Fn, typename ... Fns > typename std::result_of< Fn() >::type wait_first_success( Fn && function, Fns && ... functions) { std::size_t count( 1 + sizeof ... ( functions) ); // In this case, the value we pass through the channel is actually a // future -- which is already ready. future can carry either a value or an // exception. typedef typename std::result_of< typename std::decay< Fn >::type() >::type return_t; typedef boost::fibers::future< return_t > future_t; typedef boost::fibers::buffered_channel< future_t > channel_t; auto chanp( std::make_shared< channel_t >( 64) ); // launch all the relevant fibers wait_first_outcome_impl< return_t >( chanp, std::forward< Fn >( function), std::forward< Fns >( functions) ... ); // instantiate exception_list, just in case exception_list exceptions("wait_first_success() produced only errors"); // retrieve up to 'count' results -- but stop there! for ( std::size_t i = 0; i < count; ++i) { // retrieve the next future future_t future( chanp->value_pop() ); // retrieve exception_ptr if any std::exception_ptr error( future.get_exception_ptr() ); // if no error, then yay, return value if ( ! error) { // close the channel: no subsequent push() has to succeed chanp->close(); // show caller the value we got return future.get(); } // error is non-null: collect exceptions.add( error); } // We only arrive here when every passed function threw an exception. // Throw our collection to inform caller. throw exceptions; } A call might look like this: std::string result = wait_first_success( [](){ return sleeper("wfss_first", 50, true); }, [](){ return sleeper("wfss_second", 100); }, [](){ return sleeper("wfss_third", 150); }); std::cout << "wait_first_success(success) => " << result << std::endl; assert(result == "wfss_second");

future<
          T >

instead of plain T. wait_all_until_error() pops that future< T > and calls its future::get(): template< typename Fn, typename ... Fns > std::vector< typename std::result_of< Fn() >::type > wait_all_until_error( Fn && function, Fns && ... functions) { std::size_t count( 1 + sizeof ... ( functions) ); typedef typename std::result_of< Fn() >::type return_t; typedef typename boost::fibers::future< return_t > future_t; typedef std::vector< return_t > vector_t; vector_t results; results.reserve( count); // get channel std::shared_ptr< boost::fibers::buffered_channel< future_t > > chan( wait_all_until_error_source( std::forward< Fn >( function), std::forward< Fns >( functions) ... ) ); // fill results vector future_t future; while ( boost::fibers::channel_op_status::success == chan->pop( future) ) { results.push_back( future.get() ); } // return vector to caller return results; } For example: std::string thrown; try { std::vector< std::string > values = wait_all_until_error( [](){ return sleeper("waue_late", 150); }, [](){ return sleeper("waue_middle", 100, true); }, [](){ return sleeper("waue_early", 50); }); } catch ( std::exception const& e) { thrown = e.what(); } std::cout << "wait_all_until_error(fail) threw '" << thrown << "'" << std::endl; Naturally this complicates the API for wait_all_until_error_source(). The caller must both retrieve a

future<
          T >

and call its get() method. It would, of course, be possible to return a façade over the consumer end of the queue that would implicitly perform the get() and return a simple T (or throw). The implementation is just as you would expect. Notice, however, that we can reuse wait_first_outcome_impl(), passing the nqueue<T> rather than queue<T>. // Return a shared_ptr<buffered_channel<future<T>>> from which the caller can // get() each new result as it arrives, until 'closed'. template< typename Fn, typename ... Fns > std::shared_ptr< boost::fibers::buffered_channel< boost::fibers::future< typename std::result_of< Fn() >::type > > > wait_all_until_error_source( Fn && function, Fns && ... functions) { std::size_t count( 1 + sizeof ... ( functions) ); typedef typename std::result_of< Fn() >::type return_t; typedef boost::fibers::future< return_t > future_t; typedef boost::fibers::buffered_channel< future_t > channel_t; // make the channel auto chanp( std::make_shared< channel_t >( 64) ); // and make an nchannel facade to close it after 'count' items auto ncp( std::make_shared< nchannel< future_t > >( chanp, count) ); // pass that nchannel facade to all the relevant fibers wait_first_outcome_impl< return_t >( ncp, std::forward< Fn >( function), std::forward< Fns >( functions) ... ); // then return the channel for consumer return chanp; } For example: typedef boost::fibers::future< std::string > future_t; std::shared_ptr< boost::fibers::buffered_channel< future_t > > chan = wait_all_until_error_source( [](){ return sleeper("wauess_third", 150); }, [](){ return sleeper("wauess_second", 100); }, [](){ return sleeper("wauess_first", 50); }); future_t future; while ( boost::fibers::channel_op_status::success == chan->pop( future) ) { std::string value( future.get() ); std::cout << "wait_all_until_error_source(success) => '" << value << "'" << std::endl; }

<anchor id="integration"/><link linkend="fiber.integration">Sharing a Thread with Another Main Loop</link>

<anchor id="embedded_main_loop"/><link linkend="fiber.integration.embedded_main_loop">Embedded Main Loop</link> More challenging is when the application’s main loop is embedded in some other library or framework. Such an application will typically, after performing all necessary setup, pass control to some form of run() function from which control does not return until application shutdown. A Boost.Asio program might call io_service::run() in this way. In general, the trick is to arrange to pass control to this_fiber::yield() frequently. You could use an Asio timer for that purpose. You could instantiate the timer, arranging to call a handler function when the timer expires. The handler function could call yield(), then reset the timer and arrange to wake up again on its next expiration. Since, in this thought experiment, we always pass control to the fiber manager via yield(), the calling fiber is never blocked. Therefore there is always at least one ready fiber. Therefore the fiber manager never calls algorithm::suspend_until(). Using io_service::post() instead of setting a timer for some nonzero interval would be unfriendly to other threads. When all I/O is pending and all fibers are blocked, the io_service and the fiber manager would simply spin the CPU, passing control back and forth to each other. Using a timer allows tuning the responsiveness of this thread relative to others.

<anchor id="speculation"/><link linkend="fiber.speculation">Specualtive execution</link> Hardware transactional memory With help of hardware transactional memory multiple logical processors execute a critical region speculatively, e.g. without explicit synchronization. If the transactional execution completes successfully, then all memory operations performed within the transactional region are commited without any inter-thread serialization. When the optimistic execution fails, the processor aborts the transaction and discards all performed modifications. In non-transactional code a single lock serializes the access to a critical region. With a transactional memory, multiple logical processor start a transaction and update the memory (the data) inside the ciritical region. Unless some logical processors try to update the same data, the transactions would always succeed. Intel Transactional Synchronisation Extensions (TSX) TSX is Intel's implementation of hardware transactional memory in modern Intel processors intel.com: Intel Transactional Synchronization Extensions . In TSX the hardware keeps track of which cachelines have been read from and which have been written to in a transaction. The cache-line size (64-byte) and the n-way set associative cache determine the maximum size of memory in a transaction. For instance if a transaction modifies 9 cache-lines at a processor with a 8-way set associative cache, the transaction will always be aborted. TXS is enabled if property htm=tsx is specified at b2 command-line and BOOST_USE_TSX is applied to the compiler. A TSX-transaction will be aborted if the floating point state is modified inside a critical region. As a consequence floating point operations, e.g. store/load of floating point related registers during a fiber (context) switch are disabled. TSX can not be used together with MSVC at this time! Boost.Fiber uses TSX-enabled spinlocks to protect critical regions (see section Tuning).

<anchor id="numa"/><link linkend="fiber.numa">NUMA</link> Modern micro-processors contain integrated memory controllers that are connected via channels to the memory. Accessing the memory can be organized in two kinds: Uniform Memory Access (UMA) and Non-Uniform Memory Access (NUMA). In contrast to UMA, that provides a centralized pool of memory (and thus does not scale after a certain number of processors), a NUMA architecture divides the memory into local and remote memory relative to the micro-processor. Local memory is directly attached to the processor's integrated memory controller. Memory connected to the memory controller of another micro-processor (multi-socket systems) is considered as remote memory. If a memory controller access remote memory it has to traverse the interconnect On x86 the interconnection is implemented by Intel's Quick Path Interconnect (QPI) and AMD's HyperTransport. and connect to the remote memory controller. Thus accessing remote memory adds additional latency overhead to local memory access. Because of the different memory locations, a NUMA-system experiences non-uniform memory access time. As a consequence the best performance is achieved by keeping the memory access local. NUMA NUMA support in Boost.Fiber Because only a subset of the NUMA-functionality is exposed by several operating systems, Boost.Fiber provides only a minimalistic NUMA API. In order to enable NUMA support, b2 property numa=on must be specified and linked against additional library libboost_fiber_numa.so. MinGW using pthread implementation is not supported on Windows. Supported functionality/operating systems AIX FreeBSD HP/UX Linux Solaris Windows pin thread + + + + + + logical CPUs/NUMA nodes + + + + + + Windows organizes logical cpus in groups of 64; boost.fiber maps {group-id,cpud-id} to a scalar equivalent to cpu ID of Linux (64 * group ID + cpu ID). NUMA node distance - - - + - - tested on AIX 7.2 FreeBSD 11 - Arch Linux (4.10.13) OpenIndiana HIPSTER Windows 10

In order to keep the memory access local as possible, the NUMA topology must be evaluated. std::vector< boost::fibers::numa::node > topo = boost::fibers::numa::topology(); for ( auto n : topo) { std::cout << "node: " << n.id << " | "; std::cout << "cpus: "; for ( auto cpu_id : n.logical_cpus) { std::cout << cpu_id << " "; } std::cout << "| distance: "; for ( auto d : n.distance) { std::cout << d << " "; } std::cout << std::endl; } std::cout << "done" << std::endl; output: node: 0 | cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 | distance: 10 21 node: 1 | cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 | distance: 21 10 done The example shows that the systems consits out of 2 NUMA-nodes, to each NUMA-node belong 16 logical cpus. The distance measures the costs to access the memory of another NUMA-node. A NUMA-node has always a distance 10 to itself (lowest possible value). The position in the array corresponds with the NUMA-node ID. Some work-loads benefit from pinning threads to a logical cpus. For instance scheduling algorithm numa::work_stealing pins the thread that runs the fiber scheduler to a logical cpu. This prevents the operating system scheduler to move the thread to another logical cpu that might run other fiber scheduler(s) or migrating the thread to a logical cpu part of another NUMA-node. void thread( std::uint32_t cpu_id, std::uint32_t node_id, std::vector< boost::fibers::numa::node > const& topo) { // thread registers itself at work-stealing scheduler boost::fibers::use_scheduling_algorithm< boost::fibers::algo::numa::work_stealing >( cpu_id, node_id, topo); ... } // evaluate the NUMA topology std::vector< boost::fibers::numa::node > topo = boost::fibers::numa::topology(); // start-thread runs on NUMA-node `0` auto node = topo[0]; // start-thread is pinnded to first cpu ID in the list of logical cpus of NUMA-node `0` auto start_cpu_id = * node.logical_cpus.begin(); // start worker-threads first std::vector< std::thread > threads; for ( auto & node : topo) { for ( std::uint32_t cpu_id : node.logical_cpus) { // exclude start-thread if ( start_cpu_id != cpu_id) { // spawn thread threads.emplace_back( thread, cpu_id, node.id, std::cref( topo) ); } } } // start-thread registers itself on work-stealing scheduler boost::fibers::use_scheduling_algorithm< boost::fibers::algo::numa::work_stealing >( start_cpu_id, node.id, topo); ... The example evaluates the NUMA topology with boost::fibers::numa::topology() and spawns for each logical cpu a thread. Each spawned thread installs the NUMA-aware work-stealing scheduler. The scheduler pins the thread to the logical cpu that was specified at construction. If the local queue of one thread runs out of ready fibers, the thread tries to steal a ready fiber from another thread running at logical cpu that belong to the same NUMA-node (local memory access). If no fiber could be stolen, the thread tries to steal fibers from logical cpus part of other NUMA-nodes (remote memory access). Synopsis #include <boost/fiber/numa/pin_thread.hpp> #include <boost/fiber/numa/topology.hpp> namespace boost { namespace fibers { namespace numa { struct node { std::uint32_t id; std::set< std::uint32_t > logical_cpus; std::vector< std::uint32_t > distance; }; bool operator<( node const&, node const&) noexcept; std::vector< node > topology(); void pin_thread( std::uint32_t); void pin_thread( std::uint32_t, std::thread::native_handle_type); }}} #include <boost/fiber/numa/algo/work_stealing.hpp> namespace boost { namespace fibers { namespace numa { namespace algo { class work_stealing; }}} Class numa::node #include <boost/fiber/numa/topology.hpp> namespace boost { namespace fibers { namespace numa { struct node { std::uint32_t id; std::set< std::uint32_t > logical_cpus; std::vector< std::uint32_t > distance; }; bool operator<( node const&, node const&) noexcept; }}} Data member id std::uint32_t id; Effects: ID of the NUMA-node Data member logical_cpus std::set< std::uint32_t > logical_cpus; Effects: set of logical cpu IDs belonging to the NUMA-node Data member distance std::vector< std::uint32_t > distance; Effects: The distance between NUMA-nodes describe the cots of accessing the remote memory. Note: A NUMA-node has a distance of 10 to itself, remote NUMA-nodes have a distance > 10. The index in the array corresponds to the ID id of the NUMA-node. At the moment only Linux returns the correct distances, for all other operating systems remote NUMA-nodes get a default value of 20. Member function operator<() bool operator<( node const& lhs, node const& rhs) const noexcept; Returns: true if

lhs
            != rhs

is true and the implementation-defined total order of node::id values places lhs before rhs, false otherwise. Throws: Nothing. Non-member function numa::topology() #include <boost/fiber/numa/topology.hpp> namespace boost { namespace fibers { namespace numa { std::vector< node > topology(); }}} Effects: Evaluates the NUMA topology. Returns: a vector of NUMA-nodes describing the NUMA architecture of the system (each element represents a NUMA-node). Throws: system_error Non-member function numa::pin_thread() #include <boost/fiber/numa/pin_thread.hpp> namespace boost { namespace fibers { namespace numa { void pin_thread( std::uint32_t cpu_id); void pin_thread( std::uint32_t cpu_id, std::thread::native_handle_type h); }}} Effects: First version pins this thread to the logical cpu with ID cpu_id, e.g. the operating system scheduler will not migrate the thread to another logical cpu. The second variant pins the thread with the native ID h to logical cpu with ID cpu_id. Throws: system_error Class numa::work_stealing This class implements algorithm; the thread running this scheduler is pinned to the given logical cpu. If the local ready-queue runs out of ready fibers, ready fibers are stolen from other schedulers that run on logical cpus that belong to the same NUMA-node (local memory access). If no ready fibers can be stolen from the local NUMA-node, the algorithm selects schedulers running on other NUMA-nodes (remote memory access). The victim scheduler (from which a ready fiber is stolen) is selected at random. #include <boost/fiber/numa/algo/work_stealing.hpp> namespace boost { namespace fibers { namespace numa { namespace algo { class work_stealing : public algorithm { public: work_stealing( std::uint32_t cpu_id, std::uint32_t node_id, std::vector< boost::fibers::numa::node > const& topo, bool suspend = false); work_stealing( work_stealing const&) = delete; work_stealing( work_stealing &&) = delete; work_stealing & operator=( work_stealing const&) = delete; work_stealing & operator=( work_stealing &&) = delete; virtual void awakened( context *) noexcept; virtual context * pick_next() noexcept; virtual bool has_ready_fibers() const noexcept; virtual void suspend_until( std::chrono::steady_clock::time_point const&) noexcept; virtual void notify() noexcept; }; }}}} Constructor work_stealing( std::uint32_t cpu_id, std::uint32_t node_id, std::vector< boost::fibers::numa::node > const& topo, bool suspend = false); Effects: Constructs work-stealing scheduling algorithm. The thread is pinned to logical cpu with ID cpu_id. If local ready-queue runs out of ready fibers, ready fibers are stolen from other schedulers using topology (represents the NUMA-topology of the system). Throws: system_error Note: If suspend is set to true, then the scheduler suspends if no ready fiber could be stolen. The scheduler will by woken up if a sleeping fiber times out or it was notified from remote (other thread or fiber scheduler). Member function awakened() virtual void awakened( context * f) noexcept; Effects: Enqueues fiber f onto the shared ready queue. Throws: Nothing. Member function pick_next() virtual context * pick_next() noexcept; Returns: the fiber at the head of the ready queue, or nullptr if the queue is empty. Throws: Nothing. Note: Placing ready fibers onto the tail of the sahred queue, and returning them from the head of that queue, shares the thread between ready fibers in round-robin fashion. Member function has_ready_fibers() virtual bool has_ready_fibers() const noexcept; Returns: true if scheduler has fibers ready to run. Throws: Nothing. Member function suspend_until() virtual void suspend_until( std::chrono::steady_clock::time_point const& abs_time) noexcept; Effects: Informs work_stealing that no ready fiber will be available until time-point abs_time. This implementation blocks in std::condition_variable::wait_until(). Throws: Nothing. Member function notify() virtual void notify() noexcept = 0; Effects: Wake up a pending call to work_stealing::suspend_until(), some fibers might be ready. This implementation wakes suspend_until() via std::condition_variable::notify_all(). Throws: Nothing.

<anchor id="cuda"/><link linkend="fiber.gpu_computing.cuda">CUDA</link> CUDA (Compute Unified Device Architecture) is a platform for parallel computing on NVIDIA GPUs. The application programming interface of CUDA gives access to GPU's instruction set and computation resources (Execution of compute kernels). Synchronization with CUDA streams CUDA operation such as compute kernels or memory transfer (between host and device) can be grouped/queued by CUDA streams. are executed on the GPUs. Boost.Fiber enables a fiber to sleep (suspend) till a CUDA stream has completed its operations. This enables applications to run other fibers on the CPU without the need to spawn an additional OS-threads. And resume the fiber when the CUDA streams has finished. __global__ void kernel( int size, int * a, int * b, int * c) { int idx = threadIdx.x + blockIdx.x * blockDim.x; if ( idx < size) { int idx1 = (idx + 1) % 256; int idx2 = (idx + 2) % 256; float as = (a[idx] + a[idx1] + a[idx2]) / 3.0f; float bs = (b[idx] + b[idx1] + b[idx2]) / 3.0f; c[idx] = (as + bs) / 2; } } boost::fibers::fiber f([&done]{ cudaStream_t stream; cudaStreamCreate( & stream); int size = 1024 * 1024; int full_size = 20 * size; int * host_a, * host_b, * host_c; cudaHostAlloc( & host_a, full_size * sizeof( int), cudaHostAllocDefault); cudaHostAlloc( & host_b, full_size * sizeof( int), cudaHostAllocDefault); cudaHostAlloc( & host_c, full_size * sizeof( int), cudaHostAllocDefault); int * dev_a, * dev_b, * dev_c; cudaMalloc( & dev_a, size * sizeof( int) ); cudaMalloc( & dev_b, size * sizeof( int) ); cudaMalloc( & dev_c, size * sizeof( int) ); std::minstd_rand generator; std::uniform_int_distribution<> distribution(1, 6); for ( int i = 0; i < full_size; ++i) { host_a[i] = distribution( generator); host_b[i] = distribution( generator); } for ( int i = 0; i < full_size; i += size) { cudaMemcpyAsync( dev_a, host_a + i, size * sizeof( int), cudaMemcpyHostToDevice, stream); cudaMemcpyAsync( dev_b, host_b + i, size * sizeof( int), cudaMemcpyHostToDevice, stream); kernel<<< size / 256, 256, 0, stream >>>( size, dev_a, dev_b, dev_c); cudaMemcpyAsync( host_c + i, dev_c, size * sizeof( int), cudaMemcpyDeviceToHost, stream); } auto result = boost::fibers::cuda::waitfor_all( stream); // suspend fiber till CUDA stream has finished BOOST_ASSERT( stream == std::get< 0 >( result) ); BOOST_ASSERT( cudaSuccess == std::get< 1 >( result) ); std::cout << "f1: GPU computation finished" << std::endl; cudaFreeHost( host_a); cudaFreeHost( host_b); cudaFreeHost( host_c); cudaFree( dev_a); cudaFree( dev_b); cudaFree( dev_c); cudaStreamDestroy( stream); }); f.join(); Synopsis #include <boost/fiber/cuda/waitfor.hpp> namespace boost { namespace fibers { namespace cuda { std::tuple< cudaStream_t, cudaError_t > waitfor_all( cudaStream_t st); std::vector< std::tuple< cudaStream_t, cudaError_t > > waitfor_all( cudaStream_t ... st); }}} Non-member function cuda::waitfor() #include <boost/fiber/cuda/waitfor.hpp> namespace boost { namespace fibers { namespace cuda { std::tuple< cudaStream_t, cudaError_t > waitfor_all( cudaStream_t st); std::vector< std::tuple< cudaStream_t, cudaError_t > > waitfor_all( cudaStream_t ... st); }}} Effects: Suspends active fiber till CUDA stream has finished its operations. Returns: tuple of stream reference and the CUDA stream status

<anchor id="hip"/><link linkend="fiber.gpu_computing.hip">ROCm/HIP</link> HIP is part of the ROC (Radeon Open Compute) platform for parallel computing on AMD and NVIDIA GPUs. The application programming interface of HIP gives access to GPU's instruction set and computation resources (Execution of compute kernels). Synchronization with ROCm/HIP streams HIP operation such as compute kernels or memory transfer (between host and device) can be grouped/queued by HIP streams. are executed on the GPUs. Boost.Fiber enables a fiber to sleep (suspend) till a HIP stream has completed its operations. This enables applications to run other fibers on the CPU without the need to spawn an additional OS-threads. And resume the fiber when the HIP streams has finished. __global__ void kernel( int size, int * a, int * b, int * c) { int idx = threadIdx.x + blockIdx.x * blockDim.x; if ( idx < size) { int idx1 = (idx + 1) % 256; int idx2 = (idx + 2) % 256; float as = (a[idx] + a[idx1] + a[idx2]) / 3.0f; float bs = (b[idx] + b[idx1] + b[idx2]) / 3.0f; c[idx] = (as + bs) / 2; } } boost::fibers::fiber f([&done]{ hipStream_t stream; hipStreamCreate( & stream); int size = 1024 * 1024; int full_size = 20 * size; int * host_a, * host_b, * host_c; hipHostMalloc( & host_a, full_size * sizeof( int), hipHostMallocDefault); hipHostMalloc( & host_b, full_size * sizeof( int), hipHostMallocDefault); hipHostMalloc( & host_c, full_size * sizeof( int), hipHostMallocDefault); int * dev_a, * dev_b, * dev_c; hipMalloc( & dev_a, size * sizeof( int) ); hipMalloc( & dev_b, size * sizeof( int) ); hipMalloc( & dev_c, size * sizeof( int) ); std::minstd_rand generator; std::uniform_int_distribution<> distribution(1, 6); for ( int i = 0; i < full_size; ++i) { host_a[i] = distribution( generator); host_b[i] = distribution( generator); } for ( int i = 0; i < full_size; i += size) { hipMemcpyAsync( dev_a, host_a + i, size * sizeof( int), hipMemcpyHostToDevice, stream); hipMemcpyAsync( dev_b, host_b + i, size * sizeof( int), hipMemcpyHostToDevice, stream); hipLaunchKernel(kernel, dim3(size / 256), dim3(256), 0, stream, size, dev_a, dev_b, dev_c); hipMemcpyAsync( host_c + i, dev_c, size * sizeof( int), hipMemcpyDeviceToHost, stream); } auto result = boost::fibers::hip::waitfor_all( stream); // suspend fiber till HIP stream has finished BOOST_ASSERT( stream == std::get< 0 >( result) ); BOOST_ASSERT( hipSuccess == std::get< 1 >( result) ); std::cout << "f1: GPU computation finished" << std::endl; hipHostFree( host_a); hipHostFree( host_b); hipHostFree( host_c); hipFree( dev_a); hipFree( dev_b); hipFree( dev_c); hipStreamDestroy( stream); }); f.join(); Synopsis #include <boost/fiber/hip/waitfor.hpp> namespace boost { namespace fibers { namespace hip { std::tuple< hipStream_t, hipError_t > waitfor_all( hipStream_t st); std::vector< std::tuple< hipStream_t, hipError_t > > waitfor_all( hipStream_t ... st); }}} Non-member function hip::waitfor() #include <boost/fiber/hip/waitfor.hpp> namespace boost { namespace fibers { namespace hip { std::tuple< hipStream_t, hipError_t > waitfor_all( hipStream_t st); std::vector< std::tuple< hipStream_t, hipError_t > > waitfor_all( hipStream_t ... st); }}} Effects: Suspends active fiber till HIP stream has finished its operations. Returns: tuple of stream reference and the HIP stream status

<anchor id="worker"/><link linkend="fiber.worker">Running with worker threads</link> Keep workers running If a worker thread is used but no fiber is created or parts of the framework (like this_fiber::yield()) are touched, then no fiber scheduler is instantiated. auto worker = std::thread( []{ // fiber scheduler not instantiated }); worker.join(); If use_scheduling_algorithm<>() is invoked, the fiber scheduler is created. If the worker thread simply returns, destroys the scheduler and terminates. auto worker = std::thread( []{ // fiber scheduler created boost::fibers::use_scheduling_algorithm<my_fiber_scheduler>(); }); worker.join(); In order to keep the worker thread running, the fiber associated with the thread stack (which is called main fiber) is blocked. For instance the main fiber might wait on a condition_variable. For a gracefully shutdown condition_variable is signalled and the main fiber returns. The scheduler gets destructed if all fibers of the worker thread have been terminated. boost::fibers::mutex mtx; boost::fibers::condition_variable_any cv; auto worker = std::thread( [&mtx,&cv]{ mtx.lock(); // suspend till signalled cv.wait(mtx); mtx.unlock(); }); // signal termination cv.notify_all(); worker.join(); Processing tasks Tasks can be transferred via channels. The worker thread runs a pool of fibers that dequeue and executed tasks from the channel. The termination is signalled via closing the channel. using task = std::function<void()>; boost::fibers::buffered_channel<task> ch{1024}; auto worker = std::thread( [&ch]{ // create pool of fibers for (int i=0; i<10; ++i) { boost::fibers::fiber{ [&ch]{ task tsk; // dequeue and process tasks while (boost::fibers::channel_op_status::closed!=ch.pop(tsk)){ tsk(); } }}.detach(); } task tsk; // dequeue and process tasks while (boost::fibers::channel_op_status::closed!=ch.pop(tsk)){ tsk(); } }); // feed channel with tasks ch.push([]{ ... }); ... // signal termination ch.close(); worker.join(); An alternative is to use a work-stealing scheduler. This kind of scheduling algorithm a worker thread steals fibers from the ready-queue of other worker threads if its own ready-queue is empty. Wait till all worker threads have registered the work-stealing scheduling algorithm. boost::fibers::mutex mtx; boost::fibers::condition_variable_any cv; // start wotrker-thread first auto worker = std::thread( [&mtx,&cv]{ boost::fibers::use_scheduling_algorithm<boost::fibers::algo::work_stealing>(2); mtx.lock(); // suspend main-fiber from the worker thread cv.wait(mtx); mtx.unlock(); }); boost::fibers::use_scheduling_algorithm<boost::fibers::algo::work_stealing>(2); // create fibers with tasks boost::fibers::fiber f{[]{ ... }}; ... // signal termination cv.notify_all(); worker.join(); Because the TIB (thread information block on Windows) is not fully described in the MSDN, it might be possible that not all required TIB-parts are swapped. Using WinFiber implementation might be an alternative (see documentation about implementations fcontext_t, ucontext_t and WinFiber of boost.context).

Pthreads are created with a stack size of 8kB while std::thread's use the system default (1MB - 2MB). The microbenchmark could not be run with 1,000,000 threads because of resource exhaustion (pthread and std::thread). Instead the test runs only at 10,000 threads. time per thread (average over 10,000 - unable to spawn 1,000,000 threads) pthread std::thread std::async 54 µs - 73 µs 52 µs - 73 µs 106 µs - 122 µs

The test utilizes 16 cores with Symmetric MultiThreading enabled (32 logical CPUs). The fiber stacks are allocated by fixedsize_stack. As the benchmark shows, the memory allocation algorithm is significant for performance in a multithreaded environment. The tests use glibc’s memory allocation algorithm (based on ptmalloc2) as well as Google’s TCmalloc (via linkflags="-ltcmalloc"). Tais B. Ferreira, Rivalino Matias, Autran Macedo, Lucio B. Araujo An Experimental Study on Memory Allocators in Multicore and Multithreaded Applications, PDCAT ’11 Proceedings of the 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, pages 92-98 In the work_stealing scheduling algorithm, each thread has its own local queue. Fibers that are ready to run are pushed to and popped from the local queue. If the queue runs out of ready fibers, fibers are stolen from the local queues of other participating threads. time per fiber (average over 1.000.000) fiber (16C/32T, work stealing, tcmalloc) fiber (1C/1T, round robin, tcmalloc) 0.05 µs - 0.09 µs 1.69 µs - 1.79 µs

<anchor id="tuning"/><link linkend="fiber.tuning">Tuning</link> Disable synchronization With BOOST_FIBERS_NO_ATOMICS defined at the compiler’s command line, synchronization between fibers (in different threads) is disabled. This is acceptable if the application is single threaded and/or fibers are not synchronized between threads. Memory allocation Memory allocation algorithm is significant for performance in a multithreaded environment, especially for Boost.Fiber where fiber stacks are allocated on the heap. The default user-level memory allocator (UMA) of glibc is ptmalloc2 but it can be replaced by another UMA that fit better for the concret work-load For instance Google’s TCmalloc enables a better performance at the skynet microbenchmark than glibc’s default memory allocator. Scheduling strategies The fibers in a thread are coordinated by a fiber manager. Fibers trade control cooperatively, rather than preemptively. Depending on the work-load several strategies of scheduling the fibers are possible 1024cores.net: Task Scheduling Strategies that can be implmented on behalf of algorithm. work-stealing: ready fibers are hold in a local queue, when the fiber-scheduler's local queue runs out of ready fibers, it randomly selects another fiber-scheduler and tries to steal a ready fiber from the victim (implemented in work_stealing and numa::work_stealing) work-requesting: ready fibers are hold in a local queue, when the fiber-scheduler's local queue runs out of ready fibers, it randomly selects another fiber-scheduler and requests for a ready fibers, the victim fiber-scheduler sends a ready-fiber back work-sharing: ready fibers are hold in a global queue, fiber-scheduler concurrently push and pop ready fibers to/from the global queue (implemented in shared_work) work-distribution: fibers that became ready are proactivly distributed to idle fiber-schedulers or fiber-schedulers with low load work-balancing: a dedicated (helper) fiber-scheduler periodically collects informations about all fiber-scheduler running in other threads and re-distributes ready fibers among them TTAS locks Boost.Fiber uses internally spinlocks to protect critical regions if fibers running on different threads interact. Spinlocks are implemented as TTAS (test-test-and-set) locks, i.e. the spinlock tests the lock before calling an atomic exchange. This strategy helps to reduce the cache line invalidations triggered by acquiring/releasing the lock. Spin-wait loop A lock is considered under contention, if a thread repeatedly fails to acquire the lock because some other thread was faster. Waiting for a short time lets other threads finish before trying to enter the critical section again. While busy waiting on the lock, relaxing the CPU (via pause/yield mnemonic) gives the CPU a hint that the code is in a spin-wait loop. prevents expensive pipeline flushes (speculatively executed load and compare instructions are not pushed to pipeline) another hardware thread (simultaneous multithreading) can get time slice it does delay a few CPU cycles, but this is necessary to prevent starvation It is obvious that this strategy is useless on single core systems because the lock can only released if the thread gives up its time slice in order to let other threads run. The macro BOOST_FIBERS_SPIN_SINGLE_CORE replaces the CPU hints (pause/yield mnemonic) by informing the operating system (via std::this_thread_yield()) that the thread gives up its time slice and the operating system switches to another thread. Exponential back-off The macro BOOST_FIBERS_RETRY_THRESHOLD determines how many times the CPU iterates in the spin-wait loop before yielding the thread or blocking in futex-wait. The spinlock tracks how many times the thread failed to acquire the lock. The higher the contention, the longer the thread should back-off. A Binary Exponential Backoff algorithm together with a randomized contention window is utilized for this purpose. BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD determines the upper limit of the contention window (expressed as the exponent for basis of two). Speculative execution (hardware transactional memory) Boost.Fiber uses spinlocks to protect critical regions that can be used together with transactional memory (see section Speculative execution). TXS is enabled if property htm=tsx is specified at b2 command-line and BOOST_USE_TSX is applied to the compiler. A TSX-transaction will be aborted if the floating point state is modified inside a critical region. As a consequence floating point operations, e.g. tore/load of floating point related registers during a fiber (context) switch are disabled. NUMA systems Modern multi-socket systems are usually designed as NUMA systems. A suitable fiber scheduler like numa::work_stealing reduces remote memory access (latence). Parameters Parameters that migh be defiend at compiler's command line Parameter Default value Effect on Boost.Fiber BOOST_FIBERS_NO_ATOMICS - no multithreading support, all atomics removed, no synchronization between fibers running in different threads BOOST_FIBERS_SPINLOCK_STD_MUTEX - std::mutex used inside spinlock BOOST_FIBERS_SPINLOCK_TTAS + spinlock with test-test-and-swap on shared variable BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE - spinlock with test-test-and-swap on shared variable, adaptive retries while busy waiting BOOST_FIBERS_SPINLOCK_TTAS_FUTEX - spinlock with test-test-and-swap on shared variable, suspend on futex after certain number of retries BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX - spinlock with test-test-and-swap on shared variable, while busy waiting adaptive retries, suspend on futex certain amount of retries BOOST_FIBERS_SPINLOCK_TTAS + BOOST_USE_TSX - spinlock with test-test-and-swap and speculative execution (Intel TSX required) BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE + BOOST_USE_TSX - spinlock with test-test-and-swap on shared variable, adaptive retries while busy waiting and speculative execution (Intel TSX required) BOOST_FIBERS_SPINLOCK_TTAS_FUTEX + BOOST_USE_TSX - spinlock with test-test-and-swap on shared variable, suspend on futex after certain number of retries and speculative execution (Intel TSX required) BOOST_FIBERS_SPINLOCK_TTAS_ADAPTIVE_FUTEX + BOOST_USE_TSX - spinlock with test-test-and-swap on shared variable, while busy waiting adaptive retries, suspend on futex certain amount of retries and speculative execution (Intel TSX required) BOOST_FIBERS_SPIN_SINGLE_CORE - on single core machines with multiple threads, yield thread (std::this_thread::yield()) after collisions BOOST_FIBERS_RETRY_THRESHOLD 64 max number of retries while busy spinning, the use fallback BOOST_FIBERS_CONTENTION_WINDOW_THRESHOLD 16 max size of collisions window, expressed as exponent for the basis of two BOOST_FIBERS_SPIN_BEFORE_SLEEP0 32 max number of retries that relax the processor before the thread sleeps for 0s BOOST_FIBERS_SPIN_BEFORE_YIELD 64 max number of retries where the thread sleeps for 0s before yield thread (std::this_thread::yield())

<anchor id="custom"/><link linkend="fiber.custom">Customization</link> Overview As noted in the Scheduling section, by default Boost.Fiber uses its own round_robin scheduler for each thread. To control the way Boost.Fiber schedules ready fibers on a particular thread, in general you must follow several steps. This section discusses those steps, whereas Scheduling serves as a reference for the classes involved. The library's fiber manager keeps track of suspended (blocked) fibers. Only when a fiber becomes ready to run is it passed to the scheduler. Of course, if there are fewer than two ready fibers, the scheduler's job is trivial. Only when there are two or more ready fibers does the particular scheduler implementation start to influence the overall sequence of fiber execution. In this section we illustrate a simple custom scheduler that honors an integer fiber priority. We will implement it such that a fiber with higher priority is preferred over a fiber with lower priority. Any fibers with equal priority values are serviced on a round-robin basis. The full source code for the examples below is found in priority.cpp. Custom Property Class The first essential point is that we must associate an integer priority with each fiber. A previous version of the Fiber library implicitly tracked an int priority for each fiber, even though the default scheduler ignored it. This has been dropped, since the library now supports arbitrary scheduler-specific fiber properties. One might suggest deriving a custom fiber subclass to store such properties. There are a couple of reasons for the present mechanism. Boost.Fiber provides a number of different ways to launch a fiber. (Consider fibers::async().) Higher-level libraries might introduce additional such wrapper functions. A custom scheduler must associate its custom properties with every fiber in the thread, not only the ones explicitly launched by instantiating a custom fiber subclass. Consider a large existing program that launches fibers in many different places in the code. We discover a need to introduce a custom scheduler for a particular thread. If supporting that scheduler's custom properties required a particular fiber subclass, we would have to hunt down and modify every place that launches a fiber on that thread. The fiber class is actually just a handle to internal context data. A subclass of fiber would not add data to context. The present mechanism allows you to drop in a custom scheduler with its attendant custom properties without altering the rest of your application. Instead of deriving a custom scheduler fiber properties subclass from fiber, you must instead derive it from fiber_properties. class priority_props : public boost::fibers::fiber_properties { public: priority_props( boost::fibers::context * ctx): fiber_properties( ctx), priority_( 0) { } int get_priority() const { return priority_; } // Call this method to alter priority, because we must notify // priority_scheduler of any change. void set_priority( int p) { // Of course, it's only worth reshuffling the queue and all if we're // actually changing the priority. if ( p != priority_) { priority_ = p; notify(); } } // The fiber name of course is solely for purposes of this example // program; it has nothing to do with implementing scheduler priority. // This is a public data member -- not requiring set/get access methods -- // because we need not inform the scheduler of any change. std::string name; private: int priority_; }; Your subclass constructor must accept a context* and pass it to the fiber_properties constructor. Provide read access methods at your own discretion. It's important to call notify() on any change in a property that can affect the scheduler's behavior. Therefore, such modifications should only be performed through an access method. A property that does not affect the scheduler does not need access methods. Custom Scheduler Class Now we can derive a custom scheduler from algorithm_with_properties<>, specifying our custom property class priority_props as the template parameter. class priority_scheduler : public boost::fibers::algo::algorithm_with_properties< priority_props > { private: typedef boost::fibers::scheduler::ready_queue_type rqueue_t; rqueue_t rqueue_; std::mutex mtx_{}; std::condition_variable cnd_{}; bool flag_{ false }; public: priority_scheduler() : rqueue_() { } // For a subclass of algorithm_with_properties<>, it's important to // override the correct awakened() overload. virtual void awakened( boost::fibers::context * ctx, priority_props & props) noexcept { int ctx_priority = props.get_priority(); // With this scheduler, fibers with higher priority values are // preferred over fibers with lower priority values. But fibers with // equal priority values are processed in round-robin fashion. So when // we're handed a new context*, put it at the end of the fibers // with that same priority. In other words: search for the first fiber // in the queue with LOWER priority, and insert before that one. rqueue_t::iterator i( std::find_if( rqueue_.begin(), rqueue_.end(), [ctx_priority,this]( boost::fibers::context & c) { return properties( &c ).get_priority() < ctx_priority; })); // Now, whether or not we found a fiber with lower priority, // insert this new fiber here. rqueue_.insert( i, * ctx); } virtual boost::fibers::context * pick_next() noexcept { // if ready queue is empty, just tell caller if ( rqueue_.empty() ) { return nullptr; } boost::fibers::context * ctx( & rqueue_.front() ); rqueue_.pop_front(); return ctx; } virtual bool has_ready_fibers() const noexcept { return ! rqueue_.empty(); } virtual void property_change( boost::fibers::context * ctx, priority_props & props) noexcept { // Although our priority_props class defines multiple properties, only // one of them (priority) actually calls notify() when changed. The // point of a property_change() override is to reshuffle the ready // queue according to the updated priority value. // 'ctx' might not be in our queue at all, if caller is changing the // priority of (say) the running fiber. If it's not there, no need to // move it: we'll handle it next time it hits awakened(). if ( ! ctx->ready_is_linked()) { return; } // Found ctx: unlink it ctx->ready_unlink(); // Here we know that ctx was in our ready queue, but we've unlinked // it. We happen to have a method that will (re-)add a context* to the // right place in the ready queue. awakened( ctx, props); } void suspend_until( std::chrono::steady_clock::time_point const& time_point) noexcept { if ( (std::chrono::steady_clock::time_point::max)() == time_point) { std::unique_lock< std::mutex > lk( mtx_); cnd_.wait( lk, [this](){ return flag_; }); flag_ = false; } else { std::unique_lock< std::mutex > lk( mtx_); cnd_.wait_until( lk, time_point, [this](){ return flag_; }); flag_ = false; } } void notify() noexcept { std::unique_lock< std::mutex > lk( mtx_); flag_ = true; lk.unlock(); cnd_.notify_all(); } }; See ready_queue_t. You must override the algorithm_with_properties::awakened() method. This is how your scheduler receives notification of a fiber that has become ready to run. props is the instance of priority_props associated with the passed fiber ctx. You must override the algorithm_with_properties::pick_next() method. This is how your scheduler actually advises the fiber manager of the next fiber to run. You must override algorithm_with_properties::has_ready_fibers() to inform the fiber manager of the state of your ready queue. Overriding algorithm_with_properties::property_change() is optional. This override handles the case in which the running fiber changes the priority of another ready fiber: a fiber already in our queue. In that case, move the updated fiber within the queue. Your property_change() override must be able to handle the case in which the passed ctx is not in your ready queue. It might be running, or it might be blocked. Our example priority_scheduler doesn't override algorithm_with_properties::new_properties(): we're content with allocating priority_props instances on the heap. Replace Default Scheduler You must call use_scheduling_algorithm() at the start of each thread on which you want Boost.Fiber to use your custom scheduler rather than its own default round_robin. Specifically, you must call use_scheduling_algorithm() before performing any other Boost.Fiber operations on that thread. int main( int argc, char *argv[]) { // make sure we use our priority_scheduler rather than default round_robin boost::fibers::use_scheduling_algorithm< priority_scheduler >(); ... } Use Properties The running fiber can access its own fiber_properties subclass instance by calling this_fiber::properties(). Although properties<>() is a nullary function, you must pass, as a template parameter, the fiber_properties subclass. boost::this_fiber::properties< priority_props >().name = "main"; Given a fiber instance still connected with a running fiber (that is, not fiber::detach()ed), you may access that fiber's properties using fiber::properties(). As with boost::this_fiber::properties<>(), you must pass your fiber_properties subclass as the template parameter. template< typename Fn > boost::fibers::fiber launch( Fn && func, std::string const& name, int priority) { boost::fibers::fiber fiber( func); priority_props & props( fiber.properties< priority_props >() ); props.name = name; props.set_priority( priority); return fiber; } Launching a new fiber schedules that fiber as ready, but does not immediately enter its fiber-function. The current fiber retains control until it blocks (or yields, or terminates) for some other reason. As shown in the launch() function above, it is reasonable to launch a fiber and immediately set relevant properties -- such as, for instance, its priority. Your custom scheduler can then make use of this information next time the fiber manager calls algorithm_with_properties::pick_next().

distinction between coroutines and fibers The fiber library extends the coroutine library by adding a scheduler and synchronization mechanisms. a coroutine yields a fiber blocks When a coroutine yields, it passes control directly to its caller (or, in the case of symmetric coroutines, a designated other coroutine). When a fiber blocks, it implicitly passes control to the fiber scheduler. Coroutines have no scheduler because they need no scheduler. 'N4024: Distinguishing coroutines and fibers' . what about transactional memory GCC supports transactional memory since version 4.7. Unfortunately tests show that transactional memory is slower (ca. 4x) than spinlocks using atomics. Once transactional memory is improved (supporting hybrid tm), spinlocks will be replaced by __transaction_atomic{} statements surrounding the critical sections. synchronization between fibers running in different threads Synchronization classes from Boost.Thread block the entire thread. In contrast, the synchronization classes from Boost.Fiber block only specific fibers, so that the scheduler can still keep the thread busy running other fibers in the meantime. The synchronization classes from Boost.Fiber are designed to be thread-safe, i.e. it is possible to synchronize fibers running in different threads as well as fibers running in the same thread. (However, there is a build option to disable cross-thread fiber synchronization support; see this description.) spurious wakeup Spurious wakeup can happen when using std::condition_variable: the condition variable appears to be have been signaled while the awaited condition may still be false. Spurious wakeup can happen repeatedly and is caused on some multiprocessor systems where making std::condition_variable wakeup completely predictable would slow down all std::condition_variable operations. David R. Butenhof Programming with POSIX Threads condition_variable is not subject to spurious wakeup. Nonetheless it is prudent to test the business-logic condition in a wait() loop — or, equivalently, use one of the

wait( lock,
      predicate )

overloads. See also No Spurious Wakeups. migrating fibers between threads Support for migrating fibers between threads has been integrated. The user-defined scheduler must call context::detach() on a fiber-context on the source thread and context::attach() on the destination thread, passing the fiber-context to migrate. (For more information about custom schedulers, see Customization.) Examples work_sharing and work_stealing in directory examples might be used as a blueprint. See also Migrating fibers between threads. support for Boost.Asio Support for Boost.Asio’s async-result is not part of the official API. However, to integrate with a boost::asio::io_service, see Sharing a Thread with Another Main Loop. To interface smoothly with an arbitrary Asio async I/O operation, see Then There’s Boost.Asio. tested compilers The library was tested with GCC-5.1.1, Clang-3.6.0 and MSVC-14.0 in c++11-mode. supported architectures Boost.Fiber depends on Boost.Context - the list of supported architectures can be found here.