123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295 |
- [library Boost.Lockfree
- [quickbook 1.4]
- [authors [Blechmann, Tim]]
- [copyright 2008-2011 Tim Blechmann]
- [category algorithms]
- [purpose
- lockfree concurrent data structures
- ]
- [id lockfree]
- [dirname lockfree]
- [license
- Distributed under the Boost Software License, Version 1.0.
- (See accompanying file LICENSE_1_0.txt or copy at
- [@http://www.boost.org/LICENSE_1_0.txt])
- ]
- ]
- [c++]
- [/ Images ]
- [def _note_ [$images/note.png]]
- [def _alert_ [$images/caution.png]]
- [def _detail_ [$images/note.png]]
- [def _tip_ [$images/tip.png]]
- [/ Links ]
- [def _lockfree_ [^boost.lockfree]]
- [section Introduction & Motivation]
- [h2 Introduction & Terminology]
- The term *non-blocking* denotes concurrent data structures, which do not use traditional synchronization primitives like
- guards to ensure thread-safety. Maurice Herlihy and Nir Shavit (compare [@http://books.google.com/books?id=pFSwuqtJgxYC
- "The Art of Multiprocessor Programming"]) distinguish between 3 types of non-blocking data structures, each having different
- properties:
- * data structures are *wait-free*, if every concurrent operation is guaranteed to be finished in a finite number of
- steps. It is therefore possible to give worst-case guarantees for the number of operations.
- * data structures are *lock-free*, if some concurrent operations are guaranteed to be finished in a finite number of
- steps. While it is in theory possible that some operations never make any progress, it is very unlikely to happen in
- practical applications.
- * data structures are *obstruction-free*, if a concurrent operation is guaranteed to be finished in a finite number of
- steps, unless another concurrent operation interferes.
- Some data structures can only be implemented in a lock-free manner, if they are used under certain restrictions. The
- relevant aspects for the implementation of _lockfree_ are the number of producer and consumer threads. *Single-producer*
- (*sp*) or *multiple producer* (*mp*) means that only a single thread or multiple concurrent threads are allowed to add
- data to a data structure. *Single-consumer* (*sc*) or *Multiple-consumer* (*mc*) denote the equivalent for the removal
- of data from the data structure.
- [h2 Properties of Non-Blocking Data Structures]
- Non-blocking data structures do not rely on locks and mutexes to ensure thread-safety. The synchronization is done completely in
- user-space without any direct interaction with the operating system [footnote Spinlocks do not
- directly interact with the operating system either. However it is possible that the owning thread is preempted by the
- operating system, which violates the lock-free property.]. This implies that they are not prone to issues like priority
- inversion (a low-priority thread needs to wait for a high-priority thread).
- Instead of relying on guards, non-blocking data structures require *atomic operations* (specific CPU instructions executed
- without interruption). This means that any thread either sees the state before or after the operation, but no
- intermediate state can be observed. Not all hardware supports the same set of atomic instructions. If it is not
- available in hardware, it can be emulated in software using guards. However this has the obvious drawback of losing the
- lock-free property.
- [h2 Performance of Non-Blocking Data Structures]
- When discussing the performance of non-blocking data structures, one has to distinguish between *amortized* and
- *worst-case* costs. The definition of 'lock-free' and 'wait-free' only mention the upper bound of an operation. Therefore
- lock-free data structures are not necessarily the best choice for every use case. In order to maximise the throughput of an
- application one should consider high-performance concurrent data structures [footnote
- [@http://threadingbuildingblocks.org/ Intel's Thread Building Blocks library] provides many efficient concurrent data structures,
- which are not necessarily lock-free.].
- Lock-free data structures will be a better choice in order to optimize the latency of a system or to avoid priority inversion,
- which may be necessary in real-time applications. In general we advise to consider if lock-free data structures are necessary or if
- concurrent data structures are sufficient. In any case we advice to perform benchmarks with different data structures for a
- specific workload.
- [h2 Sources of Blocking Behavior]
- Apart from locks and mutexes (which we are not using in _lockfree_ anyway), there are three other aspects, that could violate
- lock-freedom:
- [variablelist
- [[Atomic Operations]
- [Some architectures do not provide the necessary atomic operations in natively in hardware. If this is not
- the case, they are emulated in software using spinlocks, which by itself is blocking.
- ]
- ]
- [[Memory Allocations]
- [Allocating memory from the operating system is not lock-free. This makes it impossible to implement true
- dynamically-sized non-blocking data structures. The node-based data structures of _lockfree_ use a memory pool to allocate the
- internal nodes. If this memory pool is exhausted, memory for new nodes has to be allocated from the operating system. However
- all data structures of _lockfree_ can be configured to avoid memory allocations (instead the specific calls will fail).
- This is especially useful for real-time systems that require lock-free memory allocations.
- ]
- ]
- [[Exception Handling]
- [The C++ exception handling does not give any guarantees about its real-time behavior. We therefore do
- not encourage the use of exceptions and exception handling in lock-free code.]
- ]
- ]
- [h2 Data Structures]
- _lockfree_ implements three lock-free data structures:
- [variablelist
- [[[classref boost::lockfree::queue]]
- [a lock-free multi-produced/multi-consumer queue]
- ]
- [[[classref boost::lockfree::stack]]
- [a lock-free multi-produced/multi-consumer stack]
- ]
- [[[classref boost::lockfree::spsc_queue]]
- [a wait-free single-producer/single-consumer queue (commonly known as ringbuffer)]
- ]
- ]
- [h3 Data Structure Configuration]
- The data structures can be configured with [@boost:/libs/parameter/doc/html/index.html Boost.Parameter]-style templates:
- [variablelist
- [[[classref boost::lockfree::fixed_sized]]
- [Configures the data structure as *fixed sized*. The internal nodes are stored inside an array and they are addressed by
- array indexing. This limits the possible size of the queue to the number of elements that can be addressed by the index
- type (usually 2**16-2), but on platforms that lack double-width compare-and-exchange instructions, this is the best way
- to achieve lock-freedom.
- ]
- ]
- [[[classref boost::lockfree::capacity]]
- [Sets the *capacity* of a data structure at compile-time. This implies that a data structure is fixed-sized.
- ]
- ]
- [[[classref boost::lockfree::allocator]]
- [Defines the allocator. _lockfree_ supports stateful allocator and is compatible with [@boost:/libs/interprocess/index.html Boost.Interprocess] allocators.]
- ]
- ]
- [endsect]
- [section Examples]
- [h2 Queue]
- The [classref boost::lockfree::queue boost::lockfree::queue] class implements a multi-writer/multi-reader queue. The
- following example shows how integer values are produced and consumed by 4 threads each:
- [import ../examples/queue.cpp]
- [queue_example]
- The program output is:
- [pre
- produced 40000000 objects.
- consumed 40000000 objects.
- ]
- [h2 Stack]
- The [classref boost::lockfree::stack boost::lockfree::stack] class implements a multi-writer/multi-reader stack. The
- following example shows how integer values are produced and consumed by 4 threads each:
- [import ../examples/stack.cpp]
- [stack_example]
- The program output is:
- [pre
- produced 4000000 objects.
- consumed 4000000 objects.
- ]
- [h2 Waitfree Single-Producer/Single-Consumer Queue]
- The [classref boost::lockfree::spsc_queue boost::lockfree::spsc_queue] class implements a wait-free single-producer/single-consumer queue. The
- following example shows how integer values are produced and consumed by 2 separate threads:
- [import ../examples/spsc_queue.cpp]
- [spsc_queue_example]
- The program output is:
- [pre
- produced 10000000 objects.
- consumed 10000000 objects.
- ]
- [endsect]
- [section Rationale]
- [section Data Structures]
- The implementations are implementations of well-known data structures. The queue is based on
- [@http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Michael Scott and Maged Michael],
- the stack is based on [@http://books.google.com/books?id=YQg3HAAACAAJ Systems programming: coping with parallelism by R. K. Treiber]
- and the spsc_queue is considered as 'folklore' and is implemented in several open-source projects including the linux kernel. All
- data structures are discussed in detail in [@http://books.google.com/books?id=pFSwuqtJgxYC "The Art of Multiprocessor Programming" by Herlihy & Shavit].
- [endsect]
- [section Memory Management]
- The lock-free [classref boost::lockfree::queue] and [classref boost::lockfree::stack] classes are node-based data structures,
- based on a linked list. Memory management of lock-free data structures is a non-trivial problem, because we need to avoid that
- one thread frees an internal node, while another thread still uses it. _lockfree_ uses a simple approach not returning any memory
- to the operating system. Instead they maintain a *free-list* in order to reuse them later. This is done for two reasons:
- first, depending on the implementation of the memory allocator freeing the memory may block (so the implementation would not
- be lock-free anymore), and second, most memory reclamation algorithms are patented.
- [endsect]
- [section ABA Prevention]
- The ABA problem is a common problem when implementing lock-free data structures. The problem occurs when updating an atomic
- variable using a =compare_exchange= operation: if the value A was read, thread 1 changes it to say C and tries to update
- the variable, it uses =compare_exchange= to write C, only if the current value is A. This might be a problem if in the meanwhile
- thread 2 changes the value from A to B and back to A, because thread 1 does not observe the change of the state. The common way to
- avoid the ABA problem is to associate a version counter with the value and change both atomically.
- _lockfree_ uses a =tagged_ptr= helper class which associates a pointer with an integer tag. This usually requires a double-width
- =compare_exchange=, which is not available on all platforms. IA32 did not provide the =cmpxchg8b= opcode before the pentium
- processor and it is also lacking on many RISC architectures like PPC. Early X86-64 processors also did not provide a =cmpxchg16b=
- instruction.
- On 64bit platforms one can work around this issue, because often not the full 64bit address space is used. On X86_64 for example,
- only 48bit are used for the address, so we can use the remaining 16bit for the ABA prevention tag. For details please consult the
- implementation of the =boost::lockfree::detail::tagged_ptr= class.
- For lock-free operations on 32bit platforms without double-width =compare_exchange=, we support a third approach: by using a
- fixed-sized array to store the internal nodes we can avoid the use of 32bit pointers, but instead 16bit indices into the array
- are sufficient. However this is only possible for fixed-sized data structures, that have an upper bound of internal nodes.
- [endsect]
- [section Interprocess Support]
- The _lockfree_ data structures have basic support for [@boost:/libs/interprocess/index.html Boost.Interprocess]. The only
- problem is the blocking emulation of lock-free atomics, which in the current implementation is not guaranteed to be interprocess-safe.
- [endsect]
- [endsect]
- [xinclude autodoc.xml]
- [section Appendices]
- [section Supported Platforms & Compilers]
- _lockfree_ has been tested on the following platforms:
- * g++ 4.4, 4.5 and 4.6, linux, x86 & x86_64
- * clang++ 3.0, linux, x86 & x86_64
- [endsect]
- [section Future Developments]
- * More data structures (set, hash table, dequeue)
- * Backoff schemes (exponential backoff or elimination)
- [endsect]
- [section References]
- # [@http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Michael Scott and Maged Michael],
- In Symposium on Principles of Distributed Computing, pages 267–275, 1996.
- # [@http://books.google.com/books?id=pFSwuqtJgxYC M. Herlihy & Nir Shavit. The Art of Multiprocessor Programming], Morgan Kaufmann Publishers, 2008
- [endsect]
- [endsect]
|