123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503 |
- [/
- Copyright (c) 2009-2009 Joachim Faulhaber
- Distributed under the Boost Software License, Version 1.0.
- (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt)
- ]
- [section Projects]
- ['*Projects*] are examples on the usage of interval containers
- that go beyond small toy snippets of code. The code presented
- here addresses more serious applications that approach the
- quality of real world programming. At the same time it aims to
- guide the reader more deeply into various aspects of
- the library. In order not to overburden the reader with implementation
- details, the code in ['*projects*] tries to be ['*minimal*]. It has a focus on
- the main aspects of the projects and is not intended to be complete
- and mature like the library code itself. Cause it's minimal,
- project code lives in `namespace mini`.
- [/
- [section Overview]
- [table Overview over Icl Examples
- [[level] [example] [classes] [features]]
- [[basic] [[link boost_icl.examples.large_bitset Large Bitset]]
- [__itv_map__][The most simple application of an interval map:
- Counting the overlaps of added intervals.]]
- ]
- [endsect][/ Overview IN Projects]
- ]
- [section Large Bitset][/ IN Projects]
- Bitsets are just sets. Sets of unsigned integrals,
- to be more precise.
- The prefix ['*bit*] usually only indicates,
- that the representation of those sets is organized in a compressed
- form that exploits the fact, that we can switch on an off single
- bits in machine words. Bitsets are therefore known to be very small
- and thus efficient.
- The efficiency of bitsets is usually coupled to the
- precondition that the range of values of elements
- is relatively small, like [0..32) or [0..64), values that can
- be typically represented in single or a small number of
- machine words. If we wanted to represent a set containing
- two values {1, 1000000}, we would be much better off using
- other sets like e.g. an `std::set`.
- Bitsets compress well, if elements spread over narrow ranges
- only. Interval sets compress well, if many elements are clustered
- over intervals. They can span large sets very efficiently then.
- In project ['*Large Bitset*] we want to ['*combine the bit compression
- and the interval compression*] to achieve a set implementation,
- that is capable of spanning large chunks of contiguous elements
- using intervals and also to represent more narrow ['nests] of varying
- bit sequences using bitset compression.
- As we will see, this can be achieved using only a small
- amount of code because most of the properties we need
- are provided by an __itv_map__ of `bitsets`:
- ``
- typedef interval_map<IntegralT, SomeBitSet<N>, partial_absorber,
- std::less, inplace_bit_add, inplace_bit_and> IntervalBitmap;
- ``
- Such an `IntervalBitmap` represents `k*N` bits for every segment.
- ``
- [a, a+k)->'1111....1111' // N bits associated: Represents a total of k*N bits.
- ``
- For the interval `[a, a+k)` above all bits are set.
- But we can also have individual ['nests] or ['clusters]
- of bitsequences.
- ``
- [b, b+1)->'01001011...1'
- [b+1,b+2)->'11010001...0'
- . . .
- ``
- and we can span intervals of equal bit sequences that represent
- periodic patterns.
- ``
- [c,d)->'010101....01' // Every second bit is set in range [c,d)
- [d,e)->'001100..0011' // Every two bits alterate in range [d,e)
- [e,f)->'bit-sequence' // 'bit-sequence' reoccurs every N bits in range [e,f)
- ``
- An `IntervalBitmap` can represent
- `N*(2^M)` elements, if `M` is the number of bits of the
- integral type `IntegralT`. Unlike bitsets, that usually
- represent ['*unsigned*] integral numbers, large_bitset may
- range over negative numbers as well.
- There are fields where such large
- bitsets implementations are needed. E.g. for the compact
- representation of large file allocation tables.
- What remains to be done for project *Large Bitset* is
- to code a wrapper `class large_bitset` around `IntervalBitmap`
- so that `large_bitset` looks and feels like a usual
- set class.
- [section Using large_bitset]
- To quicken your appetite for a look at the implementation
- here are a few use cases first. Within the examples that follow,
- we will use `nat`[^['k]] for unsigned integrals
- and `bits`[^['k]] for bitsets containing [^['k]] bits.
- Let's start large.
- In the first example . . .
- [import ../example/large_bitset_/large_bitset.cpp]
- [large_bitset_test_large_set_all]
- . . . we are testing the limits. First we set all bits and
- then we switch off the very last bit.
- [large_bitset_test_large_erase_last]
- Program output (/a little beautified/):
- ``
- ----- Test function test_large() -----------------------------------------------
- We have just turned on the awesome amount of 18,446,744,073,709,551,616 bits ;-)
- [ 0, 288230376151711744) -> 1111111111111111111111111111111111111111111111111111111111111111
- ---- Let's swich off the very last bit -----------------------------------------
- [ 0, 288230376151711743) -> 1111111111111111111111111111111111111111111111111111111111111111
- [288230376151711743, 288230376151711744) -> 1111111111111111111111111111111111111111111111111111111111111110
- ---- Venti is plenty ... let's do something small: A tall ----------------------
- ``
- More readable is a smaller version of `large_bitset`.
- In function `test_small()` we apply a few more operations . . .
- [large_bitset_test_small]
- . . . producing this output:
- ``
- ----- Test function test_small() -----------
- -- Switch on all bits in range [0,64] ------
- [0,8)->11111111
- [8,9)->10000000
- --------------------------------------------
- -- Turn off bits: 25,27,28 -----------------
- [0,3)->11111111
- [3,4)->10100111
- [4,8)->11111111
- [8,9)->10000000
- --------------------------------------------
- -- Flip bits in range [24,30) --------------
- [0,3)->11111111
- [3,4)->01011011
- [4,8)->11111111
- [8,9)->10000000
- --------------------------------------------
- -- Remove the first 10 bits ----------------
- [1,2)->00111111
- [2,3)->11111111
- [3,4)->01011011
- [4,8)->11111111
- [8,9)->10000000
- -- Remove even bits in range [0,72) --------
- [1,2)->00010101
- [2,3)->01010101
- [3,4)->01010001
- [4,8)->01010101
- -- Set odd bits in range [0,72) --------
- [0,9)->01010101
- --------------------------------------------
- ``
- Finally, we present a little /picturesque/ example,
- that demonstrates that `large_bitset` can also serve
- as a self compressing bitmap, that we can 'paint' with.
- [large_bitset_test_picturesque]
- Note that we have two `large_bitsets` for the /outline/
- and the /interior/. Both parts are compressed but we can
- compose both by `operator +`, because the right /positions/
- are provided. This is the program output:
- ``
- ----- Test function test_picturesque() -----
- -------- empty face: 3 intervals -----
- ********
- * *
- * *
- * *
- * *
- ******
- -------- compressed smile: 2 intervals -----
- * *
- ****
- -------- staring bitset: 6 intervals -----
- ********
- * *
- * * * *
- * *
- * **** *
- ******
- --------------------------------------------
- ``
- So, may be you are curious how this class template
- is coded on top of __itv_map__ using only about 250 lines
- of code. This is shown in the sections that follow.
- [endsect][/ Usage of a large_bitset]
- [section The interval_bitmap]
- To begin, let's look at the basic data type again, that
- will be providing the major functionality:
- ``
- typedef interval_map<DomainT, BitSetT, partial_absorber,
- std::less, inplace_bit_add, inplace_bit_and> IntervalBitmap;
- ``
- `DomainT` is supposed to be an integral
- type, the bitset type `BitSetT` will be a wrapper class around an
- unsigned integral type. `BitSetT` has to implement bitwise operators
- that will be called by the functors `inplace_bit_add<BitSetT>`
- and `inplace_bit_and<BitSetT>`.
- The type trait of interval_map is `partial_absorber`, which means
- that it is /partial/ and that empty `BitSetTs` are not stored in
- the map. This is desired and keeps the `interval_map` minimal, storing
- only bitsets, that contain at least one bit switched on.
- Functor template `inplace_bit_add` for parameter `Combine` indicates that we
- do not expect `operator +=` as addition but the bitwise operator
- `|=`. For template parameter `Section` which is instaniated by
- `inplace_bit_and` we expect the bitwise `&=` operator.
- [endsect][/ section The interval_bitmap]
- [section A class implementation for the bitset type]
- The code of the project is enclosed in a `namespace mini`.
- The name indicates, that the implementation is a /minimal/
- example implementation. The name of the bitset class will
- be `bits` or `mini::bits` if qualified.
- To be used as a codomain parameter of class template __itv_map__,
- `mini::bits` has to implement all the functions that are required
- for a codomain_type in general, which are the default constructor `bits()`
- and an equality `operator==`. Moreover `mini::bits` has to implement operators
- required by the instantiations for parameter `Combine` and `Section`
- which are `inplace_bit_add` and `inplace_bit_and`. From functors
- `inplace_bit_add` and `inplace_bit_and` there are inverse functors
- `inplace_bit_subtract` and `inplace_bit_xor`. Those functors
- use operators `|= &= ^=` and `~`. Finally if we want to apply
- lexicographical and subset comparison on large_bitset, we also
- need an `operator <`. All the operators that we need can be implemented
- for `mini::bits` on a few lines:
- [import ../example/large_bitset_/bits.hpp]
- [mini_bits_class_bits]
- Finally there is one important piece of meta information, we have
- to provide: `mini::bits` has to be recognized as a `Set` by the
- icl code. Otherwise we can not exploit the fact that a map of
- sets is model of `Set` and the resulting large_bitset would not
- behave like a set. So we have to say that `mini::bits` shall be sets:
- [mini_bits_is_set]
- This is done by adding a partial template specialization to
- the type trait template `icl::is_set`.
- For the extension of this type trait template and the result
- values of inclusion_compare we need these `#includes` for the
- implementation of `mini::bits`:
-
- [mini_bits_includes]
-
- [endsect][/ A class implementation for the bitset type]
- [section Implementation of a large bitset]
- Having finished our `mini::bits` implementation, we can start to
- code the wrapper class that hides the efficient interval map of mini::bits
- and exposes a simple and convenient set behavior to the world of users.
- Let's start with the required `#include`s this time:
- [import ../example/large_bitset_/large_bitset.hpp]
- [large_bitset_includes]
- Besides `boost/icl/interval_map.hpp` and `bits.hpp` the most important
- include here is `boost/operators.hpp`. We use this library in order
- to further minimize the code and to provide pretty extensive operator
- functionality using very little code.
- For a short and concise naming of the most important unsigned integer
- types and the corresponding `mini::bits` we define this:
- [large_bitset_natural_typedefs]
- [section large_bitset: Public interface][/ ------------------------------------]
- And now let's code `large_bitset`.
- [large_bitset_class_template_header]
- The first template parameter `DomainT` will be instantiated with
- an integral type that defines the kind of numbers that can
- be elements of the set. Since we want to go for a large set we
- use `nat64` as default which is a 64 bit unsigned integer ranging
- from `0` to `2^64-1`. As bitset parameter we also choose a 64-bit
- default. Parameters `Combine` and `Interval` are necessary to
- be passed to dependent type expressions. An allocator can be
- chosen, if desired.
- The nested list of private inheritance contains groups of template
- instantiations from
- [@http://www.boost.org/doc/libs/1_39_0/libs/utility/operators.htm Boost.Operator],
- that provides derivable operators from more fundamental once.
- Implementing the fundamental operators, we get the derivable ones
- /for free/. Below is a short overview of what we get using Boost.Operator,
- where __S stands for `large_bitset`, __i for it's `interval_type`
- and __e for it's `domain_type` or `element_type`.
- [table
- [[Group ][fundamental] [derivable ]]
- [[Equality, ordering ][`==`] [`!=` ]]
- [[ ][`<` ] [`> <= >=` ]]
- [[Set operators (__S x __S)][`+= |= -= &= ^=` ][`+ | - & ^` ]]
- [[Set operators (__S x __e)][`+= |= -= &= ^=` ][`+ | - & ^` ]]
- [[Set operators (__S x __i)][`+= |= -= &= ^=` ][`+ | - & ^` ]]
- ]
- There is a number of associated types
- [large_bitset_associated_types]
- most importantly the implementing `interval_bitmap_type` that is used
- for the implementing container.
- [large_bitmap_impl_map]
- In order to use
- [@http://www.boost.org/doc/libs/1_39_0/libs/utility/operators.htm Boost.Operator]
- we have to implement the fundamental operators as class members. This can be
- done quite schematically.
- [large_bitset_operators]
- As we can see, the seven most important operators that work on the
- class type `large_bitset` can be directly implemented by propagating
- the operation to the implementing `_map`
- of type `interval_bitmap_type`. For the operators that work on segment and
- element types, we use member functions `add`, `subtract`, `intersect` and
- `flip`. As we will see only a small amount of adaper code is needed
- to couple those functions with the functionality of the implementing
- container.
- Member functions `add`, `subtract`, `intersect` and `flip`,
- that allow to combine ['*intervals*] to `large_bitsets` can
- be uniformly implemented using a private function
- `segment_apply` that applies /addition/, /subtraction/,
- /intersection/ or /symmetric difference/, after having
- translated the interval's borders into the right bitset
- positions.
- [large_bitset_fundamental_functions]
- In the sample programs, that we will present to demonstrate
- the capabilities of `large_bitset` we will need a few additional
- functions specifically output functions in two different
- flavors.
- [large_bitset_demo_functions]
- * The first one, `show_segments()` shows the container
- content as it is implemented, in the compressed form.
- * The second function `show_matrix` shows the complete
- matrix of bits that is represented by the container.
- [endsect][/ large_bitset: Public interface]
- [section large_bitset: Private implementation] [/ -------------------------------------]
- In order to implement operations like the addition of
- an element say `42` to
- the large bitset, we need to translate the /value/ to the
- /position/ of the associated *bit* representing `42` in the
- interval container of bitsets. As an example, suppose we
- use a
- ``
- large_bitset<nat, mini::bits8> lbs;
- ``
- that carries small bitsets of 8 bits only.
- The first four interval of `lbs` are assumed to
- be associated with some bitsets. Now we want to add the interval
- `[a,b]==[5,27]`. This will result in the following situation:
- ``
- [0,1)-> [1,2)-> [2,3)-> [3,4)->
- [00101100][11001011][11101001][11100000]
- + [111 11111111 11111111 1111] [5,27] as bitset
- a b
-
- => [0,1)-> [1,3)-> [3,4)->
- [00101111][11111111][11110000]
- ``
- So we have to convert values 5 and 27 into a part that
- points to the interval and a part that refers to the
- position within the interval, which is done by a
- /division/ and a /modulo/ computation. (In order to have a
- consistent representation of the bitsequences across the containers,
- within this project, bitsets are denoted with the
- ['*least significant bit on the left!*])
- ``
- A = a/8 = 5/8 = 0 // refers to interval
- B = b/8 = 27/8 = 3
- R = a%8 = 5%8 = 5 // refers to the position in the associated bitset.
- S = b%8 = 27%8 = 3
- ``
- All /division/ and /modulo operations/ needed here are always done
- using a divisor `d` that is a power of `2`: `d = 2^x`. Therefore
- division and modulo can be expressed by bitset operations.
- The constants needed for those bitset computations are defined here:
-
- [large_bitset_impl_constants]
- Looking at the example again, we can see that we have to identify the
- positions of the beginning and ending of the interval [5,27] that is
- to insert, and then ['*subdivide*] that range of bitsets into three partitions.
- # The bitset where the interval starts.
- # the bitset where the interval ends
- # The bitsets that are completely overlapped by the interval
- ``
- combine interval [5,27] to large_bitset lbs w.r.t. some operation o
- [0,1)-> [1,2)-> [2,3)-> [3,4)->
- [00101100][11001011][11101001][11100000]
- o [111 11111111 11111111 1111]
- a b
- subdivide:
- [first! ][mid_1] . . .[mid_n][ !last]
- [00000111][1...1] . . .[1...1][11110000]
- ``
- After subdividing, we perform the operation `o` as follows:
- # For the first bitset: Set all bits from ther starting bit (`!`)
- to the end of the bitset to `1`. All other bits are `0`. Then
- perform operation `o`: `_map o= ([0,1)->00000111)`
-
- # For the last bitset: Set all bits from the beginning of the bitset
- to the ending bit (`!`) to `1`. All other bits are `0`. Then
- perform operation `o`: `_map o= ([3,4)->11110000)`
-
- # For the range of bitsets in between the staring and ending one,
- perform operation `o`: `_map o= ([1,3)->11111111)`
- The algorithm, that has been outlined and illustrated by the
- example, is implemented by the private member function
- `segment_apply`. To make the combiner operation a variable
- in this algorithm, we use a /pointer to member function type/
- [large_bitset_segment_combiner]
- as first function argument. We will pass member functions `combine_` here,
- ``
- combine_(first_of_interval, end_of_interval, some_bitset);
- ``
- that take the beginning and ending of an interval and a bitset and combine
- them to the implementing `interval_bitmap_type _map`. Here are these functions:
- [large_bitmap_combiners]
- Finally we can code function `segment_apply`, that does the partitioning
- and subsequent combining:
- [large_bitset_segment_apply]
- The functions that help filling bitsets to and from a given bit are
- implemented here:
- [large_bitset_bitset_filler]
- This completes the implementation of class template `large_bitset`.
- Using only a small amount of mostly schematic code,
- we have been able to provide a pretty powerful, self compressing
- and generally usable set type for all integral domain types.
- [endsect][/ large_bitset: Private implementation]
- [endsect][/ Implementation of a large bitset]
- [endsect][/ Large Bitsets IN Projects]
- [endsect][/ Projects]
|