123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550 |
- [/
- Boost.Optional
- Copyright (c) 2003-2007 Fernando Luis Cacciola Carballal
- Distributed under the Boost Software License, Version 1.0.
- (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt)
- ]
- [section Definitions]
- [section Introduction]
- This section provides definitions of terms used in the Numeric Conversion library.
- [blurb [*Notation]
- [_underlined text] denotes terms defined in the C++ standard.
- [*bold face] denotes terms defined here but not in the standard.
- ]
- [endsect]
- [section Types and Values]
- As defined by the [_C++ Object Model] (§1.7) the [_storage] or memory on which a
- C++ program runs is a contiguous sequence of [_bytes] where each byte is a
- contiguous sequence of bits.
- An [_object] is a region of storage (§1.8) and has a type (§3.9).
- A [_type] is a discrete set of values.
- An object of type `T` has an [_object representation] which is the sequence of
- bytes stored in the object (§3.9/4)
- An object of type `T` has a [_value representation] which is the set of
- bits that determine the ['value] of an object of that type (§3.9/4).
- For [_POD] types (§3.9/10), this bitset is given by the object representation,
- but not all the bits in the storage need to participate in the value
- representation (except for character types): for example, some bits might
- be used for padding or there may be trap-bits.
- __SPACE__
- The [*typed value] that is held by an object is the value which is determined
- by its value representation.
- An [*abstract value] (untyped) is the conceptual information that is
- represented in a type (i.e. the number π).
- The [*intrinsic value] of an object is the binary value of the sequence of
- unsigned characters which form its object representation.
- __SPACE__
- ['Abstract] values can be [*represented] in a given type.
- To [*represent] an abstract value `V` in a type `T` is to obtain a typed value
- `v` which corresponds to the abstract value `V`.
- The operation is denoted using the `rep()` operator, as in: `v=rep(V)`.
- `v` is the [*representation] of `V` in the type `T`.
- For example, the abstract value π can be represented in the type
- `double` as the `double value M_PI` and in the type `int` as the
- `int value 3`
- __SPACE__
- Conversely, ['typed values] can be [*abstracted].
- To [*abstract] a typed value `v` of type `T` is to obtain the abstract value `V`
- whose representation in `T` is `v`.
- The operation is denoted using the `abt()` operator, as in: `V=abt(v)`.
- `V` is the [*abstraction] of `v` of type `T`.
- Abstraction is just an abstract operation (you can't do it); but it is
- defined nevertheless because it will be used to give the definitions in the
- rest of this document.
- [endsect]
- [section C++ Arithmetic Types]
- The C++ language defines [_fundamental types] (§3.9.1). The following subsets of
- the fundamental types are intended to represent ['numbers]:
- [variablelist
- [[[_signed integer types] (§3.9.1/2):][
- `{signed char, signed short int, signed int, signed long int}`
- Can be used to represent general integer numbers (both negative and positive).
- ]]
- [[[_unsigned integer types] (§3.9.1/3):][
- `{unsigned char, unsigned short int, unsigned int, unsigned long int}`
- Can be used to represent positive integer numbers with modulo-arithmetic.
- ]]
- [[[_floating-point types] (§3.9.1/8):][
- `{float,double,long double}`
- Can be used to represent real numbers.
- ]]
- [[[_integral or integer types] (§3.9.1/7):][
- `{{signed integers},{unsigned integers}, bool, char and wchar_t}`
- ]]
- [[[_arithmetic types] (§3.9.1/8):][
- `{{integer types},{floating types}}`
- ]]
- ]
- The integer types are required to have a ['binary] value representation.
- Additionally, the signed/unsigned integer types of the same base type
- (`short`, `int` or `long`) are required to have the same value representation,
- that is:
- int i = -3 ; // suppose value representation is: 10011 (sign bit + 4 magnitude bits)
- unsigned int u = i ; // u is required to have the same 10011 as its value representation.
- In other words, the integer types signed/unsigned X use the same value
- representation but a different ['interpretation] of it; that is, their
- ['typed values] might differ.
- Another consequence of this is that the range for signed X is always a smaller
- subset of the range of unsigned X, as required by §3.9.1/3.
- [note
- Always remember that unsigned types, unlike signed types, have modulo-arithmetic;
- that is, they do not overflow.
- This means that:
- [*-] Always be extra careful when mixing signed/unsigned types
- [*-] Use unsigned types only when you need modulo arithmetic or very very large
- numbers. Don't use unsigned types just because you intend to deal with
- positive values only (you can do this with signed types as well).
- ]
- [endsect]
- [section Numeric Types]
- This section introduces the following definitions intended to integrate
- arithmetic types with user-defined types which behave like numbers.
- Some definitions are purposely broad in order to include a vast variety of
- user-defined number types.
- Within this library, the term ['number] refers to an abstract numeric value.
- A type is [*numeric] if:
- * It is an arithmetic type, or,
- * It is a user-defined type which
- * Represents numeric abstract values (i.e. numbers).
- * Can be converted (either implicitly or explicitly) to/from at least one arithmetic type.
- * Has [link boost_numericconversion.definitions.range_and_precision range] (possibly unbounded)
- and [link boost_numericconversion.definitions.range_and_precision precision] (possibly dynamic or
- unlimited).
- * Provides an specialization of `std::numeric_limits`.
- A numeric type is [*signed] if the abstract values it represent include negative numbers.
- A numeric type is [*unsigned] if the abstract values it represent exclude negative numbers.
- A numeric type is [*modulo] if it has modulo-arithmetic (does not overflow).
- A numeric type is [*integer] if the abstract values it represent are whole numbers.
- A numeric type is [*floating] if the abstract values it represent are real numbers.
- An [*arithmetic value] is the typed value of an arithmetic type
- A [*numeric value] is the typed value of a numeric type
- These definitions simply generalize the standard notions of arithmetic types and
- values by introducing a superset called [_numeric]. All arithmetic types and values are
- numeric types and values, but not vice versa, since user-defined numeric types are not
- arithmetic types.
- The following examples clarify the differences between arithmetic and numeric
- types (and values):
- // A numeric type which is not an arithmetic type (is user-defined)
- // and which is intended to represent integer numbers (i.e., an 'integer' numeric type)
- class MyInt
- {
- MyInt ( long long v ) ;
- long long to_builtin();
- } ;
- namespace std {
- template<> numeric_limits<MyInt> { ... } ;
- }
- // A 'floating' numeric type (double) which is also an arithmetic type (built-in),
- // with a float numeric value.
- double pi = M_PI ;
- // A 'floating' numeric type with a whole numeric value.
- // NOTE: numeric values are typed valued, hence, they are, for instance,
- // integer or floating, despite the value itself being whole or including
- // a fractional part.
- double two = 2.0 ;
- // An integer numeric type with an integer numeric value.
- MyInt i(1234);
- [endsect]
- [section Range and Precision]
- Given a number set `N`, some of its elements are representable in a numeric type `T`.
- The set of representable values of type `T`, or numeric set of `T`, is a set of numeric
- values whose elements are the representation of some subset of `N`.
- For example, the interval of `int` values `[INT_MIN,INT_MAX]` is the set of representable
- values of type `int`, i.e. the `int` numeric set, and corresponds to the representation
- of the elements of the interval of abstract values `[abt(INT_MIN),abt(INT_MAX)]` from
- the integer numbers.
- Similarly, the interval of `double` values `[-DBL_MAX,DBL_MAX]` is the `double`
- numeric set, which corresponds to the subset of the real numbers from `abt(-DBL_MAX)` to
- `abt(DBL_MAX)`.
- __SPACE__
- Let [*`next(x)`] denote the lowest numeric value greater than x.
- Let [*`prev(x)`] denote the highest numeric value lower then x.
- Let [*`v=prev(next(V))`] and [*`v=next(prev(V))`] be identities that relate a numeric
- typed value `v` with a number `V`.
- An ordered pair of numeric values `x`,`y` s.t. `x<y` are [*consecutive] iff `next(x)==y`.
- The abstract distance between consecutive numeric values is usually referred to as a
- [_Unit in the Last Place], or [*ulp] for short. A ulp is a quantity whose abstract
- magnitude is relative to the numeric values it corresponds to: If the numeric set
- is not evenly distributed, that is, if the abstract distance between consecutive
- numeric values varies along the set -as is the case with the floating-point types-,
- the magnitude of 1ulp after the numeric value `x` might be (usually is) different
- from the magnitude of a 1ulp after the numeric value y for `x!=y`.
- Since numbers are inherently ordered, a [*numeric set] of type `T` is an ordered sequence
- of numeric values (of type `T`) of the form:
- REP(T)={l,next(l),next(next(l)),...,prev(prev(h)),prev(h),h}
- where `l` and `h` are respectively the lowest and highest values of type `T`, called
- the boundary values of type `T`.
- __SPACE__
- A numeric set is discrete. It has a [*size] which is the number of numeric values in the set,
- a [*width] which is the abstract difference between the highest and lowest boundary values:
- `[abt(h)-abt(l)]`, and a [*density] which is the relation between its size and width:
- `density=size/width`.
- The integer types have density 1, which means that there are no unrepresentable integer
- numbers between `abt(l)` and `abt(h)` (i.e. there are no gaps). On the other hand,
- floating types have density much smaller than 1, which means that there are real numbers
- unrepresented between consecutive floating values (i.e. there are gaps).
- __SPACE__
- The interval of [_abstract values] `[abt(l),abt(h)]` is the range of the type `T`,
- denoted `R(T)`.
- A range is a set of abstract values and not a set of numeric values. In other
- documents, such as the C++ standard, the word `range` is ['sometimes] used as synonym
- for `numeric set`, that is, as the ordered sequence of numeric values from `l` to `h`.
- In this document, however, a range is an abstract interval which subtends the
- numeric set.
- For example, the sequence `[-DBL_MAX,DBL_MAX]` is the numeric set of the type
- `double`, and the real interval `[abt(-DBL_MAX),abt(DBL_MAX)]` is its range.
- Notice, for instance, that the range of a floating-point type is ['continuous]
- unlike its numeric set.
- This definition was chosen because:
- * [*(a)] The discrete set of numeric values is already given by the numeric set.
- * [*(b)] Abstract intervals are easier to compare and overlap since only boundary
- values need to be considered.
- This definition allows for a concise definition of `subranged` as given in the last section.
- The width of a numeric set, as defined, is exactly equivalent to the width of a range.
- __SPACE__
- The [*precision] of a type is given by the width or density of the numeric set.
- For integer types, which have density 1, the precision is conceptually equivalent
- to the range and is determined by the number of bits used in the value representation:
- The higher the number of bits the bigger the size of the numeric set, the wider the
- range, and the higher the precision.
- For floating types, which have density <<1, the precision is given not by the width
- of the range but by the density. In a typical implementation, the range is determined
- by the number of bits used in the exponent, and the precision by the number of bits
- used in the mantissa (giving the maximum number of significant digits that can be
- exactly represented). The higher the number of exponent bits the wider the range,
- while the higher the number of mantissa bits, the higher the precision.
- [endsect]
- [section Exact, Correctly Rounded and Out-Of-Range Representations]
- Given an abstract value `V` and a type `T` with its corresponding range `[abt(l),abt(h)]`:
- If `V < abt(l)` or `V > abt(h)`, `V` is [*not representable] (cannot be represented) in
- the type `T`, or, equivalently, it's representation in the type `T` is [*out of range],
- or [*overflows].
- * If `V < abt(l)`, the [*overflow is negative].
- * If `V > abt(h)`, the [*overflow is positive].
- If `V >= abt(l)` and `V <= abt(h)`, `V` is [*representable] (can be represented) in the
- type `T`, or, equivalently, its representation in the type `T` is [*in range], or
- [*does not overflow].
- Notice that a numeric type, such as a C++ unsigned type, can define that any `V` does
- not overflow by always representing not `V` itself but the abstract value
- `U = [ V % (abt(h)+1) ]`, which is always in range.
- Given an abstract value `V` represented in the type `T` as `v`, the [*roundoff] error
- of the representation is the abstract difference: `(abt(v)-V)`.
- Notice that a representation is an ['operation], hence, the roundoff error corresponds
- to the representation operation and not to the numeric value itself
- (i.e. numeric values do not have any error themselves)
- * If the roundoff is 0, the representation is [*exact], and `V` is exactly representable
- in the type `T`.
- * If the roundoff is not 0, the representation is [*inexact], and `V` is inexactly
- representable in the type `T`.
- If a representation `v` in a type `T` -either exact or inexact-, is any of the adjacents
- of `V` in that type, that is, if `v==prev` or `v==next`, the representation is
- faithfully rounded. If the choice between `prev` and `next` matches a given
- [*rounding direction], it is [*correctly rounded].
- All exact representations are correctly rounded, but not all inexact representations are.
- In particular, C++ requires numeric conversions (described below) and the result of
- arithmetic operations (not covered by this document) to be correctly rounded, but
- batch operations propagate roundoff, thus final results are usually incorrectly
- rounded, that is, the numeric value `r` which is the computed result is neither of
- the adjacents of the abstract value `R` which is the theoretical result.
- Because a correctly rounded representation is always one of adjacents of the abstract
- value being represented, the roundoff is guaranteed to be at most 1ulp.
- The following examples summarize the given definitions. Consider:
- * A numeric type `Int` representing integer numbers with a
- ['numeric set]: `{-2,-1,0,1,2}` and
- ['range]: `[-2,2]`
- * A numeric type `Cardinal` representing integer numbers with a
- ['numeric set]: `{0,1,2,3,4,5,6,7,8,9}` and
- ['range]: `[0,9]` (no modulo-arithmetic here)
- * A numeric type `Real` representing real numbers with a
- ['numeric set]: `{-2.0,-1.5,-1.0,-0.5,-0.0,+0.0,+0.5,+1.0,+1.5,+2.0}` and
- ['range]: `[-2.0,+2.0]`
- * A numeric type `Whole` representing real numbers with a
- ['numeric set]: `{-2.0,-1.0,0.0,+1.0,+2.0}` and
- ['range]: `[-2.0,+2.0]`
- First, notice that the types `Real` and `Whole` both represent real numbers,
- have the same range, but different precision.
- * The integer number `1` (an abstract value) can be exactly represented
- in any of these types.
- * The integer number `-1` can be exactly represented in `Int`, `Real` and `Whole`,
- but cannot be represented in `Cardinal`, yielding negative overflow.
- * The real number `1.5` can be exactly represented in `Real`, and inexactly
- represented in the other types.
- * If `1.5` is represented as either `1` or `2` in any of the types (except `Real`),
- the representation is correctly rounded.
- * If `0.5` is represented as `+1.5` in the type `Real`, it is incorrectly rounded.
- * `(-2.0,-1.5)` are the `Real` adjacents of any real number in the interval
- `[-2.0,-1.5]`, yet there are no `Real` adjacents for `x < -2.0`, nor for `x > +2.0`.
- [endsect]
- [section Standard (numeric) Conversions]
- The C++ language defines [_Standard Conversions] (§4) some of which are conversions
- between arithmetic types.
- These are [_Integral promotions] (§4.5), [_Integral conversions] (§4.7),
- [_Floating point promotions] (§4.6), [_Floating point conversions] (§4.8) and
- [_Floating-integral conversions] (§4.9).
- In the sequel, integral and floating point promotions are called [*arithmetic promotions],
- and these plus integral, floating-point and floating-integral conversions are called
- [*arithmetic conversions] (i.e, promotions are conversions).
- Promotions, both Integral and Floating point, are ['value-preserving], which means that
- the typed value is not changed with the conversion.
- In the sequel, consider a source typed value `s` of type `S`, the source abstract
- value `N=abt(s)`, a destination type `T`; and whenever possible, a result typed value
- `t` of type `T`.
- Integer to integer conversions are always defined:
- * If `T` is unsigned, the abstract value which is effectively represented is not
- `N` but `M=[ N % ( abt(h) + 1 ) ]`, where `h` is the highest unsigned typed
- value of type `T`.
- * If `T` is signed and `N` is not directly representable, the result `t` is
- [_implementation-defined], which means that the C++ implementation is required to
- produce a value `t` even if it is totally unrelated to `s`.
- Floating to Floating conversions are defined only if `N` is representable;
- if it is not, the conversion has [_undefined behavior].
- * If `N` is exactly representable, `t` is required to be the exact representation.
- * If `N` is inexactly representable, `t` is required to be one of the two
- adjacents, with an implementation-defined choice of rounding direction;
- that is, the conversion is required to be correctly rounded.
- Floating to Integer conversions represent not `N` but `M=trunc(N)`, were
- `trunc()` is to truncate: i.e. to remove the fractional part, if any.
- * If `M` is not representable in `T`, the conversion has [_undefined behavior]
- (unless `T` is `bool`, see §4.12).
- Integer to Floating conversions are always defined.
- * If `N` is exactly representable, `t` is required to be the exact representation.
- * If `N` is inexactly representable, `t` is required to be one of the
- two adjacents, with an implementation-defined choice of rounding direction;
- that is, the conversion is required to be correctly rounded.
- [endsect]
- [section Subranged Conversion Direction, Subtype and Supertype]
- Given a source type `S` and a destination type `T`, there is a
- [*conversion direction] denoted: `S->T`.
- For any two ranges the following ['range relation] can be defined:
- A range `X` can be ['entirely contained] in a range `Y`, in which case
- it is said that `X` is enclosed by `Y`.
- [: [*Formally:] `R(S)` is enclosed by `R(T)` iif `(R(S) intersection R(T)) == R(S)`.]
- If the source type range, `R(S)`, is not enclosed in the target type range,
- `R(T)`; that is, if `(R(S) & R(T)) != R(S)`, the conversion direction is said
- to be [*subranged], which means that `R(S)` is not entirely contained in `R(T)`
- and therefore there is some portion of the source range which falls outside
- the target range. In other words, if a conversion direction `S->T` is subranged,
- there are values in `S` which cannot be represented in `T` because they are
- out of range.
- Notice that for `S->T`, the adjective subranged applies to `T`.
- Examples:
- Given the following numeric types all representing real numbers:
- * `X` with numeric set `{-2.0,-1.0,0.0,+1.0,+2.0}` and range `[-2.0,+2.0]`
- * `Y` with numeric set `{-2.0,-1.5,-1.0,-0.5,0.0,+0.5,+1.0,+1.5,+2.0}` and range `[-2.0,+2.0]`
- * `Z` with numeric set `{-1.0,0.0,+1.0}` and range `[-1.0,+1.0]`
- For:
- [variablelist
- [[(a) X->Y:][
- `R(X) & R(Y) == R(X)`, then `X->Y` is not subranged.
- Thus, all values of type `X` are representable in the type `Y`.
- ]]
- [[(b) Y->X:][
- `R(Y) & R(X) == R(Y)`, then `Y->X` is not subranged.
- Thus, all values of type `Y` are representable in the type `X`, but in this case,
- some values are ['inexactly] representable (all the halves).
- (note: it is to permit this case that a range is an interval of abstract values and
- not an interval of typed values)
- ]]
- [[(b) X->Z:][
- `R(X) & R(Z) != R(X)`, then `X->Z` is subranged.
- Thus, some values of type `X` are not representable in the type `Z`, they fall
- out of range `(-2.0 and +2.0)`.
- ]]
- ]
- It is possible that `R(S)` is not enclosed by `R(T)`, while neither is `R(T)` enclosed
- by `R(S)`; for example, `UNSIG=[0,255]` is not enclosed by `SIG=[-128,127]`;
- neither is `SIG` enclosed by `UNSIG`.
- This implies that is possible that a conversion direction is subranged both ways.
- This occurs when a mixture of signed/unsigned types are involved and indicates that
- in both directions there are values which can fall out of range.
- Given the range relation (subranged or not) of a conversion direction `S->T`, it
- is possible to classify `S` and `T` as [*supertype] and [*subtype]:
- If the conversion is subranged, which means that `T` cannot represent all possible
- values of type `S`, `S` is the supertype and `T` the subtype; otherwise, `T` is the
- supertype and `S` the subtype.
- For example:
- [: `R(float)=[-FLT_MAX,FLT_MAX]` and `R(double)=[-DBL_MAX,DBL_MAX]` ]
- If `FLT_MAX < DBL_MAX`:
- * `double->float` is subranged and `supertype=double`, `subtype=float`.
- * `float->double` is not subranged and `supertype=double`, `subtype=float`.
- Notice that while `double->float` is subranged, `float->double` is not,
- which yields the same supertype,subtype for both directions.
- Now consider:
- [: `R(int)=[INT_MIN,INT_MAX]` and `R(unsigned int)=[0,UINT_MAX]` ]
- A C++ implementation is required to have `UINT_MAX > INT_MAX` (§3.9/3), so:
- * 'int->unsigned' is subranged (negative values fall out of range)
- and `supertype=int`, `subtype=unsigned`.
- * 'unsigned->int' is ['also] subranged (high positive values fall out of range)
- and `supertype=unsigned`, `subtype=int`.
- In this case, the conversion is subranged in both directions and the
- supertype,subtype pairs are not invariant (under inversion of direction).
- This indicates that none of the types can represent all the values of the other.
- When the supertype is the same for both `S->T` and `T->S`, it is effectively
- indicating a type which can represent all the values of the subtype.
- Consequently, if a conversion `X->Y` is not subranged, but the opposite `(Y->X)` is,
- so that the supertype is always `Y`, it is said that the direction `X->Y` is [*correctly
- rounded value preserving], meaning that all such conversions are guaranteed to
- produce results in range and correctly rounded (even if inexact).
- For example, all integer to floating conversions are correctly rounded value preserving.
- [endsect]
- [endsect]
|