123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632 |
- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
- "http://www.w3.org/TR/html4/loose.dtd">
- <html>
- <head>
- <meta http-equiv="Content-Language" content="en-us">
- <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
- <link rel="stylesheet" type="text/css" href="../../../../boost.css">
- <title>Rounding Policies</title>
- </head>
- <body lang="en">
- <h1>Rounding Policies</h1>
- <p>In order to be as general as possible, the library uses a class to
- compute all the necessary functions rounded upward or downward. This class
- is the first parameter of <code>policies</code>, it is also the type named
- <code>rounding</code> in the policy definition of
- <code>interval</code>.</p>
- <p>By default, it is <code>interval_lib::rounded_math<T></code>. The
- class <code>interval_lib::rounded_math</code> is already specialized for
- the standard floating types (<code>float</code> , <code>double</code> and
- <code>long double</code>). So if the base type of your intervals is not one
- of these, a good solution would probably be to provide a specialization of
- this class. But if the default specialization of
- <code>rounded_math<T></code> for <code>float</code>,
- <code>double</code>, or <code>long double</code> is not what you seek, or
- you do not want to specialize
- <code>interval_lib::rounded_math<T></code> (say because you prefer to
- work in your own namespace) you can also define your own rounding policy
- and pass it directly to <code>interval_lib::policies</code>.</p>
- <h2>Requirements</h2>
- <p>Here comes what the class is supposed to provide. The domains are
- written next to their respective functions (as you can see, the functions
- do not have to worry about invalid values, but they have to handle infinite
- arguments).</p>
- <pre>
- /* Rounding requirements */
- struct rounding {
- // default constructor, destructor
- rounding();
- ~rounding();
- // mathematical operations
- T add_down(T, T); // [-∞;+∞][-∞;+∞]
- T add_up (T, T); // [-∞;+∞][-∞;+∞]
- T sub_down(T, T); // [-∞;+∞][-∞;+∞]
- T sub_up (T, T); // [-∞;+∞][-∞;+∞]
- T mul_down(T, T); // [-∞;+∞][-∞;+∞]
- T mul_up (T, T); // [-∞;+∞][-∞;+∞]
- T div_down(T, T); // [-∞;+∞]([-∞;+∞]-{0})
- T div_up (T, T); // [-∞;+∞]([-∞;+∞]-{0})
- T sqrt_down(T); // ]0;+∞]
- T sqrt_up (T); // ]0;+∞]
- T exp_down(T); // [-∞;+∞]
- T exp_up (T); // [-∞;+∞]
- T log_down(T); // ]0;+∞]
- T log_up (T); // ]0;+∞]
- T cos_down(T); // [0;2π]
- T cos_up (T); // [0;2π]
- T tan_down(T); // ]-π/2;π/2[
- T tan_up (T); // ]-π/2;π/2[
- T asin_down(T); // [-1;1]
- T asin_up (T); // [-1;1]
- T acos_down(T); // [-1;1]
- T acos_up (T); // [-1;1]
- T atan_down(T); // [-∞;+∞]
- T atan_up (T); // [-∞;+∞]
- T sinh_down(T); // [-∞;+∞]
- T sinh_up (T); // [-∞;+∞]
- T cosh_down(T); // [-∞;+∞]
- T cosh_up (T); // [-∞;+∞]
- T tanh_down(T); // [-∞;+∞]
- T tanh_up (T); // [-∞;+∞]
- T asinh_down(T); // [-∞;+∞]
- T asinh_up (T); // [-∞;+∞]
- T acosh_down(T); // [1;+∞]
- T acosh_up (T); // [1;+∞]
- T atanh_down(T); // [-1;1]
- T atanh_up (T); // [-1;1]
- T median(T, T); // [-∞;+∞][-∞;+∞]
- T int_down(T); // [-∞;+∞]
- T int_up (T); // [-∞;+∞]
- // conversion functions
- T conv_down(U);
- T conv_up (U);
- // unprotected rounding class
- typedef ... unprotected_rounding;
- };
- </pre>
- <p>The constructor and destructor of the rounding class have a very
- important semantic requirement: they are responsible for setting and
- resetting the rounding modes of the computation on T. For instance, if T is
- a standard floating point type and floating point computation is performed
- according to the Standard IEEE 754, the constructor can save the current
- rounding state, each <code>_up</code> (resp. <code>_down</code>) function
- will round up (resp. down), and the destructor will restore the saved
- rounding state. Indeed this is the behavior of the default rounding
- policy.</p>
- <p>The meaning of all the mathematical functions up until
- <code>atanh_up</code> is clear: each function returns number representable
- in the type <code>T</code> which is a lower bound (for <code>_down</code>)
- or upper bound (for <code>_up</code>) on the true mathematical result of
- the corresponding function. The function <code>median</code> computes the
- average of its two arguments rounded to its nearest representable number.
- The functions <code>int_down</code> and <code>int_up</code> compute the
- nearest integer smaller or bigger than their argument. Finally,
- <code>conv_down</code> and <code>conv_up</code> are responsible of the
- conversions of values of other types to the base number type: the first one
- must round down the value and the second one must round it up.</p>
- <p>The type <code>unprotected_rounding</code> allows to remove all
- controls. For reasons why one might to do this, see the <a href=
- "#Protection">protection</a> paragraph below.</p>
- <h2>Overview of the provided classes</h2>
- <p>A lot of classes are provided. The classes are organized by level. At
- the bottom is the class <code>rounding_control</code>. At the next level
- come <code>rounded_arith_exact</code>, <code>rounded_arith_std</code> and
- <code>rounded_arith_opp</code>. Then there are
- <code>rounded_transc_dummy</code>, <code>rounded_transc_exact</code>,
- <code>rounded_transc_std</code> and <code>rounded_transc_opp</code>. And
- finally are <code>save_state</code> and <code>save_state_nothing</code>.
- Each of these classes provide a set of members that are required by the
- classes of the next level. For example, a <code>rounded_transc_...</code>
- class needs the members of a <code>rounded_arith_...</code> class.</p>
- <p>When they exist in two versions <code>_std</code> and <code>_opp</code>,
- the first one does switch the rounding mode each time, and the second one
- tries to keep it oriented toward plus infinity. The main purpose of the
- <code>_opp</code> version is to speed up the computations through the use
- of the "opposite trick" (see the <a href="#perf">performance notes</a>).
- This version requires the rounding mode to be upward before entering any
- computation functions of the class. It guarantees that the rounding mode
- will still be upward at the exit of the functions.</p>
- <p>Please note that it is really a very bad idea to mix the
- <code>_opp</code> version with the <code>_std</code> since they do not have
- compatible properties.</p>
- <p>There is a third version named <code>_exact</code> which computes the
- functions without changing the rounding mode. It is an "exact" version
- because it is intended for a base type that produces exact results.</p>
- <p>The last version is the <code>_dummy</code> version. It does not do any
- computations but still produces compatible results.</p>
- <p>Please note that it is possible to use the "exact" version for an
- inexact base type, e.g. <code>float</code> or <code>double</code>. In that
- case, the inclusion property is no longer guaranteed, but this can be
- useful to speed up the computation when the inclusion property is not
- desired strictly. For instance, in computer graphics, a small error due to
- floating-point roundoff is acceptable as long as an approximate version of
- the inclusion property holds.</p>
- <p>Here comes what each class defines. Later, when they will be described
- more thoroughly, these members will not be repeated. Please come back here
- in order to see them. Inheritance is also used to avoid repetitions.</p>
- <pre>
- template <class T>
- struct rounding_control
- {
- typedef ... rounding_mode;
- void set_rounding_mode(rounding_mode);
- void get_rounding_mode(rounding_mode&);
- void downward ();
- void upward ();
- void to_nearest();
- T to_int(T);
- T force_rounding(T);
- };
- template <class T, class Rounding>
- struct rounded_arith_... : Rounding
- {
- void init();
- T add_down(T, T);
- T add_up (T, T);
- T sub_down(T, T);
- T sub_up (T, T);
- T mul_down(T, T);
- T mul_up (T, T);
- T div_down(T, T);
- T div_up (T, T);
- T sqrt_down(T);
- T sqrt_up (T);
- T median(T, T);
- T int_down(T);
- T int_up (T);
- };
- template <class T, class Rounding>
- struct rounded_transc_... : Rounding
- {
- T exp_down(T);
- T exp_up (T);
- T log_down(T);
- T log_up (T);
- T cos_down(T);
- T cos_up (T);
- T tan_down(T);
- T tan_up (T);
- T asin_down(T);
- T asin_up (T);
- T acos_down(T);
- T acos_up (T);
- T atan_down(T);
- T atan_up (T);
- T sinh_down(T);
- T sinh_up (T);
- T cosh_down(T);
- T cosh_up (T);
- T tanh_down(T);
- T tanh_up (T);
- T asinh_down(T);
- T asinh_up (T);
- T acosh_down(T);
- T acosh_up (T);
- T atanh_down(T);
- T atanh_up (T);
- };
- template <class Rounding>
- struct save_state_... : Rounding
- {
- save_state_...();
- ~save_state_...();
- typedef ... unprotected_rounding;
- };
- </pre>
- <h2>Synopsis</h2>
- <pre>
- namespace boost {
- namespace numeric {
- namespace interval_lib {
- <span style="color: #FF0000">/* basic rounding control */</span>
- template <class T> struct rounding_control;
- <span style="color: #FF0000">/* arithmetic functions rounding */</span>
- template <class T, class Rounding = rounding_control<T> > struct rounded_arith_exact;
- template <class T, class Rounding = rounding_control<T> > struct rounded_arith_std;
- template <class T, class Rounding = rounding_control<T> > struct rounded_arith_opp;
- <span style="color: #FF0000">/* transcendental functions rounding */</span>
- template <class T, class Rounding> struct rounded_transc_dummy;
- template <class T, class Rounding = rounded_arith_exact<T> > struct rounded_transc_exact;
- template <class T, class Rounding = rounded_arith_std<T> > struct rounded_transc_std;
- template <class T, class Rounding = rounded_arith_opp<T> > struct rounded_transc_opp;
- <span style="color: #FF0000">/* rounding-state-saving classes */</span>
- template <class Rounding> struct save_state;
- template <class Rounding> struct save_state_nothing;
- <span style="color: #FF0000">/* default policy for type T */</span>
- template <class T> struct rounded_math;
- template <> struct rounded_math<float>;
- template <> struct rounded_math<double>;
- <span style=
- "color: #FF0000">/* some metaprogramming to convert a protected to unprotected rounding */</span>
- template <class I> struct unprotect;
- } // namespace interval_lib
- } // namespace numeric
- } // namespace boost
- </pre>
- <h2>Description of the provided classes</h2>
- <p>We now describe each class in the order they appear in the definition of
- a rounding policy (this outermost-to-innermost order is the reverse order
- from the synopsis).</p>
- <h3 id="Protection">Protection control</h3>
- <p>Protection refers to the fact that the interval operations will be
- surrounded by rounding mode controls. Unprotecting a class means to remove
- all the rounding controls. Each rounding policy provides a type
- <code>unprotected_rounding</code>. The required type
- <code>unprotected_rounding</code> gives another rounding class that enables
- to work when nested inside rounding. For example, the first three lines
- below should all produce the same result (because the first operation is
- the rounding constructor, and the last is its destructor, which take care
- of setting the rounding modes); and the last line is allowed to have an
- undefined behavior (since no rounding constructor or destructor is ever
- called).</p>
- <pre>
- T c; { rounding rnd; c = rnd.add_down(a, b); }
- T c; { rounding rnd1; { rounding rnd2; c = rnd2.add_down(a, b); } }
- T c; { rounding rnd1; { rounding::unprotected_rounding rnd2; c = rnd2.add_down(a, b); } }
- T d; { rounding::unprotected_rounding rnd; d = rnd.add_down(a, b); }
- </pre>
- <p>Naturally <code>rounding::unprotected_rounding</code> may simply be
- <code>rounding</code> itself. But it can improve performance if it is a
- simplified version with empty constructor and destructor. In order to avoid
- undefined behaviors, in the library, an object of type
- <code>rounding::unprotected_rounding</code> is guaranteed to be created
- only when an object of type <code>rounding</code> is already alive. See the
- <a href="#perf">performance notes</a> for some additional details.</p>
- <p>The support library defines a metaprogramming class template
- <code>unprotect</code> which takes an interval type <code>I</code> and
- returns an interval type <code>unprotect<I>::type</code> where the
- rounding policy has been unprotected. Some information about the types:
- <code>interval<T, interval_lib::policies<Rounding, _>
- >::traits_type::rounding</code> <b>is</b> the same type as
- <code>Rounding</code>, and <code>unprotect<interval<T,
- interval_lib::policies<Rounding, _> > >::type</code> <b>is</b>
- the same type as <code>interval<T,
- interval_lib::policies<Rounding::unprotected, _> ></code>.</p>
- <h3>State saving</h3>
- <p>First comes <code>save_state</code>. This class is responsible for
- saving the current rounding mode and calling init in its constructor, and
- for restoring the saved rounding mode in its destructor. This class also
- defines the <code>unprotected_rounding</code> type.</p>
- <p>If the rounding mode does not require any state-saving or
- initialization, <code>save_state_nothing</code> can be used instead of
- <code>save_state</code>.</p>
- <h3>Transcendental functions</h3>
- <p>The classes <code>rounded_transc_exact</code>,
- <code>rounded_transc_std</code> and <code>rounded_transc_opp</code> expect
- the std namespace to provide the functions exp log cos tan acos asin atan
- cosh sinh tanh acosh asinh atanh. For the <code>_std</code> and
- <code>_opp</code> versions, all these functions should respect the current
- rounding mode fixed by a call to downward or upward.</p>
- <p><strong>Please note:</strong> Unfortunately, the latter is rarely the
- case. It is the reason why a class <code>rounded_transc_dummy</code> is
- provided which does not depend on the functions from the std namespace.
- There is no magic, however. The functions of
- <code>rounded_transc_dummy</code> do not compute anything. They only return
- valid values. For example, <code>cos_down</code> always returns -1. In this
- way, we do verify the inclusion property for the default implementation,
- even if this has strictly no value for the user. In order to have useful
- values, another policy should be used explicitely, which will most likely
- lead to a violation of the inclusion property. In this way, we ensure that
- the violation is clearly pointed out to the user who then knows what he
- stands against. This class could have been used as the default
- transcendental rounding class, but it was decided it would be better for
- the compilation to fail due to missing declarations rather than succeed
- thanks to valid but unusable functions.</p>
- <h3>Basic arithmetic functions</h3>
- <p>The classes <code>rounded_arith_std</code> and
- <code>rounded_arith_opp</code> expect the operators + - * / and the
- function <code>std::sqrt</code> to respect the current rounding mode.</p>
- <p>The class <code>rounded_arith_exact</code> requires
- <code>std::floor</code> and <code>std::ceil</code> to be defined since it
- can not rely on <code>to_int</code>.</p>
- <h3>Rounding control</h3>
- <p>The functions defined by each of the previous classes did not need any
- explanation. For example, the behavior of <code>add_down</code> is to
- compute the sum of two numbers rounded downward. For
- <code>rounding_control</code>, the situation is a bit more complex.</p>
- <p>The basic function is <code>force_rounding</code> which returns its
- argument correctly rounded accordingly to the current rounding mode if it
- was not already the case. This function is necessary to handle delayed
- rounding. Indeed, depending on the way the computations are done, the
- intermediate results may be internally stored in a more precise format and
- it can lead to a wrong rounding. So the function enforces the rounding.
- <a href="#extreg">Here</a> is an example of what happens when the rounding
- is not enforced.</p>
- <p>The function <code>get_rounding_mode</code> returns the current rounding
- mode, <code>set_rounding_mode</code> sets the rounding mode back to a
- previous value returned by <code>get_rounding_mode</code>.
- <code>downward</code>, <code>upward</code> and <code>to_nearest</code> sets
- the rounding mode in one of the three directions. This rounding mode should
- be global to all the functions that use the type <code>T</code>. For
- example, after a call to <code>downward</code>,
- <code>force_rounding(x+y)</code> is expected to return the sum rounded
- toward -∞.</p>
- <p>The function <code>to_int</code> computes the nearest integer
- accordingly to the current rounding mode.</p>
- <p>The non-specialized version of <code>rounding_control</code> does not do
- anything. The functions for the rounding mode are empty, and
- <code>to_int</code> and <code>force_rounding</code> are identity functions.
- The <code>pi_</code> constant functions return suitable integers (for
- example, <code>pi_up</code> returns <code>T(4)</code>).</p>
- <p>The class template <code>rounding_control</code> is specialized for
- <code>float</code>, <code>double</code> and <code>long double</code> in
- order to best use the floating point unit of the computer.</p>
- <h2>Template class <tt>rounded_math</tt></h2>
- <p>The default policy (aka <code>rounded_math<T></code>) is simply
- defined as:</p>
- <pre>
- template <class T> struct rounded_math<T> : save_state_nothing<rounded_arith_exact<T> > {};
- </pre>
- <p>and the specializations for <code>float</code>, <code>double</code> and
- <code>long double</code> use <code>rounded_arith_opp</code>, as in:</p>
- <pre>
- template <> struct rounded_math<float> : save_state<rounded_arith_opp<float> > {};
- template <> struct rounded_math<double> : save_state<rounded_arith_opp<double> > {};
- template <> struct rounded_math<long double> : save_state<rounded_arith_opp<long double> > {};
- </pre>
- <h2 id="perf">Performance Issues</h2>
- <p>This paragraph deals mostly with the performance of the library with
- intervals using the floating-point unit (FPU) of the computer. Let's
- consider the sum of [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] as an
- example. The result is [<code>down</code>(<i>a</i>+<i>c</i>),
- <code>up</code>(<i>b</i>+<i>d</i>)], where <code>down</code> and
- <code>up</code> indicate the rounding mode needed.</p>
- <h3>Rounding Mode Switch</h3>
- <p>If the FPU is able to use a different rounding mode for each operation,
- there is no problem. For example, it's the case for the Alpha processor:
- each floating-point instruction can specify a different rounding mode.
- However, the IEEE-754 Standard does not require such a behavior. So most of
- the FPUs only provide some instructions to set the rounding mode for all
- subsequent operations. And generally, these instructions need to flush the
- pipeline of the FPU.</p>
- <p>In this situation, the time needed to sum [<i>a</i>,<i>b</i>] and
- [<i>c</i>,<i>d</i>] is far worse than the time needed to calculate
- <i>a</i>+<i>b</i> and <i>c</i>+<i>d</i> since the two additions cannot be
- parallelized. Consequently, the objective is to diminish the number of
- rounding mode switches.</p>
- <p>If this library is not used to provide exact computations, but only for
- pair arithmetic, the solution is quite simple: do not use rounding. In that
- case, doing the sum [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] will be as
- fast as computing <i>a</i>+<i>b</i> and <i>c</i>+<i>d</i>. Everything is
- perfect.</p>
- <p>However, if exact computations are required, such a solution is totally
- unthinkable. So, are we penniless? No, there is still a trick available.
- Indeed, down(<i>a</i>+<i>c</i>) = -up(-<i>a</i>-<i>c</i>) if the unary
- minus is an exact operation. It is now possible to calculate the whole sum
- with the same rounding mode. Generally, the cost of the mode switching is
- worse than the cost of the sign changes.</p>
- <h4>Speeding up consecutive operations</h4>
- <p>The interval addition is not the only operation; most of the interval
- operations can be computed by setting the rounding direction of the FPU
- only once. So the operations of the floating point rounding policy assume
- that the direction is correctly set. This assumption is usually not true in
- a program (the user and the standard library expect the rounding direction
- to be to nearest), so these operations have to be enclosed in a shell that
- sets the floating point environment. This protection is done by the
- constructor and destructor of the rounding policy.</p>
- <p>Les us now consider the case of two consecutive interval additions:
- [<i>a</i>,<i>b</i>] + [<i>c</i>,<i>d</i>] + [<i>e</i>,<i>f</i>]. The
- generated code should look like:</p>
- <pre>
- init_rounding_mode(); // rounding object construction during the first addition
- t1 = -(-a - c);
- t2 = b + d;
- restore_rounding_mode(); // rounding object destruction
- init_rounding_mode(); // rounding object construction during the second addition
- x = -(-t1 - e);
- y = t2 + f;
- restore_rounding_mode(); // rounding object destruction
- // the result is the interval [x,y]
- </pre>
- <p>Between the two operations, the rounding direction is restored, and then
- initialized again. Ideally, compilers should be able to optimize this
- useless code away. But unfortunately they are not, and this slows the code
- down by an order of magnitude. In order to avoid this bottleneck, the user
- can tell to the interval operations that they do not need to be protected
- anymore. It will then be up to the user to protect the interval
- computations. The compiler will then be able to generate such a code:</p>
- <pre>
- init_rounding_mode(); // done by the user
- x = -(-a - c - e);
- y = b + d + f;
- restore_rounding_mode(); // done by the user
- </pre>
- <p>The user will have to create a rounding object. And as long as this
- object is alive, unprotected versions of the interval operations can be
- used. They are selected by using an interval type with a specific rounding
- policy. If the initial interval type is <code>I</code>, then
- <code>I::traits_type::rounding</code> is the type of the rounding object,
- and <code>interval_lib::unprotect<I>::type</code> is the type of the
- unprotected interval type.</p>
- <p>Because the rounding mode of the FPU is changed during the life of the
- rounding object, any arithmetic floating point operation that does not
- involve the interval library can lead to unexpected results. And
- reciprocally, using unprotected interval operation when no rounding object
- is alive will produce intervals that are not guaranteed anymore to contain
- the real result.</p>
- <h4 id="perfexp">Example</h4>
- <p>Here is an example of Horner's scheme to compute the value of a polynom.
- The rounding mode switches are disabled for the whole computation.</p>
- <pre>
- // I is an interval class, the polynom is a simple array
- template<class I>
- I horner(const I& x, const I p[], int n) {
- // save and initialize the rounding mode
- typename I::traits_type::rounding rnd;
- // define the unprotected version of the interval type
- typedef typename boost::numeric::interval_lib::unprotect<I>::type R;
- const R& a = x;
- R y = p[n - 1];
- for(int i = n - 2; i >= 0; i--) {
- y = y * a + (const R&)(p[i]);
- }
- return y;
- // restore the rounding mode with the destruction of rnd
- }
- </pre>
- <p>Please note that a rounding object is specially created in order to
- protect all the interval computations. Each interval of type I is converted
- in an interval of type R before any operations. If this conversion is not
- done, the result is still correct, but the interest of this whole
- optimization has disappeared. Whenever possible, it is good to convert to
- <code>const R&</code> instead of <code>R</code>: indeed, the function
- could already be called inside an unprotection block so the types
- <code>R</code> and <code>I</code> would be the same interval, no need for a
- conversion.</p>
- <h4>Uninteresting remark</h4>
- <p>It was said at the beginning that the Alpha processors can use a
- specific rounding mode for each operation. However, due to the instruction
- format, the rounding toward plus infinity is not available. Only the
- rounding toward minus infinity can be used. So the trick using the change
- of sign becomes essential, but there is no need to save and restore the
- rounding mode on both sides of an operation.</p>
- <h3 id="extreg">Extended Registers</h3>
- <p>There is another problem besides the cost of the rounding mode switch.
- Some FPUs use extended registers (for example, float computations will be
- done with double registers, or double computations with long double
- registers). Consequently, many problems can arise.</p>
- <p>The first one is due to to the extended precision of the mantissa. The
- rounding is also done on this extended precision. And consequently, we
- still have down(<i>a</i>+<i>b</i>) = -up(-<i>a</i>-<i>b</i>) in the
- extended registers. But back to the standard precision, we now have
- down(<i>a</i>+<i>b</i>) < -up(-<i>a</i>-<i>b</i>) instead of an
- equality. A solution could be not to use this method. But there still are
- other problems, with the comparisons between numbers for example.</p>
- <p>Naturally, there is also a problem with the extended precision of the
- exponent. To illustrate this problem, let <i>m</i> be the biggest number
- before +<i>inf</i>. If we calculate 2*[<i>m</i>,<i>m</i>], the answer
- should be [<i>m</i>,<i>inf</i>]. But due to the extended registers, the FPU
- will first store [<i>2m</i>,<i>2m</i>] and then convert it to
- [<i>inf</i>,<i>inf</i>] at the end of the calculus (when the rounding mode
- is toward +<i>inf</i>). So the answer is no more accurate.</p>
- <p>There is only one solution: to force the FPU to convert the extended
- values back to standard precision after each operation. Some FPUs provide
- an instruction able to do this conversion (for example the PowerPC
- processors). But for the FPUs that do not provide it (the x86 processors),
- the only solution is to write the values to memory and read them back. Such
- an operation is obviously very expensive.</p>
- <h2>Some Examples</h2>
- <p>Here come several cases:</p>
- <ul>
- <li>if you need precise computations with the <code>float</code> or
- <code>double</code> types, use the default
- <code>rounded_math<T></code>;</li>
- <li>for fast wide intervals without any rounding nor precision, use
- <code>save_state_nothing<rounded_transc_exact<T>
- ></code>;</li>
- <li>for an exact type (like int or rational with a little help for
- infinite and NaN values) without support for transcendental functions,
- the solution could be
- <code>save_state_nothing<rounded_transc_dummy<T,
- rounded_arith_exact<T> > ></code> or directly
- <code>save_state_nothing<rounded_arith_exact<T>
- ></code>;</li>
- <li>if it is a more complex case than the previous ones, the best thing
- is probably to directly define your own policy.</li>
- </ul>
- <hr>
- <p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
- "../../../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
- height="31" width="88"></a></p>
- <p>Revised
- <!--webbot bot="Timestamp" s-type="EDITED" s-format="%Y-%m-%d" startspan -->2006-12-24<!--webbot bot="Timestamp" endspan i-checksum="12172" --></p>
- <p><i>Copyright © 2002 Guillaume Melquiond, Sylvain Pion, Hervé
- Brönnimann, Polytechnic University<br>
- Copyright © 2004-2005 Guillaume Melquiond, ENS Lyon</i></p>
- <p><i>Distributed under the Boost Software License, Version 1.0. (See
- accompanying file <a href="../../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a>
- or copy at <a href=
- "http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
- </body>
- </html>
|