123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376 |
- [template perf[name value] [value]]
- [template para[text] '''<para>'''[text]'''</para>''']
- [mathpart perf Performance]
- [section:perf_over2 Performance Overview]
- [performance_overview]
- [endsect]
- [section:interp Interpreting these Results]
- In all of the following tables, the best performing
- result in each row, is assigned a relative value of "1" and shown
- in bold, so a score of "2" means ['"twice as slow as the best
- performing result".] Actual timings in nano-seconds per function call
- are also shown in parenthesis. To make the results easier to read, they
- are color-coded as follows: the best result and everything within 20% of
- it is green, anything that's more than twice as slow as the best result is red,
- and results in between are blue.
- Result were obtained on a system
- with an Intel core i7 4710MQ with 16Gb RAM and running
- either Windows 8.1 or Xubuntu Linux.
- [caution As usual with performance results these should be taken with a large pinch
- of salt: relative performance is known to shift quite a bit depending
- upon the architecture of the particular test system used. Further
- more, our performance results were obtained using our own test data:
- these test values are designed to provide good coverage of our code and test
- all the appropriate corner cases. They do not necessarily represent
- "typical" usage: whatever that may be!
- ]
- [endsect] [/section:interp Interpreting these Results]
- [section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
- By far the most important thing you can do when using this library
- is turn on your compiler's optimisation options. As the following
- table shows the penalty for using the library in debug mode can be
- quite large. In addition switching to 64-bit code has a small but noticeable
- improvement in performance, as does switching to a different compiler
- (Intel C++ 15 in this example).
- [table_Compiler_Option_Comparison_on_Windows_x64]
- [endsect] [/section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
- [section:tradoffs Trading Accuracy for Performance]
- There are a number of [link policy Policies] that can be used to trade accuracy for performance:
- * Internal promotion: by default functions with `float` arguments are evaluated at `double` precision
- internally to ensure full precision in the result. Similarly `double` precision functions are
- evaluated at `long double` precision internally by default. Changing these defaults can have a significant
- speed advantage at the expense of accuracy, note also that evaluating using `float` internally may result in
- numerical instability for some of the more complex algorithms, we suggest you use this option with care.
- * Target accuracy: just because you choose to evaluate at `double` precision doesn't mean you necessarily want
- to target full 16-digit accuracy, if you wish you can change the default (full machine precision) to whatever
- is "good enough" for your particular use case.
- For example, suppose you want to evaluate `double` precision functions at `double` precision internally, you
- can change the global default by passing `-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false` on the command line, or
- at the point of call via something like this:
- double val = boost::math::erf(my_argument, boost::math::policies::make_policy(boost::math::policies::promote_double<false>()));
- However, an easier option might be:
- #include <boost/math/special_functions.hpp> // Or any individual special function header
- namespace math{
- namespace precise{
- //
- // Define a Policy for accurate evaluation - this is the same as the default, unless
- // someone has changed the global defaults.
- //
- typedef boost::math::policies::policy<> accurate_policy;
- //
- // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS to declare
- // functions that use the above policy. Note no trailing
- // ";" required on the macro call:
- //
- BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(accurate_policy)
- }
- namespace fast{
- //
- // Define a Policy for fast evaluation:
- //
- using namespace boost::math::polcies;
- typedef policy<promote_double<false> > fast_policy;
- //
- // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
- //
- BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
- }
- }
- And now one can call:
- math::accurate::tgamma(x);
- For the "accurate" version of tgamma, and:
- math::fast::tgamma(x);
- For the faster version.
- Had we wished to change the target precision (to 9 decimal places) as well as the evaluation type used, we might have done:
- namespace math{
- namespace fast{
- //
- // Define a Policy for fast evaluation:
- //
- using namespace boost::math::polcies;
- typedef policy<promote_double<false>, digits10<9> > fast_policy;
- //
- // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
- //
- BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
- }
- }
- One can do a similar thing with the distribution classes:
- #include <boost/math/distributions.hpp> // or any individual distribution header
- namespace math{ namespace fast{
- //
- // Define a policy for fastest possible evaluation:
- //
- using namespace boost::math::polcies;
- typedef policy<promote_float<false> > fast_float_policy;
- //
- // Invoke BOOST_MATH_DECLARE_DISTRIBUTIONS
- //
- BOOST_MATH_DECLARE_DISTRIBUTIONS(float, fast_float_policy)
- }} // namespaces
- //
- // And use:
- //
- float p_val = cdf(math::fast::normal(1.0f, 3.0f), 0.25f);
- Here's how these options change the relative performance of the distributions on Linux:
- [table_Distribution_performance_comparison_for_different_performance_options_with_GNU_C_version_5_1_0_on_linux]
- [endsect] [/section:tradoffs Trading Accuracy for Performance]
- [section:multiprecision Cost of High-Precision Non-built-in Floating-point]
- Using user-defined floating-point like __multiprecision has a very high run-time cost.
- To give some flavour of this:
- [table:linpack_time Linpack Benchmark
- [[floating-point type] [speed Mflops]]
- [[double] [2727]]
- [[__float128] [35]]
- [[multiprecision::float128] [35]]
- [[multiprecision::cpp_bin_float_quad] [6]]
- ]
- [endsect] [/section:multiprecision Cost of High-Precision Non-built-in Floating-point]
- [section:tuning Performance Tuning Macros]
- There are a small number of performance tuning options
- that are determined by configuration macros. These should be set
- in boost/math/tools/user.hpp; or else reported to the Boost-development
- mailing list so that the appropriate option for a given compiler and
- OS platform can be set automatically in our configuration setup.
- [table
- [[Macro][Meaning]]
- [[BOOST_MATH_POLY_METHOD]
- [Determines how polynomials and most rational functions
- are evaluated. Define to one
- of the values 0, 1, 2 or 3: see below for the meaning of these values.]]
- [[BOOST_MATH_RATIONAL_METHOD]
- [Determines how symmetrical rational functions are evaluated: mostly
- this only effects how the Lanczos approximation is evaluated, and how
- the `evaluate_rational` function behaves. Define to one
- of the values 0, 1, 2 or 3: see below for the meaning of these values.
- ]]
- [[BOOST_MATH_MAX_POLY_ORDER]
- [The maximum order of polynomial or rational function that will
- be evaluated by a method other than 0 (a simple "for" loop).
- ]]
- [[BOOST_MATH_INT_TABLE_TYPE(RT, IT)]
- [Many of the coefficients to the polynomials and rational functions
- used by this library are integers. Normally these are stored as tables
- as integers, but if mixed integer / floating point arithmetic is much
- slower than regular floating point arithmetic then they can be stored
- as tables of floating point values instead. If mixed arithmetic is slow
- then add:
- #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) RT
- to boost/math/tools/user.hpp, otherwise the default of:
- #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) IT
- Set in boost/math/config.hpp is fine, and may well result in smaller
- code.
- ]]
- ]
- The values to which `BOOST_MATH_POLY_METHOD` and `BOOST_MATH_RATIONAL_METHOD`
- may be set are as follows:
- [table
- [[Value][Effect]]
- [[0][The polynomial or rational function is evaluated using Horner's
- method, and a simple for-loop.
- Note that if the order of the polynomial
- or rational function is a runtime parameter, or the order is
- greater than the value of `BOOST_MATH_MAX_POLY_ORDER`, then
- this method is always used, irrespective of the value
- of `BOOST_MATH_POLY_METHOD` or `BOOST_MATH_RATIONAL_METHOD`.]]
- [[1][The polynomial or rational function is evaluated without
- the use of a loop, and using Horner's method. This only occurs
- if the order of the polynomial is known at compile time and is less
- than or equal to `BOOST_MATH_MAX_POLY_ORDER`. ]]
- [[2][The polynomial or rational function is evaluated without
- the use of a loop, and using a second order Horner's method.
- In theory this permits two operations to occur in parallel
- for polynomials, and four in parallel for rational functions.
- This only occurs
- if the order of the polynomial is known at compile time and is less
- than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
- [[3][The polynomial or rational function is evaluated without
- the use of a loop, and using a second order Horner's method.
- In theory this permits two operations to occur in parallel
- for polynomials, and four in parallel for rational functions.
- This differs from method "2" in that the code is carefully ordered
- to make the parallelisation more obvious to the compiler: rather than
- relying on the compiler's optimiser to spot the parallelisation
- opportunities.
- This only occurs
- if the order of the polynomial is known at compile time and is less
- than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
- ]
- The performance test suite generates a report for your particular compiler showing which method is likely to work best,
- the following tables show the results for MSVC-14.0 and GCC-5.1.0 (Linux). There's not much to choose between
- the various methods, but generally loop-unrolled methods perform better. Interestingly, ordering the code
- to try and "second guess" possible optimizations seems not to be such a good idea (method 3 below).
- [table_Polynomial_Method_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
- [table_Rational_Method_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
- [table_Polynomial_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]
- [table_Rational_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]
- [endsect] [/section:tuning Performance Tuning Macros]
- [section:comp_compilers Comparing Different Compilers]
- By running our performance test suite multiple times, we can compare the effect of different compilers: as
- might be expected, the differences are generally small compared to say disabling internal use of `long double`.
- However, there are still gains to be main, particularly from some of the commercial offerings:
- [table_Compiler_Comparison_on_Windows_x64]
- [table_Compiler_Comparison_on_linux]
- [endsect] [/section:comp_compilers Comparing Different Compilers]
- [section:comparisons Comparisons to Other Open Source Libraries]
- We've run our performance tests both for our own code, and against other
- open source implementations of the same functions. The results are
- presented below to give you a rough idea of how they all compare.
- In order to give a more-or-less level playing field our test data
- was screened against all the libraries being tested, and any
- unsupported domains removed, likewise for any test cases that gave large errors
- or unexpected non-finite values.
- [caution
- You should exercise extreme caution when interpreting
- these results, relative performance may vary by platform, by compiler options settings,
- the tests use data that gives good code coverage of /our/ code, but which may skew the
- results towards the corner cases. Finally, remember that different
- libraries make different choices with regard to performance verses
- numerical stability.
- ]
- The first results compare standard library functions to Boost equivalents with MSVC-14.0:
- [table_Library_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
- On Linux with GCC, we can also compare to the TR1 functions, and to GSL and RMath:
- [table_Library_Comparison_with_GNU_C_version_5_3_0_on_linux]
- And finally we can compare the statistical distributions to GSL, RMath and DCDFLIB:
- [table_Distribution_performance_comparison_with_GNU_C_version_5_3_0_on_linux]
- [endsect] [/section:comparisons Comparisons to Other Open Source Libraries]
- [section:perf_test_app The Performance Test Applications]
- Under ['boost-path]\/libs\/math\/reporting\/performance you will find
- some reasonable comprehensive performance test applications for this library.
- In order to generate the tables you will have seen in this documentation (or others
- for your specific compiler) you need to invoke `bjam` in this directory, using a C++11
- capable compiler. Note that
- results extend/overwrite whatever is already present in
- ['boost-path]\/libs\/math\/reporting\/performance\/doc\/performance_tables.qbk,
- you may want to delete this file before you begin so as to make a fresh start for
- your particular system.
- The programs produce results in Boost's Quickbook format which is not terribly
- human readable. If you configure your user-config.jam to be able to build Docbook
- documentation, then you will also get a full summary of all the data in HTML format
- in ['boost-path]\/libs\/math\/reporting\/performance\/html\/index.html. Assuming
- you're on a 'nix-like platform the procedure to do this is to first install the
- `xsltproc`, `Docbook DTD`, and `Bookbook XSL` packages. Then:
- * Copy ['boost-path]\/tools\/build\/example\/user-config.jam to your home directory.
- * Add `using xsltproc ;` to the end of the file (note the space surrounding each token, including the final ";", this is important!)
- This assumes that `xsltproc` is in your path.
- * Add `using boostbook : path-to-xsl-stylesheets : path-to-dtd ;` to the end of the file. The `path-to-dtd` should point
- to version 4.2.x of the Docbook DTD, while `path-to-xsl-stylesheets` should point to the folder containing the latest XSLT stylesheets.
- Both paths should use all forward slashes even on Windows.
- At this point you should be able to run the tests and generate the HTML summary, if GSL, RMath or libstdc++ are
- present in the compilers path they will be automatically tested. For DCDFLIB you will need to place the C
- source in ['boost-path]\/libs\/math\/reporting\/performance\/third_party\/dcdflib.
- If you want to compare multiple compilers, or multiple options for one compiler, then you will
- need to invoke `bjam` multiple times, once for each compiler. Note that in order to test
- multiple configurations of the same compiler, each has to be given a unique name in the test
- program, otherwise they all edit the same table cells. Suppose you want to test GCC with
- and without the -ffast-math option, in this case bjam would be invoked first as:
- bjam toolset=gcc -a cxxflags=-std=gnu++11
- Which would run the tests using default optimization options (-O3), we can then run again
- using -ffast-math:
- bjam toolset=gcc -a cxxflags='-std=gnu++11 -ffast-math' define=COMPILER_NAME='"GCC with -ffast-math"'
- In the command line above, the -a flag forces a full rebuild, and the preprocessor define COMPILER_NAME needs to be set
- to a string literal describing the compiler configuration, hence the double quotes - one for the command line, one for the
- compiler.
- [endsect] [/section:perf_test_app The Performance Test Applications]
- [endmathpart] [/mathpart perf Performance]
- [/
- Copyright 2006 John Maddock and Paul A. Bristow.
- Distributed under the Boost Software License, Version 1.0.
- (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt).
- ]
|