123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149 |
- [/
- Copyright (c) 2019 Nick Thompson
- Use, modification and distribution are subject to the
- Boost Software License, Version 1.0. (See accompanying file
- LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ]
- [section:anderson_darling The Anderson-Darling Test]
- [heading Synopsis]
- ```
- #include <boost/math/statistics/anderson_darling.hpp>
- namespace boost{ namespace math { namespace { statistics {
- template<class RandomAccessContainer>
- auto anderson_darling_normality_statistic(RandomAccessContainer const & v,
- typename RandomAccessContainer::value_type mu = std::numeric_limits<typename RandomAccessContainer::value_type>::quiet_NaN(),
- typename RandomAccessContainer::value_type sd = std::numeric_limits<typename RandomAccessContainer::value_type>::quiet_NaN());
- }}}
- ```
- [heading Background]
- The Anderson-Darling test for normality asks if a given sequence of numbers are drawn from a normal distribution by computing an integral over the empirical cumulative distribution function.
- The test statistic /A/[super 2] is given by
- [$../graphs/anderson_darling_definition.svg]
- where /F/[sub /n/] is the empirical cumulative distribution and /F/ is the CDF of the normal distribution.
- The value returned by the routine is /A/[super 2].
- If /A/[super 2]\/n converges to zero as /n/ goes to infinity, then the hypothesis that the data is normally distributed is supported by the test.
- If /A/[super 2]\/n converges to a finite positive value as /n/ goes to infinity, then the hypothesis is not supported by the test.
- An example usage is demonstrated below:
- ```
- #include <vector>
- #include <random>
- #include <iostream>
- #include <boost/math/statistics/anderson_darling.hpp>
- using boost::math::statistics::anderson_darling_normality_statistic;
- std::random_device rd;
- std::normal_distribution<double> dis(0, 1);
- std::vector<double> v(8192);
- for (auto & x : v) { x = dis(rd); }
- std::sort(v.begin(), v.end());
- double presumed_mean = 0;
- double presumed_standard_deviation = 0;
- double Asq = anderson_darling_normality_statistic(v, presumed_mean, presumed_standard_deviation);
- std::cout << "A^2/n = " << Asq/v.size() << "\n";
- 5.39e-05 // should be small . . .
- // Now use an incorrect hypothesis:
- presumed_mean = 4;
- Asq = anderson_darling_normality_statistic(v, presumed_mean, presumed_standard_deviation);
- std::cout << "A^2/n = " << Asq/v.size() << "\n";
- 7.41 // should be somewhat large . . .
- ```
- The Anderson-Darling normality requires sorted data.
- If the data are not sorted an exception is thrown.
- If you simply wish to know whether or not data is normally distributed, and not whether it is normally distributed with a presumed mean and variance,
- then you can call the function without the final two arguments, and the mean and variance will be estimated from the data themselves:
- ```
- double Asq = anderson_darling_normality_statistic(v);
- ```
- The following graph demonstrates the convergence of the test statistic.
- Each data point represents a vector of length /n/ which is filled with normally distributed data.
- The test statistic is computed over this vector, divided by /n/, and passed to the natural logarithm.
- This exhibits the (admittedly slow) convergence of the integral to zero when the hypothesis is true.
- [$../graphs/anderson_darling_simulation.svg]
- [heading Performance]
- ```
- ---------------------------------------------------------------
- Benchmark Time
- ---------------------------------------------------------------
- AndersonDarlingNormalityTest<float>/8 224 ns bytes_per_second=136.509M/s
- AndersonDarlingNormalityTest<float>/16 435 ns bytes_per_second=140.254M/s
- AndersonDarlingNormalityTest<float>/32 898 ns bytes_per_second=135.995M/s
- AndersonDarlingNormalityTest<float>/64 1773 ns bytes_per_second=137.675M/s
- AndersonDarlingNormalityTest<float>/128 3455 ns bytes_per_second=141.338M/s
- AndersonDarlingNormalityTest<float>/256 7001 ns bytes_per_second=139.488M/s
- AndersonDarlingNormalityTest<float>/512 13996 ns bytes_per_second=139.551M/s
- AndersonDarlingNormalityTest<float>/1024 28129 ns bytes_per_second=138.868M/s
- AndersonDarlingNormalityTest<float>/2048 55723 ns bytes_per_second=140.206M/s
- AndersonDarlingNormalityTest<float>/4096 112008 ns bytes_per_second=139.501M/s
- AndersonDarlingNormalityTest<float>/8192 224643 ns bytes_per_second=139.11M/s
- AndersonDarlingNormalityTest<float>/16384 450320 ns bytes_per_second=138.791M/s
- AndersonDarlingNormalityTest<float>/32768 896409 ns bytes_per_second=139.45M/s
- AndersonDarlingNormalityTest<float>/65536 1797800 ns bytes_per_second=139.058M/s
- AndersonDarlingNormalityTest<float>/131072 3604995 ns bytes_per_second=138.698M/s
- AndersonDarlingNormalityTest<float>/262144 7235625 ns bytes_per_second=138.207M/s
- AndersonDarlingNormalityTest<float>/524288 14502815 ns bytes_per_second=137.904M/s
- AndersonDarlingNormalityTest<float>/1048576 29058087 ns bytes_per_second=137.659M/s
- AndersonDarlingNormalityTest<float>/2097152 58470439 ns bytes_per_second=136.824M/s
- AndersonDarlingNormalityTest<float>/4194304 117476365 ns bytes_per_second=136.201M/s
- AndersonDarlingNormalityTest<float>/8388608 239887895 ns bytes_per_second=133.397M/s
- AndersonDarlingNormalityTest<float>/16777216 488787211 ns bytes_per_second=130.94M/s
- AndersonDarlingNormalityTest<float>_BigO 28.96 N 28.96 N
- AndersonDarlingNormalityTest<double>/8 470 ns bytes_per_second=129.733M/s
- AndersonDarlingNormalityTest<double>/16 911 ns bytes_per_second=133.989M/s
- AndersonDarlingNormalityTest<double>/32 1773 ns bytes_per_second=137.723M/s
- AndersonDarlingNormalityTest<double>/64 3368 ns bytes_per_second=144.966M/s
- AndersonDarlingNormalityTest<double>/128 6627 ns bytes_per_second=147.357M/s
- AndersonDarlingNormalityTest<double>/256 12458 ns bytes_per_second=156.777M/s
- AndersonDarlingNormalityTest<double>/512 23060 ns bytes_per_second=169.395M/s
- AndersonDarlingNormalityTest<double>/1024 44529 ns bytes_per_second=175.45M/s
- AndersonDarlingNormalityTest<double>/2048 88735 ns bytes_per_second=176.087M/s
- AndersonDarlingNormalityTest<double>/4096 175583 ns bytes_per_second=177.978M/s
- AndersonDarlingNormalityTest<double>/8192 348042 ns bytes_per_second=179.577M/s
- AndersonDarlingNormalityTest<double>/16384 701439 ns bytes_per_second=178.206M/s
- AndersonDarlingNormalityTest<double>/32768 1394597 ns bytes_per_second=179.262M/s
- AndersonDarlingNormalityTest<double>/65536 2777943 ns bytes_per_second=179.994M/s
- AndersonDarlingNormalityTest<double>/131072 5571455 ns bytes_per_second=179.487M/s
- AndersonDarlingNormalityTest<double>/262144 11161456 ns bytes_per_second=179.193M/s
- AndersonDarlingNormalityTest<double>/524288 22048950 ns bytes_per_second=181.417M/s
- AndersonDarlingNormalityTest<double>/1048576 44094409 ns bytes_per_second=181.429M/s
- AndersonDarlingNormalityTest<double>/2097152 88300185 ns bytes_per_second=181.199M/s
- AndersonDarlingNormalityTest<double>/4194304 176140378 ns bytes_per_second=181.678M/s
- AndersonDarlingNormalityTest<double>/8388608 352102955 ns bytes_per_second=181.769M/s
- AndersonDarlingNormalityTest<double>/16777216 706160246 ns bytes_per_second=181.267M/s
- AndersonDarlingNormalityTest<double>_BigO 42.06 N
- ```
- [heading Caveats]
- Some authors, including [@https://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm NIST], give the following definition of the Anderson-Darling test statistic:
- [$../graphs/alternative_anderson_darling_definition.svg]
- This is an approximation to the quadrature sum we use as our definition.
- Boost.Math /does not compute this quantity/.
- (However, with a sufficiently large amount of data the two definitions seem to agree to two digits, so the importance of making a clear distinction between the two is unclear.)
- Our computation of the Anderson-Darling test statistic agrees with Mathematica.
- [endsect]
- [/section:anderson_darling]
|