123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120 |
- [sect:optim Optimisation Examples]
- [h4 Poisson Distribution - Optimization and Accuracy is quite complicated.
- The general formula for calculating the CDF uses the incomplete gamma thus:
- return gamma_q(k+1, mean);
- But the case of small integral k is *very* common, so it is worth considering optimisation.
- The first obvious step is to use a finite sum of each PDF (Probability *density* function)
- for each value of k to build up the CDF (*cumulative* distribution function).
- This could be done using the PDF function for the distribution,
- for which there are two equivalent formulae:
- return exp(-mean + log(mean) * k - lgamma(k+1));
-
- return gamma_p_derivative(k+1, mean);
- The pdf would probably be more accurate using `gamma_p_derivative`.
- The reason is that the expression:
- -mean + log(mean) * k - lgamma(k+1)
- Will produce a value much smaller than the largest of the terms, so you get
- cancellation error: and then when you pass the result to `exp()` which
- converts the absolute error in its argument to a relative error in the
- result (explanation available if required), you effectively amplify the
- error further still.
- `gamma_p_derivative` is just a thin wrapper around some of the internals of
- the incomplete gamma, it does its utmost to avoid issues like this, because
- this function is responsible for virtually all of the error in the result.
- Hopefully further advances in the future might improve things even further
- (what is really needed is an 'accurate' `pow(1+x)` function, but that's a whole
- other story!).
- But calling `pdf` function makes repeated, redundant, checks on the value of `mean` and `k`,
- result += pdf(dist, i);
-
- so it may be faster to substitute the formula for the pdf in a summation loop
- result += exp(-mean) * pow(mean, i) / unchecked_factorial(i);
- (simplified by removing casting from RealType).
- Of course, mean is unchanged during this summation,
- so exp(mean) should only be calculated once, outside the loop.
- Optimising compilers 'might' do this, but one can easily ensure this.
- Obviously too, k must be small enough that unchecked_factorial is OK:
- 34 is an obvious choice as the limit for 32-bit float.
- For larger k, the number of iterations is like to be uneconomic.
- Only experiment can determine the optimum value of k
- for any particular RealType (float, double...)
- But also note that
- The incomplete gamma already optimises this case
- (argument "a" is a small int),
- although only when the result q (1-p) would be < 0.5.
- And moreover, in the above series, each term can be calculated
- from the previous one much more efficiently:
- cdf = sum from 0 to k of C[k]
- with:
- C[0] = exp(-mean)
- C[N+1] = C[N] * mean / (N+1)
- So hopefully that's:
- {
- RealType result = exp(-mean);
- RealType term = result;
- for(int i = 1; i <= k; ++i)
- { // cdf is sum of pdfs.
- term *= mean / i;
- result += term;
- }
- return result;
- }
- This is exactly the same finite sum as used by `gamma_p/gamma_q` internally.
- As explained previously it's only used when the result
- p > 0.5 or 1-p = q < 0.5.
- The slight danger when using the sum directly like this, is that if
- the mean is small and k is large then you're calculating a value ~1, so
- conceivably you might overshoot slightly. For this and other reasons in the
- case when p < 0.5 and q > 0.5 `gamma_p/gamma_q` use a different (infinite but
- rapidly converging) sum, so that danger isn't present since you always
- calculate the smaller of p and q.
- So... it's tempting to suggest that you just call `gamma_p/gamma_q` as
- required. However, there is a slight benefit for the k = 0 case because you
- avoid all the internal logic inside `gamma_p/gamma_q` trying to figure out
- which method to use etc.
- For the incomplete beta function, there are no simple finite sums
- available (that I know of yet anyway), so when there's a choice between a
- finite sum of the PDF and an incomplete beta call, the finite sum may indeed
- win out in that case.
- [endsect] [/sect:optim Optimisation Examples]
- [/
- Copyright 2006 John Maddock and Paul A. Bristow.
- Distributed under the Boost Software License, Version 1.0.
- (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt).
- ]
|