123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163 |
- [/
- Copyright (c) 2019 Nick Thompson
- Use, modification and distribution are subject to the
- Boost Software License, Version 1.0. (See accompanying file
- LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- ]
- [section:diff Lanczos Smoothing Derivatives]
- [heading Synopsis]
- ``
- #include <boost/math/differentiation/lanczos_smoothing.hpp>
- namespace boost::math::differentiation {
- template <class Real, size_t order=1>
- class discrete_lanczos_derivative {
- public:
- discrete_lanczos_derivative(Real spacing,
- size_t n = 18,
- size_t approximation_order = 3);
- template<class RandomAccessContainer>
- Real operator()(RandomAccessContainer const & v, size_t i) const;
- template<class RandomAccessContainer>
- void operator()(RandomAccessContainer const & v, RandomAccessContainer & dvdt) const;
- template<class RandomAccessContainer>
- RandomAccessContainer operator()(RandomAccessContainer const & v) const;
- Real get_spacing() const;
- };
- } // namespaces
- ``
- [heading Description]
- The `discrete_lanczos_derivative` class calculates a finite-difference approximation to the derivative of a noisy sequence of equally-spaced values /v/.
- A basic usage is
- std::vector<double> v(500);
- // fill v with noisy data.
- double spacing = 0.001;
- using boost::math::differentiation::discrete_lanczos_derivative;
- auto lanczos = discrete_lanczos_derivative(spacing);
- // Compute dvdt at index 30:
- double dvdt30 = lanczos(v, 30);
- // Compute derivative of entire vector:
- std::vector<double> dvdt = lanczos(v);
- Noise-suppressing second derivatives can also be computed.
- The syntax is as follows:
- std::vector<double> v(500);
- // fill v with noisy data.
- auto lanczos = lanczos_derivative<double, 2>(spacing);
- // evaluate second derivative at a point:
- double d2vdt2 = lanczos(v, 18);
- // evaluate second derivative of entire vector:
- std::vector<double> d2vdt2 = lanczos(v);
- For memory conscious programmers, you can reuse the memory space for the derivative as follows:
- std::vector<double> v(500);
- std::vector<double> dvdt(500);
- // . . . define spacing, create and instance of discrete_lanczos_derivative
- // populate dvdt, perhaps in a loop:
- lanczos(v, dvdt);
- If the data has variance \u03C3[super 2],
- then the variance of the computed derivative is roughly \u03C3[super 2]/p/[super 3] /n/[super -3] \u0394 /t/[super -2],
- i.e., it increases cubically with the approximation order /p/, linearly with the data variance,
- and decreases at the cube of the filter length /n/.
- In addition, we must not forget the discretization error which is /O/(\u0394 /t/[super /p/]).
- You can play around with the approximation order /p/ and the filter length /n/:
- size_t n = 12;
- size_t p = 2;
- auto lanczos = lanczos_derivative(spacing, n, p);
- double dvdt = lanczos(v, i);
- If /p=2n/, then the discrete Lanczos derivative is not smoothing:
- It reduces to the standard /2n+1/-point finite-difference formula.
- For /p>2n/, an assertion is hit as the filter is undefined.
- In our tests with AWGN, we have found the error decreases monotonically with /n/,
- as is expected from the theory discussed above.
- So the choice of /n/ is simple:
- As high as possible given your speed requirements (larger /n/ implies a longer filter and hence more compute),
- balanced against the danger of overfitting and averaging over non-stationarity.
- The choice of approximation order /p/ for a given /n/ is more difficult.
- If your signal is believed to be a polynomial,
- it does not make sense to set /p/ to larger than the polynomial degree-
- though it may be sensible to take /p/ less than this.
- For a sinusoidal signal contaminated with AWGN, we ran a few tests showing that for SNR = 1,
- p = n/8 gave the best results,
- for SNR = 10, p = n/7 was the best, and for SNR = 100, p = n/6 was the most reasonable choice.
- For SNR = 0.1, the method appears to be useless.
- The user is urged to use these results with caution: they have no theoretical backing and are extrapolated from a single case.
- The filters are (regrettably) computed at runtime-the vast number of combinations of approximation order and filter length makes the number of filters that must be stored excessive for compile-time data.
- The constructor call computes the filters.
- Since each filter has length /2n+1/ and there are /n/ filters, whose element each consist of /p/ summands,
- the complexity of the constructor call is O(/n/[super 2]/p/).
- This is not cheap-though for most cases small /p/ and /n/ not too large (< 20) is desired.
- However, for concreteness, on the author's 2.7GHz Intel laptop CPU, the /n=16/, /p=3/ filter takes 9 microseconds to compute.
- This is far from negligible, and as such the filters may be used with multiple data, and even shared between threads:
- std::vector<double> v1(500);
- std::vector<double> v2(500);
- // fill v1, v2 with noisy data.
- auto lanczos = lanczos_derivative(spacing);
- std::vector<double> dv1dt = lanczos(v1); // threadsafe
- std::vector<double> dv2dt = lanczos(v2); // threadsafe
- // need to use a different spacing?
- lanczos.reset_spacing(0.02); // not threadsafe
- The implementation follows [@https://doi.org/10.1080/00207160.2012.666348 McDevitt, 2012],
- who vastly expanded the ideas of Lanczos to create a very general framework for numerically differentiating noisy equispaced data.
- [heading Example]
- We have extracted some data from the [@https://www.gw-openscience.org/data/ LIGO signal] and differentiated it
- using the (/n/, /p/) = (60, 4) Lanczos smoothing derivative, as well as using the (/n/, /p/) = (4, 8) (nonsmoothing) derivative.
- [graph ligo_derivative]
- The original data is in orange, the smoothing derivative in blue, and the non-smoothing standard finite difference formula is in gray.
- (Each time series has been rescaled to fit in the same graph.)
- We can see that the smoothing derivative tracks the increase and decrease in the trend well, whereas the standard finite difference formula produces nonsense and amplifies noise.
- [heading Caveats]
- The computation of the filters is ill-conditioned for large /p/.
- In order to mitigate this problem, we have computed the filters in higher precision and cast the results to the desired type.
- However, this simply pushes the problem to larger /p/.
- In practice, this is not a problem, as large /p/ corresponds to less powerful denoising, but keep it in mind.
- In addition, the `-ffast-math` flag has a very large effect on the speed of these functions.
- In our benchmarks, they were 50% faster with the flag enabled, which is much larger than the usual performance increases we see by turning on this flag.
- Hence, if the default performance is not sufficient, this flag is available, though it comes with its own problems.
- This requires C++17 `if constexpr`.
- [heading References]
- * Corless, Robert M., and Nicolas Fillion. ['A graduate introduction to numerical methods.] AMC 10 (2013): 12.
- * Lanczos, Cornelius. ['Applied analysis.] Courier Corporation, 1988.
- * Timothy J. McDevitt (2012): ['Discrete Lanczos derivatives of noisy data], International Journal of Computer Mathematics, 89:7, 916-931
- [endsect]
|