123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208 |
- [/
- Copyright 2018 Nick Thompson
- Distributed under the Boost Software License, Version 1.0.
- (See accompanying file LICENSE_1_0.txt or copy at
- http://www.boost.org/LICENSE_1_0.txt).
- ]
- [section:signal_statistics Signal Statistics]
- [heading Synopsis]
- ``
- #include <boost/math/statistics/signal_statistics.hpp>
- namespace boost::math::statistics {
- template<class Container>
- auto absolute_gini_coefficient(Container & c);
- template<class ForwardIterator>
- auto absolute_gini_coefficient(ForwardIterator first, ForwardIterator last);
- template<class Container>
- auto sample_absolute_gini_coefficient(Container & c);
- template<class ForwardIterator>
- auto sample_absolute_gini_coefficient(ForwardIterator first, ForwardIterator last);
- template<class Container>
- auto hoyer_sparsity(Container const & c);
- template<class ForwardIterator>
- auto hoyer_sparsity(ForwardIterator first, ForwardIterator last);
- template<class Container>
- auto oracle_snr(Container const & signal, Container const & noisy_signal);
- template<class Container>
- auto oracle_snr_db(Container const & signal, Container const & noisy_signal);
- template<class ForwardIterator>
- auto m2m4_snr_estimator(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3);
- template<class Container>
- auto m2m4_snr_estimator(Container const & noisy_signal, typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3);
- template<class ForwardIterator>
- auto m2m4_snr_estimator_db(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3);
- template<class Container>
- auto m2m4_snr_estimator_db(Container const & noisy_signal,typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3);
- }
- ``
- [heading Description]
- The file `boost/math/statistics/signal_statistics.hpp` is a set of facilities for computing quantities commonly used in signal analysis.
- Our examples use `std::vector<double>` to hold the data, but this not required.
- In general, you can store your data in an Eigen array, and Armadillo vector, `std::array`, and for many of the routines, a `std::forward_list`.
- These routines are usable in float, double, long double, and Boost.Multiprecision precision, as well as their complex extensions whenever the computation is well-defined.
- [heading Absolute Gini Coefficient]
- The Gini coefficient, first used to measure wealth inequality, is also one of the best measures of the sparsity of an expansion in a basis.
- A sparse expansion has most of its norm concentrated in just a few coefficients, making the connection with wealth inequality obvious.
- See [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard] for details.
- However, for measuring sparsity, the phase of the numbers is irrelevant, so we provide the `absolute_gini_coefficient`:
- using boost::math::statistics::sample_absolute_gini_coefficient;
- using boost::math::statistics::absolute_gini_coefficient;
- std::vector<std::complex<double>> v{{0,1}, {0,0}, {0,0}, {0,0}};
- double abs_gini = sample_absolute_gini_coefficient(v);
- // now abs_gini = 1; maximally unequal
- std::vector<std::complex<double>> w{{0,1}, {1,0}, {0,-1}, {-1,0}};
- abs_gini = absolute_gini_coefficient(w);
- // now abs_gini = 0; every element of the vector has equal magnitude
- std::vector<double> u{-1, 1, -1};
- abs_gini = absolute_gini_coefficient(u);
- // now abs_gini = 0
- // Alternative call useful for computing over subset of the input:
- abs_gini = absolute_gini_coefficient(u.begin(), u.begin() + 1);
- The sample Gini coefficient returns unity for a vector which has only one nonzero coefficient.
- The population Gini coefficient of a vector with one non-zero element is dependent on the length of the input.
- The sample Gini coefficient lacks one desirable property of the population Gini coefficient,
- namely that "cloning" a vector has the same Gini coefficient; though cloning holds to very high accuracy with the sample Gini coefficient and can easily be recovered by a rescaling.
- If sorting the input data is too much expense for a sparsity measure (is it going to be perfect anyway?),
- consider calculating the Hoyer sparsity instead.
- [heading Hoyer Sparsity]
- The Hoyer sparsity measures a normalized ratio of the \u2113[super 1] and \u2113[super 2] norms.
- As the name suggests, it is used to measure the sparsity of an expansion in some basis.
- The Hoyer sparsity computes ([radic]/N/ - \u2113[super 1](v)/\u2113[super 2](v))/([radic]N -1).
- For details, see [@http://www.jmlr.org/papers/volume5/hoyer04a/hoyer04a.pdf Hoyer] as well as [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard].
- A few special cases will serve to clarify the intended use:
- If /v/ has only one nonzero coefficient, the Hoyer sparsity attains its maxima of 1.
- If the coefficients of /v/ all have the same magnitude, then the Hoyer sparsity attains its minima of zero.
- If the elements of /v/ are uniformly distributed on an interval [0, /b/], then the Hoyer sparsity is approximately 0.133.
- Usage:
- std::vector<Real> v{1,0,0};
- Real hs = boost::math::statistics::hoyer_sparsity(v);
- // hs = 1
- std::vector<Real> v{1,-1,1};
- Real hs = boost::math::statistics::hoyer_sparsity(v.begin(), v.end());
- // hs = 0
- The container must be forward iterable and the contents are not modified.
- Accepts real, complex, and integer inputs.
- If the input is an integral type, the output is a double precision float.
- [heading Oracle Signal-to-noise ratio]
- The function `oracle_snr` computes the ratio \u2016 /s/ \u2016[sub 2][super 2] / \u2016 /s/ - /x/ \u2016[sub 2][super 2], where /s/ is signal and /x/ is a noisy signal.
- The function `oracle_snr_db` computes 10`log`[sub 10](\u2016 /s/ \u2016[super 2] / \u2016 /s/ - /x/ \u2016[super 2]).
- The functions are so named because in general, one does not know how to decompose a real signal /x/ into /s/ + /w/ and as such /s/ is regarded as oracle information.
- Hence this function is mainly useful for unit testing other SNR estimators.
- Usage:
- std::vector<double> signal(500, 3.2);
- std::vector<double> noisy_signal(500);
- // fill 'noisy_signal' signal + noise
- double snr_db = boost::math::statistics::oracle_snr_db(signal, noisy_signal);
- double snr = boost::math::statistics::oracle_snr(signal, noisy_signal);
- The input can be real, complex, or integral.
- Integral inputs produce double precision floating point outputs.
- The input data is not modified and must satisfy the requirements of a `RandomAccessContainer`.
- [heading /M/[sub 2]/M/[sub 4] SNR Estimation]
- Estimates the SNR of a noisy signal via the /M/[sub 2]/M/[sub 4] method.
- See [@https://doi.org/10.1109/26.871393 Pauluzzi and N.C. Beaulieu] and [@https://doi.org/10.1109/ISIT.1994.394869 Matzner and Englberger] for details.
- std::vector<double> noisy_signal(512);
- // fill noisy_signal with data contaminated by Gaussian white noise:
- double est_snr_db = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal);
- The /M/[sub 2]/M/[sub 4] SNR estimator is an "in-service" estimator, meaning that the estimate is made using the noisy, data-bearing signal, and does not require a background estimate.
- This estimator has been found to be work best between roughly -3 and 15db, tending to overestimate the noise below -3db, and underestimate the noise above 15db.
- See [@https://www.mdpi.com/2078-2489/8/3/75/pdf Xue et al] for details.
- The /M/[sub 2]/M/[sub 4] SNR estimator, by default, assumes that the kurtosis of the signal is 1 and the kurtosis of the noise is 3, the latter corresponding to Gaussian noise.
- These parameters, however, can be overridden:
- std::vector<double> noisy_signal(512);
- // fill noisy_signal with the data:
- double signal_kurtosis = 1.5;
- // Noise is assumed to follow Laplace distribution, which has kurtosis of 6:
- double noise_kurtosis = 6;
- double est_snr = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal, signal_kurtosis, noise_kurtosis);
- Now, technically the method is a "blind SNR estimator", meaning that the no /a-priori/ information about the signal is required to use the method.
- However, the performance of the method is /vastly/ better if you can come up with a better estimate of the signal and noise kurtosis.
- How can we do this? Suppose we know that the SNR is much greater than 1.
- Then we can estimate the signal kurtosis simply by using the noisy signal kurtosis.
- If the SNR is much less than one, this method breaks down as the noisy signal kurtosis will tend to the noise kurtosis-though in this limit we have an excellent estimator of the noise kurtosis!
- In addition, if you have a model of what your signal should look like, you can precompute the signal kurtosis.
- For example, sinusoids have a kurtosis of 1.5.
- See [@http://www.jcomputers.us/vol8/jcp0808-21.pdf here] for a study which uses estimates of this sort to improve the performance of the /M/[sub 2]/M/[sub 4] estimator.
- /Nota bene/: The traditional definition of SNR is /not/ mean invariant.
- By this we mean that if a constant is added to every sample of a signal, the SNR is changed.
- For example, adding DC bias to a signal changes its SNR.
- For most use cases, this is really not what you intend; for example a signal consisting of zeros plus Gaussian noise has an SNR of zero,
- whereas a signal with a constant DC bias and random Gaussian noise might have a very large SNR.
- The /M/[sub 2]/M/[sub 4] SNR estimator is computed from mean-invariant quantities,
- and hence it should really be compared to the mean-invariant SNR.
- /Nota bene/: This computation requires the solution of a system of quadratic equations involving the noise kurtosis, the signal kurtosis, and the second and fourth moments of the data.
- There is no guarantee that a solution of this system exists for all value of these parameters, in fact nonexistence can easily be demonstrated for certain data.
- If there is no solution to the system, then failure is communicated by returning NaNs.
- This happens distressingly often; if a user is aware of any blind SNR estimators which do not suffer from this drawback, please open a github ticket and let us know.
- The author has not managed to fully characterize the conditions under which a real solution with /S > 0/ and /N >0/ exists.
- However, a very intuitive example demonstrates why nonexistence can occur.
- Suppose the signal and noise kurtosis are equal.
- Then the method has no way to distinguish between the signal and the noise, and the solution is non-unique.
- [heading References]
- * Mallat, Stephane. ['A wavelet tour of signal processing: the sparse way.] Academic press, 2008.
- * Hurley, Niall, and Scott Rickard. ['Comparing measures of sparsity.] IEEE Transactions on Information Theory 55.10 (2009): 4723-4741.
- * Jensen, Arne, and Anders la Cour-Harbo. ['Ripples in mathematics: the discrete wavelet transform.] Springer Science & Business Media, 2001.
- * D. R. Pauluzzi and N. C. Beaulieu, ['A comparison of SNR estimation techniques for the AWGN channel,] IEEE Trans. Communications, Vol. 48, No. 10, pp. 1681-1691, 2000.
- * Hoyer, Patrik O. ['Non-negative matrix factorization with sparseness constraints.], Journal of machine learning research 5.Nov (2004): 1457-1469.
- [endsect]
- [/section:signal_statistics Signal Statistics]
|