negative_binomial_example1.cpp 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513
  1. // negative_binomial_example1.cpp
  2. // Copyright Paul A. Bristow 2007, 2010.
  3. // Use, modification and distribution are subject to the
  4. // Boost Software License, Version 1.0.
  5. // (See accompanying file LICENSE_1_0.txt
  6. // or copy at http://www.boost.org/LICENSE_1_0.txt)
  7. // Example 1 of using negative_binomial distribution.
  8. //[negative_binomial_eg1_1
  9. /*`
  10. Based on [@http://en.wikipedia.org/wiki/Negative_binomial_distribution
  11. a problem by Dr. Diane Evans,
  12. Professor of Mathematics at Rose-Hulman Institute of Technology].
  13. Pat is required to sell candy bars to raise money for the 6th grade field trip.
  14. There are thirty houses in the neighborhood,
  15. and Pat is not supposed to return home until five candy bars have been sold.
  16. So the child goes door to door, selling candy bars.
  17. At each house, there is a 0.4 probability (40%) of selling one candy bar
  18. and a 0.6 probability (60%) of selling nothing.
  19. What is the probability mass (density) function (pdf) for selling the last (fifth)
  20. candy bar at the nth house?
  21. The Negative Binomial(r, p) distribution describes the probability of k failures
  22. and r successes in k+r Bernoulli(p) trials with success on the last trial.
  23. (A [@http://en.wikipedia.org/wiki/Bernoulli_distribution Bernoulli trial]
  24. is one with only two possible outcomes, success of failure,
  25. and p is the probability of success).
  26. See also [@ http://en.wikipedia.org/wiki/Bernoulli_distribution Bernoulli distribution]
  27. and [@http://www.math.uah.edu/stat/bernoulli/Introduction.xhtml Bernoulli applications].
  28. In this example, we will deliberately produce a variety of calculations
  29. and outputs to demonstrate the ways that the negative binomial distribution
  30. can be implemented with this library: it is also deliberately over-commented.
  31. First we need to #define macros to control the error and discrete handling policies.
  32. For this simple example, we want to avoid throwing
  33. an exception (the default policy) and just return infinity.
  34. We want to treat the distribution as if it was continuous,
  35. so we choose a discrete_quantile policy of real,
  36. rather than the default policy integer_round_outwards.
  37. */
  38. #define BOOST_MATH_OVERFLOW_ERROR_POLICY ignore_error
  39. #define BOOST_MATH_DISCRETE_QUANTILE_POLICY real
  40. /*`
  41. After that we need some includes to provide easy access to the negative binomial distribution,
  42. [caution It is vital to #include distributions etc *after* the above #defines]
  43. and we need some std library iostream, of course.
  44. */
  45. #include <boost/math/distributions/negative_binomial.hpp>
  46. // for negative_binomial_distribution
  47. using boost::math::negative_binomial; // typedef provides default type is double.
  48. using ::boost::math::pdf; // Probability mass function.
  49. using ::boost::math::cdf; // Cumulative density function.
  50. using ::boost::math::quantile;
  51. #include <iostream>
  52. using std::cout; using std::endl;
  53. using std::noshowpoint; using std::fixed; using std::right; using std::left;
  54. #include <iomanip>
  55. using std::setprecision; using std::setw;
  56. #include <limits>
  57. using std::numeric_limits;
  58. //] [negative_binomial_eg1_1]
  59. int main()
  60. {
  61. cout <<"Selling candy bars - using the negative binomial distribution."
  62. << "\nby Dr. Diane Evans,"
  63. "\nProfessor of Mathematics at Rose-Hulman Institute of Technology,"
  64. << "\nsee http://en.wikipedia.org/wiki/Negative_binomial_distribution\n"
  65. << endl;
  66. cout << endl;
  67. cout.precision(5);
  68. // None of the values calculated have a useful accuracy as great this, but
  69. // INF shows wrongly with < 5 !
  70. // https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=240227
  71. //[negative_binomial_eg1_2
  72. /*`
  73. It is always sensible to use try and catch blocks because defaults policies are to
  74. throw an exception if anything goes wrong.
  75. A simple catch block (see below) will ensure that you get a
  76. helpful error message instead of an abrupt program abort.
  77. */
  78. try
  79. {
  80. /*`
  81. Selling five candy bars means getting five successes, so successes r = 5.
  82. The total number of trials (n, in this case, houses visited) this takes is therefore
  83. = sucesses + failures or k + r = k + 5.
  84. */
  85. double sales_quota = 5; // Pat's sales quota - successes (r).
  86. /*`
  87. At each house, there is a 0.4 probability (40%) of selling one candy bar
  88. and a 0.6 probability (60%) of selling nothing.
  89. */
  90. double success_fraction = 0.4; // success_fraction (p) - so failure_fraction is 0.6.
  91. /*`
  92. The Negative Binomial(r, p) distribution describes the probability of k failures
  93. and r successes in k+r Bernoulli(p) trials with success on the last trial.
  94. (A [@http://en.wikipedia.org/wiki/Bernoulli_distribution Bernoulli trial]
  95. is one with only two possible outcomes, success of failure,
  96. and p is the probability of success).
  97. We therefore start by constructing a negative binomial distribution
  98. with parameters sales_quota (required successes) and probability of success.
  99. */
  100. negative_binomial nb(sales_quota, success_fraction); // type double by default.
  101. /*`
  102. To confirm, display the success_fraction & successes parameters of the distribution.
  103. */
  104. cout << "Pat has a sales per house success rate of " << success_fraction
  105. << ".\nTherefore he would, on average, sell " << nb.success_fraction() * 100
  106. << " bars after trying 100 houses." << endl;
  107. int all_houses = 30; // The number of houses on the estate.
  108. cout << "With a success rate of " << nb.success_fraction()
  109. << ", he might expect, on average,\n"
  110. "to need to visit about " << success_fraction * all_houses
  111. << " houses in order to sell all " << nb.successes() << " bars. " << endl;
  112. /*`
  113. [pre
  114. Pat has a sales per house success rate of 0.4.
  115. Therefore he would, on average, sell 40 bars after trying 100 houses.
  116. With a success rate of 0.4, he might expect, on average,
  117. to need to visit about 12 houses in order to sell all 5 bars.
  118. ]
  119. The random variable of interest is the number of houses
  120. that must be visited to sell five candy bars,
  121. so we substitute k = n - 5 into a negative_binomial(5, 0.4)
  122. and obtain the __pdf of the distribution of houses visited.
  123. Obviously, the best possible case is that Pat makes sales on all the first five houses.
  124. We calculate this using the pdf function:
  125. */
  126. cout << "Probability that Pat finishes on the " << sales_quota << "th house is "
  127. << pdf(nb, 5 - sales_quota) << endl; // == pdf(nb, 0)
  128. /*`
  129. Of course, he could not finish on fewer than 5 houses because he must sell 5 candy bars.
  130. So the 5th house is the first that he could possibly finish on.
  131. To finish on or before the 8th house, Pat must finish at the 5th, 6th, 7th or 8th house.
  132. The probability that he will finish on *exactly* ( == ) on any house
  133. is the Probability Density Function (pdf).
  134. */
  135. cout << "Probability that Pat finishes on the 6th house is "
  136. << pdf(nb, 6 - sales_quota) << endl;
  137. cout << "Probability that Pat finishes on the 7th house is "
  138. << pdf(nb, 7 - sales_quota) << endl;
  139. cout << "Probability that Pat finishes on the 8th house is "
  140. << pdf(nb, 8 - sales_quota) << endl;
  141. /*`
  142. [pre
  143. Probability that Pat finishes on the 6th house is 0.03072
  144. Probability that Pat finishes on the 7th house is 0.055296
  145. Probability that Pat finishes on the 8th house is 0.077414
  146. ]
  147. The sum of the probabilities for these houses is the Cumulative Distribution Function (cdf).
  148. We can calculate it by adding the individual probabilities.
  149. */
  150. cout << "Probability that Pat finishes on or before the 8th house is sum "
  151. "\n" << "pdf(sales_quota) + pdf(6) + pdf(7) + pdf(8) = "
  152. // Sum each of the mass/density probabilities for houses sales_quota = 5, 6, 7, & 8.
  153. << pdf(nb, 5 - sales_quota) // 0 failures.
  154. + pdf(nb, 6 - sales_quota) // 1 failure.
  155. + pdf(nb, 7 - sales_quota) // 2 failures.
  156. + pdf(nb, 8 - sales_quota) // 3 failures.
  157. << endl;
  158. /*`[pre
  159. pdf(sales_quota) + pdf(6) + pdf(7) + pdf(8) = 0.17367
  160. ]
  161. Or, usually better, by using the negative binomial *cumulative* distribution function.
  162. */
  163. cout << "\nProbability of selling his quota of " << sales_quota
  164. << " bars\non or before the " << 8 << "th house is "
  165. << cdf(nb, 8 - sales_quota) << endl;
  166. /*`[pre
  167. Probability of selling his quota of 5 bars on or before the 8th house is 0.17367
  168. ]*/
  169. cout << "\nProbability that Pat finishes exactly on the 10th house is "
  170. << pdf(nb, 10 - sales_quota) << endl;
  171. cout << "\nProbability of selling his quota of " << sales_quota
  172. << " bars\non or before the " << 10 << "th house is "
  173. << cdf(nb, 10 - sales_quota) << endl;
  174. /*`
  175. [pre
  176. Probability that Pat finishes exactly on the 10th house is 0.10033
  177. Probability of selling his quota of 5 bars on or before the 10th house is 0.3669
  178. ]*/
  179. cout << "Probability that Pat finishes exactly on the 11th house is "
  180. << pdf(nb, 11 - sales_quota) << endl;
  181. cout << "\nProbability of selling his quota of " << sales_quota
  182. << " bars\non or before the " << 11 << "th house is "
  183. << cdf(nb, 11 - sales_quota) << endl;
  184. /*`[pre
  185. Probability that Pat finishes on the 11th house is 0.10033
  186. Probability of selling his quota of 5 candy bars
  187. on or before the 11th house is 0.46723
  188. ]*/
  189. cout << "Probability that Pat finishes exactly on the 12th house is "
  190. << pdf(nb, 12 - sales_quota) << endl;
  191. cout << "\nProbability of selling his quota of " << sales_quota
  192. << " bars\non or before the " << 12 << "th house is "
  193. << cdf(nb, 12 - sales_quota) << endl;
  194. /*`[pre
  195. Probability that Pat finishes on the 12th house is 0.094596
  196. Probability of selling his quota of 5 candy bars
  197. on or before the 12th house is 0.56182
  198. ]
  199. Finally consider the risk of Pat not selling his quota of 5 bars
  200. even after visiting all the houses.
  201. Calculate the probability that he /will/ sell on
  202. or before the last house:
  203. Calculate the probability that he would sell all his quota on the very last house.
  204. */
  205. cout << "Probability that Pat finishes on the " << all_houses
  206. << " house is " << pdf(nb, all_houses - sales_quota) << endl;
  207. /*`
  208. Probability of selling his quota of 5 bars on the 30th house is
  209. [pre
  210. Probability that Pat finishes on the 30 house is 0.00069145
  211. ]
  212. when he'd be very unlucky indeed!
  213. What is the probability that Pat exhausts all 30 houses in the neighborhood,
  214. and *still* doesn't sell the required 5 candy bars?
  215. */
  216. cout << "\nProbability of selling his quota of " << sales_quota
  217. << " bars\non or before the " << all_houses << "th house is "
  218. << cdf(nb, all_houses - sales_quota) << endl;
  219. /*`
  220. [pre
  221. Probability of selling his quota of 5 bars
  222. on or before the 30th house is 0.99849
  223. ]
  224. So the risk of failing even after visiting all the houses is 1 - this probability,
  225. ``1 - cdf(nb, all_houses - sales_quota``
  226. But using this expression may cause serious inaccuracy,
  227. so it would be much better to use the complement of the cdf:
  228. So the risk of failing even at, or after, the 31th (non-existent) houses is 1 - this probability,
  229. ``1 - cdf(nb, all_houses - sales_quota)``
  230. But using this expression may cause serious inaccuracy.
  231. So it would be much better to use the __complement of the cdf (see __why_complements).
  232. */
  233. cout << "\nProbability of failing to sell his quota of " << sales_quota
  234. << " bars\neven after visiting all " << all_houses << " houses is "
  235. << cdf(complement(nb, all_houses - sales_quota)) << endl;
  236. /*`
  237. [pre
  238. Probability of failing to sell his quota of 5 bars
  239. even after visiting all 30 houses is 0.0015101
  240. ]
  241. We can also use the quantile (percentile), the inverse of the cdf, to
  242. predict which house Pat will finish on. So for the 8th house:
  243. */
  244. double p = cdf(nb, (8 - sales_quota));
  245. cout << "Probability of meeting sales quota on or before 8th house is "<< p << endl;
  246. /*`
  247. [pre
  248. Probability of meeting sales quota on or before 8th house is 0.174
  249. ]
  250. */
  251. cout << "If the confidence of meeting sales quota is " << p
  252. << ", then the finishing house is " << quantile(nb, p) + sales_quota << endl;
  253. cout<< " quantile(nb, p) = " << quantile(nb, p) << endl;
  254. /*`
  255. [pre
  256. If the confidence of meeting sales quota is 0.17367, then the finishing house is 8
  257. ]
  258. Demanding absolute certainty that all 5 will be sold,
  259. implies an infinite number of trials.
  260. (Of course, there are only 30 houses on the estate,
  261. so he can't ever be *certain* of selling his quota).
  262. */
  263. cout << "If the confidence of meeting sales quota is " << 1.
  264. << ", then the finishing house is " << quantile(nb, 1) + sales_quota << endl;
  265. // 1.#INF == infinity.
  266. /*`[pre
  267. If the confidence of meeting sales quota is 1, then the finishing house is 1.#INF
  268. ]
  269. And similarly for a few other probabilities:
  270. */
  271. cout << "If the confidence of meeting sales quota is " << 0.
  272. << ", then the finishing house is " << quantile(nb, 0.) + sales_quota << endl;
  273. cout << "If the confidence of meeting sales quota is " << 0.5
  274. << ", then the finishing house is " << quantile(nb, 0.5) + sales_quota << endl;
  275. cout << "If the confidence of meeting sales quota is " << 1 - 0.00151 // 30 th
  276. << ", then the finishing house is " << quantile(nb, 1 - 0.00151) + sales_quota << endl;
  277. /*`
  278. [pre
  279. If the confidence of meeting sales quota is 0, then the finishing house is 5
  280. If the confidence of meeting sales quota is 0.5, then the finishing house is 11.337
  281. If the confidence of meeting sales quota is 0.99849, then the finishing house is 30
  282. ]
  283. Notice that because we chose a discrete quantile policy of real,
  284. the result can be an 'unreal' fractional house.
  285. If the opposite is true, we don't want to assume any confidence, then this is tantamount
  286. to assuming that all the first sales_quota trials will be successful sales.
  287. */
  288. cout << "If confidence of meeting quota is zero\n(we assume all houses are successful sales)"
  289. ", then finishing house is " << sales_quota << endl;
  290. /*`
  291. [pre
  292. If confidence of meeting quota is zero (we assume all houses are successful sales), then finishing house is 5
  293. If confidence of meeting quota is 0, then finishing house is 5
  294. ]
  295. We can list quantiles for a few probabilities:
  296. */
  297. double ps[] = {0., 0.001, 0.01, 0.05, 0.1, 0.5, 0.9, 0.95, 0.99, 0.999, 1.};
  298. // Confidence as fraction = 1-alpha, as percent = 100 * (1-alpha[i]) %
  299. cout.precision(3);
  300. for (unsigned i = 0; i < sizeof(ps)/sizeof(ps[0]); i++)
  301. {
  302. cout << "If confidence of meeting quota is " << ps[i]
  303. << ", then finishing house is " << quantile(nb, ps[i]) + sales_quota
  304. << endl;
  305. }
  306. /*`
  307. [pre
  308. If confidence of meeting quota is 0, then finishing house is 5
  309. If confidence of meeting quota is 0.001, then finishing house is 5
  310. If confidence of meeting quota is 0.01, then finishing house is 5
  311. If confidence of meeting quota is 0.05, then finishing house is 6.2
  312. If confidence of meeting quota is 0.1, then finishing house is 7.06
  313. If confidence of meeting quota is 0.5, then finishing house is 11.3
  314. If confidence of meeting quota is 0.9, then finishing house is 17.8
  315. If confidence of meeting quota is 0.95, then finishing house is 20.1
  316. If confidence of meeting quota is 0.99, then finishing house is 24.8
  317. If confidence of meeting quota is 0.999, then finishing house is 31.1
  318. If confidence of meeting quota is 1, then finishing house is 1.#INF
  319. ]
  320. We could have applied a ceil function to obtain a 'worst case' integer value for house.
  321. ``ceil(quantile(nb, ps[i]))``
  322. Or, if we had used the default discrete quantile policy, integer_outside, by omitting
  323. ``#define BOOST_MATH_DISCRETE_QUANTILE_POLICY real``
  324. we would have achieved the same effect.
  325. The real result gives some suggestion which house is most likely.
  326. For example, compare the real and integer_outside for 95% confidence.
  327. [pre
  328. If confidence of meeting quota is 0.95, then finishing house is 20.1
  329. If confidence of meeting quota is 0.95, then finishing house is 21
  330. ]
  331. The real value 20.1 is much closer to 20 than 21, so integer_outside is pessimistic.
  332. We could also use integer_round_nearest policy to suggest that 20 is more likely.
  333. Finally, we can tabulate the probability for the last sale being exactly on each house.
  334. */
  335. cout << "\nHouse for " << sales_quota << "th (last) sale. Probability (%)" << endl;
  336. cout.precision(5);
  337. for (int i = (int)sales_quota; i < all_houses+1; i++)
  338. {
  339. cout << left << setw(3) << i << " " << setw(8) << cdf(nb, i - sales_quota) << endl;
  340. }
  341. cout << endl;
  342. /*`
  343. [pre
  344. House for 5 th (last) sale. Probability (%)
  345. 5 0.01024
  346. 6 0.04096
  347. 7 0.096256
  348. 8 0.17367
  349. 9 0.26657
  350. 10 0.3669
  351. 11 0.46723
  352. 12 0.56182
  353. 13 0.64696
  354. 14 0.72074
  355. 15 0.78272
  356. 16 0.83343
  357. 17 0.874
  358. 18 0.90583
  359. 19 0.93039
  360. 20 0.94905
  361. 21 0.96304
  362. 22 0.97342
  363. 23 0.98103
  364. 24 0.98655
  365. 25 0.99053
  366. 26 0.99337
  367. 27 0.99539
  368. 28 0.99681
  369. 29 0.9978
  370. 30 0.99849
  371. ]
  372. As noted above, using a catch block is always a good idea, even if you do not expect to use it.
  373. */
  374. }
  375. catch(const std::exception& e)
  376. { // Since we have set an overflow policy of ignore_error,
  377. // an overflow exception should never be thrown.
  378. std::cout << "\nMessage from thrown exception was:\n " << e.what() << std::endl;
  379. /*`
  380. For example, without a ignore domain error policy, if we asked for ``pdf(nb, -1)`` for example, we would get:
  381. [pre
  382. Message from thrown exception was:
  383. Error in function boost::math::pdf(const negative_binomial_distribution<double>&, double):
  384. Number of failures argument is -1, but must be >= 0 !
  385. ]
  386. */
  387. //] [/ negative_binomial_eg1_2]
  388. }
  389. return 0;
  390. } // int main()
  391. /*
  392. Output is:
  393. Selling candy bars - using the negative binomial distribution.
  394. by Dr. Diane Evans,
  395. Professor of Mathematics at Rose-Hulman Institute of Technology,
  396. see http://en.wikipedia.org/wiki/Negative_binomial_distribution
  397. Pat has a sales per house success rate of 0.4.
  398. Therefore he would, on average, sell 40 bars after trying 100 houses.
  399. With a success rate of 0.4, he might expect, on average,
  400. to need to visit about 12 houses in order to sell all 5 bars.
  401. Probability that Pat finishes on the 5th house is 0.01024
  402. Probability that Pat finishes on the 6th house is 0.03072
  403. Probability that Pat finishes on the 7th house is 0.055296
  404. Probability that Pat finishes on the 8th house is 0.077414
  405. Probability that Pat finishes on or before the 8th house is sum
  406. pdf(sales_quota) + pdf(6) + pdf(7) + pdf(8) = 0.17367
  407. Probability of selling his quota of 5 bars
  408. on or before the 8th house is 0.17367
  409. Probability that Pat finishes exactly on the 10th house is 0.10033
  410. Probability of selling his quota of 5 bars
  411. on or before the 10th house is 0.3669
  412. Probability that Pat finishes exactly on the 11th house is 0.10033
  413. Probability of selling his quota of 5 bars
  414. on or before the 11th house is 0.46723
  415. Probability that Pat finishes exactly on the 12th house is 0.094596
  416. Probability of selling his quota of 5 bars
  417. on or before the 12th house is 0.56182
  418. Probability that Pat finishes on the 30 house is 0.00069145
  419. Probability of selling his quota of 5 bars
  420. on or before the 30th house is 0.99849
  421. Probability of failing to sell his quota of 5 bars
  422. even after visiting all 30 houses is 0.0015101
  423. Probability of meeting sales quota on or before 8th house is 0.17367
  424. If the confidence of meeting sales quota is 0.17367, then the finishing house is 8
  425. quantile(nb, p) = 3
  426. If the confidence of meeting sales quota is 1, then the finishing house is 1.#INF
  427. If the confidence of meeting sales quota is 0, then the finishing house is 5
  428. If the confidence of meeting sales quota is 0.5, then the finishing house is 11.337
  429. If the confidence of meeting sales quota is 0.99849, then the finishing house is 30
  430. If confidence of meeting quota is zero
  431. (we assume all houses are successful sales), then finishing house is 5
  432. If confidence of meeting quota is 0, then finishing house is 5
  433. If confidence of meeting quota is 0.001, then finishing house is 5
  434. If confidence of meeting quota is 0.01, then finishing house is 5
  435. If confidence of meeting quota is 0.05, then finishing house is 6.2
  436. If confidence of meeting quota is 0.1, then finishing house is 7.06
  437. If confidence of meeting quota is 0.5, then finishing house is 11.3
  438. If confidence of meeting quota is 0.9, then finishing house is 17.8
  439. If confidence of meeting quota is 0.95, then finishing house is 20.1
  440. If confidence of meeting quota is 0.99, then finishing house is 24.8
  441. If confidence of meeting quota is 0.999, then finishing house is 31.1
  442. If confidence of meeting quota is 1, then finishing house is 1.#J
  443. House for 5th (last) sale. Probability (%)
  444. 5 0.01024
  445. 6 0.04096
  446. 7 0.096256
  447. 8 0.17367
  448. 9 0.26657
  449. 10 0.3669
  450. 11 0.46723
  451. 12 0.56182
  452. 13 0.64696
  453. 14 0.72074
  454. 15 0.78272
  455. 16 0.83343
  456. 17 0.874
  457. 18 0.90583
  458. 19 0.93039
  459. 20 0.94905
  460. 21 0.96304
  461. 22 0.97342
  462. 23 0.98103
  463. 24 0.98655
  464. 25 0.99053
  465. 26 0.99337
  466. 27 0.99539
  467. 28 0.99681
  468. 29 0.9978
  469. 30 0.99849
  470. */