chapter twelve

12 A better abstraction for randomness

 

In this chapter

  • The two philosophies of probability
  • A better way to represent discrete distributions
  • A categorical distribution is an unfair die
  • Four algorithms for sampling categorical distributions
  • Filtering elements out of a categorical distribution

In my university days I paid little attention to my only required statistics class. I found the jargon confusing and foresaw little use for statistical reasoning in my future career. I was half right: the jargon is confusing. But since I ended up implementing programming tools for data scientists, my weak knowledge of probabilistic reasoning needed to be fixed in a hurry.

In this chapter we’ll start with a refresher on statistics jargon and come up with a better way to represent randomness than the standard random number generating functions. We use high-level methods such as Where, SelectMany, Join and GroupBy to manipulate sequences without writing tedious loops; we could similarly benefit by using random quantities at a higher level of abstraction.

Writing readable, fluent code that generates random numbers is great, but the work in this chapter will prepare us for a much more fabulous power: we can automatically apply Bayes’ Theorem to correctly compute the impact of new real-world evidence when a program must make a decision. We’ll look at that in the next chapter. We have some work to do first: we need a way to efficiently simulate rolling an unfair die.

12.1 What are “probabilities”?

12.1.1 What are “discrete probability distributions”?

12.2 Generating uniform samples with Random

12.3 IDistribution<T> and IDiscreteDistribution<T>

12.4 Flipping an unfair coin with Bernoulli

12.5 Improving the ecosystem with extension methods

12.6 Representing unfair die rolls by adding a projection

12.7 Categorical algorithm one: make a big list

12.8 Categorical algorithm two: climb a ladder

12.9 Categorical algorithm three: rejecting rejection sampling

12.10 Categorical algorithm four: the alias algorithm

12.11 Filtering out a category

12.12 Summary