12 A better abstraction for randomness
In this chapter
- The two philosophies of probability
- A better way to represent discrete distributions
- A categorical distribution is an unfair die
- Four algorithms for sampling categorical distributions
- Filtering elements out of a categorical distribution
In my university days I paid little attention to my only required statistics class. I found the jargon confusing and foresaw little use for statistical reasoning in my future career. I was half right: the jargon is confusing. But since I ended up implementing programming tools for data scientists, my weak knowledge of probabilistic reasoning needed to be fixed in a hurry.
In this chapter we’ll start with a refresher on statistics jargon and come up with a better way to represent randomness than the standard random number generating functions. We use high-level methods such as Where, SelectMany, Join and GroupBy to manipulate sequences without writing tedious loops; we could similarly benefit by using random quantities at a higher level of abstraction.
Writing readable, fluent code that generates random numbers is great, but the work in this chapter will prepare us for a much more fabulous power: we can automatically apply Bayes’ Theorem to correctly compute the impact of new real-world evidence when a program must make a decision. We’ll look at that in the next chapter. We have some work to do first: we need a way to efficiently simulate rolling an unfair die.