chapter eight

8 Using probability to its maximum: The naive Bayes model

In this chapter

what is Bayes theorem?
dependent and independent events
the prior and posterior probabilities
calculating conditional probabilities based on events
using the naive Bayes model to predict whether an email is spam
or ham, based on the words in the email
coding the naive Bayes algorithm in Python

Naive Bayes is an important machine learning model used for classification. The naive Bayes model is a purely probabilistic model, which means the prediction is a number between 0 and 1, indicating the probability that a label is positive. The main component of the naive Bayes model is Bayes’ theorem.

Sick or healthy? A story with Bayes’ theorem as the hero

Prelude to Bayes’ theorem: The prior, the event, and the posterior

Use case: Spam-detection model

Finding the prior: The probability that any email is spam

Finding the posterior: The probability that an email is spam, knowing that it contains a particular word

What the math just happened? Turning ratios into probabilities

What about two words? The naive Bayes algorithm

What about more than two words?

Building a spam-detection model with real data

Data preprocessing

Finding the priors

Finding the posteriors with Bayes’ theorem

Implementing the naive Bayes algorithm

Further work

Summary

Exercises