chapter eight

8 Using probability to its maximum: The naive Bayes model

 

In this chapter

  • what is Bayes theorem?
  • dependent and independent events
  • the prior and posterior probabilities
  • calculating conditional probabilities based on events
  • using the naive Bayes model to predict whether an email is spam
  • or ham, based on the words in the email
  • coding the naive Bayes algorithm in Python

Naive Bayes is an important machine learning model used for classification. The naive Bayes model is a purely probabilistic model, which means the prediction is a number between 0 and 1, indicating the probability that a label is positive. The main component of the naive Bayes model is Bayes’ theorem.

Sick or healthy? A story with Bayes’ theorem as the hero

Prelude to Bayes’ theorem: The prior, the event, and the posterior

Use case: Spam-detection model

Finding the prior: The probability that any email is spam

Finding the posterior: The probability that an email is spam, knowing that it contains a particular word

What the math just happened? Turning ratios into probabilities

What about two words? The naive Bayes algorithm

What about more than two words?

Building a spam-detection model with real data

Data preprocessing

Finding the priors

Finding the posteriors with Bayes’ theorem

Implementing the naive Bayes algorithm

Further work

Summary

Exercises