10 Faster decision making with machine learning and PySpark

 

This chapter covers

  • An introduction to machine learning
  • Training and applying decision trees classifiers in parallel with PySpark
  • Matching problems and appropriate machine learning algorithms
  • Training and applying random forest regressors with PySpark

In chapter 9 we saw some of Spark’s raw data transformation options and we used it in the Map and Reduce style we’ve been exploring throughout the book. Chapter 9 showed how we can write Python and take advantage of Spark, one of the most popular distributed computing frameworks. One of the reasons Spark is so popular is because of its built-in machine learning capabilities.

Machine learning refers to the design, training, application, and study of judgmental algorithms which adjust themselves based on input data. A familiar example of machine learning is the spam filter. Spam filter designers feed spam into their spam filter algorithms, which either are or contain machine learning algorithms. Then the spam filter algorithm learns to make judgments about whether or not an email is spam.

Figure 10.1 Spam filters are machine learning algorithms which learn how to judge emails as spam or not by looking at lots of spam emails and not-spam emails.

10.1   What is machine learning?

 
 
 

10.1.1   Machine learning as self-adjusting judgmental algorithms

 
 

10.1.2   Common applications of machine learning

 
 
 
 

10.2   Machine learning basics with decision tree classifiers

 
 
 
 

10.2.1   Designing decision tree classifiers

 
 

10.2.2   Implementing a decision tree in PySpark

 

10.3   Fast random forest classifications in PySpark

 
 

10.3.1   Understanding random forest classifiers

 
 
 

10.3.2   Implementing a random forest classifier

 
 
 

10.4   Exercises

 
 
 

10.4.1   ML question

 

10.4.3   Decision trees on Iris dataset

 
 

10.4.5   Other classifiers 

 
 
 

10.5   Summary

 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest