Chapter 10. Example: digital display advertising
This chapter covers
- Visualizing and preparing a real-world dataset
- Building a predictive model of the probability that users will click a digital display advertisement
- Comparing the performance of several algorithms in both training and prediction phases
- Scaling by dimension reduction and parallel processing
Chapter 9 presented techniques that enable you to scale your machine-learning workflow. In this chapter, you’ll apply those techniques to a large-scale real-world problem: optimizing an online advertising campaign. We begin with a short introduction to the complex world of online advertising, the data that drives it, and some of the ways it’s used by advertisers to maximize return on advertising spend (ROAS). Then we show how to put some of the techniques in chapter 9 to use in this archetypal big-data application.
We employ several datasets in our example. Unfortunately, only a few large datasets of this type are available to the public. The primary dataset in our example isn’t available for download, and even if it were, it would be too large for personal computing.
10.12. Terms from this chapter
Word |
Definition |
---|---|
recommender | A class of ML algorithms used to predict users’ affinities for various items. |
collaborative filtering | Recommender algorithms that work by characterizing users via their item preferences, and items by the preferences of common users. |
ensemble method | An ML strategy in which multiple models’ independent predictions are combined. |
ensemble effect | The tendency of multiple combined models to yield better predictive performance than the individual components. |
k-nearest neighbors | An algorithm that bases predictions on the nearest observations in the training space. |
Euclidean distance | One of many ways of measuring distances in feature space. In two-dimensional space, it’s the familiar distance formula. |
random forest | An ensemble learning method that fits multiple decision tree classifiers or regressors to subsets of the training data and features and makes predictions based on the combined model. |
bagging | The process of repeated sampling with replacement used by random forests and other algorithms. |
stacking | Use of a machine-learning algorithm, often logistic regression, to combine the predictions of other algorithms to create a final “consensus” prediction. |