Chapter 10. Example: digital display advertising


This chapter covers

  • Visualizing and preparing a real-world dataset
  • Building a predictive model of the probability that users will click a digital display advertisement
  • Comparing the performance of several algorithms in both training and prediction phases
  • Scaling by dimension reduction and parallel processing

Chapter 9 presented techniques that enable you to scale your machine-learning workflow. In this chapter, you’ll apply those techniques to a large-scale real-world problem: optimizing an online advertising campaign. We begin with a short introduction to the complex world of online advertising, the data that drives it, and some of the ways it’s used by advertisers to maximize return on advertising spend (ROAS). Then we show how to put some of the techniques in chapter 9 to use in this archetypal big-data application.

We employ several datasets in our example. Unfortunately, only a few large datasets of this type are available to the public. The primary dataset in our example isn’t available for download, and even if it were, it would be too large for personal computing.

10.1. Display advertising

10.2. Digital advertising data

10.3. Feature engineering and modeling strategy

10.4. Size and shape of the data

10.5. Singular value decomposition

10.6. Resource estimation and optimization

10.7. Modeling

10.8. K-nearest neighbors

10.9. Random forests

10.10. Other real-world considerations

10.11. Summary

10.12. Terms from this chapter



recommender A class of ML algorithms used to predict users’ affinities for various items.
collaborative filtering Recommender algorithms that work by characterizing users via their item preferences, and items by the preferences of common users.
ensemble method An ML strategy in which multiple models’ independent predictions are combined.
ensemble effect The tendency of multiple combined models to yield better predictive performance than the individual components.
k-nearest neighbors An algorithm that bases predictions on the nearest observations in the training space.
Euclidean distance One of many ways of measuring distances in feature space. In two-dimensional space, it’s the familiar distance formula.
random forest An ensemble learning method that fits multiple decision tree classifiers or regressors to subsets of the training data and features and makes predictions based on the combined model.
bagging The process of repeated sampling with replacement used by random forests and other algorithms.
stacking Use of a machine-learning algorithm, often logistic regression, to combine the predictions of other algorithms to create a final “consensus” prediction.

10.13. Recap and conclusion