Chapter 17. Case study: Shop It To Me

 

This chapter covers

  • Speed and scale considerations for the classifier system
  • How the training pipeline was constructed
  • Restructuring classification for very high throughput

So far, the chapters on classification have presented an overview of classification plus detailed explanations of what it takes to design and train a Mahout classifier, evaluate the trained models in order to adjust performance to the desired level, and deploy the classifier into large-scale systems. This chapter puts all those topics into practice in a case study drawn from a real-world online marketing company, called Shop It To Me (http://www.shopittome.com/), that selected Mahout as their approach to classification. You’ll examine the issues their small engineering team faced in building and deploying a high-performance Mahout-based classifier and see the solutions they came up with.

Chapter 16 focused on the scale requirements of the very large systems that are best served by Mahout classification. Similarly, the case study presented in this chapter deals with large data sets, but it also affords a look at a system with speed requirements that are extreme, even for Mahout systems. Because of the scale and especially the speed required, some of the solutions the team designed required substantial innovations. Overall, the system described in this chapter shows how you can push Mahout classifiers farther than appears possible at first glance.

17.1. Why Shop It To Me chose Mahout

17.2. General structure of the email marketing system

17.3. Training the model

17.4. Speeding up classification

17.5. Summary