Chapter 16. Deploying a classifier

 

This chapter covers

  • Specifying the speed and size requirements of a classifier
  • Building a large-scale classifier
  • Delivering a high-speed classifier
  • Building and deploying a working classification server

This chapter examines the issues you face in putting a classification system into real-world production. In previous chapters, we talked about the advantages of using Mahout classifiers when working with large data sets, how the classifiers are trained, how they work internally, and how you can evaluate them, but those discussions purposely omitted many of the ways that the real world intrudes when you need to put a classifier into production. In this chapter, we examine these real-world aspects as we explain how to deploy a high-speed classifier. We include tips on how to optimize feature extraction and vector encoding, describe how to balance load and speed requirements, and explain how to build a training pipeline for very large systems, such as those appropriate for Mahout classification. Finally, we provide an example of the deployment of a Thrift-based server that allows fully functional load-balanced classification.

16.1. Process for deployment in huge systems

16.2. Determining scale and speed requirements

16.3. Building a training pipeline for large systems

16.4. Integrating a Mahout classifier

16.5. Example: a Thrift-based classification server

16.6. Summary