Chapter 9. Scaling machine-learning workflows
This chapter covers
- Determining when to scale up workflows for model accuracy and prediction throughput
- Avoiding unnecessary investments in complex scaling strategies and heavy infrastructure
- Ways to scale linear ML algorithms to large amounts of training data
- Approaches to scaling nonlinear ML algorithms—usually a much greater challenge
- Decreasing latency and increasing throughput of predictions
In real-world machine-learning applications, scalability is often a primary concern. Many ML-based systems are required to quickly crunch new data and produce predictions, because the predictions become useless after a few milliseconds (for instance, think of real-time applications such as the stock market or clickstream data). On the other hand, other machine-learning applications need to be able to scale during model training, to learn on gigabytes or terabytes of data (think about learning a model from an internet-scale image corpus).
In previous chapters, you worked mostly with data that’s small enough to fit, process, and model on a single machine. For many real-world problems, this may be sufficient to solve the problem at hand, but plenty of applications require scaling to multiple machines and sometimes hundreds of machines in the cloud. This chapter is about deciding on a scaling strategy and learning about the technologies involved.