1 Introduction to distributed machine learning systems
This chapter covers
- Handle the growing scale in large scale machine learning applications.
- Establish patterns to build scalable and reliable distributed systems.
- Leverage patterns in distributed systems and build reusable patterns that could accelerate distributed machine learning systems in a more scalable and reliable way.
Machine learning systems are becoming more and more important nowadays: recommendation systems learn to generate recommendations of potential interest with the right context according to user feedback and interactions; anomalous event detection systems help monitor assets to avoid downtime due to extreme conditions; fraud detection systems protect financial institutions from security attacks and malicious fraud behaviors.
There are increasing demands on building large scale distributed machine learning systems. If a data analyst, data scientist, or software engineer has basic knowledge and hands-on experience in building machine learning models in Python and wants to take a step further to learn how to build something more robust, scalable, and reliable, then this is the right book to read. While experience in production environments or distributed systems is not a requirement, we expect readers in this position to have at least some exposure to machine learning applications running in production and should have written Python and Bash scripts for at least one year.