1 Introduction to distributed machine learning systems


This chapter covers

  • Handling the growing scale in large-scale machine learning applications
  • Establishing patterns to build scalable and reliable distributed systems
  • Using patterns in distributed systems and building reusable patterns

Machine learning systems are becoming more important nowadays. Recommendation systems learn to generate recommendations of potential interest with the right context according to user feedback and interactions, anomalous event detection systems help monitor assets to prevent downtime due to extreme conditions, and fraud detection systems protect financial institutions from security attacks and malicious fraud behaviors.

There is increasing demand for building large-scale distributed machine learning systems. If a data analyst, data scientist, or software engineer has basic knowledge of and hands-on experience in building machine learning models in Python and wants to take things a step further by learning how to build something more robust, scalable, and reliable, this book is the right one to read. Although experience in production environments or distributed systems is not a requirement, I expect readers in this position to have at least some exposure to machine learning applications running in production and should have written Python and Bash scripts for at least one year.

1.1 Large-scale machine learning

1.1.1 The growing scale

1.1.2 What can we do?

1.2 Distributed systems

1.2.1 What is a distributed system?

1.2.2 The complexity and patterns

1.3 Distributed machine learning systems

1.3.1 What is a distributed machine learning system?

1.3.2 Are there similar patterns?

1.4 What we will learn in this book