- Using model serving to generate predictions or make inferences on new data with previously trained machine learning models.
- Handling the growing number of model serving requests and achieving horizontal scaling with the help of replicated model serving services.
- Processing large model serving requests by leveraging the sharded services pattern.
- Assessing model serving systems and determining whether event-driven design would be beneficial for improving resource efficiency.
4.1 What is model serving?
4.2 Replicated services pattern: Handling growing number of serving requests
4.2.1 Problem
4.2.2 Solution
4.2.3 Discussion
4.2.4 Exercises
4.3 Sharded services pattern: Processing large model serving requests with high resolution videos
4.3.1 Problem
4.3.2 Solution
4.3.3 Discussion
4.3.4 Exercises
4.4 Event-driven processing pattern: Responding model serving requests based on events
4.4.1 Problem
4.4.2 Solution
4.4.3 Discussion
4.4.4 Exercises
4.5 References
4.6 Summary