4 Model serving patterns

 

This chapter covers

  • Using model serving to generate predictions or make inferences on new data with previously trained machine learning models.
  • Handling the growing number of model serving requests and achieving horizontal scaling with the help of replicated model serving services.
  • Processing large model serving requests by leveraging the sharded services pattern.
  • Assessing model serving systems and determining whether event-driven design would be beneficial for improving resource efficiency.

4.1 What is model serving?

4.2 Replicated services pattern: Handling growing number of serving requests

4.2.1 Problem

4.2.2 Solution

4.2.3 Discussion

4.2.4 Exercises

4.3 Sharded services pattern: Processing large model serving requests with high resolution videos

4.3.1 Problem

4.3.2 Solution

4.3.3 Discussion

4.3.4 Exercises

4.4 Event-driven processing pattern: Responding model serving requests based on events

4.4.1 Problem

4.4.2 Solution

4.4.3 Discussion

4.4.4 Exercises

4.5 References

4.6 Summary