chapter four
4 Running Inference
This chapter covers
- Generation of different types of content.
- How to calculate the cost of doing inference with a large model.
- Areas for performance improvement and cost savings to look at.
In chapters 2 and 3 examples of data preparation and tuning of SLMs have been provided. This chapter now introduces you to the SLM inference space and provides some tips on how to understand how much you are going to spend in terms of GPU power and where to look for potential performance and cost improvements.