9 Optimizing cost and quality
This chapter covers
- Model choice and tuning
- Prompt engineering
- Fine-tuning models
Analyzing data with large language models is a great way to burn money quickly. If you’ve been using GPT-4 (or a similarly large model) for a while, you’ve probably noticed how fees pile up quickly, forcing you to recharge your account regularly. But do we always need to use the largest (and most expensive) model? Can’t we make smaller models perform almost as well? How can we get the most bang for our buck?
This chapter is about saving money when using language models on large data sets. Fortunately, we have quite a few options for doing so. First, we have lots of choices when it comes to large language models. Selecting a model that is as small (or, rather, as cheap) as possible while still performing well on our analysis task can go a long way toward balancing our budget. Second, models typically have various tuning parameters, allowing us to tune everything from the overall text generation strategy to the way specific tokens are (de-)prioritized. We want to optimize our settings there to turn small models into GPT-4 alternatives for certain tasks. Third, we can use prompt engineering to tweak the way we ask the model our questions, sometimes leading to surprisingly different results!