3 Large language model operations: Building a platform for LLMs

 

This chapter covers

  • An overview of large language model operations
  • Deployment challenges
  • Large language model best practices
  • Required large language model infrastructure
Before anything else, preparation is the key to success.
—Alexander Graham Bell

As we learned in the last chapter, when it comes to transformers and natural language processing (NLP), bigger is better, especially when it’s linguistically informed. However, bigger models come with bigger challenges because of their size, regardless of their linguistic efficacy, thus requiring us to scale up our operations and infrastructure to handle these problems. In this chapter, we’ll be looking into exactly what those challenges are, what we can do to minimize them, and what architecture can be set up to help solve them.

3.1 Introduction to large language model operations

3.2 Operations challenges with large language models

3.2.1 Long download times

3.2.2 Longer deploy times

3.2.3 Latency

3.2.4 Managing GPUs

3.2.5 Peculiarities of text data

3.2.6 Token limits create bottlenecks

3.2.7 Hallucinations cause confusion

3.2.8 Bias and ethical considerations

3.2.9 Security concerns

3.2.10 Controlling costs

3.3 LLMOps essentials

3.3.1 Compression

3.3.2 Distributed computing

3.4 LLM operations infrastructure

3.4.1 Data infrastructure

3.4.2 Experiment trackers

3.4.3 Model registry

3.4.4 Feature stores

3.4.5 Vector databases

3.4.6 Monitoring system

3.4.7 GPU-enabled workstations

sitemap