7 Optimizing Performance for Foundational Models
This chapter covers
- Understanding the challenges of compute and memory
- Evaluating and refining performance of foundational models
- Distributed computing approaches to achieving optimization
- Strategies on optimizing performance with Step Functions and AWS Lambda
Generative AI models, known for their immense scale and complexity, have unlocked unprecedented capabilities in areas like natural language processing and generative art. While these models drive advanced end-to-end solutions, they also present challenges in computational efficiency and resource utilization, making optimization essential.
This chapter delves into the multifaceted endeavor of optimizing performance for foundational models, providing insights into the underlying challenges and different strategies that can be employed to address them. We will explore how AWS provides services that complement the optimization process well, such as through utilizing Step Functions, a serverless orchestration service that coordinates multiple AWS services into serverless workflows, and AWS Lambda, a serverless compute service that lets you run code without provisioning or managing servers. We will explore these concepts through the scenario of performing a book review with Amazon Bedrock. Through understanding these strategies, you will know when to apply which tools for the best performance of your LLMs.