chapter seven

7 Optimizing Performance for Foundational Models

This chapter covers

Understanding the challenges of compute and memory
Evaluating and refining performance of foundational models
Distributed computing approaches to achieving optimization
Strategies on optimizing performance with Step Functions and AWS Lambda

Generative AI models, known for their immense scale and complexity, have unlocked unprecedented capabilities in areas like natural language processing and generative art. While these models drive advanced end-to-end solutions, they also present challenges in computational efficiency and resource utilization, making optimization essential.

This chapter delves into the multifaceted endeavor of optimizing performance for foundational models, providing insights into the underlying challenges and different strategies that can be employed to address them. We will explore how AWS provides services that complement the optimization process well, such as through utilizing Step Functions, a serverless orchestration service that coordinates multiple AWS services into serverless workflows, and AWS Lambda, a serverless compute service that lets you run code without provisioning or managing servers. We will explore these concepts through the scenario of performing a book review with Amazon Bedrock. Through understanding these strategies, you will know when to apply which tools for the best performance of your LLMs.

7.1 Understanding the Challenges of Compute and Memory with LLMs

7.1.1 Memory

7.1.2 Compute

7.1.3 Inference Optimization

7.2 Evaluation and Refining Performance of Foundational Models

7.2.1 Reinforcement Learning

7.2.2 Automatic Model Evaluations

7.2.3 Human Worker-based Model Evaluation Jobs

7.2.4 Model Evaluation Tasks

7.2.5 Creating the Appropriate Prompt Datasets

7.2.6 Understanding Model Evaluation Job Results

7.3 Distributed Computing Approaches to Achieving Optimization

7.3.1 Embracing Data Parallelism

7.3.2 Advancing with Model Parallelism

7.3.3 Harnessing Hybrid Parallelism

7.3.4 Leveraging Elastic Scaling

7.3.5 Implementing Effective Load Balancing

7.3.6 Ensuring Fault Tolerance and Recovery

7.4 Strategies on Optimizing Performance with Step Functions and Lambda

7.5 Achieving Consistency Between Deployments with CloudFormation

7.5.1 Define AWS Lambda Function

7.5.2 Configuring the Lambda Invoke Permission

7.5.3 Setting Up the Stages of the Step Function

7.5.4 Configuring the Output

7.5.5 Upload and Configure Resources Further

7.5.6 Test and Validate

7.6 Summary