chapter nine

9 Creating an LLM project: Reimplementing Llama 2

This chapter covers

Implementing Meta’s Llama2 model
Training a simple LLM
Making improvements to it to prepare it for production
Serving the model to a production endpoint you can share with your friends

For the first major project in the book, we wanted to start from scratch. We’ve been showing you how to work with LLMs from end to end, and we are going to put it all together in this chapter. This includes doing the pre-training of a model roughly following a research paper. We won’t dive too deeply into the actual research, in fact, we’ll take several shortcuts here as this isn’t the focus of this book. We will, however, be showcasing how to train the model, preparing the model for servings with quantization, finetuning it with Low-Rank Adaptation (LoRA) for a specific purpose or task, and deploying it to a production environment you can showcase to your friends.

This chapter will be very dense, but you should be more than prepared to meet the challenge by this point. This is because it’s mainly a data scientist-focused project for production. We chose this project so that you can put all the lessons you’ve learned throughout the book together into one place and leave you with end-to-end hands-on experience.

9.1 Implementing Meta’s Llama

9.1.1 Tokenization and Configuration

9.1.2 Dataset, Data Loading, Evaluation, and Generation

9 Creating an LLM project: Reimplementing Llama 2

This chapter covers

9.1 Implementing Meta’s Llama

9.1.1 Tokenization and Configuration

9.1.2 Dataset, Data Loading, Evaluation, and Generation

9.1.3 Network Architecture

9.2 Simple Llama

9.3 Making it better

9.3.1 Quantization

9.3.2 LoRA

9.3.3 FSDP-QLoRA

9.4 Deploy to a Hugging Face Hub Space

9.5 Summary