chapter fourteen

14 Project: Keeping family traditions alive with Airflow and Generative AI

This chapter covers

The concept of Retrieval-Augmented Generation (RAG).
Implementing Airflow tasks to populate a vector database with your content.
Retrieving relevant documents from a vector database using vector similarity search.
Using a large language model (LLM) to generate content based on your knowledge base.

In recent years, the Generative AI (GenAI) revolution has reshaped the way we create text, audio, and image-related content. GenAI systems have emerged as powerful tools capable of generating coherent, contextually relevant text that closely mimics human writing, opening new possibilities across various sectors, from marketing and copywriting to education and customer service.

Having high-quality data is paramount to building a good GenAI system or product, as poor input data will inevitably lead to poor results. Fortunately, Airflow can play an important role in ensuring high quality input data by automating the processes involved in data preparation. In this chapter we’ll explore Airflow’s role in building robust GenAI solutions with an example use case involving family recipes.

As we navigate this new era, the demand for high-quality, curated data has never been greater. Organizations and individuals alike are recognizing the importance of preparing, organizing, and providing access to their data pipelines to fuel GenAI applications.

14.1 Use case: bringing family recipes to life

14.2 Fine-tuning an existing LLM

14.3 RAG to the rescue

14.4 Uploading recipes to the Recipe Vault

14.5 Preprocess the recipes with DockerOperator

14 Project: Keeping family traditions alive with Airflow and Generative AI

This chapter covers

14.1 Use case: bringing family recipes to life

14.2 Fine-tuning an existing LLM

14.3 RAG to the rescue

14.4 Uploading recipes to the Recipe Vault

14.5 Preprocess the recipes with DockerOperator

14.6 Creating a collection to store our recipes

14.6.1 Defining how to vectorize our text

14.6.2 Creating a schema for the collection

14.6.3 Preparing our collection of recipes

14.7 Updating and creating new records in the Vector database

14.8 Deleting outdated records from the vector database

14.9 Adding recipes to the vector database

14.10 RAG in action

14.10.1 The R is for retrieving

14.10.2 Structuring our questions with prompt templates

14.10.3 Searching for recipes

14.11 Summary