13 Project: Keeping family traditions alive with Airflow and Generative AI

 

This chapter covers

  • The concept of Retrieval-Augmented Generation (RAG).
  • Implementing Airflow tasks to populate a vector database with your content.
  • Retrieving relevant documents from a vector database using vector similarity search.
  • Using a large language model (LLM) to generate content based on your knowledge base.

In recent years, the Generative AI (GenAI) revolution has reshaped the way we create text, audio, and image-related content. GenAI systems have emerged as powerful tools capable of generating coherent, contextually relevant text that closely mimics human writing, opening new possibilities across various sectors, from marketing and copywriting to education and customer service.

As we navigate this new era, the demand for high-quality, curated data has never been greater. Organizations and individuals alike are recognizing the importance of preparing, organizing, and providing access to their data pipelines to fuel GenAI applications.

In this chapter, we will explore Apache Airflow’s role in the GenAI landscape. Airflow enables the orchestration of GenAI data pipelines by automating the processes involved in data preparation, empowering users to harness GenAI's full capabilities.

13.1 Fine-tuning an existent LLM

13.2 RAG to the rescue

13.3 Uploading recipes to the Recipe Vault

13.4 Preprocess the recipes with DockerOperator

13.5 Creating a collection to store our recipes

13.5.1 Defining how to vectorize our text

13.5.2 Creating a schema for the collection

13.5.3 Preparing our collection of recipes

13.6 Updating and creating new records in the Vector database

13.7 Deleting outdated records from the vector database

13.8 Adding recipes to the vector database

13.9 RAG in action

13.9.1 The R is for retrieving

13.9.2 Structuring our questions with prompt templates

13.9.3 Searching for recipes

13.1 Summary