1 Intro to enterprise RAG

MEAP v1

This chapter covers

Introducing Retrieval Augmented Generation (RAG)
Understanding the difference between Naive RAG and Enterprise RAG
Exploring why businesses need Enterprise RAG
Showcasing real-world use cases of RAG in action

NOTE

If you are already familiar with RAG, you may want to skip to Chapter 3.

Have you ever wished for a magic helper who could give you the exact information you need, exactly when you need it, without any hassle? Let me tell you about my assistant, Raginald. He's not just any helper; he's incredibly quick and always knows precisely where to find what I'm looking for.

One day, I casually asked, "Raginald, could you please tell me how much Product XYZ costs?" Without missing a beat, he dashed downstairs to the archives. He ran through a maze of dusty shelves filled with old books. He reached for one specific volume, flipped it open to the exact page, and found the price of Product XYZ. Moments later, he was back at my side, beaming with pride. "The price for Product XYZ is two dollars and forty-five cents," he said with a confident smile.

Wouldn't it be amazing if getting information was always that easy? Imagine having access to any data you need in seconds, without the usual stress. That's where Retrieval Augmented Generation, or RAG, comes into the picture. It's like having a digital Raginald who can access all your company's information instantly, without any physical limitations.

So, what exactly is Retrieval Augmented Generation? Simply put, RAG is an advanced AI technology that combines the conversational skills of chatbots with real-time data retrieval from various sources. Think of it as a super-efficient assistant who understands your questions in plain English and knows exactly where to find the answers. You can ask, "What's the current status of Project GKR?" or "How many units of Product XYZ did we sell last quarter?" The AI understands your question, searches through your databases, and gives you an accurate answer—all in seconds.

But RAG doesn't just answer simple questions; it goes further by providing detailed, specific information tailored to your needs. It can adjust how it responds, adapting to whatever language you choose. It connects with various data sources—like databases, PDF documents, and apps like Slack—so no matter where your information is stored, RAG can find it and present it to you in the best way possible.

In RAG, an AI is hooked up to a search engine. Whenever a user asks a question, the AI engages, searches for an answer to that question in a database, and then delivers the answer as a well-written sentence or paragraph. Take a look at the figure below to see how a RAG system interacts with an LLM.

But not all RAG setups are created equal. Many beginners start with what I like to call “Naive RAG,” which works like this: First, you convert the user’s question into an embedding, a numerical representation that captures the underlying meaning of the question rather than just its exact wording. These numerical codes act like “semantic fingerprints,” allowing a system to understand not only what you’re asking but also how it relates to other pieces of information.

Next, you compare that question embedding to a bunch of embeddings stored in a vector database—a specialized data store designed to hold and quickly search through these numerical representations. Unlike traditional databases that focus on exact keyword matches, vector databases retrieve information based on conceptual similarity. So even if you don’t use the exact words from the stored documents, the system can still find relevant content. The closest matching text is then passed back to the LLM, along with your question, so the model can craft an answer using the provided context. It’s a straightforward process, and you’ll find tons of how-to videos on YouTube walking you through this kind of setup.

In practice, though, Naive RAG often stumbles in a business setting. It might serve you the wrong data or churn out inaccurate answers when queries get complicated or your dataset is huge. Enter Enterprise RAG, which goes way beyond that simple approach. It systematically handles complex real-world challenges—like multi-database queries, updated data, and clarifying ambiguous user questions. In short, it makes retrieval more efficient, more accurate, and far more user-friendly, even at enterprise scale. Finally, we'll look at real-world examples where RAG makes a significant impact—from small mom-and-pop stores to large corporations, healthcare providers, and educational institutions. By the end of this chapter, you'll see how RAG can transform the way you interact with data, making your work life not just easier but also more productive and enjoyable.

Throughout this book, we'll guide you step-by-step on how to build your own Enterprise RAG system. By the end, you'll have the knowledge and tools to implement a scalable RAG solution tailored to your business needs, and you will be empowered to harness the full potential of AI-driven data retrieval.

1.1 A brief intro to RAG

Now that we've introduced the concept of RAG and how it can serve as your digital Raginald, let's see how it can transform your daily operations. Imagine a customer calls your company asking about a product: "Is Product XYZ flammable?"

Without RAG, you'd have a big task ahead of you. First, you'd need to figure out which of your company’s multiple different databases has the order information—a big job by itself. Then, you'd have to write a precise search query, making sure to spell the product name exactly right; any mistake could mean no results. Once you've pulled up the product information, you'd need to manually look through it all to see if Product XYZ is flammable. After all that work, which could easily take substantial time and effort because of messy data and multiple databases, you'd finally have an answer for the customer.

To illustrate this process, take a look at figure 1.1. It shows the traditional method of handling such a query without a RAG system. You'll notice how time-consuming and complex the steps are, involving manually writing search queries and combing through data.

Figure 1.1 Traditional manual workflow for retrieving answers, requiring database queries, corrections, and manual review. This process is time-consuming, and requires a lot of effort.

But what if you tried using Naive RAG to help? As we learned earlier, Naive RAG works by taking a question, turning it into numbers (we call this embedding), and then searching for similar information in your data. It then uses a language model to try to answer your question based on what it finds. Sounds helpful, right? But here's the catch: Naive RAG can sometimes be worse than not using RAG at all. It often pulls up the wrong information because it doesn't use the right search methods. Even when it finds the right information, it might mix things up or "hallucinate," giving you an answer that's not accurate. The majority of online tutorials teach this kind of RAG, and businesses are not getting great results with it.

For our current example, you might ask Naïve Rag, “I like dogs. Dogs are cute, dogs are funny, dogs are my favorite thing in the world. Speaking of dentists, do you sell knitted hats for cats?”. The Naïve RAG incorrectly focuses on the word “dog” and retrieves a record for a book named “How to Teach Your Dog About A.I.”, but no information that could be helpful to answering your question. Stumped, the naïve RAG hallucinates an answer to your question: “Yes, we have knitted hats for cats. They are made of alpaca hair and cost $479 each.” This is even worse than returning no information, because sometimes the incorrect information can sound plausible.

Refer to figure 1.2 to see how Naive RAG attempts to streamline the information retrieval process. While it seems more efficient on the surface, the flowchart does not show Naïve RAG’s potential pitfalls like inaccurate data retrieval and misinformation due to the lack of contextual understanding.

Figure 1.2 Basic RAG process with embedding, vector search, and a large language model. This simple approach is efficient but prone to errors and lacks context handling.

Now, let's see how the scenario plays out with Enterprise RAG. You simply type into the chatbot: " I like dogs. Dogs are cute, dogs are funny, dogs are my favorite thing in the world. Speaking of dentists, do you sell knitted hats for cats?” Behind the scenes, the AI gets to work. It uses natural language processing to understand your question, even though you used indirect language. The AI then smartly turns your question into an optimized search, checking all the relevant databases at once. For example, it might focus on "knitted hats for cats" to make sure it searches effectively. In about ten to thirty seconds, the chatbot responds in a full sentence: "Yes, we sell knitted hats for cats. Each hat costs around $20 and they are made from wool yarn."

In this example, Enterprise RAG cut your search time by a whopping 90%, from five minutes to just half a minute. Figure 1.3 illustrates how Enterprise RAG enhances the entire RAG process. The flowchart shows a more capable, reliable system that reduces manual effort and increases accuracy by making full use of AI’s advanced capabilities.

Figure 1.3 Enterprise RAG pipeline improves speed, accuracy, and scalability by incorporating validation, query rewriting, and asynchronous agents, reducing response times to 30 seconds..

This setup isn’t just about speeding things up—it’s about making everything more accurate, less complex, and accessible for everyone in your company, no matter their technical expertise.

Here’s how it works. When a user submits a question, the system kicks things off with Input Validation. This step checks that the question makes sense and follows any needed format or content guidelines, which helps avoid misunderstandings and errors down the line. For instance, it can filter out overly lengthy or too-short queries, setting up a smoother process from the beginning.

Next, we head into the Question Triage stage. Here, the system categorizes the question based on what it’s about and how complex it is, deciding on the best route for retrieving information. If someone asks about a specific product, that might set off a sequence where agents search through a product database. A question about an order might go down a different path entirely. This triage step is crucial because it ensures the question follows the right path, making the entire retrieval process more efficient.

Once categorized, the question may go through Query Rewriting. This step is like giving the question a quick makeover to better match the language and keywords of the databases it’s searching. If a user asks, “What’s the warranty on my X24 blender?” this stage might rephrase it to “X24 blender warranty” to perform a more precise search. Query rewriting is especially helpful when users ask questions informally, boosting the chances of finding spot-on answers.

Now, with a refined query, the system activates Asynchronous Agents backed by a High-Quality Language Model to start the search. These agents conduct Enterprise Search across multiple sources at the same time to bring back results quickly and accurately. The system performs a search using an AI Search Index, which combines keyword search and vector search. And if nothing relevant turns up, the system prompts the user to rephrase or clarify their question, keeping the search focused and avoiding delays.

When the data comes in, it’s sorted through the Order and Filter Results stage. This step makes sure that only relevant data is passed along, and orders it so the system’s answers are more consistent.

Finally, we reach the Writer Agent. This agent pulls together the most relevant information and crafts it into a clear, polished response. It’s like having an editor on standby, ensuring the answer is both accurate and easy to understand, ready for the user’s immediate needs.

From end to end, this system is designed to be flexible and scalable. It handles both structured data, like SQL records, and unstructured information, such as documents, allowing it to grow alongside your business. Enterprise RAG evolves with you, adapting as new data sources are added and your information needs expand.

Think of this RAG system as your digital assistant, pulling the information you need instantly so your team can stay focused on meaningful work. It transforms data retrieval from a cumbersome task into a seamless experience, boosting productivity, enhancing customer service, and improving communication across your organization. By integrating Enterprise RAG, you unlock the full potential of your data, making it accessible to everyone and taking your business’s efficiency to new heights.

1.2 The difference between naive RAG and enterprise RAG

Let’s dive a little deeper into the difference between Naive RAG and Enterprise RAG, so we can understand why, in a business environment, Naive RAG just won’t cut it. Remember our trusty assistant, Raginald? He's the one who dashes off to fetch any piece of information I need, returning with the answer in a few seconds. We've talked about how Retrieval Augmented Generation, or RAG, makes this possible in a simple setting. But what happens when we try to scale this up for an entire company? Well, that's when things get a bit more interesting.

In its simplest form, RAG is pretty straightforward. It's like having Raginald at your beck and call. If I ask him, "What's the price of product XYZ?" he knows exactly where to find that information and delivers it promptly. There are countless tutorials online showing how to implement this basic version of RAG. You don't need to be a tech wizard to get it up and running. It's accessible, efficient, and works well for very simple tasks. Figure 1.4 illustrates the simplicity of the Naïve RAG pipeline.

Figure 1.4 A naive RAG pipeline with limited steps for retrieving answers. Suitable for simple queries but insufficient for handling complex or large-scale enterprise needs.

Now, let's take this concept and apply it to a large company with vast amounts of data, multiple departments, and diverse customer needs. Suddenly, our simple setup starts to look a bit overwhelmed. Imagine Raginald trying to manage not just a single bookshelf but an entire library filled with countless volumes. Here's where he starts to sweat. Each step in the process comes with its own set of challenges, turning a once-simple task into a complex operation. These considerations are depicted in Figure 1.5, highlighting the critical questions to address when building an enterprise-grade RAG system.

Figure 1.5 Key questions for designing enterprise RAG systems, addressing user input limits, database performance, context accuracy, and feedback management for better scalability and reliability.

Consider the variety of questions that come into a global company. Questions come in all shapes and sizes, and handling this diversity is a significant challenge. For instance, customers might ask questions in different languages. A company operating internationally can't expect all its customers to speak the same language, so the system needs to understand and process multilingual queries. Imagine a customer from Spain asking about a product in Spanish, while another from Japan inquires in Japanese. The RAG needs to comprehend both and provide accurate responses.

Efficient and accurate data retrieval is another hurdle. With vast databases containing millions of records, retrieving the right information quickly is critical. The system needs to know which databases to search and must ensure that the data is up-to-date. Customers today expect near-instant responses; slow retrieval times can lead to frustration and a poor user experience.

Accuracy is paramount. Providing incorrect or irrelevant information can harm the customer's trust in the company. Imagine if a customer asks about the availability of a product, and the system incorrectly tells them it's out of stock when it's actually available. Such errors can lead to lost sales and damage the company's reputation.

Trusting the Large Language Model (LLM) that powers the AI adds another layer of complexity. LLMs are powerful but not infallible. Sometimes they might generate information that sounds plausible but isn't true—a phenomenon known as "hallucination" in AI terms. For instance, the AI might confidently state that a product has certain features it doesn't actually have. Just like people, AI can make mistakes, and companies need to anticipate and minimize these errors. This is especially important because there can be legal implications; if the AI provides incorrect information that leads to customer dissatisfaction or harm, the company might be held liable. For example, Canada Air was legally forced to follow through on a promise that its AI chatbot made to a customer. (https://www.bbc.com/travel/article/20240222-air-canada-chatbot-misinformation-what-travellers-should-know)

Delivering the final answer to the user is the last part of the process, but it's just as important as the earlier steps. Ensuring that the AI provides consistent answers over time and across similar queries helps build trust and reliability.

Sometimes, the AI won't have an answer, and it needs a graceful way to communicate this. It's also important for the AI to communicate to the user its capabilities and limitations so that users have realistic expectations.

Managing costs is another critical aspect of scaling RAG for enterprise use. Implementing features like agents or rewriting queries can enhance the accuracy of responses but often come at an additional expense. Monitoring their usage and optimizing workflows helps control costs. Backend compute and search services can become expensive, especially at scale. Regularly assessing usage patterns and employing cost-effective infrastructure can make a significant difference in managing the budget.

Proper safeguards, or guardrails, are essential to maintain the integrity of the RAG system. The AI must be instructed to avoid using inappropriate language or making statements that could lead to negative publicity. Imagine if the AI accidentally used profanity or made an insensitive remark—it could make the company go viral for all the wrong reasons. Guardrails need to be in place to prevent the system from saying anything that could harm the company's reputation.

Protecting sensitive information is also critical. The AI should not accidentally leak confidential data, such as personal customer information or proprietary company details. This includes having strict access controls and setting policies for how data can be retrieved and shared.

Bringing all these elements together, scaling Retrieval Augmented Generation for enterprise use isn't just about making the system bigger; it's about making it smarter, more reliable, and more user-friendly. The journey from a Naive RAG setup to an enterprise-level solution is filled with challenges, but the rewards are significant. Companies can provide faster, more accurate customer service, make data-driven decisions more efficiently, and stay competitive in a rapidly evolving market.

As for Raginald, he's not just running up and down the stairs anymore. In this enterprise scenario, he's now equipped with a jetpack, multilingual dictionaries, and a network of fellow assistants—all working together to keep the company’s data retrieval operations running smoothly. He's become a symbol of the advanced, efficient, and responsive AI systems that modern enterprises need to succeed.

So, while the challenges are many, the potential benefits of scaling RAG are enormous. It's about embracing technology to enhance human capabilities, much like giving Raginald that jetpack. With thoughtful implementation and a focus on addressing the complexities, companies can transform their operations and set themselves up for success in the age of AI.

1.3 Why businesses need enterprise RAG

We've spent some time with our reliable assistant, Raginald. While having a real-life assistant like Raginald sounds amazing, it's not exactly practical—or budget-friendly—for most businesses. That's where Enterprise RAG steps in. Enterprise RAG isn't just about speed; it's about empowering your employees with instant access to the information they need to perform their jobs effectively. With Enterprise RAG, answers are just a question away, enabling employees to focus on what they do best.

This instant access to information leads to more streamlined operations. Quick access to data means decisions can be made faster, projects can move forward sooner, and opportunities aren't missed because of delays in information retrieval. Picture an investor evaluating the latest news about a stock: with a RAG system, they can pull together all the relevant headlines and market data in moments, allowing them to make a buy-or-sell decision far more promptly (and accurately). That kind of speed can be the difference between a winning trade and a missed chance.

Moreover, when everyone in an organization has access to the information they need, collaboration becomes smoother, and productivity soars. Teams can share insights effortlessly, coordinate tasks efficiently, and align their efforts toward common goals. For instance, a marketing department can instantly access customer feedback collected by the customer service team, enabling them to adjust their strategies in real time. This level of collaboration enhances innovation and drives business growth.

Time is of the essence, especially when it comes to customer service. The quicker you can resolve issues, the happier your customers will be, and the more efficient your operations become. Consider a scenario where a customer contacts support with a complex issue. Traditionally, the support agent might need to search through multiple databases, consult with other departments, or put the customer on hold to find the necessary information. This process can be time-consuming and frustrating for both the agent and the customer.

With Enterprise RAG, the support agent can access all relevant information almost instantly. Companies that have implemented RAG often see significant reductions in customer service resolution times. Linkedin’s customer support team has reduced its time-per-issue by 28.6% (https://www.evidentlyai.com/blog/rag-examples). This leads to happier customers who feel valued and heard, and it boosts the morale and productivity of support teams who can now handle inquiries more efficiently. Faster resolution times not only improve customer satisfaction but also allow support teams to focus on more complex issues that require human insight.

RAG is breaking down barriers by being the first AI application that's universally useful across all types of businesses. Whether you're a large corporation handling vast amounts of data across multiple departments, a small mom-and-pop store looking to improve customer interactions, or a medium-sized business aiming to scale operations flexibly, RAG provides tools that can adapt to your specific needs.

By drawing information from multiple sources, RAG provides a comprehensive and accurate response, reducing the risk of missing critical details. This versatility means that employees spend less time gathering information from different systems and more time making meaningful use of the data. It fosters a more informed workforce, capable of making better decisions and driving innovation.

However, before diving in, it's important to consider a few key factors—including the cost of implementation. Setting up a RAG system isn't just about installing software; it involves expenses related to renting or acquiring the necessary hardware and software, as well as ongoing operational costs. For instance, you will need to invest in cloud computing resources to handle the processing demands of advanced AI models.

Additionally, the technical skills required to set up and maintain the system are an important consideration. Implementing RAG often requires expertise in AI, machine learning, and data engineering. This could mean hiring new staff with the right skills or investing in training for your existing team.

It's also important to acknowledge that implementing RAG isn't a plug-and-play solution. Your company will need to invest in data preparation, software development, and ongoing maintenance to ensure the system runs effectively. AI technology evolves rapidly, and keeping your RAG system up-to-date may require regular updates, which come with their own costs.

Weighing these considerations carefully—including the financial investment and resource allocation—can help you determine if RAG is the right fit for your business. When implemented thoughtfully and with a clear understanding of the costs involved, Enterprise RAG offers a path to greater efficiency and success. It's not just a technological advancement; it's a strategic asset that can transform your business operations and position you for long-term growth.

Imagine a world where every employee has a digital assistant as efficient and enthusiastic as Raginald. That's the promise of Enterprise RAG—a scalable, cost-effective solution that brings the benefits of instant information retrieval to your entire organization. Whether you're a startup looking to make your mark or an established company aiming to streamline operations, Enterprise RAG offers significant benefits that can outweigh the initial costs, leading to greater efficiency and success.

1.4 Example use cases

To bring the concept to life, we'll look at real-world examples where RAG makes a significant impact across various industries. Imagine you run a small coffee roastery with just a few popular bean varieties. One morning, you ask your RAG system a straightforward question: “Did we run out of Ethiopian beans yet, and how soon should we reorder?” In seconds, the system consults your purchase logs stored in a PDF inventory sheet, along with sales data from your records, and responds, “You brought in 20 bags of Ethiopian beans last week, and you’ve sold 15—mostly over the past three days. Your supplier typically takes a week to deliver, so we might want to reorder soon.” Instead of manually checking documents or spreadsheets, you get an instant, reliable answer that helps you stay ahead of demand and keep your customers happy.

Scaling up to a large corporation, suppose you're an executive at a major beverage company, keen to stay ahead of industry trends. You ask, "What questions were asked of our competitor during their last earnings call?" The RAG chatbot efficiently compiles the information: "During the last earnings call, analysts asked about their plans for international expansion, the impact of rising prices on production costs, and their strategy for embracing sustainable packaging." Armed with this insight, you can tailor your strategies, address similar concerns proactively, and perhaps even outmaneuver your competition.

In a busy hospital, doctors need quick access to patient data and medical research. A physician might ask, "What's the latest on Patient Smith's test results, and are there any new studies on her condition?" RAG responds promptly: "Patient Smith's latest tests show improved kidney function and stabilized blood pressure. A recent study published this month suggests a new treatment protocol that could be beneficial for her condition." This rapid retrieval of patient information and relevant research can enhance patient care and potentially save lives.

In the fast-paced business world, staying on top of financial performance is crucial. A business owner might inquire, "Summarize the key takeaways from my sales, P&L, and revenue reports." RAG delivers: "Sales have increased 8% compared to last quarter, with strong growth in the online segment. However, profit margins are slightly down due to rising supply costs. Revenue overall shows steady improvement, but inventory costs need closer attention." With this summary, you can make informed decisions quickly, focusing on strategic adjustments without having to comb through all the detailed financial reports.

Educators can also benefit from RAG. Imagine a university professor preparing for a lecture who asks, "Find recent case studies on renewable energy adoption in developing countries." RAG provides: "Here are three case studies from the past year focusing on solar energy initiatives in Kenya, wind farms in Chile, and hydroelectric projects in Laos, highlighting their economic and social impacts." This saves hours of research time, allowing the professor to craft a more engaging and informative lecture.

These examples highlight how RAG is transforming the way we interact with data. By providing immediate access to information, enhancing productivity, improving customer service, and offering cost savings, RAG becomes an invaluable asset for any organization. Its universal applicability means that no matter the size or sector of your business, RAG can be adapted to meet your needs and help you achieve your goals more effectively.

1.5 Building a RAG system

Now that we have a good understanding of what a RAG system does, we will implement our own from scratch. See figure 1.6 for an overview of all the components we will be building. We will start out in Chapter 2 building evals, which you can see at the bottom of the flowchart. Then in Chapter 3 we will move along to the first kind of ingestion, which optimizes documents for retrieval from a vector database. You can’t retrieve anything if you don’t have something to retrieve! After that we will work on the second kind of ingestion in Chapter 4, which transforms structured data into easily searchable records. The next step will be building the retrieval mechanism in Chapter 5, which will involve the 2 previously built search services and a new set of agents. Finally, we will deal with the generation aspect of the system, by summarizing the retrieved data and returning an answer.

Figure 1.6 Enterprise RAG system architecture showing ingestion, retrieval, and generation steps. Raw data is preprocessed, embedded, and searched to deliver accurate, context-aware answers.

Let's go over each step using an example. Imagine you are an electronics shop owner and you sell a wide range of products - from hairdryers to TV sets and music systems. Each product has multiple variations and all of them come with their own user manual. Buyers and users often ask you specific questions about the products which are very difficult for a real human to remember. Questions like “What is the battery life of XM5 headphones?” or “What is the warranty period of the MegaCorp hair dryer?” are very specific and important questions that must be addressed for the buyers to analyze what they want to purchase.

In such a scenario, a RAG-based question-answer system would be very helpful. Let’s go over each step to understand how to build a RAG system over hundreds of electronic product user manuals.

Ingestion: we preprocess those user manual PDFs so we can serve up relevant information to our language model. Using Python libraries like PyPDF2, we start by extracting text from each PDF. Then we split (or “chunk”) the content into bite-sized sections, because giving an LLM huge, unstructured documents often leads to poor answers or wasted tokens. By making these chunks logically meaningful—like splitting them by headings, paragraphs, or certain word counts—we help the LLM narrow down exactly which part of the manual to reference.

Why chunk? Smaller pieces give our system clearer context. Instead of passing in entire 30-page PDFs, we pass in just the chunk that talks about, say, “Lubrication of gears,” which drastically increases accuracy and cuts down on noise.

Once we have these text chunks, we tag them with metadata—like the product name, topic, or page number—so later on, the system can quickly filter by relevant fields. This makes retrieval much faster, since it doesn’t have to wade through every single chunk. After tagging, we embed each chunk using a high-quality embedding model. Embeddings turn your text into numerical vectors that capture meaning in a format that can be stored in a vector database, right alongside that handy metadata. Then, when a user query arrives, the system looks for embeddings that match the query and returns precisely the chunk (and context) the user needs, complete with a quick link to the source.

An example of metadata for a sample chunk: {“product name”: “Aetheraxis XM5 headphones”, “topic”: “battery maintenance”, “page number”: 8} If the user asks a question about the battery for Aetheraxis XM5 headphones, this metadata would allow us to zero in on the most relevant information without searching through hundreds of manuals.

You can also choose to avoid all these hassles and let a managed service like Azure AI Search take care of this. Managed search services provide you with a user interface to ingest your raw text into searchable records with different options and preferences.

Retrieval: Let’s move to the other side of the story where we focus on inference, or querying our RAG system. Not all users are prompt engineers and thus user queries can be vague or confusing sometimes, and it is our job to make sure we transform user queries into a search term so that we can find correct and relevant information for the user. Let's say the user asks a question “For how many hours can my aetheraxis xm5 headphones run after a full charge?”. This is difficult to search because our user manuals contain information in a very technical language, and so we use an LLM to rewrite this question into a search query, “Aetheraxis ultrasound headphones battery life”. Now we ask AI agents to use this search term to query our vector database and get the top 3 or 5 chunks of data that we embedded in the previous steps. These chunks will act as reference material or context for our LLM to generate an answer.

Generation: This is the final step of our RAG system. We now have two things - user query and relevant context (the retrieved data chunks). Now all we need is to convert the relevant information into a format that is understandable to our readers and that answers their question. Since our user asked question in a very conversational way, “For how many hours can my aetheraxis xm5 headphones run after a full charge?”, we also need to make sure that our system generates an answer which is conversational and easy to understand. Let's say our retrieved chunk has following text:

“Aetheraxis XM5 headphones come with an exceptional battery life of 8 hours on a single charge with continuous playback music and 14 hours of normal usage which is 40% better than its previous model.”

We cannot directly provide this as an answer to our reader since this might be a little confusing and too technical. However, when we pass this context to our LLM along with the original user query and our system prompt, LLM will rephrase this into final response as below:

“Aetheraxis XM5 headphones come with an amazing battery life which is 40% better than its previous model. These headphones can run for 14 hours of normal usage after full charge. If you play music continuously, the battery may run out in 8 hours and need to be charged.”

Since we also have metadata added to our chunks, we can provide it along with our response just in case the user wants to verify the information. This will allow the user to find the exact document and page where the information was found. LLM services like Perplexity use this to enhance user trust in their answers.

By implementing a RAG system, the shop owner transforms a vast library of product manuals into an intelligent assistant capable of providing quick, accurate, and conversational answers to customer questions. This not only improves customer satisfaction by delivering information quickly but also frees up valuable time that can be redirected toward other critical business operations. The RAG system effectively turns complex technical documents into accessible knowledge, enabling better decision-making for buyers and elevating the overall shopping experience. In the following chapters, we'll go deeper into each of these steps, guiding you through the process of building a powerful RAG system tailored to your needs.

1.6 Summary

Retrieval Augmented Generation (RAG) is an advanced AI technology that combines conversational skills with real-time data retrieval, like an efficient assistant.
RAG allows users to ask questions in plain language and receive detailed, specific information tailored to their needs, accessing data from databases, documents, and applications like Slack.
Naive RAG, while easy to set up, often falls short in business environments due to misunderstandings of context, retrieving incorrect data, or providing inaccurate ("hallucinated") answers.
Enterprise RAG is designed to handle complex business scenarios, accurately processing diverse questions in different languages and grasping user intent.
Implementing Enterprise RAG leads to streamlined operations, faster decision-making, improved collaboration, and enhanced customer service by resolving issues quickly.
The book will guide readers step-by-step in building their own Enterprise RAG system, empowering them to harness the full potential of AI-driven data retrieval.