chapter three

3 Summarizing text using LangChain

 

This chapter covers

  • Summarization of large documents exceeding the LLM’s context window
  • Summarization across multiple documents

In chapter 1, we explored three major LLM application types: summarization engines, chatbots, and AI agents. In this chapter, you’ll begin building practical summarization chains using LangChain, with a particular focus on the LangChain Expression Language (LCEL) to handle various real-world scenarios. A chain is a sequence of connected operations where the output of one step becomes the input for the next—ideal for automating tasks like summarization. This work lays the foundation for constructing a more advanced summarization engine in the next chapter.

Summarization engines are essential for automating the summarization of large document volumes, a task that would be impractical and costly to handle manually, even with tools such as ChatGPT. Starting with a summarization engine is a practical entry point for developing LLM applications, providing a solid base for more complex projects and showcasing LangChain’s capabilities, which we’ll further explore in later chapters.

Before we start building, we’ll look at different summarization techniques, each suited to specific scenarios, including large documents, content consolidation, and handling structured data. You’ve already worked with summarizing small documents using a PromptTemplate in chapter 2, so we’ll skip that and focus on more complex examples.

3.1 Summarizing a document bigger than the context window

3.1.1 Chunking the text into Document objects

3.1.2 Split

3.1.3 Map

3.1.4 Reduce

3.1.5 MapReduce combined chain

3.1.6 MapReduce execution

3.2 Summarizing across documents

3.2.1 Creating a list of Document objects

3.2.2 Wikipedia content

3.2.3 File-based content

3.2.4 Creating the Document list

3.2.5 Progressively refining the final summary

3.3 Summarization flowchart

Summary