chapter three

3 Summarizing text using LangChain

This chapter covers

Summarization of large documents exceeding the LLM’s context window
Summarization across multiple documents
Summarization of structured data

In Chapter 1, you explored three major LLM application types: summarization engines, chatbots, and AI agents. In this chapter, you'll begin building practical summarization chains using LangChain, with a particular focus on the LangChain Expression Language (LCEL) to handle various real-world scenarios. A chain is a sequence of connected operations where the output of one step becomes the input for the next—ideal for automating tasks like summarization. This work lays the foundation for constructing a more advanced summarization engine in the next chapter.

Summarization engines are essential for automating the summarization of large document volumes, a task that would be impractical and costly to handle manually, even with tools like ChatGPT. Starting with a summarization engine is a practical entry point for developing LLM applications, providing a solid base for more complex projects and showcasing LangChain’s capabilities, which we’ll further explore in later chapters.

3.1 Summarizing a document bigger than context window

3.1.1 Chunking the text into Document objects

3.1.2 Split

3.1.3 Map

3.1.4 Reduce

3.1.5 Map-reduce combined chain

3.1.6 Map-reduce execution

3.2 Summarizing across documents

3.2.1 Creating a list of Document objects

3.2.2 Wikipedia content

3.2.3 File based content

3.2.4 Creating the Document list

3.2.5 Progressively refining the final summary

3.3 Summarization flowchart

3.4 Summary