3 Summarizing text using LangChain

 

This chapter covers

  • Summarization of large documents exceeding the LLM’s context window
  • Summarization across multiple documents
  • Summarization of structured data

In Chapter 1, you explored three major LLM application types: summarization engines, chatbots, and autonomous agents. In this chapter, you'll begin building practical summarization chains using LangChain, with a particular focus on the LangChain Expression Language (LCEL) to handle various real-world scenarios. A chain is a sequence of connected operations where the output of one step becomes the input for the next—ideal for automating tasks like summarization. This work lays the foundation for constructing a more advanced summarization engine in the next chapter.

Summarization engines are essential for automating the summarization of large document volumes, a task that would be impractical and costly to handle manually, even with tools like ChatGPT. Starting with a summarization engine is a practical entry point for developing LLM applications, providing a solid base for more complex projects and showcasing LangChain’s capabilities, which we’ll further explore in later chapters.

3.1 Summarizing a document bigger than context window

3.1.1 Chunking the text into Document objects

3.1.2 Split

3.1.3 Map

3.1.4 Reduce

3.1.5 Map-reduce combined chain

3.1.6 Map-reduce execution

3.2 Summarizing across documents

3.2.1 Creating a list of Document objects

3.2.2 Wikipedia content

3.2.3 File based content

3.2.4 Creating the Document list

3.2.5 Progressively refining the final summary

3.3 Summarization flowchart

3.4 Summary