3 Summarizing text using LangChain

 

This chapter covers

  • Summarization of large documents exceeding the LLM’s context window
  • Summarization across multiple documents
  • Summarization of structured data

In Chapter 1, we explored three core LLM applications: summarization engines, chatbots, and autonomous agents. In this chapter, we’ll dive into building concrete summarization chains using LangChain, primarily through LCEL, tailored for various scenarios. This will lay the foundation for the more advanced summarization engine we’ll develop in the next chapter.

Summarization engines are essential for automating the summarization of large document volumes, a task that’s not feasible to manage manually, even with tools like ChatGPT. Starting with a summarization engine is a practical entry point for developing LLM applications, providing a solid base for more complex projects and showcasing LangChain’s capabilities, which we’ll further explore in later chapters.

Before we start building, we’ll look at different summarization techniques, each suited to specific scenarios like large documents, content consolidation, and handling structured data. Since you’ve already worked with summarizing small documents using a PromptTemplate in section 1.3.2, we’ll skip that and focus on more complex examples.

3.1 Summarizing a document bigger than context window

As mentioned in chapter 2, each LLM has a maximum prompt size, also referred to as the "context window."

3.1.1 Chunking the text into Document objects

3.1.2 Split

3.1.3 Map

3.1.4 Reduce

3.1.5 Map-reduce combined chain

3.1.6 Map-reduce execution

3.2 Summarizing across documents

3.2.1 Creating a list of Document objects

3.2.2 Wikipedia content

3.2.3 File based content

3.2.4 Creating the Document list

3.2.5 Progressively refining the final summary

3.3 Summarization flowchart

3.4 Summary