4 Summarizing text using LangChain

This chapter covers

Summarization of large documents exceeding the LLM’s context window
Summarization across documents
Summarization of structured data

In chapter 1, we discussed the three main solutions for most LLM needs: the summarization (or query) engine, the chatbot, and the autonomous agent. This chapter lays the groundwork for a summarization engine, which we'll develop further in the next chapter. Subsequent chapters will focus on chatbots and knowledge agents.

Summarization engines come in handy when you need to automate summarization tasks for numerous documents regularly. Doing this manually, even with tools like ChatGPT, isn't practical for large volumes.

A summarization engine serves as a straightforward yet practical starting point to delve into LLM application development. It provides foundational elements for more complex applications and offers an opportunity to showcase LangChain functionalities we'll use in upcoming chapters.

Before diving into building a summarization engine, it's essential to understand various summarization techniques. We'll explore methods tailored to different scenarios, considering factors like document size, content consolidation, and handling structured data such as table rows.

In the LangChain introduction in section 1.7.2, you've already summarized a small document using a PromptTemplate, so we can skip this scenario.

4.1 Summarizing a document bigger than context window

4.1.1 Chunking the text into Document objects

4.1.2 Map

4.1.3 Reduce

4.1.4 Map-reduce orchestration

4 Summarizing text using LangChain

This chapter covers

4.1 Summarizing a document bigger than context window

4.1.1 Chunking the text into Document objects

4.1.2 Map

4.1.3 Reduce

4.1.4 Map-reduce orchestration

4.1.5 Map-reduce execution

4.2 Summarizing across documents

4.2.1 Creating a list of Document objects

4.2.2 Wikipedia content

4.2.3 File based content

4.2.4 Creating the Document list

4.2.5 Progressively refining the final summary

4.3 Summarizing structured data

4.3.1 Retrieving the structured data

4.3.2 Creating the Document list

4.3.3 Summarizing the records

4.4 Summarization flowchart

4.5 Summary

4 Summarizing text using LangChain

This chapter covers

4.1 Summarizing a document bigger than context window

4.1.1 Chunking the text into Document objects

4.1.2 Map

4.1.3 Reduce

4.1.4 Map-reduce orchestration

4.1.5 Map-reduce execution

4.2 Summarizing across documents

4.2.1 Creating a list of Document objects

4.2.2 Wikipedia content

4.2.3 File based content

4.2.4 Creating the Document list

4.2.5 Progressively refining the final summary

4.3 Summarizing structured data

4.3.1 Retrieving the structured data

4.3.2 Creating the Document list

4.3.3 Summarizing the records

4.4 Summarization flowchart

4.5 Summary

Unable to load book!