appendix-b

Appendix B. LLM-Powered Summarization

 

In Appendix B we added /search, giving the news reader the ability to find articles by using a fuzzy seach with vectors in stead of relying solely on keywords. The key infrastructure for that was Qdrant, a vector database that stores high-dimensional representations of each article, and Ollama, a local model server running nomic-embed-text to convert article text into those vectors. But returning a ranked list of article titles still puts the burden on the reader to figure out what matters. This appendix takes the next step: use those search results to generate a human-readable summary with a Large Language Model (LLM).

Qdrant will continue to provide retrieval, finding articles related to the topic we want to summarize. Ollama gains a second responsibility: alongside running nomic-embed-text for embedding, it also serves llama3, a text generation model that synthesizes retrieved articles into prose. The domain.Storage and domain.Searchable interfaces from Appendix B require no changes; the new summary handler consumes them directly. Our focus is to add a new summarizer package and a new HTTP handler leveraging LLMs with the ability to be flexible with our design for the future.

B.1 LLM Technologies

B.1.1 Ollama: Running Models Locally

B.1.2 LangChain: A Consistent Interface for LLMs

B.2 Infrastructure Setup

B.3 Creating the Summary Module

B.3.1 Config and Initialization

B.3.2 The Summarize Method

B.3.3 Building the Prompt

B.4 Wiring It into the API

B.4.1 What the Handler Needs

B.4.2 The Summary Handler

B.4.3 Updating the API Main

B.5 End-to-End Test

B.6 What to Try Next

B.7 Summary