chapter eleven

11 Building the qualitative engine with news analysis and LLMs

This chapter cover

Building a Retrieval-Augmented Generation (RAG) pipeline for qualitative market signals
Cleaning and unifying multi-source news data for consistent retrieval
Embedding text into vector representations with auditable metadata
Designing structured prompts to extract quantifiable signals from context
Deploying the “LLM Analyst” to generate Policy Tone, Supply Risk, and Novelty scores

In Chapter 10, we meticulously engineered our quantitative engine. By transforming a universe of (Exchange-Traded Fund)ETF price and volume data into predictive features, we trained a machine learning model to decipher the market's numerical language. That engine listens to the rhythms of price, momentum, and correlation. But the market doesn't just speak in numbers; it speaks in narratives, fears, and expectations. A central bank's subtle shift in tone, a breakthrough technological announcement, or a sudden geopolitical flare-up—these are the qualitative events that numbers alone often fail to capture until it's too late.

11.1 The new playbook: from market numbers to market narratives

11.1.1 The hard truth: a reality check on LLMs in investment research

11.1.2 Our philosophy: the LLM as a probabilistic analyst, not a deterministic oracle

11.1.3 The strategist's hunt for alpha: where qualitative signals shine

11.1.4 Designing the Qualitative Engine architecture

11.2 Implementing the news ingestion and processing pipeline

11.2.1 The data sourcing reality: building our own information infrastructure

11.2.2 Our pragmatic toolkit: a four-source blend

11.2.3 Implementing the fetcher functions

11.3 Building the RAG feature pipeline: from raw news to retrieval-ready vectors

11.3.1 Data unification and cleaning

11.3.2 Chunking strategy: choosing the right granularity

11.3.3 Text embedding with OpenAI

11.3.4 Vector database storage with ChromaDB

11.3.5 Verification: trust, but verify the pipeline

11.4 Building the "LLM Analyst": from retrieval to quantified signals

11.4.1 Retrieval strategy: a two-step process for context and precision

11.4.2 Structured prompt engineering: the art of instructing an analyst

11.4.3 The final output: a structured, auditable signal

11.5 Summary