chapter eleven

11 Building the qualitative engine with news analysis and LLMs

 

This chapter cover

  • Building a Retrieval-Augmented Generation (RAG) pipeline for qualitative market signals
  • Cleaning and unifying multi-source news data for consistent retrieval
  • Embedding text into vector representations with auditable metadata
  • Designing structured prompts to extract quantifiable signals from context
  • Deploying the “LLM Analyst” to generate Policy Tone, Supply Risk, and Novelty scores

In Chapter 10, we meticulously engineered our quantitative engine. By transforming a universe of (Exchange-Traded Fund)ETF price and volume data into predictive features, we trained a machine learning model to decipher the market's numerical language. That engine listens to the rhythms of price, momentum, and correlation. But the market doesn't just speak in numbers; it speaks in narratives, fears, and expectations. A central bank's subtle shift in tone, a breakthrough technological announcement, or a sudden geopolitical flare-up—these are the qualitative events that numbers alone often fail to capture until it's too late.

11.1 The new playbook: from market numbers to market narratives

11.1.1 The hard truth: a reality check on LLMs in investment research

11.1.2 Our philosophy: the LLM as a probabilistic analyst, not a deterministic oracle

11.1.3 The strategist's hunt for alpha: where qualitative signals shine

11.1.4 Designing the Qualitative Engine architecture

11.2 Implementing the news ingestion and processing pipeline

11.2.1 The data sourcing reality: building our own information infrastructure

11.2.2 Our pragmatic toolkit: a four-source blend

11.2.3 Implementing the fetcher functions

11.3 Building the RAG feature pipeline: from raw news to retrieval-ready vectors

11.3.1 Data unification and cleaning

11.3.2 Chunking strategy: choosing the right granularity

11.3.3 Text embedding with OpenAI

11.3.4 Vector database storage with ChromaDB

11.3.5 Verification: trust, but verify the pipeline

11.4 Building the "LLM Analyst": from retrieval to quantified signals

11.4.1 Retrieval strategy: a two-step process for context and precision

11.4.2 Structured prompt engineering: the art of instructing an analyst

11.4.3 The final output: a structured, auditable signal

11.5 Summary