13 Causality and large language models

 

This chapter covers

  • Using causal information in LLMs to enhance a causal analysis
  • Connecting the components of an LLM to a causal ideas
  • Building a causal LLM

Large language models (LLMs) represent a significant advancement in the field of artificial intelligence. These models are large neural networks designed to generate and understand human-readable text. They are “large” because their scale is truly impressive—cutting-edge LLMs have parameters numbering in the billions and trillions. As generative models, their main function is to generate coherent and contextually relevant natural language. They can also generate structured text, such as programming code, markup languages, mathematical symbols, database queries, and many other useful things in text form.

LLMs are just one example of a broad class of generative AI. For example, we can use the neural network architecture underlying cutting-edge LLMs to model other sequences, such as time-series or DNA. LLMs are a type of foundation model, meaning large-scale models that serve as a base or foundation upon which more specialized models or applications can be built. Some LLMs are multimodal, meaning they work with text as well as other content modalities, such as images. In this chapter, we’ll focus specifically on LLMs, but much of what we discuss can be generalized to these related ideas.

To start, let’s explore some use cases for using LLMs to enhance a causal analysis.

13.1 LLMs as a causal knowledgebase

13.1.1 Building a causal DAG

13.1.2 Generating code for DAGs, models, and causal analyses

13.1.3 Explanations and mechanism

13.1.4 The causal frame problem and AI alignment

13.1.5 Understanding and contextualizing causal concepts

13.1.6 Formalization of causal queries

13.1.7 Beware: LLMs hallucinate

13.2 A causality-themed LLM primer

13.2.1 A probabilistic ML view of LLMs

13.2.2 The attention mechanism

13.2.3 From tokens to causal representation

13.2.4 Hallucination, attention, and causal identification

13.3 Forging your own causal LLM

13.3.1 An LLM for script writing

Summary