chapter thirteen

13 Causality and Large Language Models

This chapter covers

Using causal information in LLMs to enhance a causal analysis
Connecting the components of an LLM to a causal ideas
Building a causal large language model

Large Language Models (LLMs) represent a significant advancement in the field of artificial intelligence. These models are large neural networks designed to generate and understand human-like text. They are “large” because their scale is truly impressive, cutting-edge LLMs have parameters numbering in the billions and trillions. As generative models, their main function is to generate coherent and contextually relevant natural language. They can also generate structured text, such as programming code, markup languages, mathematical symbols, database queries, and many other useful things in text form.

LLMs are just one example of a broad class of generative AI. For example, we can use the neural network architecture underlying cutting-edge LLMs to model other sequences, such as time-series or DNA. LLMs are a type of foundation model, meaning large-scale models that serve as a base or foundation upon which more specialized models or applications can be built. Some LLMs are multi-modal, meaning they work with text as well as other content modalities, such as images. In this chapter, we’ll focus specifically on LLMs, but much of what we discuss can be generalized to these related ideas.

13.1 LLMs as a Causal Knowledgebase

13.1.1 Building a Causal DAG

13.1.2 Generating code for DAGs, models, and causal analyses

13.1.3 Explanations and mechanism

13.1.4 Solving the causal frame problem

13.1.5 Understanding and contextualizing causal concepts

13.1.6 Formalization of causal queries

13.1.7 Beware. LLMs hallucinate.

13.2 A Causality-themed LLM Primer

13.2.1 A probabilistic ML view of LLMs

13.2.2 The attention mechanism

13.2.3 From tokens to causal representation

13.2.4 Hallucination, attention, and causal identification

13.3 Forging your own causal LLM

13.3.1 An LLM for Script Writing

13.3.2 Using pre-trained models for causal Markov kernels

13.3.3 Sampling from the interventional and observational distributions

13.3.4 Closing thoughts

13.4 Summary