13 Causality and large language models
This chapter covers
- Using causal information in LLMs to enhance a causal analysis
- Connecting the components of an LLM to a causal ideas
- Building a causal LLM
Large language models (LLMs) represent a significant advancement in the field of artificial intelligence. These models are large neural networks designed to generate and understand human-readable text. They are “large” because their scale is truly impressive—cutting-edge LLMs have parameters numbering in the billions and trillions. As generative models, their main function is to generate coherent and contextually relevant natural language. They can also generate structured text, such as programming code, markup languages, mathematical symbols, database queries, and many other useful things in text form.
LLMs are just one example of a broad class of generative AI. For example, we can use the neural network architecture underlying cutting-edge LLMs to model other sequences, such as time-series or DNA. LLMs are a type of foundation model, meaning large-scale models that serve as a base or foundation upon which more specialized models or applications can be built. Some LLMs are multimodal, meaning they work with text as well as other content modalities, such as images. In this chapter, we’ll focus specifically on LLMs, but much of what we discuss can be generalized to these related ideas.
To start, let’s explore some use cases for using LLMs to enhance a causal analysis.