chapter one

1 How AI works

This chapter covers

The way large language models (LLMs) process inputs and generate outputs
The transformer architecture that powers LLMs
Different types of machine learning
How LLMs and other AI models learn from data
How convolutional neural networks are used to process different types of media with AI
Combining different types of data (e.g., produce images from text).

This chapter will help you understand how AI works and get you up to speed with many foundational AI topics. Since the latest AI boom, many of these topics, such as “embeddings” and “temperature,” are now widely discussed not just by AI practitioners but also by businesspeople and the general public. This chapter demystifies them.

Instead of just piling up definitions and writing textbook explanations, this chapter is a bit more opinionated than that. It points out common AI problems, misconceptions, and limitations based on my experience working in the field, as well as discussing some interesting insights that you might not be aware of. For example, we’ll discuss why language generation is more expensive in French than in English, and how OpenAI hires armies of human workers to manually help “tame” ChatGPT. So, even if you already know all the topics covered in this chapter, reading it might provide you with a different take on them.

1.1 How large language models (LLMs) work

1.1.1 Text generation

1.1.2 End of text

1.1.3 Chat

1.1.4 The system prompt

1.1.5 Calling external software functions

1.1.6 Retrieval-augmented generation (RAG)

1.2 The concept of tokens

1.2.1 One token at a time

1.2.2 Billed by the token

1.2.3 What about languages other than English?

1.2.4 Why do LLMs need tokens anyway?

1.3 Embeddings: A way to represent meaning

1.3.1 Machine learning and embeddings

1.3.2 Visualizing embeddings

1.3.3 Why embeddings are useful

1.3.4 Why LLMs struggle to analyze individual letters

1.4 The transformer architecture

1.4.1 Step 1: Initial embeddings

1.4.2 Step 2: Contextualization

1.4.3 Step 3: Predictions

1.4.4 Temperature

1.4.5 Can you get an LLM to always output the same thing?

1.4.6 Where to learn more

1.5 Machine learning

1.5.1 Deep learning

1.5.2 Types of machine learning

1.5.3 How LLMs are trained (and tamed)

1.5.4 Loss

1.5.5 Stochastic Gradient Descent (SGD)

1.6 Convolutions (images, video and audio)

1.7 Multi-modal AI

1.8 No Free Lunch

1.9 Summary