2 Mathematical foundations of reinforcement learning

In this chapter

You will learn about the core components of reinforcement learning.
You will learn to represent sequential decision-making problems as reinforcement learning environments using a mathematical framework known as Markov decision processes.
You will build from scratch environments that reinforcement learning agents learn to solve in later chapters.

Mankind’s history has been a struggle against a hostile environment. We finally have reached a point where we can begin to dominate our environment. ... As soon as we understand this fact, our mathematical interests necessarily shift in many areas from descriptive analysis to control theory.

— Richard Bellman American applied mathematician, an IEEE medal of honor recipient

You pick up this book and decide to read one more chapter despite having limited free time. A coach benches their best player for tonight’s match ignoring the press criticism. A parent invests long hours of hard work and unlimited patience in teaching their child good manners. These are all examples of complex sequential decision-making under uncertainty.

Components of reinforcement learning

Examples of problems, agents, and environments

The agent: The decision maker

The environment: Everything else

Agent-environment interaction cycle

MDPs: The engine of the environment

States: Specific configurations of the environment

Actions: A mechanism to influence the environment

Transition function: Consequences of agent actions

Reward signal: Carrots and sticks

Horizon: Time changes what’s optimal

Discount: The future is uncertain, value it less

Extensions to MDPs

Putting it all together

Summary