chapter two

2 mathematical foundations of reinforcement learning

In this chapter:

You learn about the core components of reinforcement learning.
You learn to represent sequential decision-making problems as reinforcement learning environments using a mathematical framework known as Markov Decision Processes.
You build from scratch environments that reinforcement learning agents learn to solve in later chapters.

Mankind’s history has been a struggle against a hostile environment. We finally have reached a point where we can begin to dominate our environment [...]. As soon as we understand this fact, our mathematical interests necessarily shift in many areas from descriptive analysis to control theory.

— Richard Bellman American applied mathematician an IEEE medal of honor recipient

You pick up this book and decide to read one more chapter despite having limited free time, a coach benches their best player for tonight’s match ignoring the press criticism, a parent invests long hours of hard work and unlimited patience in teaching their child good manners. These are all examples of complex sequential decision-making under uncertainty.

2.1 Components of reinforcement learning

2.1.1 Examples of problems, agents, and environments

2.1.2 The agent: The decision-maker

2.1.3 The environment: Everything else

2 mathematical foundations of reinforcement learning

In this chapter:

2.1 Components of reinforcement learning

2.1.1 Examples of problems, agents, and environments

2.1.2 The agent: The decision-maker

2.1.3 The environment: Everything else

2.1.4 Agent-environment interaction cycle

2.2 MDPs: The engine of the environment

2.2.1 States: Specific configurations of the environment

2.2.2 Actions: A mechanism to influence the environment

2.2.3 Transition function: Consequences of agent actions

2.2.4 Reward signal: Carrots and sticks

2.2.5 Horizon: Time changes what’s optimal

2.2.6 Discount: The future is uncertain, value it less

2.2.7 Extensions to MDPs

2.2.8 Putting it all together

2.3 Summary