2 Mathematical foundations of reinforcement learning

 

In this chapter

  • You will learn about the core components of reinforcement learning.
  • You will learn to represent sequential decision-making problems as reinforcement learning environments using a mathematical framework known as Markov decision processes.
  • You will build from scratch environments that reinforcement learning agents learn to solve in later chapters.

Mankind’s history has been a struggle against a hostile environment. We finally have reached a point where we can begin to dominate our environment. ... As soon as we understand this fact, our mathematical interests necessarily shift in many areas from descriptive analysis to control theory.

— Richard Bellman American applied mathematician, an IEEE medal of honor recipient

You pick up this book and decide to read one more chapter despite having limited free time. A coach benches their best player for tonight’s match ignoring the press criticism. A parent invests long hours of hard work and unlimited patience in teaching their child good manners. These are all examples of complex sequential decision-making under uncertainty.

Components of reinforcement learning

 
 
 
 

Examples of problems, agents, and environments

 
 
 

The agent: The decision maker

 
 
 

The environment: Everything else

 
 

Agent-environment interaction cycle

 
 
 
 

MDPs: The engine of the environment

 

States: Specific configurations of the environment

 
 

Actions: A mechanism to influence the environment

 
 

Transition function: Consequences of agent actions

 
 
 

Reward signal: Carrots and sticks

 
 

Horizon: Time changes what’s optimal

 
 
 

Discount: The future is uncertain, value it less

 
 
 

Extensions to MDPs

 
 
 

Putting it all together

 
 
 

Summary

 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage