chapter twelve

12 Reinforcement learning

This chapter covers

Grasping the fundamental principles underlying reinforcement learning
Understanding the Markov decision process
Comprehending the actor-critic architecture and proximal policy optimization
Getting familiar with noncontextual and contextual multi-armed bandits
Applying reinforcement learning to solve optimization problems

Reinforcement learning (RL) is a powerful machine learning approach that enables intelligent agents to learn optimal or near-optimal behavior through interacting with their environments. This chapter dives into the key concepts and techniques within RL, shedding light on its underlying principles as essential background knowledge. Following this theoretical exposition, the chapter will proceed to illustrate practical examples of employing RL strategies to tackle optimization problems.

12.1 Demystifying reinforcement learning

Reinforcement learning (RL) is a subfield of machine learning that deals with how an agent can learn to make decisions and take actions in an environment to achieve specific goals following a trial-and-error learning approach. The core idea of RL is that the agent learns by interacting with the environment, receiving feedback in the form of rewards or penalties as a result of its actions. The agent’s objective is to maximize the cumulative reward over time.

12.1.1 Markov decision process (MDP)

12.1.2 From MDP to reinforcement learning

12.1.3 Model-based vs. model-free RL

12 Reinforcement learning

This chapter covers

12.1 Demystifying reinforcement learning

12.1.1 Markov decision process (MDP)

12.1.2 From MDP to reinforcement learning

12.1.3 Model-based vs. model-free RL

12.1.4 Actor-critic methods

12.1.5 Proximal policy optimization

12.1.6 Multi-armed bandit (MAB)

12.2 Optimization with reinforcement learning

12.3 Balancing CartPole using A2C and PPO

12.4 Autonomous coordination in mobile networks using PPO

12.5 Solving the truck selection problem using contextual bandits

12.6 Journey’s end: A final reflection

Summary