6 Improving agents’ behaviors
In this chapter
- You will learn about improving policies when learning from feedback that is simultaneously sequential and evaluative.
- You will develop algorithms for finding optimal policies in reinforcement learning environments when the transition and reward functions are unknown.
- You will write code for agents that can go from random to optimal behavior using only their experiences and decision making, and train the agents in a variety of environments.
When it is obvious that the goals cannot be reached, don’t adjust the goals, adjust the action steps.
— Confucius Chinese teacher, editor, politician, and philosopher of the Spring and Autumn period of Chinese history