5 Evaluating agents’ behaviors
In this chapter
- You will learn about estimating policies when learning from feedback that is simultaneously sequential and evaluative.
- You will develop algorithms for evaluating policies in reinforcement learning environments when the transition and reward functions are unknown.
- You will write code for estimating the value of policies in environments in which the full reinforcement learning problem is on display.
I conceive that the great part of the miseries of mankind are brought upon them by false estimates they have made of the value of things.
— Benjamin Franklin Founding Father of the United States an author, politician, inventor, and a civic activist
You know how challenging it is to balance immediate and long-term goals. You probably experience this multiple times a day: should you watch movies tonight or keep reading this book? One has an immediate satisfaction to it; you watch the movie, and you go from poverty to riches, from loneliness to love, from overweight to fit, and so on, in about two hours and while eating popcorn. Reading this book, on the other hand, won’t really give you much tonight, but maybe, and only maybe, will provide much higher satisfaction in the long term.