Humans learn from experiences (or at least should ). You didn’t get so charming by accident. Years of positive compliments as well as negative criticism have all helped shape who you are today. This chapter is about designing a machine-learning system driven by criticisms and rewards.
You learn what makes people happy, for example, by interacting with friends, family members, or even strangers, and you figure out how to ride a bike by trying out various muscle movements until riding clicks. When you perform actions, you’re sometimes rewarded immediately. Finding a good restaurant nearby might yield instant gratification, for example. At other times, the reward doesn’t appear right away; you might have to travel a long distance to find an exceptional place to eat. Reinforcement learning is about choosing the right actions, given any state—such as in figure 13.1, which shows a person making decisions to arrive at their destination.