concept TD error in category reinforcement learning

appears as: TD error, The TD error, TD error, TD errors, TD errors
Grokking Deep Reinforcement Learning MEAP V14 epub

This is an excerpt from Manning's book Grokking Deep Reinforcement Learning MEAP V14 epub.

In reinforcement learning, this measure of “surprise” is given by the TD error! Well, technically, the absolute TD error. The TD error provides us with the difference between the agent’s current estimate and target value. The current estimate indicates the value our agent thinks is going to get for acting in a specific way. The target value suggests a new estimate for the same state-action pair, which can be seen as a reality check. The absolute difference between these values indicates how far off we are, how unexpected this experience is, how much new information we received, which makes it a good indicator for learning opportunity.

Show Me The Math

The absolute TD error is the priority

 

10_13

Figure 10.13

Now, the TD error is not the perfect indicator of the “highest learning opportunity,” but maybe the best reasonable proxy for it. In reality, the best criterion for “learning the most” is really inside the network and hidden behind parameter updates. But, it seems impractical to calculate gradients for all experiences in the replay buffer every time step. The good thing about the TD error is that the machinery to calculate it is in there already. And of course, the fact that the TD error is still a good signal for prioritizing the replay of experiences.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage