7 Achieving goals more effectively and efficiently

 

In this chapter

  • You learn about making reinforcement learning agents more effective at reaching optimal performance when interacting with challenging environments.
  • You learn about making reinforcement learning agents more efficient at achieving goals by making the most from the experiences.
  • You improve on the agents presented in the previous chapters to have them make the most out of the data they collect and therefore optimize their performance more quickly.

Efficiency is doing things right; effectiveness is doing the right things.

— Peter Drucker, Founder of modern Management and, Presidential Medal of Freedom recipient

In this chapter, we improve on the agents you learned about in the previous chapter. More specifically, we take on two separate lines of improvement. First, we use the λ return that you learned about in chapter 5 for the policy evaluation requirements of the generalized policy iteration pattern. We explore using the λ return for both on-policy and off-policy methods. Using the λ return with eligibility traces propagates credit to the right state-action pairs more quickly than standard methods, making the value-function estimates get near the actual values faster.

7.1   Learning to improve policies using robust targets

7.1.1   Sarsa(λ): Improving policies after each step based on multi-step estimates

7.1.2   Watkins's Q(λ): Decoupling behavior from learning, again

7.2   Agents that interact, learn and plan

7.2.1   Dyna-Q: Learning sample models

7.2.2   Trajectory Sampling: Making plans for the immediate future

7.3   Summary

sitemap