7 Achieving goals more effectively and efficiently

 

In this chapter

  • You will learn about making reinforcement learning agents more effective at reaching optimal performance when interacting with challenging environments.
  • You will learn about making reinforcement learning agents more efficient at achieving goals by making the most of the experiences.
  • You will improve on the agents presented in the previous chapters to have them make the most out of the data they collect and therefore optimize their performance more quickly.

Efficiency is doing things right; effectiveness is doing the right things.

— Peter Drucker Founder of modern management and Presidential Medal of Freedom recipient

In this chapter, we improve on the agents you learned about in the previous chapter. More specifically, we take on two separate lines of improvement. First, we use the λ-return that you learned about in chapter 5 for the policy evaluation requirements of the generalized policy iteration pattern. We explore using the λ-return for both on-policy and off-policy methods. Using the λ-return with eligibility traces propagates credit to the right state-action pairs more quickly than standard methods, making the value-function estimates get near the actual values faster.

Learning to improve policies using robust targets

SARSA(λ): Improving policies after each step based on multi-step estimates

Watkins’s Q(λ): Decoupling behavior from learning, again

Agents that interact, learn, and plan

Dyna-Q: Learning sample models

Trajectory sampling: Making plans for the immediate future

Summary

sitemap