4 Balancing the gathering and utilization of information
In this chapter:
- You learn about the challenges of learning from evaluative feedback and how to properly balance the gathering and utilization of information.
- You develop exploration strategies that accumulate low levels of regret in problems with unknown transition function and reward signals.
- You write code with trial-and-error learning agents that learn to optimize their behavior through their own experiences in many-options one-choice environments known as multi-armed bandits.
Our ultimate objective is to make programs that learn from their experience as effectively as humans do.
— John McCarthy
Founder of the field of Artificial Intelligence, Inventor of the Lisp programming Language
No matter how small and unimportant a decision may seem, every decision you make is a tradeoff between information gathering and information exploitation. For example, when you go to your favorite restaurant, should you order your favorite dish, yet again, or should you request that dish you have been meaning to try? If a Silicon Valley startup offers you a job, should you make a career move, or should you stay put in your current role?