Chapter 2: Evaluative Feedback
The n-Armed Bandit Problem
The Exploration/Exploitation Dilemma
Action-Value Methods
e-Greedy Action Selection
10-Armed Testbed
e-Greedy Methods on the 10-Armed Testbed
Softmax Action Selection
Binary Bandit Tasks
Contingency Space
Linear Learning Automata
Performance on Binary Bandit Tasks A and B
Incremental Implementation
Tracking a Nonstationary Problem
Optimistic Initial Values
Reinforcement Comparison
Performance of a Reinforcement Comparison Method
Pursuit Methods
Performance of a Pursuit Method
Associative Search
Conclusions