Chapter 2: Evaluative Feedback

The n-Armed Bandit Problem

The Exploration/Exploitation Dilemma

Action-Value Methods

e-Greedy Action Selection

10-Armed Testbed

e-Greedy Methods on the 10-Armed Testbed

Softmax Action Selection

Binary Bandit Tasks

Contingency Space

Linear Learning Automata

Performance on Binary Bandit Tasks A and B

Incremental Implementation

Tracking a Nonstationary Problem

Optimistic Initial Values

Reinforcement Comparison

Performance of a Reinforcement Comparison Method

Pursuit Methods

Performance of a Pursuit Method

Associative Search

Conclusions