Nearly optimal exploration-exploitation decision thresholds
Dimitrakakis, ChristosNearly optimal exploration-exploitation decision thresholds
ICANN 2006
( gzipped Postscript - 170Kb )
Abstract: While in general trading off exploration and exploitation in
reinforcement learning is hard, under some formulations relatively
simple solutions exist. Optimal decision thresholds for the
multi-armed bandit problem, one for the infinite horizon discounted
reward case and one for the finite horizon undiscounted reward case
are derived, which make the link between the reward horizon,
uncertainty and the need for exploration explicit. From this result
follow two practical approximate algorithms, which are illustrated
experimentally.