Reinforcement learning with immediate rewards and linear hypotheses

Abe, Naoki , Alan Biermann and Philip M. Long
Reinforcement learning with immediate rewards and linear hypotheses
Algorithmica (Postscript - 350KB )

Abstract: We perform theoretical analysis of algorithms for reinforcement learning with immediate rewards using linear function approximation. We provide nearly optimal worst-case bounds on cumulative regret, which, informally, is the difference between the total reward obtained by the algorithm, and the best total reward that can be obtained without having to learn. The bounds are worst-case in the sense that they hold for any sequence of states, where it is assumed that the actions of the algorithm do not affect future states.