Reinforcement learning with immediate rewards and linear hypotheses
Abe, Naoki , Alan Biermann and Philip M. LongReinforcement learning with immediate rewards and linear hypotheses
Algorithmica
(Postscript - 350KB )
Abstract: We perform theoretical analysis of algorithms for reinforcement
learning with immediate rewards using linear function approximation.
We provide nearly optimal worst-case bounds on cumulative regret, which, informally, is the difference between the total reward obtained by the algorithm, and the best total reward that can be obtained without having to learn. The bounds are worst-case in the
sense that they hold for any sequence of states, where it is assumed
that the actions of the algorithm do not affect future states.