The Stability of General Discounted Reinforcement Learning with Linear Function Approximation
Reynolds, StuartThe Stability of General Discounted Reinforcement Learning with Linear Function Approximation
UKCI'02
( gzipped Postscript - 80 )
Abstract: This paper shows that general discounted return estimating
reinforcement learning algorithms cannot diverge to infinity when a
form of linear function approximator is used for approximating the
value-function or Q-function. The results are significant insofar
as examples of divergence of the value-function exist where similar
linear function approximators are trained using a similar
incremental gradient descent rule. A different gradient descent
error criterion is used to produce a training rule which has a
non-expansion property and therefore cannot possibly diverge.
This training rule is found to be commonly used for reinforcement
learning.