The Stability of General Discounted Reinforcement Learning with Linear Function Approximation

Reynolds, Stuart
The Stability of General Discounted Reinforcement Learning with Linear Function Approximation
UKCI'02 ( gzipped Postscript - 80 )

Abstract: This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the value-function or Q-function. The results are significant insofar as examples of divergence of the value-function exist where similar linear function approximators are trained using a similar incremental gradient descent rule. A different gradient descent error criterion is used to produce a training rule which has a non-expansion property and therefore cannot possibly diverge. This training rule is found to be commonly used for reinforcement learning.