Propagation of Q-values in Tabular TD(lambda)
Preux, PhilippePropagation of Q-values in Tabular TD(lambda)
proceedings of the ECML, 2002
( gzipped Postscript - 75 KB )
Abstract: In this paper, we propose a new idea for tabular TD(lambda) algorithm.
In TD learning, rewards are propagated along the sequence of
state/action pairs that have been visited recently. In complement
to this, we propose to propagate rewards towards neighboring
state/action pairs along this sequence, though unvisited. This leads
to a great decrease in the number of iterations required for TD(lambda)
to be able to generalize since it is no longer necessary that a
state/action pair is visited for its Q-value to be updated. The use
of this propagation process makes tabular TD(lambda) coming closer to
neural net based TD(lambda) with regards to its ability to generalize,
while keeping unchanged other properties of tabular TD(lambda).