Publications on Temporal Difference Learning

Bandera, C. , V. Francisco, B. Jose, M. Harmon, and L. Baird
Residual q-learning applied to visual attention.
Proceedings of the Thirteenth International Conference on Machine Learning, pages 20-27. Morgan Kaufmann, 1996 ( HTML) Abstract:
Foveal vision features imagers with graded acuity coupled with context sensitive sensor gaze contro...

Borkar, Vivek , Vijaymohan R. Konda(
Actor-Critic algorithm as multi-time scale stochastic approximation algorithm
'Sadhana', Indian Academy of Sciences (Postscript - 561 KB) Abstract:
The actor-critic algorithm of Barto et al for simulation-based optimization of Markov decision proce...

Boyan, Justin , A. Moore(
Learning evaluation functions for large acyclic domains
Proceedings of the Thirteenth International Conference on Machine Learning, pages 63-70. Morgan Kaufmann, 1996. (Postscript - 147 KB) Abstract:
Some of the most successful recent applications of reinforcement learning have used neural network...

Boyan, Justin , Michael L. Littman(
Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach
Advances in Neural Information Processing Systems (Postscript - 155KB) Abstract:
This paper describes the Q-routing algorithm for packet routing, in which a reinforcement learning ...

Coulom, Rémi
Reinforcement Learning Using Neural Networks, with Applications to Motor Control
PhD thesis (html - 1Mb) Abstract:
This thesis is a study of practical methods to estimate value functions with feedforward neural netw...

Coulom, Rémi
Feedforward Neural Networks in Reinforcement Learning Applied to High-dimensional Motor Control
Proceedings of ALT2002 (pdf - 139 Kb) Abstract:
Local linear function approximators are often preferred to feedforward neural networks to estimate v...

Dietterich, Thomas , W. Zhang
A Reinforcement Learning Approach to Job-shop Scheduling
Proceedings of IJCAI95 ( gzipped Postscript - ) Abstract:
We apply reinforcement learning methods to learn domain-specific heuristics for job shop scheduling...

Francois, Rivest , Doina Precup
Combining TD-learning with Cascade-correlation Networks
ICML 2003 Abstract:
Using neural networks to represent value functions in reinforcement learning algorithms often invo...

Gadaleta, Sabino , Gerhard Dangelmayr(
Optimal Chaos Control through reinforcement learning
Chaos, 9, 775, 1999 Abstract:
A general purpose chaos control algorithm based on reinforcement learning is introduced and applie...

Garcia, Frédérick , Florent Serre(
Efficient Asymptotic Approximation in Temporal Difference Learning
European Conference on Artificial Intelligence ECAI'2000 ( gzipped Postscript - 78383 KB) Abstract:
We propose in this paper an asymptotic approximation of online TD(lambda) with accumulating eligib...

Ghory, Imran
Reinforcement Learning in Board Games
Technical Report CSTR-04-004, Department of Computer Science, University of Bristol, May 2004. (pdf - 1097439 bytes) Abstract:
This project investigates the application of the TD(lambda) reinforcement learning algorithm and neu...

Konda, Vijaymohan , Vivek S. Borkar ( )
Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization (Postscript - 619 KB) Abstract:
Algorithms learning the optimal policy of a Markov decision process based on simulated transitions a...

Leslie, David , E. J. Collins(
Individual Q-learning in normal form games
unpublished (PDF - 210K) Abstract:
The single-agent multi-armed bandit problem can be solved by an agent that learns the values of e...

Littman, Michael (
Markov games as a framework for multi-agent reinforcement learning
Proceedings of the Eleventh International Conference on Machine Learning (Postscript - 83KB) Abstract:
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive age...

Preux, Philippe (
Propagation of Q-values in Tabular TD(lambda)
proceedings of the ECML, 2002 ( gzipped Postscript - 75 KB) Abstract:
In this paper, we propose a new idea for tabular TD(lambda) algorithm. In TD learning, rewards ar...

Reynolds, Stuart (
Optimistic Initial Q-values and the max Operator
UKCI'01 ( gzipped Postscript - 80) Abstract:
This paper provides a surprising new insight into the role of the max operator used by reinforcement...

Reynolds, Stuart (
Experience Stack Reinforcement Learning for Off-Policy Control
Cognitive Science Technical Report number CSRP-02-1, School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK. January 2002 ( gzipped Postscript - 235) Abstract:
This paper introduces a novel method for allowing backwards replay to be applied as an online learni...

Singh, Satinder , Richard Sutton(
Reinforcement Learning with Replacing Eligibility Traces
Machine Learning ( gzipped Postscript - ) Abstract:

Singh, Satinder , Peter Dayan(
Analytical Mean Squared Error Curves for Temporal Difference Learning
Machine Learning ( gzipped Postscript - ) Abstract:

Sutton, Richard (
Learning to predict by the method of temporal differences
Machine Learning, 3:9-44, 1988 ( gzipped Postscript - 121 KB) Abstract:
This article introduces a class of incremental learning procedures specialized for prediction - tha...

Tesauro, Gerald (
Temporal Difference Learning and TD-Gammon
unpublished (HTML - ) Abstract:
Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-l...

Tesauro, Gerald (
TD-Gammon, a self-teaching backgammon program, achieves master-level play
unpublished (compressed Postscript - 120 KB) Abstract:
TD Gammon is a neural network that is able to teach itself to play backgammon soley by playing agai...

Thrun, Sebastian (
Learning to Play the Game of Chess
Advances in Neural Information Processing Systems (NIPS) 7, 1995. Abstract:
This paper presents NeuroChess, a program which learns to play chess from the final outcome of game...

Tsitsiklis, John , Ben Van Roy(
An Analysis of Temporal-Difference Learning with Function Approximation
IEEE Transactions on Automatic Control, Vol. 42, No. 5, May 1997, pp. 674-690. (Postscript - 2 MB) Abstract:
We discuss the temporal-difference learning algorithm, as applied to approximating cost-to-go funct...

Wilson, Stewart
Generalization in the XCS classifier system
Genetic Programming 1998: Proceedings of the Third Annual Conference. San Francisco, CA: Morgan Kaufmann. ( HTML) Abstract:
This paper studies two changes to XCS, a classifier system in which fitness is based on prediction...

Xu, Xin , Han-gen He and Dewen Hu
Efficient Reinforcement Learning Using Recursive Least-Squares Methods
Journal of Artificial Intelligence Research, Vol.16,2002, pp:259-292 ( gzipped Postscript - 700) Abstract:
The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptiv...

Yin, ChangMing (

unpublished (Postscript - ) Abstract:

ZHAO, Gang , Shoji TATSUMI,Ruoying SUN(
RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (pdf - 171kb) Abstract:
This paper proposes a RTP-Q reinforcement learning system which varies an efficient method for explo...

ZHAO, Gang , Shoji TATSUMI,Ruoying SUN(
Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (pdf - 172kb) Abstract:
In this paper, based on discussing different exploration methods, replacing the pre-action-selector ...

Zhuang, Xiaodong (
conference proceedings (Compressed PDF - 207KB) Abstract:
In this paper, multi-scale reinforcement learning is presented based on fuzzy state. The concept of ...