Publications on Theoretical Analysis

Abe, Naoki , Alan Biermann and Philip M. Long
Reinforcement learning with immediate rewards and linear hypotheses
Algorithmica (Postscript - 350KB) Abstract:
We perform theoretical analysis of algorithms for reinforcement learning with immediate rewards usi...

Beleznay, Ferenc , Tamas Grobler, Csaba Szepesvari
Comparing Value-Function Estimation Algorithms in Undiscounted Problems
unpublished ( gzipped Postscript - 104) Abstract:
We compare scaling properties of several value-function estimation algorithms. In particular, we pr...

Bhulai, Sandjai
Markov Decision Processes: the control of high-dimensional systems
Ph.D. Thesis, Vrije Universiteit, 2002 (Postscript - ) Abstract:
We develop algorithms for the computation of (nearly) optimal decision rules in high-dimensional sys...

Borkar, Vivek , Vijaymohan R. Konda
Actor-Critic algorithm as multi-time scale stochastic approximation algorithm
'Sadhana', Indian Academy of Sciences (Postscript - 561 KB) Abstract:
The actor-critic algorithm of Barto et al for simulation-based optimization of Markov decision proce...

Dietterich, Thomas
The MAXQ Method for Hierarchical Reinforcement Learning
Proceedings of the International Conference on Machine Learning, 1998 ( gzipped Postscript - 53KB) Abstract:
This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decompo...

Dietterich, Thomas
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
journal version; under review ( gzipped Postscript - 359KB) Abstract:
This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decompo...

Dimitrakakis, Christos , Samy Bengio
Gradient Estimates of Return
IDIAP Research Report (abridged version presented at PASCAL workshop on principled methods of trading exploration and exploitation) ( gzipped Postscript - 185KB) Abstract:
The exploration-exploitation trade-off that arises when one considers simple point estimates of exp...

Dimitrakakis, Christos (
Nearly optimal exploration-exploitation decision thresholds
ICANN 2006 ( gzipped Postscript - 170Kb) Abstract:
While in general trading off exploration and exploitation in reinforcement learning is hard, under ...

Gabor, Zoltan , Zs. Kalmar and Cs. Szepesvari
Multi-criteria Reinforcement Learning
Technical Report TR-98-115, "Attila József" University, Research Group on Artificial Intelligence Szeged, HU-6700, 1998 ( gzipped Postscript - 155 KB) Abstract:
This is a longer version of the paper published in ICML'98. We consider multi-criteria sequential...

Garcia, Frédérick , Florent Serre
Efficient Asymptotic Approximation in Temporal Difference Learning
European Conference on Artificial Intelligence ECAI'2000 ( gzipped Postscript - 78383 KB) Abstract:
We propose in this paper an asymptotic approximation of online TD(lambda) with accumulating eligib...

Garcia, Frédérick , Seydina Ndiaye
A Learning Rate Analysis of Reinforcement Learning Algorithms in Fine-Horizon
ICML'98 ( gzipped Postscript - 96 KB) Abstract:
In this article we consider the particular framework of non-stationary finite-horizon Markov Decis...

Goldsmith, Judy , Michael L. Littman, Martin Mundhenk
E-mail: goldsmit at
The Complexity of Plan Existence and Evaluation in Probabilistic Domains
Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI--97) (Postscript - 277KB) Abstract:
We examine the computational complexity of testing and finding small plans in probabilistic plannin...

Kalmár, Zsolt , Cs. Szepesvári and A. Lorincz
Module-Based Reinforcement Learning: Experiments with a Real Robot
Machine Learning ( gzipped Postscript - 755 KB) Abstract:
The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, ...

Konda, Vijaymohan , Vivek S. Borkar
Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization (Postscript - 580 KB) Abstract:
Algorithms learning the optimal policy of a Markov decision process based on simulated transitions a...

Likas, Aristidis
A Reinforcement Learning Approach to On-line Clustering
Neural Computation, to appear ( gzipped Postscript - 80KB) Abstract:
A general technique is proposed for embedding on-line clustering algorithms based on competitive l...

Littman, Michael
Probabilistic Propositional Planning: Representations and Complexity
Proceedings of the Fourteenth National Conference on Artificial Intelligence (Postscript - 360KB) Abstract:
Many representations for probabilistic propositional planning problems have been studied. This pap...

Littman, Michael , Csaba Szepesvári(
A generalized reinforcement-learning model: Convergence and applications
Proceedings of the Thirteenth International Conference on Machine Learning (Postscript - 170KB) Abstract:
Reinforcement learning is the process by which an autonomous agent uses its experience interacting ...

Littman, Michael
Memoryless policies: Theoretical limitations and practical results
From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior (Postscript - 416KB) Abstract:
One form of adaptive behavior is "goal-seeking" in which an agent acts so as to minimize the time i...

Littman, Michael
An optimization-based categorization of reinforcement learning environments
This paper proposes a categorization of reinforcement learning environments based on the optimizati...

Matt, Andreas , Georg Regensburger
"Reinforcement Learning for Several Environments: Theory and Applications"
A joint PhD thesis by Andreas Matt and Georg Regensburger Abstract:
Until now reinforcement learning has been applied to learn the optimal behavior for a single environ...

Munos, Remi
Reinforcement Learning for Continuous Stochastic Control Problems
Neural Information Processing Systems, 1997 (Postscript - 809KB) Abstract:
This paper is concerned with the problem of Reinforcement Learning for continuous state space and ...

Munos, Remi
A convergent Reinforcement Learning algorithm in the continuous case based on a Finite Difference method
IJCAI'1997 (compressed Postscript - 225KB) Abstract:
In this paper, we propose a convergent Reinforcement Learning algorithm for solving optimal contr...

Munos, Remi
A Convergent Reinforcement Learning algorithm in the continuous case : the Finite-Element Reinforcement Learning
International Conference on Machine Learning, 1996 (Postscript - 197KB) Abstract:
This paper presents a direct reinforcement learning algorithm, called Finite-Element Reinforcem...

Munos, Remi
A general convergence method for Reinforcement Learning in the continuous case
European Conference on Machine Learning, 1998 (compressed Postscript - 230KB) Abstract:
In this paper, we propose a general method for designing convergent Reinforcement Learning algorit...

Munos, Remi , Leemon Baird, Andrew Moore
Gradient Descent Approaches to Neural-Net-Based Solutions of the Hamilton-Jacobi-Bellman Equation.
IJCNN'99 ( gzipped Postscript - 128KB) Abstract:
In this paper we investigate new approaches to dynamic-programming-based optimal control of contin...

Munos, Remi
Error Bounds for Approximate Policy Iteration
Icml 2003 ( gzipped Postscript - 80 KB) Abstract:
In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration resul...

Randlov, Jette
Shaping in Reinforcement Learning by Changing the Physics of the Problem
ICML-2000 ( gzipped Postscript - 65 ) Abstract:
Children learn to ride a bicycle by using training wheels. They are actually trying to learn one ta...

Reynolds, Stuart
The Stability of General Discounted Reinforcement Learning with Linear Function Approximation
UKCI'02 ( gzipped Postscript - 80) Abstract:
This paper shows that general discounted return estimating reinforcement learning algorithms ca...

Reynolds, Stuart
Optimistic Initial Q-values and the max Operator
UKCI'01 ( gzipped Postscript - 80) Abstract:
This paper provides a surprising new insight into the role of the max operator used by reinforcement...

Reynolds, Stuart
Reinforcement Learning with Exploration
PhD Thesis, School of Computer Science, The University of Birmingham, B15 2TT, UK ( gzipped Postscript - 1.1MB) Abstract:
Reinforcement Learning (RL) techniques may be used to find optimal controllers for multistep decisio...

Schmidhuber, Juergen (
Theory of universal optimal reinforcement learning machines (with links to work by Marcus Hutter and Schmidhuber)
Several journal papers and conference papers (HTML - 200 KB) Abstract:
The ultimate predictive world model is Solomonoff's Bayesian induction scheme based on the univers...

Singh, Satinder , Richard Sutton
Reinforcement Learning with Replacing Eligibility Traces
Machine Learning ( gzipped Postscript - ) Abstract:

Singh, Satinder , Peter Dayan
Analytical Mean Squared Error Curves for Temporal Difference Learning
Machine Learning ( gzipped Postscript - ) Abstract:

Singh, Satinder , Tommi Jaakkola, Michael Jordan
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
Proceedings of the Eleventh International Machine Learning Conference ( gzipped Postscript - 59 KB) Abstract:

Singh, Satinder , T. Jaakkola, M.L. Littman and Cs. Szepesvari(
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Machine Learning, to appear, 1998 ( Postscript - 210 KB) Abstract:
An important application of reinforcement learning (RL) is to finite-state control problems and one ...

Szepesvari, Csaba (
The Asymptotic Convergence-Rate of Q-learning
NIPS'97 ( gzipped Postscript - 64 KB) Abstract:
In this paper we show that for discounted MDPs with discount factor \gamma>1/2 the asymptotic rate o...

Szepesvari, Csaba (
Learning and Exploitation do not Conflict Under Minimax Optimality
Proceedings of 9th European Conference of Machine Learning, pp. 242-249, 1997 ( gzipped Postscript - 40 KB) Abstract:
We show that adaptive real time dynamic programming extended with the action selection strategy whic...

Szepesvari, Csaba (
Some basic facts concerning minimax sequential decision processes
Technical Report TR-96-100, "Attila József" University, Research Group on Artificial Intelligence Szeged, HU-6700, 1996. ( gzipped Postscript - 65 KB) Abstract:
It is shown that for discounted minimax sequential decision processes the evaluation function of a s...

Szepesvari, Csaba
General Framework for Reinforcement Learning
Proceedings of ICANN'95 Paris, France, Oct. 1995, Vol. II., pp. 165-170 ( gzipped Postscript - ??) Abstract:
In this article we propose a general framework for sequential decision making. The framework is base...

Szepesvari, Csaba
Dynamic Concept Model Learns Optimal Policies
Proceedings of IEEE WCCI ICNN'94 Vol. III. pp. 1738-1742. Orlando, Florida, June 1994 ( gzipped Postscript - ??) Abstract:
Reinforcement learning is a flourishing field of neural methods. It has a firm theoretical basis and...

Szepesvari, Csaba
Efficient Approximate Planning in Continuous Space Markovian Decision Problems
unpublished ( gzipped Postscript - 128) Abstract:
In this article we consider Monte-Carlo planning algorithms for planning in continuous state-space, ...

Tran, Ronan , Kana Miza
Vietnam Travel Guide - The updated insiders
unpublished (html - ) Abstract:
Vietnam Travel & Tourism Guide with in-depth guidebook information, photos, maps, hotels, tours, air...

Tsitsiklis, John , Ben Van Roy
An Analysis of Temporal-Difference Learning with Function Approximation
IEEE Transactions on Automatic Control, Vol. 42, No. 5, May 1997, pp. 674-690. (Postscript - 2 MB) Abstract:
We discuss the temporal-difference learning algorithm, as applied to approximating cost-to-go funct...

Tsitsiklis, John , Benjamin Van Roy
Feature-Based Methods for Large Scale Dynamic Programming
Machine Learning, Vol. 22, 1996, pp. 59-94. ( PDF - 2.9 MB) Abstract:
We develop a methodological framework and present a few different ways in which dynamic programmin...

Van Roy, Benjamin
Learning and Value Function Approximation in Complex Decision Processes
PhD Thesis (Postscript - 1691 KB) Abstract:
In principle, a wide variety of sequential decision problems -- ranging from dynamic resource alloc...

Woergotter, Florentin , Bernd Porr
Temporal sequence learning, prediction and control - A review of different models and their relation to biological mechanisms
Neural Computation, 17: 245-319 Abstract:
A review of RL in view of its relation to classical conditioning and the biophysics of the underlyin...

Zhuang, X. (
The Strategy Entropy of Reinforcement Learning in Discrete State Space
conference proceedings Abstract:
In this paper, the concept of entropy is introduced into reinforcement learning. The definitions of ...

Zhuang, Xiaodong
conference proceedings (Compressed PDF - 207KB) Abstract:
In this paper, multi-scale reinforcement learning is presented based on fuzzy state. The concept of ...

gabor, zoltan , Zs. Kalmár and Cs. Szepesvári
Multi-criteria Reinforcement Learning
Proceedings of International Conference of Machine Learning, 1998 ( gzipped Postscript - 103 KB) Abstract:
We consider multi-criteria sequential decision making problems where the vector-valued evaluations a...