Publications on Theoretical Analysis
, Alan Biermann and Philip M. Long
Reinforcement learning with immediate rewards and linear hypotheses
(Postscript - 350KB)
We perform theoretical analysis of algorithms for reinforcement
learning with immediate rewards usi...
, Tamas Grobler, Csaba Szepesvari
Comparing Value-Function Estimation Algorithms in Undiscounted Problems
( gzipped Postscript - 104)
We compare scaling properties of several value-function estimation algorithms.
In particular, we pr...
Markov Decision Processes: the control of high-dimensional systems
Ph.D. Thesis, Vrije Universiteit, 2002
(Postscript - )
We develop algorithms for the computation of (nearly) optimal decision rules in high-dimensional sys...
, Vijaymohan R. Konda
Actor-Critic algorithm as multi-time scale stochastic approximation algorithm
'Sadhana', Indian Academy of Sciences
(Postscript - 561 KB)
The actor-critic algorithm of Barto et al for simulation-based optimization of Markov decision proce...
The MAXQ Method for Hierarchical Reinforcement Learning
Proceedings of the International Conference on Machine Learning, 1998
( gzipped Postscript - 53KB)
This paper presents a new approach to hierarchical reinforcement
learning based on the MAXQ decompo...
Hierarchical Reinforcement Learning with the MAXQ Value
journal version; under review
( gzipped Postscript - 359KB)
This paper presents a new approach to hierarchical
reinforcement learning based on the MAXQ decompo...
, Samy Bengio
Gradient Estimates of Return
IDIAP Research Report (abridged version presented at PASCAL workshop on principled methods of trading exploration and exploitation)
( gzipped Postscript - 185KB)
The exploration-exploitation trade-off that arises when one considers
simple point estimates of exp...
Nearly optimal exploration-exploitation decision thresholds
( gzipped Postscript - 170Kb)
While in general trading off exploration and exploitation in
reinforcement learning is hard, under ...
, Zs. Kalmar and Cs. Szepesvari
Multi-criteria Reinforcement Learning
Technical Report TR-98-115, "Attila József" University, Research Group on Artificial Intelligence Szeged, HU-6700, 1998
( gzipped Postscript - 155 KB)
This is a longer version of the paper published in ICML'98.
We consider multi-criteria sequential...
, Florent Serre
Efficient Asymptotic Approximation in Temporal Difference Learning
European Conference on Artificial Intelligence ECAI'2000
( gzipped Postscript - 78383 KB)
We propose in this paper an asymptotic approximation of
online TD(lambda) with accumulating eligib...
, Seydina Ndiaye
A Learning Rate Analysis of Reinforcement Learning Algorithms in Fine-Horizon
( gzipped Postscript - 96 KB)
In this article we consider the particular framework of non-stationary
finite-horizon Markov Decis...
, Michael L. Littman, Martin Mundhenk
E-mail: goldsmit at cs.uky.edu
The Complexity of Plan Existence and Evaluation in Probabilistic Domains
Proceedings of the Thirteenth Annual Conference on Uncertainty in
Artificial Intelligence (UAI--97)
(Postscript - 277KB)
We examine the computational complexity of testing and finding small
plans in probabilistic plannin...
, Cs. Szepesvári and A. Lorincz
Module-Based Reinforcement Learning: Experiments with a Real Robot
( gzipped Postscript - 755 KB)
The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, ...
, Vivek S. Borkar
Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization
(Postscript - 580 KB)
Algorithms learning the optimal policy of a Markov decision process based on simulated transitions a...
A Reinforcement Learning Approach to On-line Clustering
Neural Computation, to appear
( gzipped Postscript - 80KB)
A general technique is proposed for
embedding on-line clustering algorithms based on competitive
Probabilistic Propositional Planning: Representations and Complexity
Proceedings of the Fourteenth
National Conference on Artificial Intelligence
(Postscript - 360KB)
Many representations for probabilistic propositional planning problems
have been studied. This pap...
, Csaba Szepesvári( email@example.com)
reinforcement-learning model: Convergence and applications
Proceedings of the Thirteenth International Conference on Machine
(Postscript - 170KB)
Reinforcement learning is the process by which an autonomous agent
uses its experience interacting ...
Memoryless policies: Theoretical limitations and practical results
From Animals to Animats 3: Proceedings
of the Third International Conference on Simulation of Adaptive
(Postscript - 416KB)
One form of adaptive behavior is "goal-seeking" in which an agent acts
so as to minimize the time i...
An optimization-based categorization of reinforcement learning environments
This paper proposes a categorization of reinforcement learning
environments based on the optimizati...
, Georg Regensburger
"Reinforcement Learning for Several Environments: Theory and Applications"
A joint PhD thesis by Andreas Matt and Georg Regensburger
Until now reinforcement learning has been applied to learn the optimal behavior for a single environ...
Reinforcement Learning for Continuous Stochastic Control Problems
Neural Information Processing Systems, 1997
(Postscript - 809KB)
This paper is concerned with the problem of Reinforcement Learning for continuous state
space and ...
A convergent Reinforcement Learning algorithm in the continuous case based on a Finite Difference method
(compressed Postscript - 225KB)
In this paper, we propose a convergent Reinforcement Learning algorithm for solving optimal
A Convergent Reinforcement Learning algorithm in the continuous case : the Finite-Element Reinforcement Learning
Conference on Machine Learning, 1996
(Postscript - 197KB)
This paper presents a direct reinforcement learning algorithm, called Finite-Element
A general convergence method for Reinforcement Learning in the continuous case
European Conference on Machine Learning, 1998
(compressed Postscript - 230KB)
In this paper, we propose a general method for designing convergent Reinforcement Learning
, Leemon Baird, Andrew Moore
Gradient Descent Approaches to Neural-Net-Based Solutions of
the Hamilton-Jacobi-Bellman Equation.
( gzipped Postscript - 128KB)
In this paper we investigate new approaches to
dynamic-programming-based optimal control of contin...
Error Bounds for Approximate Policy Iteration
( gzipped Postscript - 80 KB)
In Dynamic Programming, convergence of algorithms such as Value Iteration
or Policy Iteration resul...
Shaping in Reinforcement Learning by Changing the Physics of the Problem
( gzipped Postscript - 65 )
Children learn to ride a bicycle by using training wheels.
They are actually trying to learn one ta...
The Stability of General Discounted Reinforcement Learning with Linear Function Approximation
( gzipped Postscript - 80)
This paper shows that general discounted return estimating
reinforcement learning algorithms ca...
Optimistic Initial Q-values and the max Operator
( gzipped Postscript - 80)
This paper provides a surprising new insight into the role of the max operator used by reinforcement...
Reinforcement Learning with Exploration
PhD Thesis, School of Computer Science, The University of Birmingham, B15 2TT, UK
( gzipped Postscript - 1.1MB)
Reinforcement Learning (RL) techniques may be used to find optimal controllers for multistep decisio...
Theory of universal optimal reinforcement learning machines (with links to work by Marcus Hutter and Schmidhuber)
Several journal papers and conference papers
(HTML - 200 KB)
The ultimate predictive
world model is Solomonoff's Bayesian induction scheme based on
, Richard Sutton
Reinforcement Learning with Replacing Eligibility Traces
( gzipped Postscript - )
, Peter Dayan
Analytical Mean Squared Error Curves for Temporal Difference Learning
( gzipped Postscript - )
, Tommi Jaakkola, Michael Jordan
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
Proceedings of the Eleventh International Machine Learning Conference
( gzipped Postscript - 59 KB)
, T. Jaakkola, M.L. Littman and Cs. Szepesvari( firstname.lastname@example.org)
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Machine Learning, to appear, 1998
( Postscript - 210 KB)
An important application of reinforcement learning (RL) is to finite-state control problems and one ...
The Asymptotic Convergence-Rate of Q-learning
( gzipped Postscript - 64 KB)
In this paper we show that for discounted MDPs with discount factor \gamma>1/2 the asymptotic rate o...
Learning and Exploitation do not Conflict Under Minimax Optimality
Proceedings of 9th European Conference of Machine Learning, pp. 242-249, 1997
( gzipped Postscript - 40 KB)
We show that adaptive real time dynamic programming extended with the action selection strategy whic...
Some basic facts concerning minimax sequential decision processes
Technical Report TR-96-100, "Attila József" University, Research Group on Artificial Intelligence Szeged, HU-6700, 1996.
( gzipped Postscript - 65 KB)
It is shown that for discounted minimax sequential decision processes the evaluation function of a s...
General Framework for Reinforcement Learning
Proceedings of ICANN'95 Paris, France, Oct. 1995, Vol. II., pp. 165-170
( gzipped Postscript - ??)
In this article we propose a general framework for sequential decision making. The framework is base...
Dynamic Concept Model Learns Optimal Policies
Proceedings of IEEE WCCI ICNN'94 Vol. III. pp. 1738-1742. Orlando, Florida, June 1994
( gzipped Postscript - ??)
Reinforcement learning is a flourishing field of neural methods. It has a firm theoretical basis and...
Efficient Approximate Planning in Continuous Space Markovian Decision Problems
( gzipped Postscript - 128)
In this article we consider Monte-Carlo planning algorithms for planning in continuous state-space, ...
, Kana Miza
Vietnam Travel Guide - The updated insiders
(html - )
Vietnam Travel & Tourism Guide with in-depth guidebook information, photos, maps, hotels, tours, air...
, Ben Van Roy
An Analysis of Temporal-Difference Learning with Function Approximation
IEEE Transactions on Automatic Control,
Vol. 42, No. 5, May 1997, pp. 674-690.
(Postscript - 2 MB)
We discuss the temporal-difference learning algorithm, as applied to
approximating cost-to-go funct...
, Benjamin Van Roy
Feature-Based Methods for Large Scale Dynamic Programming
Machine Learning, Vol. 22,
1996, pp. 59-94.
( PDF - 2.9 MB)
We develop a methodological framework and present a few
different ways in which dynamic programmin...
Learning and Value Function Approximation in Complex Decision Processes
(Postscript - 1691 KB)
In principle, a wide variety of sequential decision problems
-- ranging from dynamic resource alloc...
, Bernd Porr
Temporal sequence learning, prediction and control - A review of different models and their relation to biological mechanisms
Neural Computation, 17: 245-319
A review of RL in view of its relation to classical conditioning and the biophysics of the underlyin...
The Strategy Entropy of Reinforcement Learning in Discrete State Space
In this paper, the concept of entropy is introduced into reinforcement learning. The definitions of ...
MULTI-SCALE REINFORCEMENT LEARNING WITH FUZZY STATE
(Compressed PDF - 207KB)
In this paper, multi-scale reinforcement learning is presented based on fuzzy state. The concept of ...
, Zs. Kalmár and Cs. Szepesvári
Multi-criteria Reinforcement Learning
Proceedings of International Conference of Machine Learning, 1998
( gzipped Postscript - 103 KB)
We consider multi-criteria sequential decision making problems where the vector-valued evaluations a...