Supervised learning combined with an actor-critic architecture
Rosenstein, Michael , Andrew BartoSupervised learning combined with an actor-critic architecture
Technical Report 02-41, Department of Computer Science, University of Massachusetts, Amherst, 2002
(PDF - 176 KB)
Abstract: To address the shortcomings of reinforcement learning (RL) a number of
researchers have focused recently on ways to take advantage of structure in RL problems
and on ways to make domain knowledge part of RL algorithms. In this paper we examine a
supervised actor-critic architecture, whereby a supervisor adds structure to a learning
problem and supervised learning makes that structure part of an actor-critic framework for
reinforcement learning. We provide a steepest descent algorithm for real-valued actions
such that the actor adjusts its policy in accordance with gradient information from both
supervisor and critic. We also illustrate the approach with two kinds of supervisors: a
feedback controller that is easily designed yet sub-optimal, and a human operator providing
intermittent control of a simulated robotic arm.