Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots

Perez-Uribe, Andres
Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots
Emerging Neural Architectures based on Neuroscience, S. Wermter, J. Austin, D. Willshaw (Eds.), Springer-verlag, LNAI 2036, pp. 522-533. (PDF - 250 Kb)

Abstract: Neuroscientists have identified a neural substrate of prediction and reward in experiments with primates. The so-called dopamine neurons have been shown to code an error in the temporal prediction of rewards. Similarly, artificial systems can ``learn to predict'' by the so-called temporal-difference (TD) methods. Based on the general resemblance between the effective reinforcement term of TD models and the response of dopamine neurons, neuroscientists have developed a TD-learning time-delay actor-critic neural model and compared its performance with the behavior of monkeys in the laboratory. We have used such a neural network model to learn to predict variable-delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. Such architecture implementing TD-learning appears as a promising mechanism for robotic systems that learn from simple human teaching signals in the real world.