REINFORCEMENT LEARNING AND POMDPs (dozens of papers on RL in partially observable environments since 1989)

Schmidhuber, Juergen
REINFORCEMENT LEARNING AND POMDPs (dozens of papers on RL in partially observable environments since 1989)
Journal papers and conference papers (HTML - 100KB)

Abstract: Realistic environments are not fully observable. General learning agents need an internal state to memorize important events. The essential question is: how can they learn to identify and store those events relevant for further optimal action selection? To address this issue, we have studied reinforcement learners with recurrent neural network value function approximators (1990 -), recurrent network world models (1990 -), actions that address and set internal storage cells, trained by the success-story algorithm (1994 -), direct search in a space of event-memorizing algorithms (1994 -), and Goedel machines (2003 -).