Theory of universal optimal reinforcement learning machines (with links to work by Marcus Hutter and Schmidhuber)
Schmidhuber, JuergenTheory of universal optimal reinforcement learning machines (with links to work by Marcus Hutter and Schmidhuber)
Several journal papers and conference papers
(HTML - 200 KB)
Abstract: The ultimate predictive
world model is Solomonoff's Bayesian induction scheme based on
the universal prior M, the enumerable weighted sum of all
enumerable measures. In theory we may use M online to predict
consequences of future action sequences, always choosing
those with highest predicted success. This approach yields
Hutter's (IDSIA) optimal general reinforcement learner AIXI.
M and AIXI are incomputable though. But we provide links to
computable RL methods that still are optimal in a certain sense.