Theory of universal optimal reinforcement learning machines (with links to work by Marcus Hutter and Schmidhuber)

Schmidhuber, Juergen
Theory of universal optimal reinforcement learning machines (with links to work by Marcus Hutter and Schmidhuber)
Several journal papers and conference papers (HTML - 200 KB)

Abstract: The ultimate predictive world model is Solomonoff's Bayesian induction scheme based on the universal prior M, the enumerable weighted sum of all enumerable measures. In theory we may use M online to predict consequences of future action sequences, always choosing those with highest predicted success. This approach yields Hutter's (IDSIA) optimal general reinforcement learner AIXI. M and AIXI are incomputable though. But we provide links to computable RL methods that still are optimal in a certain sense.