Gradient Estimates of Return

Dimitrakakis, Christos , Samy Bengio
Gradient Estimates of Return
IDIAP Research Report (abridged version presented at PASCAL workshop on principled methods of trading exploration and exploitation) ( gzipped Postscript - 185KB )

Abstract: The exploration-exploitation trade-off that arises when one considers simple point estimates of expected returns no longer appears when full distributions are considered. This work develops a simple gradient-based approach for mainting such distributions and investigates methods for using them to direct exploration.