Gradient Estimates of Return
Dimitrakakis, Christos , Samy BengioGradient Estimates of Return
IDIAP Research Report (abridged version presented at PASCAL workshop on principled methods of trading exploration and exploitation)
( gzipped Postscript - 185KB )
Abstract: The exploration-exploitation trade-off that arises when one considers
simple point estimates of expected returns no longer appears when full
distributions are considered. This work develops a simple
gradient-based approach for mainting such distributions and
investigates methods for using them to direct exploration.