RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate

ZHAO, Gang , Shoji TATSUMI,Ruoying SUN
RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (pdf - 171kb)

Abstract: This paper proposes a RTP-Q reinforcement learning system which varies an efficient method for exploring an environment into time constraints exploration planning and compounds it into an integrated system of learning, planning and reacting for aiming for the best of both methods. Based on improving the performance of exploring an environment, refining the model of the environment, the RTP-Q learning system accelerates the learning rate for obtaining an optimal policy. The results of experiment on navigation tasks demonstrate that the RTP-Q learning system is efficient.