Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment

ZHAO, Gang , Shoji TATSUMI,Ruoying SUN
Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (pdf - 172kb)

Abstract: In this paper, based on discussing different exploration methods, replacing the pre-action-selector in the Q-ee learning, we introduce a method that can be used to implement an active exploration to an environment, the Active Exploration Planning (AEP), into the learning system, which we call the Q-ae learning. With this replacement, the Q-ae learning not only maintains advantages of the Q-ee learning but also is adapted to a stochastic environment. Moreover, under deterministic MDPs, this paper presents the convergent condition and its proof for an agent to obtain the optimal policy by the method of the Q-ae learning. Further, by discussions and experiments, it is shown that by adjusting the relation between the learning factor and the discounted rate, the exploration process to an environment can be controlled on a stochastic environment. And, experimental results about the exploration rate to an environment and the correct rate of learned policies also illustrate the efficiency of the Q-ae learning on the stochastic environment.