Reinforcement Learning Repository at MSU

Topics: Applications to Average-Reward /Undiscounted Methods

These methods are based on an optimality criteria where the agent tries to maximize the expected payoff per step (rather than the expected discounted sum of rewards). Undiscounted methods are less well-understood than discounted methods at the present time, and are also significantly harder to analyze theoretically.