Reinforcement Learning Repository at MSU

Topics: Applications to DP/MDP

Dynamic programming (DP) refers to a solution method for finding optimal solutions to problems with a compositional cost structure. Richard Bellman, Ronald Howard, and David Blackwell laid the foundations for most of the early research into this problem. In particular, they originated the most popular algorithms (value iteration and policy iteration), as well as made significant contributions to the the mathematical study of Markov decision processes (MDP). Much of the current research into RL is based on the framework of DP and MDP.

References: An extensive discussion of MDPs may be found in:
Puterman, M. Markov Decision Processes: Discrete Dynamic Stochastic Programming. John Wiley, 1994.