ALL topUMass
Home
People
Research
Pubs
Contact
Links
Restricted
RL Repository

Department of Computer Science
University of Massachusetts Amherst

Research

Hierarchy

By applying temporal abstraction it is possible to construct hierarchical control architectures, such that temporally extended actions on one level of the hierarchy choose between actions on a lower level. In this context, a stochastic optimal control problem is generally modelled as a semi-Markov decision process (SMDP), a generalization of the MDP that incorporates temporally extended actions. Several hierarchical reinforcement learning (RL) algorithms have been proposed which compute approximate solutions to SMDPs. There are many advantages with hierarchical control architectures over flat architectures. Complexity is reduced since decisions are not required at each step, but rather at selected portions of the state space. Hierarchy also promotes generalization since low-level actions can be reused once they are learned.

Hierarchy is a key component for much of the research in ALL. In the past, we have been successful in developing techniques for learning and planning with temporally extended actions. Another prominent research area has been developing methods for autonomously discovering hierarchical control architectures based on experience. A current research topic is combining policy gradient methods with hierarchical RL algorithms to acquire efficient techniques for learning optimal behavior in problems with continuous state and action spaces. Another direction of research which we currently pursue is the concurrent execution of temporally extended actions in a hierarchical framework. We are also concerned with developing more compact representations of temporally extended actions. This research involves the application of the various approximate inference methods used in dynamic Bayes networks (DBNs). Many of these research topics have a potential application to robotics, industrial processes and other real-world systems.

Hierarchical control architectures facilitate the organization of memory which is vital to solving POMDPs. Since decisions are only made at selected points, an agent only needs to remember these key situations, and can ignore intermittent information. This reduces the space requirement for memory. Several techniques for structuring and using hierarchical memory have been proposed by ALL researchers. We are also development a theory of hierarchical POMDPs (HPOMDPs) and their application in indoor robot navigation tasks.

[ Top of page ]   [ ALL Home ]   [ Department of Computer Science ]   [ University of Massachusetts Amherst ]