







|
Department of Computer Science
University of Massachusetts Amherst
Research
Hierarchy
By applying temporal abstraction it is possible to construct
hierarchical control architectures, such that temporally extended
actions on one level of the hierarchy choose between actions on a
lower level. In this context, a stochastic optimal control problem is
generally modelled as a semi-Markov decision process (SMDP), a
generalization of the MDP that incorporates temporally extended
actions. Several hierarchical reinforcement learning (RL) algorithms
have been proposed which compute approximate solutions to SMDPs.
There are many advantages with hierarchical control architectures over
flat architectures. Complexity is reduced since decisions are not
required at each step, but rather at selected portions of the state
space. Hierarchy also promotes generalization since low-level actions
can be reused once they are learned.
Hierarchy is a key component for much of the research in ALL. In the
past, we have been successful in developing techniques for learning
and planning with temporally extended actions. Another prominent
research area has been developing methods for autonomously discovering
hierarchical control architectures based on experience. A current
research topic is combining policy gradient methods with hierarchical
RL algorithms to acquire efficient techniques for learning optimal
behavior in problems with continuous state and action spaces. Another
direction of research which we currently pursue is the concurrent
execution of temporally extended actions in a hierarchical framework.
We are also concerned with developing more compact representations of
temporally extended actions. This research involves the application
of the various approximate inference methods used in dynamic Bayes
networks (DBNs). Many of these research topics have a potential
application to robotics, industrial processes and other real-world
systems.
Hierarchical control architectures facilitate the organization of
memory which is vital to solving POMDPs. Since decisions are only
made at selected points, an agent only needs to remember these key
situations, and can ignore intermittent information. This reduces the
space requirement for memory. Several techniques for structuring and
using hierarchical memory have been proposed by ALL researchers. We
are also development a theory of hierarchical POMDPs (HPOMDPs) and
their application in indoor robot navigation tasks.
|