
Course description: This course will provide a comprehensive introduction to reinforcement learning, a powerful approach to learning from interaction to achieve goals in stochastic and incompletelyknown environments. Reinforcement learning has adapted key ideas from machine learning, operations research, control theory, psychology, and neuroscience to produce some strikingly successful applications. The focus is on algorithms for learning what actions to take, and when to take them, so as to optimize longterm performance. This may involve sacrificing immediate reward to obtain greater reward in the longterm or just to obtain more information about the environment. The course will cover Markov decision processes, dynamic programming, temporaldifference learning, policy gradient reinforcement learning methods, Monte Carlo reinforcement learning methods, eligibility traces, the role of function approximation, hierarchical reinforcement learning approaches, partially observable Markov decision processes, and the integration of learning and planning.
Lecture: Tuesday & Thursday 2:303:45, CMPS 140
Prerequisites: Interest in learning approaches to artificial intelligence; basic probability theory; computer programming ability. If you have passed Math 515 or equivalent, you have enough basic probability theory. If you have passed a programming course at the level of CMPSCI 287, you have enough programming ability; knowledge of C++ or Java is recommended. Please talk with the instructor if you want to take the course but have doubts about your qualifications.
Credit: 3 units
Instructors: Andrew G. Barto, barto at cs.umass.edu, 5452109 and Amy McGovern, amy at cs.umass.edu, 5771338
Teaching assistant: B. Ravindran, ravi at cs.umass.edu, 5451596
Required book: We will be using a textbook by R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. (Clicking on the title will take you to a full description of the book, from which you can obtain a detailed look at what will be covered in this course.) The book is available at the textbook annex.
We will also assign additional reading for some of the advanced topics not covered in the book. Schedule: Roughly, the plan is to cover one chapter from the book each week, starting with chapter 1 on September 4. A detailed schedule will be provided when the class begins.Required work: There will be a set of exercises for each chapter comprising most of the nonprogramming exercises in the chapter. These will usually be due on the last day the chapter is covered in class (generally the second day of the chapter). All exercises will be marked and returned to you. Answer sheets for each exercise set will be made available at the end of the class on which the exercises are due. So you have to turn in your exercises on time. You are expected to spend time studying the answers provided.
Programming Exercises: Each student will complete a number of projects requiring programming during the course and for each, will hand in results of their work (details and due dates to be designated).
Exams: There will be a closedbook inclass midterm and a closedbook final exam during the exam period.
Grading: