Reinforcement Learning

Fall 2003

Course Information

Course description: This course will provide a comprehensive introduction to reinforcement learning, a powerful approach to learning from interaction to achieve goals in stochastic and incompletely-known environments. Reinforcement learning has adapted key ideas from machine learning, operations research, control theory, psychology, and neuroscience to produce some strikingly successful applications. The focus is on algorithms for learning what actions to take, and when to take them, so as to optimize long-term performance. This may involve sacrificing immediate reward to obtain greater reward in the long-term or just to obtain more information about the environment. The course will cover Markov decision processes, dynamic programming, temporal-difference learning, policy gradient reinforcement learning methods, Monte Carlo reinforcement learning methods, eligibility traces, the role of function approximation, hierarchical reinforcement learning approaches, partially observable Markov decision processes, and the integration of learning and planning.

Lecture: Tuesday & Thursday 2:30-3:45, CMPS 140

Prerequisites: Interest in learning approaches to artificial intelligence; basic probability theory; computer programming ability. If you have passed Math 515 or equivalent, you have enough basic probability theory. If you have passed a programming course at the level of CMPSCI 287, you have enough programming ability; knowledge of C++ or Java is recommended. Please talk with the instructor if you want to take the course but have doubts about your qualifications.

Credit: 3 units

Instructors: Andrew G. Barto, barto at, 545-2109 and Amy McGovern, amy at, 577-1338

Teaching assistant: B. Ravindran, ravi at, 545-1596

Required book: We will be using a textbook by R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. (Clicking on the title will take you to a full description of the book, from which you can obtain a detailed look at what will be covered in this course.) The book is available at the textbook annex.

We will also assign additional reading for some of the advanced topics not covered in the book.

Schedule: Roughly, the plan is to cover one chapter from the book each week, starting with chapter 1 on September 4. A detailed schedule will be provided when the class begins.

Required work: There will be a set of exercises for each chapter comprising most of the non-programming exercises in the chapter. These will usually be due on the last day the chapter is covered in class (generally the second day of the chapter). All exercises will be marked and returned to you. Answer sheets for each exercise set will be made available at the end of the class on which the exercises are due. So you have to turn in your exercises on time. You are expected to spend time studying the answers provided.

Programming Exercises: Each student will complete a number of projects requiring programming during the course and for each, will hand in results of their work (details and due dates to be designated).

Exams: There will be a closed-book in-class midterm and a closed-book final exam during the exam period.