|
| Lecture | Date | Class topic | Reading | Homework assigned | Homework Due |
|---|---|---|---|---|---|
| Lecture 1 | Th Sept 4 | Introduction and course overview | Chapter 1 | Exercise Set 1: ex. 1.1-1.5 (PDF) | |
| Lecture 2 | Tu Sept 9 | Evaluative Feedback | Chapter 2 | Exercise Set 2: ex. 2.3, 2.4, 2.6, 2.8, 2.13, 2.16 | Exercise Set 1 |
| Lecture 3 | Th Sept 11 | Evaluative feedback continued, policy gradient methods (PS, PDF) | Chapter 2, REINFORCE | Programming Exercise 1 | |
| Lecture 4 | Tu Sept 16 | The RL Problem | Chapter 3 | Exercise Set 3: ex. 3.2-3.5, modified 3.6, 3.7-3.17 | Exercise Set 2 |
| Lecture 5 | Th Sept 18 | The RL problem continued | Chapter 3 | ||
| Lecture 6 | Tu Sept 23 | The RL problem continued | Chapter 3, modified slides: PDF, PS. | Exercise Set 4: ex. 4.1, 4.2, 4.3, 4.5, 4.6, 4.7, 4.9 | Exercise Set 3 |
| Lecture 7 | Th Sept 25 | Dynamic Programming | Chapter 4 | Programming Exercise 2 | Programming Exercise 1 |
| Lecture 8 | Tu Sept 30 | Dynamic Programming continued | Chapter 4 | Exercise Set 4 | |
| Lecture 9 | Th Oct 2 | Monte-Carlo Methods, importance sampling | Chapter 5, Importance Sampling | Exercise Set 5: ex. 5.1, 5.2, 5.3, 5.5, 5.6, 5.7 | |
| Lecture 10 | Tu Oct 7 | Monte-Carlo methods continued, rollouts, policy gradient algorithms | Chapter 5, Policy Gradient, GPOMDP | ||
| Lecture 11 | Th Oct 9 | Temporal-Difference learning | Chapter 6, Convergence of Sarsa(0) | Exercise Set 6: ex. 6.1, 6.2, 6.4, 6.5, 6.8, 6.9, 6.10, 6.12 | Exercise Set 5 |
| Lecture 12 | Tu Oct 14 | TD learning continued | Chapter 6 | ||
| Lecture 13 | Th Oct 16 | TD learning continued, actor-critic methods and policy gradient algorithms | Chapter 6 | Programming Exercise 2, Exercise Set 6 | |
| Lecture 14 | Tu Oct 21 | Eligibility traces | Chapter 7 | Exercise Set 7. ex: 7.2, 7.4, 7.5, 7.6, 7.8, 7.9, 7.10 | |
| Lecture 15 | Th Oct 23 | Midterm Review | |||
| Lecture 16 | Tu Oct 28 | In class midterm exam. Chapters 1-6. | |||
| Lecture 17 | Th Oct 30 | Eligibility traces continued, Generalization and function approximation | Chapter 7, Chapter 8 | ||
| Lecture 18 | Tu Nov 4 | Generalization and function approximation continued | Chapter 8 | Programming Exercise 3, Exercise Set 8. ex: 8.1-8.7 | Exercise Set 7 |
| Lecture 19 | Th Nov 6 | Generalization and function approximation continued, structured representations, neural networks, convergence results | Chapter 8, | ||
| Tu Nov 11 | Holiday - Veteran's day | ||||
| Lecture 20 | Th Nov 13 | Generalization and function approximation continued | Chapter 8 | Exercise Set 8 | |
| Lecture 21 | Tu Nov 18 | Function approximation, Policy gradient, Actor-critic | Chapter 8, Actor-Critic: Algorithm, Papers: 1, 2 | Exercise Set 9: 9.1-9.3, 9.5, Programming Exercise 4 | Programming Exercise 3 |
| Lecture 22 | Th Nov 20 | Planning and Learning, model based methods, E^3 algorithm | Chapter 9 | ||
| Lecture 23 | Tu Nov 25 | Hierarchical reinforcement learning, Options framework | Recent Advances in Hierarchical Reinforcement Learning. Only read sections 1-4. | Exercise Set 10 | Checkpoint for Programming Exercise 4, Exercise Set 9 |
| Thanksgiving Recess | |||||
| Lecture 24 | Tu Dec 2 | Hierarchical reinforcement learning continued, MAXQ value function decomposition, HAM framework | |||
| Lecture 25 | Th Dec 4 | Case studies | Chapter 11, | Exercise Set 10, | |
| Lecture 26 | Tu Dec 9 | Case studies continued, Open problems in RL | Chapter 11, | Programming Exercise 4 | |
| Lecture 27 | Th Dec 11 | Review for final exam | |||
| TBA | Final Exam. | ||||
| Sa Dec 20 | Last day of exams | ||||