Chapter 11: Case Studies
TD Gammon
A Few Details
Multi-layer Neural Network
Summary of TD-Gammon Results
Samuelís Checkers Player
Samuelís Backups
The Basic Idea
More Samuel Details
The Acrobot
Acrobot Learning Curves for Sarsa(l)
Typical Acrobot Learned Behavior
Elevator Dispatching
Semi-Markov Q-Learning
Passenger Arrival Patterns
Control Strategies
The Elevator Model(from Lewis, 1991)
State Space
Actions
Constraints
Performance Criteria
Average Squared Wait Time
Algorithm
Computing Rewards
Neural Networks
Elevator Results
Dynamic Channel Allocation
Job-Shop Scheduling