Chapter 11: Case Studies

TD Gammon

A Few Details

Multi-layer Neural Network

Summary of TD-Gammon Results

Samuelís Checkers Player

Samuelís Backups

The Basic Idea

More Samuel Details

The Acrobot

Acrobot Learning Curves for Sarsa(l)

Typical Acrobot Learned Behavior

Elevator Dispatching

Semi-Markov Q-Learning

Passenger Arrival Patterns

Control Strategies

The Elevator Model (from Lewis, 1991)

State Space



Performance Criteria

Average Squared Wait Time


Computing Rewards

Neural Networks

Elevator Results

Dynamic Channel Allocation

Job-Shop Scheduling