Chapter 11: Case Studies

TD Gammon

A Few Details

Multi-layer Neural Network

Summary of TD-Gammon Results

Samuelís Checkers Player

Samuelís Backups

The Basic Idea

More Samuel Details

The Acrobot

Acrobot Learning Curves for Sarsa(l)

Typical Acrobot Learned Behavior

Elevator Dispatching

Semi-Markov Q-Learning

Passenger Arrival Patterns

Control Strategies

The Elevator Model (from Lewis, 1991)

State Space

Actions

Constraints

Performance Criteria

Average Squared Wait Time

Algorithm

Computing Rewards

Neural Networks

Elevator Results

Dynamic Channel Allocation

Job-Shop Scheduling