Chapter 7: Eligibility Traces


N-step TD Prediction

Mathematics of N-step TD Prediction

Learning with N-step Backups

Random Walk Examples

A Larger Example

Averaging N-step Returns

Forward View of TD(l)

l-return Weighting Function

Relation to TD(0) and MC

Forward View of TD(l) II

l-return on the Random Walk

Backward View of TD(l)

On-line Tabular TD(l)

Backward View

Relation of Backwards View to MC & TD(0)

Forward View = Backward View

On-line versus Off-line on Random Walk

Control: Sarsa(l)

Sarsa(l) Algorithm

Sarsa(l) Gridworld Example

Three Approaches to Q(l)

Watkinsís Q(l)

Pengís Q(l)

NaÔve Q(l)

Comparison Task

Comparison Results

Convergence of the Q(l)ís

Eligibility Traces for Actor-Critic Methods

Replacing Traces

Replacing Traces Example

Why Replacing Traces?

More Replacing Traces

Implementation Issues

Variable l


Something Here is Not Like the Other

Author: Andy Barto


