Error Bounds for Approximate Policy Iteration
Munos, RemiError Bounds for Approximate Policy Iteration
Icml 2003
( gzipped Postscript - 80 KB )
Abstract: In Dynamic Programming, convergence of algorithms such as Value Iteration
or Policy Iteration results --in discounted problems-- from a contraction
property of the back-up operator, guaranteeing convergence to its
fixed-point. When approximation is considered, known results in Approximate
Policy Iteration provide bounds on the closeness to optimality of
the approximate value function obtained by successive policy improvement
steps as a function of the maximum norm of value determination
errors during policy evaluation steps. Unfortunately, such results
have limited practical range since most function approximators (such
as linear regression) select the best fit in a given class of parameterized
functions by minimizing some (weighted) quadratic norm.
In this paper, we provide error bounds for Approximate Policy Iteration
using quadratic norms, and illustrate those results in the case of
feature-based linear function approximation.