Temporal difference (TD) methods is a general framework
for solving sequential prediction and control problems, whereby an
agent learns by comparing temporally successive predictions. A key
strength of TD methods is that the agent can learn before seeing the
final outcome. Q-learning is one of the most popular TD methods.