时序差分学习

时序差分学习（Temporal-Difference Learning, TD learning）是强化学习中最核心与最著名的思想
‘If one had to identify one idea as central and novel to reinforcement learning, it would undoubtedly be temporal-difference (TD) learning.’ —Richard S. Sutton& Andrew G. Barto
TD = DP + MC
TD, DP都是使用下一时刻的状态函数来估计当前时刻的状态函数。
TD,MC都是通过经历一次一次与环境互动，产生多个episode来估计状态函数。

ＴＤ：

$V(S_t)\leftarrow V(S_t)+\alpha(G_t-V(S_t))$

ＭＣ：

$V(S_t)\leftarrow V(S_t)+\alpha(R_{t+1}+V(S_{t+1})-V(S_t))$