Temporal Difference in Matlab R2013a
- SARSA (epsilon-greedy, with e=0.1 for the start)
- TD0 (Random Walk Policy)
- Q-Learning (epsilon-greedy, with e=0.1 for the start)
- Q-V Learing (epsilon-greedy, with e=0.1 for the start)
for the specified environment
Reference: Sutton, R. S. and Barto, A. G., "Reinforcement learning: An introduction," 1998