temporal-difference's Introduction

Temporal-Difference

Temporal Difference in Matlab R2013a

SARSA (epsilon-greedy, with e=0.1 for the start)
TD0 (Random Walk Policy)
Q-Learning (epsilon-greedy, with e=0.1 for the start)
Q-V Learing (epsilon-greedy, with e=0.1 for the start)

for the specified environment

Reference: Sutton, R. S. and Barto, A. G., "Reinforcement learning: An introduction," 1998

temporal-difference's People

temporal-difference's Issues

Sarsa: 1st action selecting policy of a new episode

Thanks for your code again!
The first action of a new episode is selected randomly in your Sarsa algorithm(Line 14 action=randi([1 2],1,1);), but referring to Sutton, R. S.'s book we should "Choose A from S using policy derived from Q(e.g.,epsilon-greedy)".From my perspective,the Q mentioned here is the updated Q of last episode,not the Q at the very beginning(if it is the Q at the very beginning, the epsilon-greedy equals to random policy). Are you trying to make more exploitation with random selected action in the first step or we might do as the book as well ?

golnarkmahani / temporal-difference Goto Github PK

temporal-difference's Introduction

Temporal-Difference

temporal-difference's People

temporal-difference's Issues

Sarsa: 1st action selecting policy of a new episode

Q-V Learing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent