ch5_MCLearning.py
: monte-carlo 학습방법론ch5_TDLearning.py
: time-difference 학습방법론
ch6_MCControl.py
: monte-carlo controlch6_SARSA.py
: on-policy TD control - SARSAch6_QLearning.py
: off-policy TD control - QLearning
ch7_CosineFitting.py
: Neural Network 기초
ch8_replaybuffer.py
: replaybuffer techniquech8_DQN.py
: Deep Q-Learning (Value network)ch8_DQN_main.py
: 메인 함수 (600번의 에피소드)
ch9_REINFORCE.py
: REINFORCE algorithm (Policy network)ch9_REINFORCE_main.py
: 메인 함수 (2000번의 에피소드)ch9_ActorCritic.py
: TD ActorCritic (Value + Policy network)ch9_ActorCritic_main.py
: 메인 함수 (1000번의 에피소드)