Reinforcement Leaning Tutorial

About

Weekend Deep Reinforcement Learning (DRL) is a self-study of DRL in my free time. DRL is very easy, especially when you already have a bit background in Control and Deep Learning. Even without the background, the concept is still very simple, so why not study and have fun with it.

My implementation aims to provides a minimal code implementation, and short notes to summarize the theory.

The code, modules, and config system are written based on mmcv configs and registry system, thus very easy to adopt, adjust components by changing the config files.
Lecture Notes: No lengthy math, just the motivation concept, key equations for implementing, and a summary of tricks that makes the methods work. More important, I try to make the connection with previous methods as possible.

My learning strategy is to go directly to summarize and implement the papers, starting from the basic one. I hate the fact that most of the books in RL always start with very heavy theory background, asking us to remember many vague definitions, such as what is On-Line, Off-Line, Policy Gradient, etc. NO, NO, NO !!! Let play with the basic blocks first. When we feel comfortable, just recap and introduce these concepts later. It is absolutely fine if you don't remember these definitions at all.

Following are the great resource that I learn from:

1. Env Setup:

conda create -n RL --python=3.8 -y
conda install tqdm mathplotlib scipy
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
pip install gym 
pip install gym[all] #Install the environment dependence
# or pip install cmake 'gym[atari]'
pip install pybullet

2. Try Gym environment

import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
    observation = env.reset() # Before start, reset the environment 
    for t in range(100):
        env.render()            
        print(observation)
        action = env.action_space.sample() # This is where your code should return action
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()

Every environment comes with an env.action_space and an env.observation_space.
List all available environments: gym.envs.registry.all().

3. Algorithms:

Paper ranking:

🏆 Must known benchmark papers.
🚀 Improved version of benchmark papers. Come back after finishing the benchmark papers.

Q-Learning: Introduction to RL with Q-Learning
Deep Q-Learning:
- 🏆 Deep Q-Network (DQN - Nature 2015): code | config
- 🏆 Double-DQN (DDQN - AAAI 2016): code | config
- 🚀 Dueling DQN (DuelDQN - ICLM 2016)
Actor-Critic methods:
- 🏆 Deep Deterministic Policy Gradient (DDPG - ICLR 2016): Note | code | config
- 🏆 Twin Delayed DDPG (TD3 - ICML 2018): Note | code | config
- 🏆 Soft Actor-Critic (SAC - ICML 2018): Note | code | config
- 🚀 Meta-SAC (ICML 7th Workshop -2020)
- 🚀 Smooth Exploration for Robotic Reinforcement Learning (arXiv 2021)
Recap and overview of RL methods:
Policy Gradient:
- Vanilla Policy Gradient
- 🏆 Trust Region Policy Optimization (TRPO - ICML 2015)
- 🏆 Proximal Policy Optimization (PP0 -2017).
- 🚀 Truly Policy Optimization (TPPO - MLR 2020)
How to deal with Sparse Reward for Off-Line learning:
On-Line Policy (TBD)
Model-Based Learning (TBD)
Multi-Agent Learning (TBD)

4. Usage:

Except the first Q-Learning tutorial, that is for RL introduction, all other methods can be easily trained as:

python tools/train.py [path/to/config.py] [--extra_args]

For example, to train a Deep Q-Learning (DQN) for mountain car env, use:

python tools/train.py configs/DQN/dqn_mountain_car.py

chuong98 / weekend-deeprl Goto Github PK

weekend-deeprl's Introduction

Reinforcement Leaning Tutorial

About

1. Env Setup:

2. Try Gym environment

3. Algorithms:

4. Usage:

weekend-deeprl's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent