Giter Club home page Giter Club logo

deeprl's Introduction

DeepRL

Highly modularized implementation of popular deep RL algorithms by PyTorch. My principal here is to reuse as much components as I can through different algorithms, use as less tricks as I can and switch easily between classical control tasks like CartPole and Atari games with raw pixel inputs.

Implemented algorithms:

  • Deep Q-Learning (DQN)
  • Double DQN
  • Dueling DQN
  • Async Advantage Actor Critic (A3C)
  • Async One-Step Q-Learning
  • Async One-Step Sarsa
  • Async N-Step Q-Learning
  • Continuous A3C
  • Deep Deterministic Policy Gradient (DDPG)
  • Hybrid Reward Architecture (HRA)

Curves

Curves for CartPole are trivial so I didn't place it here.

DQN, Double DQN, Dueling DQN

Loading... Loading...

The network and parameters here are exactly same as the DeepMind Nature paper. Training curve is smoothed by a window of size 100. All the models are trained in a server with Xeon E5-2620 v3 and Titan X. For Breakout, test is triggered every 1000 episodes with 50 repetitions. In total, 16M frames cost about 4 days and 10 hours. For Pong, test is triggered every 10 episodes with no repetition. In total, 4M frames cost about 18 hours.

Discrete A3C

Loading... Loading...

The network I used here is a smaller network with only 42 * 42 input, alougth the network for DQN can also work here, it's quite slow.

Training of A3C took about 2 hours (16 processes) in a server with two Xeon E5-2620 v3. While other async methods took about 1 day. Those value based async methods do work but I don't know how to make them stable. This is the test curve. Test is triggered in a separate deterministic test process every 50K frames.

Continuous A3C

Loading...

Sometimes Bipedal Walker may run into NAN, I'm still not able to totally solve it. And continuous A3C is very sensible to hyper parameters.

Dependency

  • Open AI gym
  • PyTorch
  • PIL (pip install Pillow)
  • Python 2.7 (I didn't test with Python 3)
  • Tensorflow (We need tensorboard)

Usage

Detailed usage and all training parameters can be found in main.py. And you need to create following directories before running the program:

cd DeepRL
mkdir data log evaluation_log

References

deeprl's People

Contributors

shangtongzhang avatar

Watchers

 avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.