Giter Club home page Giter Club logo

implementations-dqn's Introduction

Human-level control through Deep Reinforcement Learning

black Build Status flake8 Build Status isort Build Status pytest Build Status

numpydoc Docstring Style pre-commit

This repository is a implementation of the paper Human-level control through Deep Reinforcement Learning.

Please โญ this repository if you found it useful!


Table of Contents ๐Ÿ“œ

For implementations of other deep learning papers, check the implementations repository!


Summary ๐Ÿ“

DQN Architecture

Deep Q-Network (DQN) is a reinforcement learning algorithm that extends the tabular Q-Learning algorithm to large complex environments using neural networks. To train the algorithm efficiently, the authors suggest using Experience Replay and Target Networks.

Instead of the traditional Q-Learning algorithm that discards the interaction experience after learning from it once, DQN saves all these experience into a "replay buffer." This allows minibatch learning, which lowers variance and accelerates learning. Target network slows down the update of the Q-network that is used to compute the target of the MSE loss, which also lowers variance.

Installation ๐Ÿงฑ

First, clone this repository from GitHub. Since this repository contains submodules, you should use the --recursive flag.

git clone --recursive https://github.com/seungjaeryanlee/implementations-dqn.git

If you already cloned the repository without the flag, you can download the submodules separately with the git submodules command:

git clone https://github.com/seungjaeryanlee/implementations-dqn.git
git submodule update --init --recursive

After cloing the repository, use the requirements.txt for simple installation of PyPI packages.

pip install -r requirements.txt

Running ๐Ÿƒ

Results ๐Ÿ“Š

This repository uses TensorBoard for offline logging and Weights & Biases for online logging. You can see the all the metrics in my summary report at Weights & Biases!

Train Episode Return Evaluation Episode Return TD Loss

Differences from the Paper ๐Ÿ‘ฅ

Reproducibility ๐ŸŽฏ

implementations-dqn's People

Contributors

seungjaeryanlee avatar

Stargazers

 avatar Sungjin Chun avatar wenwu avatar

Watchers

James Cloos avatar  avatar

implementations-dqn's Issues

Running on Atari is too slow

Profiling was done with cProfile:

python -m cProfile train_eval_atari.py -c pong.conf  --USE_WANDB --ENV_STEPS=15000

It shows that get_torch_batch is very slow taking 2193.386 out of 2336.502 seconds (94% of the time).

>>> import pstats
>>> p = pstats.Stats('pong.cprofile.log')
>>> p.sort_stats('tottime').print_stats(10)
Fri Aug  2 06:31:44 2019    pong.cprofile.log

         20782479 function calls (19031333 primitive calls) in 2336.502 seconds

   Ordered by: internal time
   List reduced from 5974 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    52153 1716.914    0.033 1716.914    0.033 {built-in method numpy.array}
     2502  458.084    0.183 2193.386    0.877 /home/seungjaeryanlee/git/implementations-dqn/dqn/replays.py:130(get_torch_batch)
     2502   86.691    0.035   86.691    0.035 {method 'run_backward' of 'torch._C._EngineBase' objects}
     2502   14.445    0.006 1731.073    0.692 /home/seungjaeryanlee/miniconda3/envs/impl/lib/python3.7/site-packages/numpy/core/fromnumeric.py:42(_wrapit)
   500000   12.055    0.000   12.055    0.000 {method '__deepcopy__' of 'numpy.ndarray' objects}
    67292    9.204    0.000    9.204    0.000 /home/seungjaeryanlee/miniconda3/envs/impl/lib/python3.7/site-packages/atari_py/ale_python_interface.py:151(act)
    14407    8.868    0.001    8.869    0.001 {method 'to' of 'torch._C._TensorBase' objects}
    16514    4.499    0.000    4.499    0.000 {resize}
1751311/250003    2.291    0.000   17.615    0.000 /home/seungjaeryanlee/miniconda3/envs/impl/lib/python3.7/copy.py:132(deepcopy)
    20673    2.153    0.000    2.153    0.000 {built-in method conv2d}

cProfile output file: pong.cprofile.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.