implementations-dqn's Introduction

Human-level control through Deep Reinforcement Learning

This repository is a implementation of the paper Human-level control through Deep Reinforcement Learning.

Please ⭐ this repository if you found it useful!

Table of Contents 📜

Summary
Installation
Running
Results
Differences from the Paper
Reproducibility

For implementations of other deep learning papers, check the implementations repository!

Summary 📝

Deep Q-Network (DQN) is a reinforcement learning algorithm that extends the tabular Q-Learning algorithm to large complex environments using neural networks. To train the algorithm efficiently, the authors suggest using Experience Replay and Target Networks.

Instead of the traditional Q-Learning algorithm that discards the interaction experience after learning from it once, DQN saves all these experience into a "replay buffer." This allows minibatch learning, which lowers variance and accelerates learning. Target network slows down the update of the Q-network that is used to compute the target of the MSE loss, which also lowers variance.

Installation 🧱

First, clone this repository from GitHub. Since this repository contains submodules, you should use the --recursive flag.

git clone --recursive https://github.com/seungjaeryanlee/implementations-dqn.git

If you already cloned the repository without the flag, you can download the submodules separately with the git submodules command:

git clone https://github.com/seungjaeryanlee/implementations-dqn.git
git submodule update --init --recursive

After cloing the repository, use the requirements.txt for simple installation of PyPI packages.

pip install -r requirements.txt

Running 🏃

Results 📊

This repository uses TensorBoard for offline logging and Weights & Biases for online logging. You can see the all the metrics in my summary report at Weights & Biases!

Differences from the Paper 👥

Reproducibility 🎯

implementations-dqn's People

Contributors

Stargazers

Watchers

implementations-dqn's Issues

Plots from paper

Running on Atari is too slow

Profiling was done with cProfile:

python -m cProfile train_eval_atari.py -c pong.conf  --USE_WANDB --ENV_STEPS=15000

It shows that get_torch_batch is very slow taking 2193.386 out of 2336.502 seconds (94% of the time).

>>> import pstats
>>> p = pstats.Stats('pong.cprofile.log')
>>> p.sort_stats('tottime').print_stats(10)
Fri Aug  2 06:31:44 2019    pong.cprofile.log

         20782479 function calls (19031333 primitive calls) in 2336.502 seconds

   Ordered by: internal time
   List reduced from 5974 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    52153 1716.914    0.033 1716.914    0.033 {built-in method numpy.array}
     2502  458.084    0.183 2193.386    0.877 /home/seungjaeryanlee/git/implementations-dqn/dqn/replays.py:130(get_torch_batch)
     2502   86.691    0.035   86.691    0.035 {method 'run_backward' of 'torch._C._EngineBase' objects}
     2502   14.445    0.006 1731.073    0.692 /home/seungjaeryanlee/miniconda3/envs/impl/lib/python3.7/site-packages/numpy/core/fromnumeric.py:42(_wrapit)
   500000   12.055    0.000   12.055    0.000 {method '__deepcopy__' of 'numpy.ndarray' objects}
    67292    9.204    0.000    9.204    0.000 /home/seungjaeryanlee/miniconda3/envs/impl/lib/python3.7/site-packages/atari_py/ale_python_interface.py:151(act)
    14407    8.868    0.001    8.869    0.001 {method 'to' of 'torch._C._TensorBase' objects}
    16514    4.499    0.000    4.499    0.000 {resize}
1751311/250003    2.291    0.000   17.615    0.000 /home/seungjaeryanlee/miniconda3/envs/impl/lib/python3.7/copy.py:132(deepcopy)
    20673    2.153    0.000    2.153    0.000 {built-in method conv2d}

cProfile output file: pong.cprofile.log

seungjaeryanlee / implementations-dqn Goto Github PK

implementations-dqn's Introduction

Human-level control through Deep Reinforcement Learning

Table of Contents 📜

Summary 📝

Installation 🧱

Running 🏃

Results 📊

Differences from the Paper 👥

Reproducibility 🎯

implementations-dqn's People

Contributors

Stargazers

Watchers

implementations-dqn's Issues

Plots from paper

Running on Atari is too slow

Plots from Weights & Biases

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent