Giter Club home page Giter Club logo

d4pg's Introduction

Distributed Distributional Deep Deterministic Policy Gradients (D4PG)

A Tensorflow implementation of a Distributed Distributional Deep Deterministic Policy Gradients (D4PG) network, for continuous control.

D4PG builds on the Deep Deterministic Policy Gradients (DDPG) approach (paper, code), making several improvements including the introduction of a distributional critic, using distributed agents running on multiple threads to collect experiences, prioritised experience replay (PER) and N-step returns.

Trained on OpenAI Gym environments.

This implementation has been successfully trained and tested on the Pendulum-v0, BipedalWalker-v2 and LunarLanderContinuous-v2 environments. This code can however be run on any environment with a low-dimensional (non-image) state space and continuous action space.

This currently holds the high score for the Pendulum-v0 environment on the OpenAI leaderboard

Requirements

Note: Versions stated are the versions I used, however this will still likely work with other versions.

Usage

The default environment is 'Pendulum-v0'. To use a different environment simply change the ENV parameter in params.py before running the following files.

To train the D4PG network, run

  $ python train.py

This will train the network on the specified environment and periodically save checkpoints to the /ckpts folder.

To test the saved checkpoints during training, run

  $ python test_every_new_ckpt.py

This should be run alongside the training script, allowing to periodically test the latest checkpoints as the network trains. This script will invoke the run_every_new_ckpt.sh shell script which monitors the given checkpoint directory and runs the test.py script on the latest checkpoint every time a new checkpoint is saved. Test results are saved to a text file in the /test_results folder (optional).

Once we have a trained network, we can visualise its performance in the environment by running

  $ python play.py

This will play the environment on screen using the trained network and save a GIF (optional).

Note: To reproduce the best 100-episode performance of -123.11 +/- 6.86 that achieved the top score on the 'Pendulum-v0' OpenAI leaderboard, run

  $ python test.py

specifying the train_params.ENV and test_params.CKPT_FILE parameters in params.py as Pendulum-v0 and Pendulum-v0.ckpt-660000 respectively.

Results

Result of training the D4PG on the 'Pendulum-v0' environment:

Result of training the D4PG on the 'LunarLanderContinuous-v2' environment:

Result of training the D4PG on the 'BipedalWalker-v2' environment:

Result of training the D4PG on the 'BipedalWalkerHardcore-v2' environment:

Environment Best 100-episode performance Ckpt file
Pendulum-v0 -123.11 +/- 6.86 ckpt-660000
LunarLanderContinuous-v2 290.87 +/- 2.00 ckpt-320000
BipedalWalker-v2 304.62 +/- 0.13 ckpt-940000
BipedalWalkerHardcore-v2 256.29 +/- 7.08 ckpt-8130000

All checkpoints for the above results are saved in the ckpts folder and the results can be reproduced by running python test.py and specifying the train_params.ENV and test_params.CKPT_FILE parameters in params.py for the desired environment and checkpoint file.

To-do

  • Train/test on further environments, including Mujoco

References

License

MIT License

d4pg's People

Contributors

msinto93 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.