Giter Club home page Giter Club logo

nec's Introduction

Neural Episodic Control

Tensorflow implementation of Neural Episodic Control (Pritzel et al.).

⚠️ This is not an official implementation, and might have some glitch (,or a major defect).

Basic Setup

  1. Please install basic Deep RL packages with your environment management system.

This code is tested under the environment having those packages:

python3.6+
tensorflow == 1.13.1
gym['atari'] == 0.12.0
numpy, opencv, etc.
  1. Clone this repo
git clone [email protected]:hiwonjoon/NEC.git --recursive
  1. Install pyflann approximated nearest neighbor algorithm.
cd libs/flann
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=<path you want to install flann & pyflann library> ..
make -j8
make install

Training

python q_learning.py --log_dir <path to logdir> --env_id <env id such as 'BreakoutNoFrameskip-v4'>

For more hyperparameters, please check out q_learning.py.

Note that the checkpoint file generated during training can be huge since the memory has to be preserved as a part of model. In default, the checkpoint will be generated every 1 million timesteps.

Evaluation

python q_learning.py --log_dir <path to logdir> --env_id <env id such as 'BreakoutNoFrameskip-v4'> --mode eval

You can specify specific model with the --model_file option; for example, policy after 5M timesteps can be loaded by providing --model_file model.ckpt-5000000 option.

Implementation Difference

The paper doesn't reveal few hyperparameters, such as epsilon decaying strategy or learning rate alpha. Such parameters are just picked using my hunch, and with the help of the other NEC repo. Along with unrevealed hyperparameters, I also used different hyperparameters: delta of 1e-5 instead of reported 1e-3 is used, the network and DND is update every 32 frames instead of 16, and replay buffer size is set to 30000 instaed of 100000.

One of the feature that is not implemented is replacing a memory when it find a exact match from a dictionary. However, I am unsure whether it will really happen since both embedding network and the saved embedding in the dictionary keep changing, so it is very unlikely to get a exact match even the exact same state is visited multiple times.

Sample Result

  • Pong

    • Trained policy after 5 million frames of training

pong_after_5M_frames

  • Average Episode Reward: 21 (vs about 15 @ 5 million frames, 20.4 @ 10 million frames from the original NEC paper)

  • Training Statistics

pong_training

  • Breakout

    • Trained policy after 5 million frames of training

breakout_after_5M_frames

  • Average Episode Reward: 27.5 @ 5M, 133.5 @ 10M (vs 13.6 @ 10 million frames from the original NEC paper)

  • Training Statistics

pong_training

  • Hero

    • Trained policy after 5 million frames of training

hero_after_5M_frames

  • Average Episode Reward: 13395 @ 5M and saturated. (vs about 13000 @ 5M, 16265.3 @ 10M)

  • MsPacman

    • Trained policy after 5 million frames of training

MsPacman_after_5M_frames

  • Average Episode Reward: 1800 @ 5M, 2118 @ 10M (vs about 2800 @ 5M, 4142.8 @ 10M)

  • Alien

    • Trained policy after 5 million frames of training

Alien_after_5M_frames

  • Average Episode Reward: 490 @ 5M and 800 @ 10M (vs about 2200 @ 5M, 3460.6 @ 10M)

  • Frostbite

    • Trained policy after 5 million frames of training

Frostbite_after_5M_frames

  • Average Episode Reward: 260 @ 5M and saturated. (vs about 1500 @ 5M, 2747.4 @ 10M)

What is the big deal here?

IMHO, the coolest part of NEC is its straightforwardness; for example, it does not requrie reward scaling scheme (most of the other RL altorithms clip reward -1 to 1 in order to stabilize value function learning). It is basically cool continous extension of classical Tabular Q-learning with deep learning.

Enjoy! 🍺

nec's People

Contributors

hiwonjoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nec's Issues

when install pyflann meet error

hello, I meet this problem ,when I want to install pyflann: I run 'sudo cmake install'

running install
running build
running build_py
package init file '/home/lin/桌面/code/NEC/libs/flann/build/lib/init.py' not found (or not a regular file)
package init file '/home/lin/桌面/code/NEC/libs/flann/build/lib/init.py' not found (or not a regular file)
running install_lib
running install_egg_info
Removing /home/lin/anaconda3/lib/python3.7/site-packages/flann-1.9.1-py3.7.egg-info
Writing /home/lin/anaconda3/lib/python3.7/site-packages/flann-1.9.1-py3.7.egg-info
CMake Error at examples/cmake_install.cmake:47 (file):
file INSTALL cannot find
"/home/lin/code/NEC/libs/flann/build/bin/flann_example_c".
Call Stack (most recent call first):
cmake_install.cmake:70 (include)

Do you know how to solve this problem? Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.