Giter Club home page Giter Club logo

vds's Introduction

Value Disagreement Sampling (VDS)

This codebase is adapted from Openai baselines.

Run experiments

In project directory, run

python -m baselines.ve_run --alg=her --env=FetchPush-v1 --num_timesteps=500000 \
--size_ensemble=3 --log_path=./data/test_fetch_push

Baselines

OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.

These algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones.

Prerequisites

Baselines requires python3 (>=3.5) with the development headers. You'll also need system packages CMake, OpenMPI and zlib. Those can be installed as follows

Ubuntu

sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev

Mac OS X

Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the following:

brew install cmake openmpi

Virtual environment

From the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via

pip install virtualenv

Virtualenvs are essentially folders that have copies of python executable and all python packages. To create a virtualenv called venv with python3, one runs

virtualenv /path/to/venv --python=python3

To activate a virtualenv:

. /path/to/venv/bin/activate

More thorough tutorial on virtualenvs and options can be found here

Tensorflow versions

The master branch supports Tensorflow from version 1.4 to 1.14. For Tensorflow 2.0 support, please use tf2 branch.

Installation

  • Clone the repo and cd into it:

    git clone https://github.com/openai/baselines.git
    cd baselines
  • If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases, you may use

    pip install tensorflow-gpu==1.14 # if you have a CUDA-compatible gpu and proper drivers

    or

    pip install tensorflow==1.14

    to install Tensorflow 1.14, which is the latest version of Tensorflow supported by the master branch. Refer to TensorFlow installation guide for more details.

  • Install baselines package

    pip install -e .

MuJoCo

Some of the baselines examples use MuJoCo (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from www.mujoco.org). Instructions on setting up MuJoCo can be found here

vds's People

Contributors

zzyunzhi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

vds's Issues

Can't repreduce the result on HandReach envrionment

I use command 'python -m baselines.ve_run --alg=her --env=HandReach-v0 --num_timesteps=4000000 --size_ensemble=3 --log_path=./data/test_handreach' to train handreach envrionment, but it seems the algorithm has no effect on this envrionment. In the paper, when training 2 million steps, the test success rate is about 40%, but in my training, the success rate is 25% at most and is very unstable. The part of log file is as follows,could you provide any advices?

Logging to ./data/test_HandReach
Training her on goal:HandReach-v0 with arguments
{'size_ensemble': 3}
before mpi_fork: rank 0 num_cpu
after mpi_fork: rank 0 num_cpu 1
Creating a DDPG agent with action space 20 x 1.0...
T: 50
_Q_lr: 0.001
_action_l2: 1.0
_batch_size: 256
_buffer_size: 1000000
_clip_obs: 200.0
_disagreement_fun_name: std
_hidden: 256
_layers: 3
_max_u: 1.0
_n_candidates: 1000
_network_class: baselines.her.actor_critic:ActorCritic
_noise_eps: 0.2
_norm_clip: 5
_norm_eps: 0.01
_pi_lr: 0.001
_polyak: 0.95
_random_eps: 0.3
_relative_goals: False
_replay_k: 4
_replay_strategy: future
_rollout_batch_size: 2
_size_ensemble: 3
_test_with_polyak: False
_ve_batch_size: 1000
_ve_buffer_size: 1000000
_ve_lr: 0.001
_ve_replay_k: 4
_ve_replay_strategy: none
_ve_use_Q: True
_ve_use_double_network: True
aux_loss_weight: 0.0078
bc_loss: 0
ddpg_params: {'buffer_size': 1000000, 'hidden': 256, 'layers': 3, 'network_class': 'baselines.her.actor_critic:ActorCritic', 'polyak': 0.95, 'batch_size': 256, 'Q_lr': 0.001, 'pi_lr': 0.001, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'action_l2': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 'ddpg', 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173a60>, 'gamma': 0.98, 'bc_loss': 0, 'q_filter': 0, 'num_demo': 100, 'demo_batch_size': 128, 'prm_loss_weight': 0.001, 'aux_loss_weight': 0.0078, 'info': {'env_name': 'HandReach-v0'}}
demo_batch_size: 128
env_name: HandReach-v0
env_type: goal
gamma: 0.98
gs_params: {'n_candidates': 1000, 'disagreement_fun_name': 'std'}
make_env: <function prepare_params..make_env at 0x7f848c260f28>
n_batches: 40
n_cycles: 50
n_epochs: 800
n_test_rollouts: 10
num_cpu: 1
num_demo: 100
prm_loss_weight: 0.001
q_filter: 0
total_timesteps: 4000000
ve_n_batches: 100
ve_params: {'size_ensemble': 3, 'buffer_size': 1000000, 'lr': 0.001, 'batch_size': 1000, 'use_Q': True, 'use_double_network': True, 'hidden': 256, 'layers': 3, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 've', 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173ae8>, 'gamma': 0.98, 'polyak': 0.95}
Training...

| ddpg/stats_g/mean | 0.673 |
| ddpg/stats_g/std | 0.0189 |
| ddpg/stats_o/mean | 0.31 |
| ddpg/stats_o/std | 0.7 |
| epoch | 0 |
| test/episode | 20 |
| test/mean_Q | -2.89 |
| test/success_rate | 0 |
| test/sum_rewards | -49 |
| test/timesteps | 1e+03 |
| time_eval | 1.38 |
| time_rollout | 18.1 |
| time_train | 25.7 |
| time_ve | 311 |
| timesteps | 5e+03 |
| train/actor_loss | -1.62 |
| train/critic_loss | 0.0384 |
| train/episode | 100 |
| train/success_rate | 0 |
| train/sum_rewards | -49 |
| train/timesteps | 5e+03 |
| ve/loss | 0.00142 |
| ve/stats_disag/mean | 0.1 |
| ve/stats_disag/std | 0.0299 |
| ve/stats_g/mean | 0.672 |
| ve/stats_g/std | 0.0195 |
| ve/stats_o/mean | 0.302 |
| ve/stats_o/std | 0.701 |

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.