vds's Introduction

Value Disagreement Sampling (VDS)

This codebase is adapted from Openai baselines.

Run experiments

In project directory, run

python -m baselines.ve_run --alg=her --env=FetchPush-v1 --num_timesteps=500000 \
--size_ensemble=3 --log_path=./data/test_fetch_push

Baselines

OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.

These algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones.

Prerequisites

Baselines requires python3 (>=3.5) with the development headers. You'll also need system packages CMake, OpenMPI and zlib. Those can be installed as follows

Ubuntu

sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev

Mac OS X

Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the following:

brew install cmake openmpi

Virtual environment

From the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via

pip install virtualenv

Virtualenvs are essentially folders that have copies of python executable and all python packages. To create a virtualenv called venv with python3, one runs

virtualenv /path/to/venv --python=python3

To activate a virtualenv:

. /path/to/venv/bin/activate

More thorough tutorial on virtualenvs and options can be found here

Tensorflow versions

The master branch supports Tensorflow from version 1.4 to 1.14. For Tensorflow 2.0 support, please use tf2 branch.

Installation

Clone the repo and cd into it:

git clone https://github.com/openai/baselines.git
cd baselines

If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases, you may use
```
pip install tensorflow-gpu==1.14 # if you have a CUDA-compatible gpu and proper drivers
```
or
```
pip install tensorflow==1.14
```
to install Tensorflow 1.14, which is the latest version of Tensorflow supported by the master branch. Refer to TensorFlow installation guide for more details.
Install baselines package
```
pip install -e .
```

MuJoCo

Some of the baselines examples use MuJoCo (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from www.mujoco.org). Instructions on setting up MuJoCo can be found here

vds's People

Contributors

Stargazers

Watchers

vds's Issues

AttributeError: 'ParticleMazeEnv' object has no attribute '_reset_sim'

Hi, I want to test maze tasks, but I get such error: AttributeError: 'ParticleMazeEnv' object has no attribute '_reset_sim'

I run command as: python -m baselines.ve_run --alg=her --env=MazeA-v0 --num_timesteps=500000 --size_ensemble=3 --log_path=./data/test_MazeA-v0

Can't repreduce the result on HandReach envrionment

I use command 'python -m baselines.ve_run --alg=her --env=HandReach-v0 --num_timesteps=4000000 --size_ensemble=3 --log_path=./data/test_handreach' to train handreach envrionment, but it seems the algorithm has no effect on this envrionment. In the paper, when training 2 million steps, the test success rate is about 40%, but in my training, the success rate is 25% at most and is very unstable. The part of log file is as follows,could you provide any advices?

Logging to ./data/test_HandReach
Training her on goal:HandReach-v0 with arguments
{'size_ensemble': 3}
before mpi_fork: rank 0 num_cpu
after mpi_fork: rank 0 num_cpu 1
Creating a DDPG agent with action space 20 x 1.0...
T: 50
_Q_lr: 0.001
_action_l2: 1.0
_batch_size: 256
_buffer_size: 1000000
_clip_obs: 200.0
_disagreement_fun_name: std
_hidden: 256
_layers: 3
_max_u: 1.0
_n_candidates: 1000
_network_class: baselines.her.actor_critic:ActorCritic
_noise_eps: 0.2
_norm_clip: 5
_norm_eps: 0.01
_pi_lr: 0.001
_polyak: 0.95
_random_eps: 0.3
_relative_goals: False
_replay_k: 4
_replay_strategy: future
_rollout_batch_size: 2
_size_ensemble: 3
_test_with_polyak: False
_ve_batch_size: 1000
_ve_buffer_size: 1000000
_ve_lr: 0.001
_ve_replay_k: 4
_ve_replay_strategy: none
_ve_use_Q: True
_ve_use_double_network: True
aux_loss_weight: 0.0078
bc_loss: 0
ddpg_params: {'buffer_size': 1000000, 'hidden': 256, 'layers': 3, 'network_class': 'baselines.her.actor_critic:ActorCritic', 'polyak': 0.95, 'batch_size': 256, 'Q_lr': 0.001, 'pi_lr': 0.001, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'action_l2': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 'ddpg', 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173a60>, 'gamma': 0.98, 'bc_loss': 0, 'q_filter': 0, 'num_demo': 100, 'demo_batch_size': 128, 'prm_loss_weight': 0.001, 'aux_loss_weight': 0.0078, 'info': {'env_name': 'HandReach-v0'}}
demo_batch_size: 128
env_name: HandReach-v0
env_type: goal
gamma: 0.98
gs_params: {'n_candidates': 1000, 'disagreement_fun_name': 'std'}
make_env: <function prepare_params..make_env at 0x7f848c260f28>
n_batches: 40
n_cycles: 50
n_epochs: 800
n_test_rollouts: 10
num_cpu: 1
num_demo: 100
prm_loss_weight: 0.001
q_filter: 0
total_timesteps: 4000000
ve_n_batches: 100
ve_params: {'size_ensemble': 3, 'buffer_size': 1000000, 'lr': 0.001, 'batch_size': 1000, 'use_Q': True, 'use_double_network': True, 'hidden': 256, 'layers': 3, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 've', 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173ae8>, 'gamma': 0.98, 'polyak': 0.95}
Training...

| ddpg/stats_g/mean | 0.673 |
| ddpg/stats_g/std | 0.0189 |
| ddpg/stats_o/mean | 0.31 |
| ddpg/stats_o/std | 0.7 |
| epoch | 0 |
| test/episode | 20 |
| test/mean_Q | -2.89 |
| test/success_rate | 0 |
| test/sum_rewards | -49 |
| test/timesteps | 1e+03 |
| time_eval | 1.38 |
| time_rollout | 18.1 |
| time_train | 25.7 |
| time_ve | 311 |
| timesteps | 5e+03 |
| train/actor_loss | -1.62 |
| train/critic_loss | 0.0384 |
| train/episode | 100 |
| train/success_rate | 0 |
| train/sum_rewards | -49 |
| train/timesteps | 5e+03 |
| ve/loss | 0.00142 |
| ve/stats_disag/mean | 0.1 |
| ve/stats_disag/std | 0.0299 |
| ve/stats_g/mean | 0.672 |
| ve/stats_g/std | 0.0195 |
| ve/stats_o/mean | 0.302 |
| ve/stats_o/std | 0.701 |

Recommend Projects

zzyunzhi / vds Goto Github PK