vcrl's Introduction

Variational Curriculum Reinforcement Learning

This repository contains the official training and evaluation code for Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills. It provides an implementation of VCRL variants on the Sawyer environment, including the presented method Value Uncertainty Variational Curriculum (VUVC).

Installation

All Python dependencies are listed in environment.yml. To set up the environment, follow these steps:

Install the Anaconda environment by running the following command:
```
conda env create -f environment.yml
```
Activate vcrl environment:
```
conda activate vcrl
```
Install the codebase by running:
```
pip install -e .
```

Usage

General Usage

You can use the following command to run the training and evaluation code:

python -m scripts.METHOD \
    BASE_LOGDIR \
    --gpu_id GPU_ID \
    --snapshot_gap SNAPSHOT_GAP \
    --seed SEED \
    --spec EXP_SPEC

The placeholders should be replaced with the appropriate values:

METHOD: Training method. Choose one of the VCRL variants: [her, rig, edl, skewfit, vuvc].
BASE_LOGDIR: Sub-directory where the training and evaluation results will be saved, including the policy, replay buffer, and training log.
GPU_ID: GPU ID to use.
SNAPSHOT_GAP: Save the model every SNAPSHOT_GAP training epochs. The best performing model will be saved as params.pkl.
SEED: Random seed. The seeds that we used in the paper range from 0 to 4.
EXP_SPEC: Experiment specification. The results will be saved at BASE_LOGDIR/vcrl_logs/ENV_ID/METHOD/EXP_SPEC/SEED/.

By default, hyperparameters used in the paper are defined in the script files for each training method. To test different configurations, you can override them with your own choices.

Training with EDL is in two stages: 1) training a VAE along with a density-based exploration policy and 2) unsupervised training of skills. To specify the training stage, use the --mode flag with the options train_vae or train_policy in the command line for edl.

Examples

Here are some examples of running the code on the SawyerDoorHook environment:

# VUVC
python -m scripts.SawyerDoorHook.vuvc /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# HER
python -m scripts.SawyerDoorHook.her /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# RIG
python -m scripts.SawyerDoorHook.rig /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# Skew-Fit
python -m scripts.SawyerDoorHook.skewfit /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# EDL
python -m scripts.SawyerDoorHook.edl /tmp/vcrl/ --mode train_vae --gpu_id 0 --snapshot_gap 20 --seed 0
python -m scripts.SawyerDoorHook.edl /tmp/vcrl/ --mode train_policy --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

Reference

@inproceedings{kim2023variational,
  title={Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills},
  author={Kim, Seongun and Lee, Kyowoon and Choi, Jaesik},
  booktitle={International Conference on Machine Learning},
  year={2023},
}

License

This repository is released under the MIT license. See LICENSE for additional details.

Credits

This repository is extended from rlkit. For more details about the coding infrastructure, please refer to rlkit.
The Sawyer environment is adapted from multiworld where the Sawyer MuJoCo models are developed by Vikash Kumar under Apache-2.0 License.

vcrl's People

Contributors

Stargazers

Watchers

vcrl's Issues

TypeError: 'numpy.intc' object is not iterable

When I run the following command, the error occured：
python ./scripts/SawyerDoorHook/vuvc.py ./log --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

This is the output in terminal:
pygame 2.5.2 (SDL 2.28.3, Python 3.6.13)
Hello from the pygame community. https://www.pygame.org/contribute.html
No personal conf_private.py found.
doodad not detected
2024-05-09 19:34:31.871201 ÖÐ¹ú±ê×¼Ê±¼ä | Variant:
2024-05-09 19:34:31.871201 ÖÐ¹ú±ê×¼Ê±¼ä | {
"args": {
"env_id": "SawyerDoorHookResetFreeEnv-v1",
"base_logdir": "./log",
"log_dir": "/{}/vcrl_logs/{}/vuvc/{}/seed_{}",
"render": false,
"use_gpu": true,
"gpu_id": 0,
"snapshot_mode": "gap_and_last",
"snapshot_gap": 20,
"seed": 0,
"spec": "default"
},
"skewfit_variant": {
"exploration_goal_sampling_mode": "custom_goal_sampler",
"evaluation_goal_sampling_mode": "presampled",
"custom_goal_sampler": "replay_buffer",
"presampled_goals_path": "E:\College\RL\project2\multiworld\envs\mujoco\goals\door_goals.npy",
"presample_goals": true
},
"qf_kwargs": {
"hidden_sizes": [
400,
300
]
},
"policy_kwargs": {
"hidden_sizes": [
400,
300
]
},
"vae_kwargs": {
"representation_size": 16,
"decoder_output_activation": "<function identity at 0x000002F0585D1048>",
"decoder_distribution": "gaussian_identity_variance",
"input_channels": 3,
"architecture": {
"conv_args": {
"kernel_sizes": [
5,
3,
3
],
"n_channels": [
16,
32,
64
],
"strides": [
3,
2,
2
]
},
"conv_kwargs": {
"hidden_sizes": [],
"batch_norm_conv": false,
"batch_norm_fc": false
},
"deconv_args": {
"hidden_sizes": [],
"deconv_input_width": 3,
"deconv_input_height": 3,
"deconv_input_channels": 64,
"deconv_output_kernel_size": 6,
"deconv_output_strides": 3,
"deconv_output_channels": 3,
"kernel_sizes": [
3,
3
],
"n_channels": [
32,
16
],
"strides": [
2,
2
]
},
"deconv_kwargs": {
"batch_norm_deconv": false,
"batch_norm_fc": false
}
}
},
"replay_buffer_kwargs": {
"start_skew_epoch": 10,
"max_size": 100000,
"fraction_goals_rollout_goals": 0.2,
"fraction_goals_env_goals": 0.5,
"exploration_rewards_type": "None",
"vae_priority_type": "vae_prob",
"priority_function_kwargs": {
"sampling_method": "importance_sampling",
"decoder_distribution": "gaussian_identity_variance",
"num_latents_to_sample": 10
},
"power": -0.5,
"relabeling_goal_sampling_mode": "custom_goal_sampler",
"disagreement_method": "var"
},
"sac_trainer_kwargs": {
"reward_scale": 1,
"discount": 0.99,
"soft_target_tau": 0.001,
"target_update_period": 1,
"use_automatic_entropy_tuning": true
},
"vae_trainer_kwargs": {
"beta": 20,
"lr": 0.001
},
"image_env_kwargs": {
"imsize": 48,
"init_camera": "<function sawyer_door_env_camera_v0 at 0x000002F0549C4A60>",
"transpose": true,
"normalize": true,
"non_presampled_goal_img_is_garbage": true
},
"vae_wrapped_env_kwargs": {
"sample_from_true_prior": true,
"reward_params": {
"type": "latent_distance"
}
},
"algo_kwargs": {
"batch_size": 1024,
"num_epochs": 170,
"num_eval_steps_per_epoch": 500,
"num_expl_steps_per_train_loop": 500,
"num_trains_per_train_loop": 1000,
"min_num_steps_before_training": 10000,
"vae_training_schedule": "<function custom_schedule at 0x000002F058A0AC80>",
"oracle_data": false,
"vae_save_period": 50,
"parallel_vae_train": false,
"max_path_length": 100
},
"generate_vae_dataset_kwargs": {
"N": 2,
"test_p": 0.9,
"use_cached": true,
"show": false,
"oracle_dataset": false,
"n_random_steps": 1,
"env_id": "SawyerDoorHookResetFreeEnv-v1",
"imsize": 48,
"init_camera": "<function sawyer_door_env_camera_v0 at 0x000002F0549C4A60>",
"non_presampled_goal_img_is_garbage": true
}
}
D:\myDownload\Anaconda3\anaconda3\envs\vcrl\lib\site-packages\gym\envs\registration.py:14: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately.
result = entry_point.load(False)

[SawyerPushAndReachXYEnv] init
{'render.modes': ['human', 'rgb_array'], 'video.frames_per_second': 10}
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
Traceback (most recent call last):
File "./scripts/SawyerDoorHook/vuvc.py", line 182, in
self.reset()
File "E:\College\RL\project2\multiworld\envs\mujoco\sawyer_xyz\sawyer_door_hook.py", line 177, in reset
ob = self.reset_model()
File "E:\College\RL\project2\multiworld\envs\mujoco\sawyer_xyz\sawyer_door_hook.py", line 173, in reset_model
return self._get_obs()
File "E:\College\RL\project2\multiworld\envs\mujoco\sawyer_xyz\sawyer_door_hook.py", line 110, in _get_obs
angle = self.get_door_angle()
File "E:\College\RL\project2\multiworld\envs\mujoco\sawyer_xyz\sawyer_door_hook.py", line 136, in get_door_angle
return np.array([self.data.get_joint_qpos('doorjoint')])
File "mujoco_py\generated/wrappers.pxi", line 2539, in mujoco_py.cymj.PyMjData.get_joint_qpos
TypeError: 'numpy.intc' object is not iterable

I have followed openai/mujoco-py@master...aaronsnoswell:fix-windows-support to fix this problem, and referred to openai/mujoco-py#324, but it didn't work, can anybody help me with that? I would be appreciate it.

Recommend Projects

seongun-kim / vcrl Goto Github PK

vcrl's Introduction

Variational Curriculum Reinforcement Learning

Installation

Usage

General Usage

Examples

Reference

License

Credits

vcrl's People

Contributors

Stargazers

Watchers

vcrl's Issues

TypeError: 'numpy.intc' object is not iterable

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent