Giter Club home page Giter Club logo

vcrl's Introduction

Variational Curriculum Reinforcement Learning

This repository contains the official training and evaluation code for Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills. It provides an implementation of VCRL variants on the Sawyer environment, including the presented method Value Uncertainty Variational Curriculum (VUVC).

Installation

All Python dependencies are listed in environment.yml. To set up the environment, follow these steps:

  1. Install the Anaconda environment by running the following command:
    conda env create -f environment.yml
  2. Activate vcrl environment:
    conda activate vcrl
  3. Install the codebase by running:
    pip install -e .

Usage

General Usage

You can use the following command to run the training and evaluation code:

python -m scripts.METHOD \
    BASE_LOGDIR \
    --gpu_id GPU_ID \
    --snapshot_gap SNAPSHOT_GAP \
    --seed SEED \
    --spec EXP_SPEC

The placeholders should be replaced with the appropriate values:

  • METHOD: Training method. Choose one of the VCRL variants: [her, rig, edl, skewfit, vuvc].
  • BASE_LOGDIR: Sub-directory where the training and evaluation results will be saved, including the policy, replay buffer, and training log.
  • GPU_ID: GPU ID to use.
  • SNAPSHOT_GAP: Save the model every SNAPSHOT_GAP training epochs. The best performing model will be saved as params.pkl.
  • SEED: Random seed. The seeds that we used in the paper range from 0 to 4.
  • EXP_SPEC: Experiment specification. The results will be saved at BASE_LOGDIR/vcrl_logs/ENV_ID/METHOD/EXP_SPEC/SEED/.

By default, hyperparameters used in the paper are defined in the script files for each training method. To test different configurations, you can override them with your own choices.

Training with EDL is in two stages: 1) training a VAE along with a density-based exploration policy and 2) unsupervised training of skills. To specify the training stage, use the --mode flag with the options train_vae or train_policy in the command line for edl.

Examples

Here are some examples of running the code on the SawyerDoorHook environment:

# VUVC
python -m scripts.SawyerDoorHook.vuvc /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# HER
python -m scripts.SawyerDoorHook.her /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# RIG
python -m scripts.SawyerDoorHook.rig /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# Skew-Fit
python -m scripts.SawyerDoorHook.skewfit /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# EDL
python -m scripts.SawyerDoorHook.edl /tmp/vcrl/ --mode train_vae --gpu_id 0 --snapshot_gap 20 --seed 0
python -m scripts.SawyerDoorHook.edl /tmp/vcrl/ --mode train_policy --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

Reference

@inproceedings{kim2023variational,
  title={Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills},
  author={Kim, Seongun and Lee, Kyowoon and Choi, Jaesik},
  booktitle={International Conference on Machine Learning},
  year={2023},
}

License

This repository is released under the MIT license. See LICENSE for additional details.

Credits

This repository is extended from rlkit. For more details about the coding infrastructure, please refer to rlkit.
The Sawyer environment is adapted from multiworld where the Sawyer MuJoCo models are developed by Vikash Kumar under Apache-2.0 License.

vcrl's People

Contributors

seongun-kim avatar leekwoon avatar

Stargazers

 avatar  avatar  avatar Chay avatar Tokarev Igor avatar Dahye avatar  avatar Adam Yanxiao Zhao avatar Yoon, Seungje avatar  avatar

Watchers

 avatar Kostas Georgiou avatar  avatar

vcrl's Issues

TypeError: 'numpy.intc' object is not iterable

When I run the following command, the error occured:
python ./scripts/SawyerDoorHook/vuvc.py ./log --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

This is the output in terminal:
pygame 2.5.2 (SDL 2.28.3, Python 3.6.13)
Hello from the pygame community. https://www.pygame.org/contribute.html
No personal conf_private.py found.
doodad not detected
2024-05-09 19:34:31.871201 Öйú±ê׼ʱ¼ä | Variant:
2024-05-09 19:34:31.871201 Öйú±ê׼ʱ¼ä | {
"args": {
"env_id": "SawyerDoorHookResetFreeEnv-v1",
"base_logdir": "./log",
"log_dir": "/{}/vcrl_logs/{}/vuvc/{}/seed_{}",
"render": false,
"use_gpu": true,
"gpu_id": 0,
"snapshot_mode": "gap_and_last",
"snapshot_gap": 20,
"seed": 0,
"spec": "default"
},
"skewfit_variant": {
"exploration_goal_sampling_mode": "custom_goal_sampler",
"evaluation_goal_sampling_mode": "presampled",
"custom_goal_sampler": "replay_buffer",
"presampled_goals_path": "E:\College\RL\project2\multiworld\envs\mujoco\goals\door_goals.npy",
"presample_goals": true
},
"qf_kwargs": {
"hidden_sizes": [
400,
300
]
},
"policy_kwargs": {
"hidden_sizes": [
400,
300
]
},
"vae_kwargs": {
"representation_size": 16,
"decoder_output_activation": "<function identity at 0x000002F0585D1048>",
"decoder_distribution": "gaussian_identity_variance",
"input_channels": 3,
"architecture": {
"conv_args": {
"kernel_sizes": [
5,
3,
3
],
"n_channels": [
16,
32,
64
],
"strides": [
3,
2,
2
]
},
"conv_kwargs": {
"hidden_sizes": [],
"batch_norm_conv": false,
"batch_norm_fc": false
},
"deconv_args": {
"hidden_sizes": [],
"deconv_input_width": 3,
"deconv_input_height": 3,
"deconv_input_channels": 64,
"deconv_output_kernel_size": 6,
"deconv_output_strides": 3,
"deconv_output_channels": 3,
"kernel_sizes": [
3,
3
],
"n_channels": [
32,
16
],
"strides": [
2,
2
]
},
"deconv_kwargs": {
"batch_norm_deconv": false,
"batch_norm_fc": false
}
}
},
"replay_buffer_kwargs": {
"start_skew_epoch": 10,
"max_size": 100000,
"fraction_goals_rollout_goals": 0.2,
"fraction_goals_env_goals": 0.5,
"exploration_rewards_type": "None",
"vae_priority_type": "vae_prob",
"priority_function_kwargs": {
"sampling_method": "importance_sampling",
"decoder_distribution": "gaussian_identity_variance",
"num_latents_to_sample": 10
},
"power": -0.5,
"relabeling_goal_sampling_mode": "custom_goal_sampler",
"disagreement_method": "var"
},
"sac_trainer_kwargs": {
"reward_scale": 1,
"discount": 0.99,
"soft_target_tau": 0.001,
"target_update_period": 1,
"use_automatic_entropy_tuning": true
},
"vae_trainer_kwargs": {
"beta": 20,
"lr": 0.001
},
"image_env_kwargs": {
"imsize": 48,
"init_camera": "<function sawyer_door_env_camera_v0 at 0x000002F0549C4A60>",
"transpose": true,
"normalize": true,
"non_presampled_goal_img_is_garbage": true
},
"vae_wrapped_env_kwargs": {
"sample_from_true_prior": true,
"reward_params": {
"type": "latent_distance"
}
},
"algo_kwargs": {
"batch_size": 1024,
"num_epochs": 170,
"num_eval_steps_per_epoch": 500,
"num_expl_steps_per_train_loop": 500,
"num_trains_per_train_loop": 1000,
"min_num_steps_before_training": 10000,
"vae_training_schedule": "<function custom_schedule at 0x000002F058A0AC80>",
"oracle_data": false,
"vae_save_period": 50,
"parallel_vae_train": false,
"max_path_length": 100
},
"generate_vae_dataset_kwargs": {
"N": 2,
"test_p": 0.9,
"use_cached": true,
"show": false,
"oracle_dataset": false,
"n_random_steps": 1,
"env_id": "SawyerDoorHookResetFreeEnv-v1",
"imsize": 48,
"init_camera": "<function sawyer_door_env_camera_v0 at 0x000002F0549C4A60>",
"non_presampled_goal_img_is_garbage": true
}
}
D:\myDownload\Anaconda3\anaconda3\envs\vcrl\lib\site-packages\gym\envs\registration.py:14: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately.
result = entry_point.load(False)

[SawyerPushAndReachXYEnv] init
{'render.modes': ['human', 'rgb_array'], 'video.frames_per_second': 10}
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
Traceback (most recent call last):
File "./scripts/SawyerDoorHook/vuvc.py", line 182, in
self.reset()
File "E:\College\RL\project2\multiworld\envs\mujoco\sawyer_xyz\sawyer_door_hook.py", line 177, in reset
ob = self.reset_model()
File "E:\College\RL\project2\multiworld\envs\mujoco\sawyer_xyz\sawyer_door_hook.py", line 173, in reset_model
return self._get_obs()
File "E:\College\RL\project2\multiworld\envs\mujoco\sawyer_xyz\sawyer_door_hook.py", line 110, in _get_obs
angle = self.get_door_angle()
File "E:\College\RL\project2\multiworld\envs\mujoco\sawyer_xyz\sawyer_door_hook.py", line 136, in get_door_angle
return np.array([self.data.get_joint_qpos('doorjoint')])
File "mujoco_py\generated/wrappers.pxi", line 2539, in mujoco_py.cymj.PyMjData.get_joint_qpos
TypeError: 'numpy.intc' object is not iterable

I have followed openai/mujoco-py@master...aaronsnoswell:fix-windows-support to fix this problem, and referred to openai/mujoco-py#324, but it didn't work, can anybody help me with that? I would be appreciate it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.