Giter Club home page Giter Club logo

pku-alignment / omnisafe Goto Github PK

View Code? Open in Web Editor NEW
880.0 38.0 129.0 53.54 MB

OmniSafe is an infrastructural framework for accelerating SafeRL research.

Home Page: https://www.omnisafe.ai

License: Apache License 2.0

Python 99.27% Makefile 0.50% Dockerfile 0.24%
benchmark pytorch safe-reinforcement-learning deep-reinforcement-learning reinforcement-learning machine-learning constraint-rl constraint-satisfaction-problem deep-learning safe-rl

omnisafe's Introduction

Organization PyPI tests Documentation Status Downloads GitHub Repo Stars codestyle License CodeCov Open In Colab

Documentation | Implemented Algorithms | Installation | Getting Started | License


OmniSafe is an infrastructural framework designed to accelerate safe reinforcement learning (RL) research. It provides a comprehensive and reliable benchmark for safe RL algorithms, and also an out-of-box modular toolkit for researchers. SafeRL intends to develop algorithms that minimize the risk of unintended harm or unsafe behavior.

OmniSafe stands as the inaugural unified learning framework in the realm of safe reinforcement learning, aiming to foster the Growth of SafeRL Learning Community. The key features of OmniSafe:

  • Highly Modular Framework. OmniSafe presents a highly modular framework, incorporating an extensive collection of tens of algorithms tailored for safe reinforcement learning across diverse domains. This framework is versatile due to its abstraction of various algorithm types and well-designed API, using the Adapter and Wrapper design components to bridge gaps and enable seamless interactions between different components. This design allows for easy extension and customization, making it a powerful tool for developers working with different types of algorithms.

  • High-performance parallel computing acceleration. By harnessing the capabilities of torch.distributed, OmniSafe accelerates the learning process of algorithms with process parallelism. This enables OmniSafe not only to support environment-level asynchronous parallelism but also incorporates agent asynchronous learning. This methodology bolsters training stability and expedites the training process via the deployment of a parallel exploration mechanism. The integration of agent asynchronous learning in OmniSafe underscores its commitment to providing a versatile and robust platform for advancing SafeRL research.

  • Out-of-box toolkits. OmniSafe offers customizable toolkits for tasks like training, benchmarking, analyzing, and rendering. Tutorials and user-friendly APIs make it easy for beginners and average users, while advanced researchers can enhance their efficiency without complex code.

Train video


Table of Contents


Quick Start

Installation

Prerequisites

OmniSafe requires Python 3.8+ and PyTorch 1.10+.

We support and test for Python 3.8, 3.9, 3.10 on Linux. Meanwhile, we also support M1 and M2 versions of macOS. We will accept PRs related to Windows, but do not officially support it.

Install from source

# Clone the repo
git clone https://github.com/PKU-Alignment/omnisafe.git
cd omnisafe

# Create a conda environment
conda env create --file conda-recipe.yaml
conda activate omnisafe

# Install omnisafe
pip install -e .

Install from PyPI

OmniSafe is hosted in PyPI / Status.

pip install omnisafe

Implemented Algorithms

Latest SafeRL Papers
List of Algorithms On Policy SafeRL Off Policy SafeRL Model-Based SafeRL Offline SafeRL Others

Examples

cd examples
python train_policy.py --algo PPOLag --env-id SafetyPointGoal1-v0 --parallel 1 --total-steps 10000000 --device cpu --vector-env-nums 1 --torch-threads 1

Algorithms Registry

Domains Types Algorithms Registry
On Policy Primal Dual TRPOLag; PPOLag; PDO; RCPO
TRPOPID; CPPOPID
Convex Optimization CPO; PCPO; FOCOPS; CUP
Penalty Function IPO; P3O
Primal OnCRPO
Off Policy Primal-Dual DDPGLag; TD3Lag; SACLag
DDPGPID; TD3PID; SACPID
Model-based Online Plan SafeLOOP; CCEPETS; RCEPETS
Pessimistic Estimate CAPPETS
Offline Q-Learning Based BCQLag; C-CRR
DICE Based COptDICE
Other Formulation MDP ET-MDP PPOEarlyTerminated; TRPOEarlyTerminated
SauteRL PPOSaute; TRPOSaute
SimmerRL PPOSimmerPID; TRPOSimmerPID

Supported Environments

Here is a list of environments that Safety-Gymnasium supports:

Category Task Agent Example
Safe Navigation Goal[012] Point, Car, Racecar, Ant SafetyPointGoal1-v0
Button[012]
Push[012]
Circle[012]
Safe Velocity Velocity HalfCheetah, Hopper, Swimmer, Walker2d, Ant, Humanoid SafetyHumanoidVelocity-v1
Safe Isaac Gym OverSafeFinger ShadowHand ShadowHandOverSafeFinger
OverSafeJoint
CatchOver2UnderarmSafeFinger
CatchOver2UnderarmSafeJoint

For more information about environments, please refer to Safety-Gymnasium.

Customizing your environment

We offer a flexible customized environment interface that allows users to achieve the following without modifying the OmniSafe source code:

  • Use OmniSafe to train algorithms on customized environments.
  • Create the the environment with specified personalized parameters.
  • Complete the recording of environment-specific information in Logger.

We provide step-by-step tutorials on Environment Customization From Scratch and Environment Customization From Community to give you a detailed introduction on how to use this extraordinary feature of OmniSafe.

Note: If you find trouble customizing your environment, please feel free to open an issue or discussion. Pull requests are also welcomed if you're willing to contribute the implementation of your environments interface.

Try with CLI

pip install omnisafe

omnisafe --help  # Ask for help

omnisafe benchmark --help  # The benchmark also can be replaced with 'eval', 'train', 'train-config'

# Quick benchmarking for your research, just specify:
# 1. exp_name
# 2. num_pool(how much processes are concurrent)
# 3. path of the config file (refer to omnisafe/examples/benchmarks for format)

# Here we provide an exampe in ./tests/saved_source.
# And you can set your benchmark_config.yaml by following it
omnisafe benchmark test_benchmark 2 ./tests/saved_source/benchmark_config.yaml

# Quick evaluating and rendering your trained policy, just specify:
# 1. path of algorithm which you trained
omnisafe eval ./tests/saved_source/PPO-{SafetyPointGoal1-v0} --num-episode 1

# Quick training some algorithms to validate your thoughts
# Note: use `key1:key2`, your can select key of hyperparameters which are recursively contained, and use `--custom-cfgs`, you can add custom cfgs via CLI
omnisafe train --algo PPO --total-steps 2048 --vector-env-nums 1 --custom-cfgs algo_cfgs:steps_per_epoch --custom-cfgs 1024

# Quick training some algorithms via a saved config file, the format is as same as default format
omnisafe train-config ./tests/saved_source/train_config.yaml

Getting Started

Important Hints

We have provided benchmark results for various algorithms, including on-policy, off-policy, model-based, and offline approaches, along with parameter tuning analysis. Please refer to the following:

Quickstart: Colab on the Cloud

Explore OmniSafe easily and quickly through a series of Google Colab notebooks:

  • Getting Started Introduce the basic usage of OmniSafe so that users can quickly hand it.
  • CLI Command Introduce how to use the CLI tool of OmniSafe.

We take great pleasure in collaborating with our users to create tutorials in various languages. Please refer to our list of currently supported languages. If you are interested in translating the tutorial into a new language or improving an existing version, kindly submit a PR to us.


Changelog

See CHANGELOG.md.

Citing OmniSafe

If you find OmniSafe useful or use OmniSafe in your research, please cite it in your publications.

@article{omnisafe,
  title   = {OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research},
  author  = {Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang},
  journal = {arXiv preprint arXiv:2305.09304},
  year    = {2023}
}

Publications using OmniSafe

We have compiled a list of papers that use OmniSafe for algorithm implementation or experimentation. If you are willing to include your work in this list, or if you wish to have your implementation officially integrated into OmniSafe, please feel free to contact us.

Papers Publisher
Off-Policy Primal-Dual Safe Reinforcement Learning ICLR 2024
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model ICLR 2024
Iterative Reachability Estimation for Safe Reinforcement Learning NeurIPS 2023
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation AAAI 2024
Learning Safety Constraints From Demonstration Using One-Class Decision Trees AAAI 2024 WorkShops

The OmniSafe Team

OmniSafe is mainly developed by the SafeRL research team directed by Prof. Yaodong Yang. Our SafeRL research team members include Borong Zhang, Jiayi Zhou, JTao Dai, Weidong Huang, Ruiyang Sun, Xuehai Pan and Jiaming Ji. If you have any questions in the process of using OmniSafe, don't hesitate to ask your questions on the GitHub issue page, we will reply to you in 2-3 working days.

License

OmniSafe is released under Apache License 2.0.

omnisafe's People

Contributors

1asan avatar dtch1997 avatar dtrc2207 avatar erjanmx avatar gaiejj avatar hdadong avatar mickelliu avatar muchvo avatar pre-commit-ci[bot] avatar r-y1 avatar rockmagma02 avatar xuehaipan avatar xujinming01 avatar zmsn-2077 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

omnisafe's Issues

[BUG] Can't we specify `standardized_rew_adv` and `standardized_cost_adv` at the same time?

Required prerequisites

What version of OmniSafe are you using?

0.1.0

System information

no need

Problem description

Can't we specify standardized_rew_adv and standardized_cost_adv at the same time?
I guess it is a bug.
b6c6d67014cc5a8e0d9ff485a101ca5
https://github.com/PKU-MARL/omnisafe/blob/main/omnisafe/utils/config.py#L216

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] Having trouble in replicating the performance of PPO-Lag

Required prerequisites

Questions

I trained a policy with PPO-Lag yesterday, and got a terrible result. The learned policy barely navigate the robot to the given goal. After 300 epoches, the EpRet/Mean is nearly zero and the EpCost/Mean is still over 25.0, the given limitation on epcost. I used the given default parameters of PPO-Lag and started the training process in the terminal with the command of "python train_policy.py --env-id SafetyPointGoal1-v0 --algo PPOLag --parallel 4". The whole training process can be found in the following figure.
PPO-Lag

Besides, a similar situation has emerged in several other algorithms, including CPO, IPO, RCPO, and CUP.

Thanks for your reading and help.

[BUG] Problems encountered during installation

Required prerequisites

Motivation

Well, I try to install omnisafe library in a docker, which is equipped with Python 3.8+ and PyTorch 1.10+. Installing safety-gymnasium is smooth but a problem happens when installing omnisafe using command "pip install -e .". The problem is shown below:
20cf190954d2a89e195a01d562f09d3
This is caused by that enum34 conflicts with python 3.8. But just uninstalling this library is not enough as executing "pip install -e ." will result in another problem, which is shown below:
54dc270f631c6945cacc5835306425f

Solution

To solve these problems, I try to use "pip install setuptools==59.5.0" and then "pip uninstall enum34". In this way, omnisafe can be installed successfully.
In a word, just reduce the version of setuptools can solve this problem.

Alternatives

No response

Additional context

No response

[BUG] Having trouble running `pip install -e .`

Required prerequisites

What version of OmniSafe are you using?

0.0.2

System information

3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0] linux
0.0.2

Problem description

When I was running the pip install -e . command, I encountered the problem shown in the figure below:
IMG_9362
IMG_9363
But unexpectedly, when I re-run pip install -e . command, the omnisafe is successfully installed and example train_policy.py run.
I'm guessing it might be a configuration issue with my computer itself, but I also think it's a potential bug. what do you think?

Reproducible example code

	pip install -e .

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] Something wrong of PPO-Lag in SafetySwimmerVelocity-v4

Required prerequisites

What version of OmniSafe are you using?

0.1.1

System information

3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0] linux
0.1.1

Problem description

When I train PPO-Lag in SafetySwimmerVelocity-v4, like

cd examples
python train_from_custom_dict.py

where custom_dict is

import omnisafe


env_id = 'SafetySwimmerVelocity-v4'
custom_cfgs = {
    'train_cfgs': {
        'total_steps': 2048,
        'vector_env_nums': 2,
        'parallel': 1,
    },
    'algo_cfgs': {
        'update_cycle': 1024,
        'update_iters': 1,
    },
    'logger_cfgs': {
        'use_wandb': False,
    },
}

agent = omnisafe.Agent('PPOLag', env_id, custom_cfgs=custom_cfgs)
agent.learn()

I encountered NAN problem, like:

ValueError: Expected parameter loc (Tensor of shape (64, 2)) of distribution Normal(loc: torch.Size([64, 2]), scale: torch.Size([64, 2])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan]], grad_fn=<AddmmBackward0>)

How can I get rid of it?

Reproducible example code

import omnisafe


env_id = 'SafetySwimmerVelocity-v4'
custom_cfgs = {
    'train_cfgs': {
        'total_steps': 2048,
        'vector_env_nums': 2,
        'parallel': 1,
    },
    'algo_cfgs': {
        'update_cycle': 1024,
        'update_iters': 1,
    },
    'logger_cfgs': {
        'use_wandb': False,
    },
}

agent = omnisafe.Agent('PPOLag', env_id, custom_cfgs=custom_cfgs)
agent.learn()

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] nan when running in command line

Required prerequisites

What version of OmniSafe are you using?

0.2.2

System information

ubuntu20.04
0.2.2

Problem description

When I try to use omnisafe via the command line, I found there is some wrong, make nan in tensor.

Reproducible example code

The Python snippets:

Command lines:

omnisafe train --algo PPOLag --env-id SafetySwimmerVelocity-v4 --total-steps 1024 --custom-cfgs algo_cfgs:update_cycle --custom-cfgs 512

Extra dependencies:


Steps to reproduce:

  1. run the command provided above

Traceback

mean = tensor([[nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan]], grad_fn=<AddmmBackward0>)

Expected behavior

No response

Additional context

No response

[Feature Request] Gratitude & When is focops updated?

Motivation

Thank you very much for your informative documentation. As a beginner, I have learned a lot from this. I have just started my research on SafeRL, and my mentor has asked me to research CPO, PCPO, FOCOPS, etc. I found that the CPO of omnisafe's document is very detailed. When will focops be updated?

image

image

Hope for your reply.

Checklist

  • I have checked that there is no similar issue in the repo. (required)

[Feature Request] LAMBDA

Required prerequisites

Motivation

I'm the author of LAMBDA, I can contribute my implementation, how should I proceed?

Solution

Implementation of LAMBDA, in pytorch (if I'm not mistaken). I also have a TF and JAX implementations.

Alternatives

No response

Additional context

No response

[BUG] the logger bug in Experiment Grid

Required prerequisites

What version of OmniSafe are you using?

0.1.0

System information

0.1.0, linux ubuntu 20.04

Problem description

I noticed that one of the very nice features of omnisafe is the experiment grid, which can run a very large number of experiments in large batches.
But I found that logger is distinguishing different experiment folders by current timestamp as a different variable, this is not a problem in case of no parallelism, but I found that in experement grid parallelism, different algorithms may be created at the same time.
Then there will be a potential bug of different algorithms logging to a folder at the same time, as evidenced by the following.

image

image

I think this potential bug may have something to do with issue #140 as well.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Feature Request] Cuda support

Required prerequisites

Motivation

What a nice repo !
However, I notice that omnisafe currently only support CPU. It seems that omnisafe haven't implemented cuda support yet.
Although training a small model (hidden layer size 64) on CPU is already very fast, I want to use a larger network for more complex tasks, which requires cuda support. When will you be able to provide this support?

Solution

No response

Alternatives

No response

Additional context

No response

[BUG] When I use the experiment grid in GPU, exps with the same file are saved separately

Required prerequisites

What version of OmniSafe are you using?

0.2.2

System information

ubuntu20.04
0.2.2

Problem description

When I use the experiment grid in GPU, exps with the same file are saved separately. It looks like because they are running in different GPU.

Reproducible example code

The Python snippets:

    eg = ExperimentGrid(exp_name='sg')

    # Set the algorithms.
    base_policy = ['PPO', 'PolicyGradient', 'P3O', 'PPOLag', 'FOCOPS', 'CUP']

    sg_envs = [
        'SafetyPointGoal0-v0',
        'SafetyPointGoal1-v0',
        'SafetyPointGoal2-v0',
        'SafetyPointButton0-v0',
        'SafetyPointButton1-v0',
        'SafetyPointButton2-v0',
        'SafetyPointCircle0-v0',
        'SafetyPointCircle1-v0',
        'SafetyPointCircle2-v0',

        'SafetyCarGoal0-v0',
        'SafetyCarGoal1-v0',
        'SafetyCarGoal2-v0',
        'SafetyCarButton0-v0',
        'SafetyCarButton1-v0',
        'SafetyCarButton2-v0',
        'SafetyCarCircle0-v0',
        'SafetyCarCircle1-v0',
        'SafetyCarCircle2-v0',
    ]
    eg.add('env_id', sg_envs)

    # Set the device.
    avaliable_gpus = [num for num in range(torch.cuda.device_count())]
    gpu_id = [0, 1, 2, 3, 4, 5, 6, 7]
    # if you want to use CPU, please set gpu_id = None
    # gpu_id = None

    if set(gpu_id) > set(avaliable_gpus):
        warnings.warn('The GPU ID is not available, use CPU instead.')
        gpu_id = None

    eg.add('algo', base_policy)
    eg.add('logger_cfgs:use_wandb', [False])
    eg.add('train_cfgs:vector_env_nums', [32])
    eg.add('train_cfgs:torch_threads', [1])
    eg.add('algo_cfgs:cost_normalize', [False])
    eg.add('algo_cfgs:reward_normalize', [False])
    eg.add('algo_cfgs:obs_normalize', [True])
    eg.add('algo_cfgs:update_cycle', [32768])
    eg.add('train_cfgs:total_steps', [32768 * 500])
    eg.add('seed', [0, 5, 10])
    # total experiment num must can be divided by num_pool
    # meanwhile, users should decide this value according to their machine
    eg.run(train, num_pool=81, gpu_id=gpu_id)

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

if exps are only different in seeds, I think they should be saved in same folder.

Additional context

No response

[Question] Aren't your normalization functions doing standardization?

Required prerequisites

Questions

I mean, this is normalization:
image
and, this is standardization:
image

this is an example of your code:

    def normalize(self, data: torch.Tensor) -> torch.Tensor:
        """Normalize the _data."""
        data = data.to(self._mean.device)
        self._push(data)
        if self._count <= 1:
            return data
        output = (data - self._mean) / self._std
        return torch.clamp(output, -self._clip, self._clip)

[Question] implementation, training, and performance of p3o.

Required prerequisites

Questions

Hi, I still fail to achieve a success training of p3o, with the latest implementation at PR#112.

A possible reason is that the implementation of P3O is inconsistent with the paper.

Currently, line 65 of p3o.py is
"loss_pi_c = self.cfgs.kappa * F.relu(surr_cadv + * Jc)",
which should have been
"loss_pi_c = self.cfgs.kappa * F.relu(surr_cadv + (1 - self.cfgs.cost_gamma) * Jc)".

Besides, I also test p3o in another environment (bullet_safety_gym safety-point-reach-v0), but it seem a little conservative.
The given cost_limit is 10. Some other algorithms, including PPO-Lag, TRPO-Lag, achieve a better performance than P3O (with cost of about 10.0 and return over 15).
Screenshot from 2023-02-14 14-35-25

[BUG] Errors about CLI

Required prerequisites

What version of OmniSafe are you using?

0.2.2

System information

3.9.13 (main, Aug 25 2022, 18:29:29)
[Clang 12.0.0 ] darwin
0.2.2

Problem description

image

When use omnisafe benchmark --help, i run the example command provided by omnisafe, this bug happens.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] dependencies confliction in version

Required prerequisites

Questions

While I install omnisafe from PyPI using pip install omnisafe, there are some dependency conflicts.
Does it matter?

ERROR: pip's dependency resolver does not currently take into
account
al1 the packaces
that are
installed. This behaviour is the source of the following dep
pandas-profiling 3.2.0 requires joblib-=1.1.0, but you have joblib 1.2.0 which is incompatible.

[Question] Parallel running

Required prerequisites

Questions

I have completed the installation of omnisafe and successfully run the example code. To make the experiment faster, I try to adjust the ''parallel'' parameter, I find that the program can run when parallel=2, which is shown below:
df8978a39c84249f0a39418df7195d0
However, when this parameter is set ≥3, it seems that the code can not be executed. The error is shown below:
a5ed431bd5a67be40c2dee2da01c49e

Maybe this problem is caused by that my device does not have enough cores to use, but I'm not sure. In this way, I put forward this problem and hope the developers can check the reason of this issue.

[Feature Request] Will support Sauté RL(ICML 2022)?

Motivation

I was recently reading ICML 2022: Sauté RL: Almost Surely Safe Reinforcement Learning
Using State Augmentation, which is also a kind of safe reinforcement learning, but I found that there is no code published in this article, will omnisafe add the code of this article subsequently?

Sauté RL: https://proceedings.mlr.press/v162/sootla22a/sootla22a.pdf

Hope for your reply.

image

Checklist

  • I have checked that there is no similar issue in the repo. (required)

[Question] Training Data Visualization with tensorboard?

Questions

I found that omnisafe records a lot of data from training, some of which are particularly useful for tuning references, but some of the recorded data I did not understand, for example,

  1. FPS, I run ppo in omnisafe, in environment SafetyPointGoal1-v0, and the following graph appears. Can you please explain what FPS means?
    image

  2. I ran TRPOLag at the same time, and I found that the tensorboard recorded the following data. What is the meaning of these losses? Is the smaller, the better?
    image

Checklist

  • I have checked that there is no similar issue in the repo. (required)
  • I have read the documentation. (required)

[Question] Cost Loss and Update for SAC Lagrangian

Required prerequisites

Questions

  1. Why does the backup for the cost critic loss assign data['rew'] instead of data['cost'] to cost? Wouldn't this update result in a cost critic identical to the standard value critic?

  2. The initial update for the Lagrange multiplier uses Jc = data['cost'].sum().item(). However, the update_lagrange_multiplier method uses Jc to compute the lambda loss which has function signature: def compute_lambda_loss(self, mean_ep_cost): Shouldn't Jc be defined as Jc = data['cost'].mean().item() if it's the mean_ep_cost?

[BUG] Nice Repo! But when use CUP algorithm's configs doesn't have lagrangian_upper_bound

Required prerequisites

What version of OmniSafe are you using?

0.0.2

System information

0.0.2

Problem description

In this file https://github.com/PKU-MARL/omnisafe/blob/dev/omnisafe/algorithms/on_policy/cup.py#L57.
The codes are,

        Lagrange.__init__(
            self,
            cost_limit=self.cfgs.lagrange_cfgs.cost_limit,
            lagrangian_multiplier_init=self.cfgs.lagrange_cfgs.lagrangian_multiplier_init,
            lambda_lr=self.cfgs.lagrange_cfgs.lambda_lr,
            lambda_optimizer=self.cfgs.lagrange_cfgs.lambda_optimizer,
            lagrangian_upper_bound=self.cfgs.lagrange_cfgs.lagrangian_upper_bound,
        )

But the CUP's configs yaml files doesn't have lagrangian_upper_bound in lagrange_cfgs.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] pylint: error: argument --spelling-dict

Required prerequisites

What version of OmniSafe are you using?

0.0.1

System information

3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0] linux
0.0.1

Problem description

When I use pre-commit run --all-files, the following error appears:

usage: pylint [options]
pylint: error: argument --spelling-dict: invalid choice: 'en_US' (choose from '')

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] How many types of cameras are available and what are their names?

Required prerequisites

Questions

What an amazing repo! It exports demos that I really like. But when I was using omnisafe's evaluate_saved_policy.py , I found that the default value of the parameter camera_name is track. Are there other types of cameras that can be used and what are their names?
截屏2023-01-09 23 02 09

[Question] how can I run agent in omnisafe but using safety_gymnasium env

Required prerequisites

Questions

I want to run a omnisafe with a safety_gymnasium env or some other env, how can i train it and eval it.
`import safety_gymnasium
import omnisafe

if name == 'main':
env = safety_gymnasium.make("SafetyCarPush2-v0")
agent = omnisafe.Agent('PPOLag', env)
agent.learn()
obs, info = env.reset()
ep_reward, ep_cost = 0, 0
for i in range(1000):
action, _states = agent.predict(obs, deterministic=True)
obs, reward, cost, done, _, info = env.step(action)
ep_reward += reward
ep_cost += cost
env.render()
if done:
print(ep_reward, ep_cost)
obs, info = env.reset()
ep_reward, ep_cost = 0, 0
env.close()`

like this.

[BUG] Is there something wrong in AutoResetWrapper?

Required prerequisites

What version of OmniSafe are you using?

0.1.0

System information

no need

Problem description

I noticed that you design wrappers to unify various environments, it is a pretty good design. But when I was trying to use autoreset wrapper, I found that it seems to be uncompleted. If you needed, I can try to fix this.

class AutoReset(Wrapper):
    """Auto reset the environment when the episode is terminated.

    Example:
        >>> env = AutoReset(env)

    """

    def __init__(self, env: CMDP) -> None:
        super().__init__(env)

        assert self.num_envs == 1, 'AutoReset only supports single environment'

    def step(
        self, action: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, Dict]:
        obs, reward, cost, terminated, truncated, info = super().step(action)

        if terminated or truncated:
            info['last_episode_obs'], _ = self.reset()
            obs, _ = self.reset()

        return obs, reward, cost, terminated, truncated, info

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Feature Request] More details about how to DIY users' local environments and modify default configs of environments in documentation and README.

Required prerequisites

Motivation

Consider add more details about how to DIY local environments and modify default configs of environments in documentation and README.

Solution

No response

Alternatives

No response

Additional context

Just like this in safety-gym:
0193cc58c4f0d2b0682f81d57e28926

[BUG] When I enable `smooth` in statistics tools, costs are skipped.

Required prerequisites

What version of OmniSafe are you using?

0.3.0

System information

ubuntu 20.04
0.3.0

Problem description

When I enable smooth in statistics tools, costs are skipped.
Before enabling smooth:
before
After enabling smooth:
after

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

1.plot without smooth
2.enable smooth and plot
3.

Traceback

No response

Expected behavior

Smooth in all graphs

Additional context

No response

[Question] Importance!!!! Do you have the performance of the CUP(NeurIPS 2022) algorithm on safety gym?

Required prerequisites

Questions

I see that there is already CUP related code in the dev branch, but I can't run the algorithm at the moment, do you have the performance of the CUP algorithm on safety gym?

[Question] Why you use the raw action in off-policy algorithms to update actor-critic

Required prerequisites

Questions

Why you use the raw action in off-policy algorithms to update actor-critic?
Shouldn't it be updated using the action after scaling?
https://github.com/PKU-MARL/omnisafe/blob/main/omnisafe/algorithms/off_policy/sac.py#L91

[BUG] Something wrong with tutorial

Required prerequisites

What version of OmniSafe are you using?

0.3.0

System information

ubuntu 20.04, 0.3.0

Problem description

image

when i use tutorial jupyter notebook local in pc, I found some error in default settings of usage.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] How can I run off-policy algorithms on main branch

Required prerequisites

Questions

When I try

cd examples
python train_policy.py --algo SAC

I encountered problems below:
image

In #153 we set the default total-step=3276800 and vector-env-nums=16 , which not match the off-policy algorithms.

[BUG] Standardize advantage twice in on-policy buffer and vector-on-policy buffer

Required prerequisites

What version of OmniSafe are you using?

0.1.1

System information

No need

Problem description

I found that the advantage will be standardized twice in the on-policy buffer and vector-on-policy buffer. I believe this is a bug.
See below:

    def get(self) -> Dict[str, torch.Tensor]:
        """Get the data in the buffer."""
        self.ptr, self.path_start_idx = 0, 0

        data = {
            'obs': self.data['obs'],
            'act': self.data['act'],
            'target_value_r': self.data['target_value_r'],
            'adv_r': self.data['adv_r'],
            'logp': self.data['logp'],
            'discounted_ret': self.data['discounted_ret'],
            'adv_c': self.data['adv_c'],
            'target_value_c': self.data['target_value_c'],
        }

        # self.data['adv_r'] = torch.zeros_like(self.data['adv_r'])
        # self.data['adv_c'] = torch.zeros_like(self.data['adv_c'])

        adv_mean, adv_std, *_ = distributed.dist_statistics_scalar(data['adv_r'])
        cadv_mean, *_ = distributed.dist_statistics_scalar(data['adv_c'])
        if self._standardized_adv_r:
            data['adv_r'] = (data['adv_r'] - adv_mean) / (adv_std + 1e-8)
        if self._standardized_adv_c:
            data['adv_c'] = data['adv_c'] - cadv_mean

        return data
    def get(self) -> Dict[str, torch.Tensor]:
        """Get the data from the buffer."""
        data_pre = {k: [v] for k, v in self.buffers[0].get().items()}
        for buffer in self.buffers[1:]:
            for k, v in buffer.get().items():
                data_pre[k].append(v)
        data = {k: torch.cat(v, dim=0) for k, v in data_pre.items()}

        adv_mean, adv_std, *_ = distributed.dist_statistics_scalar(data['adv_r'])
        cadv_mean, *_ = distributed.dist_statistics_scalar(data['adv_c'])
        if self._standardized_adv_r:
            data['adv_r'] = (data['adv_r'] - adv_mean) / (adv_std + 1e-8)
        if self._standardized_adv_c:
            data['adv_c'] = data['adv_c'] - cadv_mean

        return data

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] Some questions about the cpo documentation.

Required prerequisites

What version of OmniSafe are you using?

python3 -m pip show omnisafe

System information

Describe the characteristic of your environment:

  • Describe how the library was installed (pip, conda, source, ...)
  • Python version
  • Versions of any other relevant libraries
import sys, omnisafe
print(sys.version, sys.platform)
print(omnisafe.__version__)

Problem description

There are some parts that don't seem to be quite right and have problems with the rendering.

image

image

Reproducible example code

The Python snippets:

Run the snippets with the following commands:

Extra dependencies:


Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] Why does pytest exit with errors?

Required prerequisites

Questions

Hi! I am trying to contribute to omnisafe with code implemented by myself. But when I make test to check my code, these errors appear.
6d4d02d0ef4a6a73abc407eb66e0a1a
It seems that pytest can not recognize the syntax of python and import modules. And I noticed that it uses python 2.7.
Any help will be appreciated!

[Question] How to use the vision-based Safety Gymnasium in a headless server

Hi, I've recently come across the following error, on a machine with Nvidia driver version: 515.76, CUDA Version: 11.7, when trying to use the vision-based Safety Gymnasium in a headless Ubuntu 20.04 remote server. The same exact code was running properly on a machine with display.

Any idea how to fix the issue?

The program I run is as follows:

import argparse
import os

# import gymnasium
import safety_gymnasium
from gymnasium.utils.save_video import save_video


WORKDIR = os.path.abspath('.')
DIR = os.path.join(WORKDIR, 'omnisafe/envs/safety-gymnasium/examples', 'cached_test_vision_video')


def run_random(env_name):
    env = safety_gymnasium.make(env_name)
    # env.seed(0)
    obs, _ = env.reset()
    terminled = False
    ep_ret = 0
    ep_cost = 0
    render_list = []
    for i in range(1001):
        if terminled:
            print('Episode Return: %.3f \t Episode Cost: %.3f' % (ep_ret, ep_cost))
            ep_ret, ep_cost = 0, 0
            obs, _ = env.reset()
            save_video(
                frames=render_list,
                video_folder=DIR,
                name_prefix=f'test_vision_output',
                fps=30,
            )
            render_list = []
        assert env.observation_space.contains(obs)
        act = env.action_space.sample()
        assert env.action_space.contains(act)
        # Use the environment's built_in max_episode_steps
        if hasattr(env, '_max_episode_steps'):
            max_ep_len = env._max_episode_steps
        render_list.append(obs['vision'])
        obs, reward, cost, terminled, truncated, info = env.step(act)

        ep_ret += reward
        ep_cost += cost


if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('--env', default='SafetyCarGoal0Vision-v0')
    args = parser.parse_args()
    run_random(args.env)

which produced errors:

(mbppo) $ python safety_gym_v2_vision.py 
/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/utils/passive_env_checker.py:49: UserWarning: WARN: A Box observation space has an unconventional shape (neither an image, nor a 1D vector). We recommend flattening the observation to have only a 1D vector or use a custom policy to properly process the data. Actual observation shape: (3, 3)
  logger.warn(
/home/weidong/anaconda3/envs/mbppo/lib/python3.8/site-packages/glfw/__init__.py:912: GLFWError: (65544) b'X11: The DISPLAY environment variable is missing'
  warnings.warn(message, GLFWError)
/home/weidong/anaconda3/envs/mbppo/lib/python3.8/site-packages/glfw/__init__.py:912: GLFWError: (65537) b'The GLFW library is not initialized'
  warnings.warn(message, GLFWError)
Traceback (most recent call last):
  File "safety_gym_v2_vision.py", line 66, in <module>
    run_random(args.env)
  File "safety_gym_v2_vision.py", line 31, in run_random
    obs, _ = env.reset()
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/wrappers/order_enforcing.py", line 57, in reset
    return self.env.reset(**kwargs)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/wrappers/env_checker.py", line 60, in reset
    return env_reset_passive_checker(self.env, **kwargs)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/utils/passive_env_checker.py", line 214, in env_reset_passive_checker
    result = env.reset(**kwargs)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/builder.py", line 183, in reset
    return (self.task.obs(), info)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/tasks/goal/goal_level0.py", line 214, in obs
    obs['vision'] = self.obs_vision()
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/base_task.py", line 375, in obs_vision
    vision = self.engine.render(width, height, mode='rgb_array', camera_name='vision', cost={})
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/engine.py", line 339, in render
    self._get_viewer(mode).render(camera_id=camera_id)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/engine.py", line 453, in _get_viewer
    self.viewer = RenderContextOffscreen(self.model, self.data)
  File "/home/weidong/anaconda3/envs/mbppo/lib/python3.8/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 232, in __init__
    super().__init__(model, data, offscreen=True)
  File "/home/weidong/anaconda3/envs/mbppo/lib/python3.8/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 57, in __init__
    self.con = mujoco.MjrContext(self.model, mujoco.mjtFontScale.mjFONTSCALE_150)
mujoco.FatalError: gladLoadGL error

[BUG] Omnisafe cannot run on a Mac M1 device

Required prerequisites

What version of OmniSafe are you using?

0.2.2

System information

3.9.13 (main, Aug 25 2022, 18:29:29)
[Clang 12.0.0 ] darwin
0.2.2

Problem description

Admittedly, this is a very nice library. However, Omnisafe has the following bug when i run the following command omnisafe train --algo PPO --total-steps 1024 --vector-env-nums 1 --custom-cfgs algo_cfgs:update_cycle --custom-cfgs 512 --device cpu.
image
Note: My device is MacBook Pro M1 Pro.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] Why can I make Gymnasium environments when using Safety-Gymnasium?

Required prerequisites

What version of OmniSafe are you using?

0.01

System information

3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0] linux
0.0.1

Problem description

When I test safety-gymnasium, I wrongly input 'Humanoid-v4' as env_id, but it works and something goes run.

Reproducible example code

The Python snippets:

import argparse

import safety_gymnasium


def run_random(env_name):
    """Random run."""
    env = safety_gymnasium.make(env_name, render_mode='human')
    obs, _ = env.reset()
    # Use below to specify seed.
    # obs, _ = env.reset(seed=0)
    terminated, truncated = False, False
    ep_ret, ep_cost = 0, 0
    while True:
        if terminated or truncated:
            print(f'Episode Return: {ep_ret} \t Episode Cost: {ep_cost}')
            ep_ret, ep_cost = 0, 0
            obs, _ = env.reset()
        assert env.observation_space.contains(obs)
        act = env.action_space.sample()
        assert env.action_space.contains(act)
        # Use the environment's built_in max_episode_steps
        if hasattr(env, '_max_episode_steps'):  # pylint: disable=unused-variable
            max_ep_len = env._max_episode_steps  # pylint: disable=unused-variable,protected-access
        # pylint: disable-next=unused-variable
        obs, reward, cost, terminated, truncated, info = env.step(act)

        ep_ret += reward
        ep_cost += cost


if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('--env', default='Humanoid-v4')
    args = parser.parse_args()
    run_random(args.env)

Command lines:

Extra dependencies:


Steps to reproduce:

  1. run code above.

Traceback

Traceback (most recent call last):
  File "~/omnisafe/envs/safety_gymnasium/examples/env.py", line 53, in <module>
    run_random(args.env)
  File "~/omnisafe/envs/safety_gymnasium/examples/env.py", line 42, in run_random
    obs, reward, cost, terminated, truncated, info = env.step(act)
  File "~/omnisafe/envs/safety-gymnasium/safety_gymnasium/wrappers/time_limit.py", line 45, in step
    observation, reward, cost, terminated, truncated, info = self.env.step(action)
ValueError: not enough values to unpack (expected 6, got 5)

Expected behavior

No response

Additional context

No response

[Feature Request] OmniSafe will support PyTorch 2.0

Required prerequisites

Motivation

The PyTorch team has released PyTorch 2.0: Our next-generation release that is faster, more Pythonic and Dynamic as ever. More detail can be referred to pytorch-2.0-release-blog.
We will support PyTorch 2.0 so that developers can use the latest features of PyTorch 2.0 in the framework of OmniSafe.

Solution

No response

Alternatives

No response

Additional context

No response

[Question] Something wrong when I check the progress.txt

Required prerequisites

Questions

When I trained TRPO on SafetyHumanoidVelocity-v4, I found that progress.txt recorded multiple lines of data in the same line. This makes it difficult for me to draw the training curve locally.
image

[BUG] model 'distutils' has no attribute 'version'

Required prerequisites

What version of OmniSafe are you using?

0.0.1

System information

3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0] linux
0.0.1

Problem description

image

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] A bug in on-policy adapter with autoreset mechanism

Required prerequisites

What version of OmniSafe are you using?

0.1.1

System information

3.8.16 (default, Mar 2 2023, 03:21:46)
[GCC 11.2.0] linux
0.1.1

Problem description

In onpolicy_adapter.py, the end of the episodes are handled like this:

            obs = next_obs
            epoch_end = step >= steps_per_epoch - 1
            for idx, (done, time_out) in enumerate(zip(terminated, truncated)):
                if epoch_end or done or time_out:
                    if (epoch_end or time_out) and not done:
                        if epoch_end:
                            logger.log(
                                f'Warning: trajectory cut off when rollout by epoch at {self._ep_len[idx]} steps.'
                            )
                        _, last_value_r, last_value_c, _ = agent.step(obs[idx])
                        last_value_r = last_value_r.unsqueeze(0)
                        last_value_c = last_value_c.unsqueeze(0)
                    elif done:
                        last_value_r = torch.zeros(1)
                        last_value_c = torch.zeros(1)

                    if done or time_out:
                        self._log_metrics(logger, idx)
                        self._reset_log(idx)

                        self._ep_ret[idx] = 0.0
                        self._ep_cost[idx] = 0.0
                        self._ep_len[idx] = 0.0

                    buffer.finish_path(last_value_r, last_value_c, idx)

while in safety-gymnasium, when the episode end, it will auto reset immediately and carry the last state by info. For example, I think when 'time_out==True, epoch_end==False, done==False', the value of the last state is calculated from the first observation of next episode.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] AutoReset error while run single environment in omnisafe

Required prerequisites

What version of OmniSafe are you using?

0.1.0

System information

0.1.0 linux ubuntu 20.04

Problem description

When I parallel an environment, I found that the environment would report errors. I checked the specific reason for the error. When the number of environments was 1, it called AutoResetWrapper of the environment, but safety-gymnasium==0.1.0, AutoResetWrapper did not return the information related to cost. I found that the maintainer of safety-gymnasium has been fixed in https://github.com/PKU-MARL/safety-gymnasium/pull/23/files, but has not updated the latest code version in pypi, But omnisafe depends on safety-gymnasium: 0.1.0. Could you tell me how to solve it here?

issue pictures:
image
image
image

safety-gymnasium fixd code in https://github.com/PKU-MARL/safety-gymnasium/pull/23/files:
image

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] wandb.sdk.lib.config_util.ConfigError when using exp_grid

Required prerequisites

Questions

First, Thanks for this well-designed library, it is elegant and reliable.
Yesterday, When I start a set of experiments, I found a strange ERROR, I am not sure whether it is a bug.
This is my core configuration for running, the other part is as same as your example in omnisafe/examples/benchmarks:

    eg = ExperimentGrid(exp_name='Test')
    eg.add('algo', ['PPOLag'])
    eg.add('env_id', ['SafetyPointGoal0-v0', 'SafetyPointGoal1-v0', 'SafetyPointGoal2-v0', 'SafetyAntVelocity-v4'])
    eg.add('epochs', 20)
    # eg.add('actor_lr', [0.001, 0.003, 0.004], 'lr', True)
    # eg.add('actor_iters', [1, 2], 'ac_iters', True)
    eg.add('wandb_project', 'test')
    eg.add('num_envs', [1, 2, 4, 8, 16, 32])
    # eg.add('seed', [0, 5, 10])
    eg.run(train, num_pool=10)

This is the information which wandb throw:
image

[BUG]'mujoco.structs.MjModelGeomViews' object has no attribute 'name

Describe the bug

'mujoco.structs.MjModelGeomViews' object has no attribute 'name'

When I run examples/vis_safety_gymnasium.py, the bugs as follows,

"/home/saferl/Documents/github/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/world.py", line 435, in init
    self.geom_names = [
  File "/home/saferl/Documents/github/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/world.py", line 438, in <listcomp>
    if self.model.geom(i).name != 'floor'
AttributeError: 'mujoco.structs.MjModelGeomViews' object has no attribute 'name'

My virtual environment depends on

Gymnasium           0.26.3
mujoco                  2.2.0
numpy                   1.23.5
torch                      1.10.0+cu111
torchaudio             0.10.0+rocm4.1
torchvision             0.11.0+cu111

Checklist

  • I have checked that there is no similar issue in the repo. (required)
  • I have read the documentation. (required)
  • I have provided a minimal working example to reproduce the bug. (required)

[BUG] progress.csv saved incorrectly

Required prerequisites

What version of OmniSafe are you using?

0.3.0

System information

ubuntu 20.04
0.3.0

Problem description

After I ran some experiments, I tried to analyze these results via `StatisticsTools class. There is something that went wrong in the .csv file. And I am sure the processes end with no error.

  File "~/anaconda3/envs/test_vel/lib/python3.8/site-packages/matplotlib/_api/__init__.py", line 93, in check_isinstance
    raise TypeError(
TypeError: 'value' must be an instance of str or bytes, not a float

Reproducible example code

The Python snippets:

    eg = ExperimentGrid(exp_name='benchmark')

    # Set the algorithms.
    base_policy = ['PPO', 'PolicyGradient', 'P3O', 'PPOLag', 'FOCOPS', 'CUP', 'NaturalPG', 'TRPO']
    first_order_policy = ['CUP', 'FOCOPS', 'TRPOLag']
    second_order_policy = ['CPO', 'PCPO', 'RCPO']

    # Set the environments.
    mujoco_envs = [
        # 'SafetyAntVelocity-v4',
        # 'SafetyHopperVelocity-v4',
        # 'SafetyHumanoidVelocity-v4',
        'SafetyWalker2dVelocity-v4',
        # 'SafetyHalfCheetahVelocity-v4',
        # 'SafetySwimmerVelocity-v4',
    ]
    eg.add('env_id', mujoco_envs)

    # # Set the device.
    # avaliable_gpus = list(range(torch.cuda.device_count()))
    # gpu_id = [0, 1, 2, 3]
    # # if you want to use CPU, please set gpu_id = None
    # # gpu_id = None

    # if not set(gpu_id).issubset(avaliable_gpus):
    #     warnings.warn('The GPU ID is not available, use CPU instead.', stacklevel=1)
    #     gpu_id = None

    eg.add('algo', base_policy + first_order_policy + second_order_policy)
    eg.add('logger_cfgs:use_wandb', [False])
    eg.add('train_cfgs:vector_env_nums', [1])
    eg.add('train_cfgs:torch_threads', [1])
    eg.add('algo_cfgs:update_cycle', [2048])
    eg.add('train_cfgs:total_steps', [10240000])
    eg.add('seed', [i for i in range(0, 51, 5)])
    # total experiment num must can be divided by num_pool
    # meanwhile, users should decide this value according to their machine
    eg.run(train, num_pool=140)

    # just fill in the name of the parameter of which value you want to compare.
    # then you can specify the value of the parameter you want to compare,
    # or you can just specify how many values you want to compare in single graph at most,
    # and the function will automatically generate all possible combinations of the graph.
    # but the two mode can not be used at the same time.
    # eg.analyze(parameter='algo', values=None, compare_num=6, cost_limit=25)
    eg.render(num_episodes=10, render_mode='rgb_array', width=256, height=256)
    # eg.evaluate(num_episodes=1)

Command lines:

Extra dependencies:


Steps to reproduce:

  1. just run the python script via experiment gird.
  2. using StatisticsTools to analyze.

Traceback

File "~/anaconda3/envs/test_vel/lib/python3.8/site-packages/matplotlib/_api/__init__.py", line 93, in check_isinstance
    raise TypeError(
TypeError: 'value' must be an instance of str or bytes, not a float

Expected behavior

No response

Additional context

No response

[BUG] mujoco.FatalError: an OpenGL platform library has not been loaded into this process.

Required prerequisites

What version of OmniSafe are you using?

0.0.2

System information

3.8.15 (default, Nov 4 2022, 20:59:55)
[GCC 11.2.0] linux
0.0.2

Problem description

self = <gymnasium.envs.mujoco.mujoco_rendering.RenderContextOffscreen object at 0x7f5121c6dbe0>
model = <mujoco._structs.MjModel object at 0x7f5121c80e30>, data = <mujoco._structs.MjData object at 0x7f5146d457f0>, offscreen = True

    def __init__(self, model, data, offscreen=True):
    
        self.model = model
        self.data = data
        self.offscreen = offscreen
        self.offwidth = model.vis.global_.offwidth
        self.offheight = model.vis.global_.offheight
        max_geom = 1000
    
        mujoco.mj_forward(self.model, self.data)
    
        self.scn = mujoco.MjvScene(self.model, max_geom)
        self.cam = mujoco.MjvCamera()
        self.vopt = mujoco.MjvOption()
        self.pert = mujoco.MjvPerturb()
>       self.con = mujoco.MjrContext(self.model, mujoco.mjtFontScale.mjFONTSCALE_150)
E       mujoco.FatalError: an OpenGL platform library has not been loaded into this process, this most likely means that a valid OpenGL context has not been created before mjr_makeContext was called

../../../anaconda3/envs/omnisafe/lib/python3.8/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py:57: FatalError

image

Reproducible example code

The Python snippets:

import os

import helpers
import omnisafe

def test_evaluate_saved_policy():
    """Test render policy."""
    DIR = os.path.join(os.path.dirname(__file__), 'runs')
    evaluator = omnisafe.Evaluator()
    for env in os.scandir(DIR):
        env_path = os.path.join(DIR, env)
        for algo in os.scandir(env_path):
            print(algo)
            algo_path = os.path.join(env_path, algo)
            for exp in os.scandir(algo_path):
                exp_path = os.path.join(algo_path, exp)
                for item in os.scandir(os.path.join(exp_path, 'torch_save')):
                    if item.is_file() and item.name.split('.')[-1] == 'pt':
                        evaluator.load_saved_model(save_dir=exp_path, model_name=item.name)
                        evaluator.evaluate(num_episodes=1)
                        evaluator.render(num_episode=1, camera_name='track', width=256, height=256)

Command lines:

pytest test_evaluate_saved_policy.py

Extra dependencies:


Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

How to change the render mode?

Questions

It seems that the environment cannot modify its rendering mode. We tested two ways and both failed.

Like the new way in gymnasium library:
env = safety_gymnasium.make(env_name, render='rgb_array')
which gets
TypeError: __init__() got an unexpected keyword argument 'render'

Or the old way in gym library:
env.render(render_mode='rgb_array')
which gets
TypeError: env_render_passive_checker() got an unexpected keyword argument 'render_mode'.

Checklist

  • I have checked that there is no similar issue in the repo. (required)
  • I have read the documentation. (required)

[Feature Request] rgb_array don't not support

Describe the bug

I really appreciate this work, because I don't have to waste time on tedious mujoco200_linux installation and use old dependencies to adapt to safety_gym, but I noticed that you seem to have some environment codes that is not released, please tell me when you will release it!!!

Also I have a problem when I use it on my server, I want to save the visual mp4 via rgb_array, when I use render_mode = 'rbg_array' on safety_gymnasium, this doesn't seem to be supported, will this be supported subsequently?

Screenshots

image

Checklist

  • I have checked that there is no similar issue in the repo. (required)
  • I have read the documentation. (required)
  • I have provided a minimal working example to reproduce the bug. (required)

[Question] SafetyGym and vision-based in Safety Gymnasium

Questions

Thank you very much for your contribution; this has dramatically reduced the tedious process of installing safety gym on different machines.

I see the following statement in README.

Further, to facilitate the progress of community research, we redesigned [Safety_Gym](https://github.com/openai/safety-gym),
removed the dependency on mujoco_py, made it created on top of Mujoco and fixed some bugs.

I have two puzzles in the process, the first one is why there is no doggo agent, but I noticed that the original safety gym has this one.

image

Secondly, I noticed that there is no upload of these environments in the repo at the moment. They look very meaningful. May I presume to ask when they will be released approximately? I can cite your work when using this environment, although I didn't find any article about omnisafe or safety gymnasium; can you provide a valid way to cite it?

Checklist

  • I have checked that there is no similar issue in the repo. (required)
  • I have read the documentation. (required)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.