pku-alignment / omnisafe Goto Github PK

OmniSafe is an infrastructural framework for accelerating SafeRL research.

License: Apache License 2.0

Python 99.27% Makefile 0.50% Dockerfile 0.24%

benchmark pytorch safe-reinforcement-learning deep-reinforcement-learning reinforcement-learning machine-learning constraint-rl constraint-satisfaction-problem deep-learning safe-rl

omnisafe's Introduction

Documentation | Implemented Algorithms | Installation | Getting Started | License

OmniSafe is an infrastructural framework designed to accelerate safe reinforcement learning (RL) research. It provides a comprehensive and reliable benchmark for safe RL algorithms, and also an out-of-box modular toolkit for researchers. SafeRL intends to develop algorithms that minimize the risk of unintended harm or unsafe behavior.

OmniSafe stands as the inaugural unified learning framework in the realm of safe reinforcement learning, aiming to foster the Growth of SafeRL Learning Community. The key features of OmniSafe:

Highly Modular Framework. OmniSafe presents a highly modular framework, incorporating an extensive collection of tens of algorithms tailored for safe reinforcement learning across diverse domains. This framework is versatile due to its abstraction of various algorithm types and well-designed API, using the Adapter and Wrapper design components to bridge gaps and enable seamless interactions between different components. This design allows for easy extension and customization, making it a powerful tool for developers working with different types of algorithms.
High-performance parallel computing acceleration. By harnessing the capabilities of torch.distributed, OmniSafe accelerates the learning process of algorithms with process parallelism. This enables OmniSafe not only to support environment-level asynchronous parallelism but also incorporates agent asynchronous learning. This methodology bolsters training stability and expedites the training process via the deployment of a parallel exploration mechanism. The integration of agent asynchronous learning in OmniSafe underscores its commitment to providing a versatile and robust platform for advancing SafeRL research.
Out-of-box toolkits. OmniSafe offers customizable toolkits for tasks like training, benchmarking, analyzing, and rendering. Tutorials and user-friendly APIs make it easy for beginners and average users, while advanced researchers can enhance their efficiency without complex code.

Quick Start
- Installation
Implemented Algorithms
- Examples
Getting Started
- Important Hints
- Quickstart: Colab on the Cloud
Changelog
Citing OmniSafe
Publications using OmniSafe
The OmniSafe Team
License

Quick Start

Installation

Prerequisites

OmniSafe requires Python 3.8+ and PyTorch 1.10+.

We support and test for Python 3.8, 3.9, 3.10 on Linux. Meanwhile, we also support M1 and M2 versions of macOS. We will accept PRs related to Windows, but do not officially support it.

Install from source

# Clone the repo
git clone https://github.com/PKU-Alignment/omnisafe.git
cd omnisafe

# Create a conda environment
conda env create --file conda-recipe.yaml
conda activate omnisafe

# Install omnisafe
pip install -e .

Install from PyPI

OmniSafe is hosted in / .

pip install omnisafe

Implemented Algorithms

Latest SafeRL Papers

[AAAI 2023] Augmented Proximal Policy Optimization for Safe Reinforcement Learning (APPO)
[NeurIPS 2022] Constrained Update Projection Approach to Safe Policy Optimization (CUP)
[NeurIPS 2022] Effects of Safety State Augmentation on Safe Exploration (Simmer)
[NeurIPS 2022] Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
[ICML 2022] Sauté RL: Almost Surely Safe Reinforcement Learning Using State Augmentation (SauteRL)
[IJCAI 2022] Penalized Proximal Policy Optimization for Safe Reinforcement Learning
[AAAI 2022] Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)

List of Algorithms

On Policy SafeRL

Off Policy SafeRL

[Preprint 2019] The Lagrangian version of DDPG (DDPGLag)
[Preprint 2019] The Lagrangian version of TD3 (TD3Lag)
[Preprint 2019] The Lagrangian version of SAC (SACLag)
[ICML 2020] Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (DDPGPID)
[ICML 2020] Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (TD3PID)
[ICML 2020] Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (SACPID)

Model-Based SafeRL

[NeurIPS 2021] Safe Reinforcement Learning by Imagining the Near Future (SMBPO)
[CoRL 2021 (Oral)] Learning Off-Policy with Online Planning (SafeLOOP)
[AAAI 2022] Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)
[NeurIPS 2022] Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
[ICLR 2022] Constrained Policy Optimization via Bayesian World Models (LA-MBDA)
[ICML 2022 Workshop] Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method (RCE)
[NeurIPS 2018] Constrained Cross-Entropy Method for Safe Reinforcement Learning (CCE)

Offline SafeRL

The Lagrange version of BCQ (BCQ-Lag)
The Constrained version of CRR (C-CRR)
[AAAI 2022] Constraints Penalized Q-learning for Safe Offline Reinforcement Learning CPQ
[ICLR 2022 (Spotlight)] COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation
[ICML 2022] Constrained Offline Policy Optimization (COPO)

Others

[RA-L 2021] Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones
[ICML 2022] Sauté RL: Almost Surely Safe Reinforcement Learning Using State Augmentation (SauteRL)
[NeurIPS 2022] Effects of Safety State Augmentation on Safe Exploration

Examples

cd examples
python train_policy.py --algo PPOLag --env-id SafetyPointGoal1-v0 --parallel 1 --total-steps 10000000 --device cpu --vector-env-nums 1 --torch-threads 1

Algorithms Registry

Domains	Types	Algorithms Registry
On Policy	Primal Dual	TRPOLag; PPOLag; PDO; RCPO
	Primal Dual	TRPOPID; CPPOPID
	Convex Optimization	CPO; PCPO; FOCOPS; CUP
	Penalty Function	IPO; P3O
	Primal	OnCRPO
Off Policy	Primal-Dual	DDPGLag; TD3Lag; SACLag
Off Policy	Primal-Dual	DDPGPID; TD3PID; SACPID
Model-based	Online Plan	SafeLOOP; CCEPETS; RCEPETS
Model-based	Pessimistic Estimate	CAPPETS
Offline	Q-Learning Based	BCQLag; C-CRR
Offline	DICE Based	COptDICE
Other Formulation MDP	ET-MDP	PPOEarlyTerminated; TRPOEarlyTerminated
	SauteRL	PPOSaute; TRPOSaute
	SimmerRL	PPOSimmerPID; TRPOSimmerPID

Supported Environments

Here is a list of environments that Safety-Gymnasium supports:

Category	Task	Agent	Example
Safe Navigation	Goal[012]	Point, Car, Racecar, Ant	SafetyPointGoal1-v0
	Button[012]
	Push[012]
	Circle[012]
Safe Velocity	Velocity	HalfCheetah, Hopper, Swimmer, Walker2d, Ant, Humanoid	SafetyHumanoidVelocity-v1
Safe Isaac Gym	OverSafeFinger	ShadowHand	ShadowHandOverSafeFinger
	OverSafeJoint
	CatchOver2UnderarmSafeFinger
	CatchOver2UnderarmSafeJoint

For more information about environments, please refer to Safety-Gymnasium.

Customizing your environment

We offer a flexible customized environment interface that allows users to achieve the following without modifying the OmniSafe source code:

Use OmniSafe to train algorithms on customized environments.
Create the the environment with specified personalized parameters.
Complete the recording of environment-specific information in Logger.

We provide step-by-step tutorials on Environment Customization From Scratch and Environment Customization From Community to give you a detailed introduction on how to use this extraordinary feature of OmniSafe.

Note: If you find trouble customizing your environment, please feel free to open an issue or discussion. Pull requests are also welcomed if you're willing to contribute the implementation of your environments interface.

Try with CLI

pip install omnisafe

omnisafe --help  # Ask for help

omnisafe benchmark --help  # The benchmark also can be replaced with 'eval', 'train', 'train-config'

# Quick benchmarking for your research, just specify:
# 1. exp_name
# 2. num_pool(how much processes are concurrent)
# 3. path of the config file (refer to omnisafe/examples/benchmarks for format)

# Here we provide an exampe in ./tests/saved_source.
# And you can set your benchmark_config.yaml by following it
omnisafe benchmark test_benchmark 2 ./tests/saved_source/benchmark_config.yaml

# Quick evaluating and rendering your trained policy, just specify:
# 1. path of algorithm which you trained
omnisafe eval ./tests/saved_source/PPO-{SafetyPointGoal1-v0} --num-episode 1

# Quick training some algorithms to validate your thoughts
# Note: use `key1:key2`, your can select key of hyperparameters which are recursively contained, and use `--custom-cfgs`, you can add custom cfgs via CLI
omnisafe train --algo PPO --total-steps 2048 --vector-env-nums 1 --custom-cfgs algo_cfgs:steps_per_epoch --custom-cfgs 1024

# Quick training some algorithms via a saved config file, the format is as same as default format
omnisafe train-config ./tests/saved_source/train_config.yaml

Getting Started

Important Hints

We have provided benchmark results for various algorithms, including on-policy, off-policy, model-based, and offline approaches, along with parameter tuning analysis. Please refer to the following:

Quickstart: Colab on the Cloud

Explore OmniSafe easily and quickly through a series of Google Colab notebooks:

Getting Started Introduce the basic usage of OmniSafe so that users can quickly hand it.
CLI Command Introduce how to use the CLI tool of OmniSafe.

We take great pleasure in collaborating with our users to create tutorials in various languages. Please refer to our list of currently supported languages. If you are interested in translating the tutorial into a new language or improving an existing version, kindly submit a PR to us.

Changelog

See CHANGELOG.md.

Citing OmniSafe

If you find OmniSafe useful or use OmniSafe in your research, please cite it in your publications.

@article{omnisafe,
  title   = {OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research},
  author  = {Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang},
  journal = {arXiv preprint arXiv:2305.09304},
  year    = {2023}
}

Publications using OmniSafe

We have compiled a list of papers that use OmniSafe for algorithm implementation or experimentation. If you are willing to include your work in this list, or if you wish to have your implementation officially integrated into OmniSafe, please feel free to contact us.

Papers	Publisher
Off-Policy Primal-Dual Safe Reinforcement Learning	ICLR 2024
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model	ICLR 2024
Iterative Reachability Estimation for Safe Reinforcement Learning	NeurIPS 2023
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation	AAAI 2024
Learning Safety Constraints From Demonstration Using One-Class Decision Trees	AAAI 2024 WorkShops

The OmniSafe Team

OmniSafe is mainly developed by the SafeRL research team directed by Prof. Yaodong Yang. Our SafeRL research team members include Borong Zhang, Jiayi Zhou, JTao Dai, Weidong Huang, Ruiyang Sun, Xuehai Pan and Jiaming Ji. If you have any questions in the process of using OmniSafe, don't hesitate to ask your questions on the GitHub issue page, we will reply to you in 2-3 working days.

License

OmniSafe is released under Apache License 2.0.

omnisafe's People

Contributors

Stargazers

Watchers

Forkers

zmsn-2077 calico-1226 gaiejj xuehaipan zcchenvy wfei1 hdadong muchvo yinglinmonian rockmagma02 msiba glue25 zhouzhiqian 1asan ntt720 nicbair cerviny billionerd s8xy ohayoy e-kiss-me vamoko jbluv computerversand paramedick hs991023 closegoingaway molierflower moguijoe wensiyuansix dtrc2207 hay-man n0wwa 0x8235 r-y1 fskeo zaku-zaku jinning-li herpacker luluchou obsidian6s d3p10y spicyguml tutuna jinyi-sama coder-drinker xupercoin masemxiao tufo830 mistyr0se lycokie monsterdove farmingtong nicolesherwood maigone staccats hisstar rice-jivy gengyiran wgtom mickelliu cby-pku zq2413262560 weott sudo-michael lingzhi3 wenhaoma-uts iq-scm yiran-hao yang-new obnayuf guanjiayi apocalypsex z-st71112 xujinming01 jackory 00mjk adam-lyy sword865 yuanxi-ntu sjp0901 luigiberducci hcplu dtch1997 li-yr amdee joyce94 komorebi-wd fu-li-ck gaiyi7788 jinbo-he yiwen233 pwhjy markelz manila95 aagha66 zifanwu liyunlooong 5l1v3r1 tianxingchen

omnisafe's Issues

[Feature Request] Model-based algo codes need to be enhanced for code specification

Motivation

We noticed that the model-base-related code is too cumbersome and the related code is duplicated, we will fix it by this weekday.

Checklist

I have checked that there is no similar issue in the repo. (required)

[BUG] Can't we specify `standardized_rew_adv` and `standardized_cost_adv` at the same time?

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.1.0

System information

no need

Problem description

Can't we specify standardized_rew_adv and standardized_cost_adv at the same time?
I guess it is a bug.

https://github.com/PKU-MARL/omnisafe/blob/main/omnisafe/utils/config.py#L216

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] Having trouble in replicating the performance of PPO-Lag

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

I trained a policy with PPO-Lag yesterday, and got a terrible result. The learned policy barely navigate the robot to the given goal. After 300 epoches, the EpRet/Mean is nearly zero and the EpCost/Mean is still over 25.0, the given limitation on epcost. I used the given default parameters of PPO-Lag and started the training process in the terminal with the command of "python train_policy.py --env-id SafetyPointGoal1-v0 --algo PPOLag --parallel 4". The whole training process can be found in the following figure.

Besides, a similar situation has emerged in several other algorithms, including CPO, IPO, RCPO, and CUP.

Thanks for your reading and help.

[BUG] Problems encountered during installation

Required prerequisites

I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Motivation

Well, I try to install omnisafe library in a docker, which is equipped with Python 3.8+ and PyTorch 1.10+. Installing safety-gymnasium is smooth but a problem happens when installing omnisafe using command "pip install -e .". The problem is shown below:

This is caused by that enum34 conflicts with python 3.8. But just uninstalling this library is not enough as executing "pip install -e ." will result in another problem, which is shown below:

Solution

To solve these problems, I try to use "pip install setuptools==59.5.0" and then "pip uninstall enum34". In this way, omnisafe can be installed successfully.
In a word, just reduce the version of setuptools can solve this problem.

Alternatives

No response

Additional context

No response

[BUG] Having trouble running `pip install -e .`

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.0.2

System information

3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0] linux
0.0.2

Problem description

When I was running the pip install -e . command, I encountered the problem shown in the figure below:

But unexpectedly, when I re-run pip install -e . command, the omnisafe is successfully installed and example train_policy.py run.
I'm guessing it might be a configuration issue with my computer itself, but I also think it's a potential bug. what do you think？

Reproducible example code

	pip install -e .

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] Something wrong of PPO-Lag in SafetySwimmerVelocity-v4

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.1.1

System information

3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0] linux
0.1.1

Problem description

When I train PPO-Lag in SafetySwimmerVelocity-v4, like

cd examples
python train_from_custom_dict.py

where custom_dict is

import omnisafe


env_id = 'SafetySwimmerVelocity-v4'
custom_cfgs = {
    'train_cfgs': {
        'total_steps': 2048,
        'vector_env_nums': 2,
        'parallel': 1,
    },
    'algo_cfgs': {
        'update_cycle': 1024,
        'update_iters': 1,
    },
    'logger_cfgs': {
        'use_wandb': False,
    },
}

agent = omnisafe.Agent('PPOLag', env_id, custom_cfgs=custom_cfgs)
agent.learn()

I encountered NAN problem, like:

ValueError: Expected parameter loc (Tensor of shape (64, 2)) of distribution Normal(loc: torch.Size([64, 2]), scale: torch.Size([64, 2])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan]], grad_fn=<AddmmBackward0>)

How can I get rid of it?

Reproducible example code

import omnisafe


env_id = 'SafetySwimmerVelocity-v4'
custom_cfgs = {
    'train_cfgs': {
        'total_steps': 2048,
        'vector_env_nums': 2,
        'parallel': 1,
    },
    'algo_cfgs': {
        'update_cycle': 1024,
        'update_iters': 1,
    },
    'logger_cfgs': {
        'use_wandb': False,
    },
}

agent = omnisafe.Agent('PPOLag', env_id, custom_cfgs=custom_cfgs)
agent.learn()

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] nan when running in command line

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.2.2

System information

ubuntu20.04
0.2.2

Problem description

When I try to use omnisafe via the command line, I found there is some wrong, make nan in tensor.

Reproducible example code

The Python snippets:

Command lines:

omnisafe train --algo PPOLag --env-id SafetySwimmerVelocity-v4 --total-steps 1024 --custom-cfgs algo_cfgs:update_cycle --custom-cfgs 512

Extra dependencies:

Steps to reproduce:

run the command provided above

Traceback

mean = tensor([[nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan],                                               │ │
│ │        │   │   [nan, nan]], grad_fn=<AddmmBackward0>)

Expected behavior

No response

Additional context

No response

[Feature Request] Gratitude & When is focops updated？

Motivation

Thank you very much for your informative documentation. As a beginner, I have learned a lot from this. I have just started my research on SafeRL, and my mentor has asked me to research CPO, PCPO, FOCOPS, etc. I found that the CPO of omnisafe's document is very detailed. When will focops be updated?

Hope for your reply.

Checklist

I have checked that there is no similar issue in the repo. (required)

[Feature Request] LAMBDA

Required prerequisites

I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Motivation

I'm the author of LAMBDA, I can contribute my implementation, how should I proceed?

Solution

Implementation of LAMBDA, in pytorch (if I'm not mistaken). I also have a TF and JAX implementations.

Alternatives

No response

Additional context

No response

[BUG] the logger bug in Experiment Grid

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.1.0

System information

0.1.0, linux ubuntu 20.04

Problem description

I noticed that one of the very nice features of omnisafe is the experiment grid, which can run a very large number of experiments in large batches.
But I found that logger is distinguishing different experiment folders by current timestamp as a different variable, this is not a problem in case of no parallelism, but I found that in experement grid parallelism, different algorithms may be created at the same time.
Then there will be a potential bug of different algorithms logging to a folder at the same time, as evidenced by the following.

I think this potential bug may have something to do with issue #140 as well.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Feature Request] Cuda support

Required prerequisites

I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Motivation

What a nice repo !
However, I notice that omnisafe currently only support CPU. It seems that omnisafe haven't implemented cuda support yet.
Although training a small model (hidden layer size 64) on CPU is already very fast, I want to use a larger network for more complex tasks, which requires cuda support. When will you be able to provide this support?

Solution

No response

Alternatives

No response

Additional context

No response

[BUG] When I use the experiment grid in GPU, exps with the same file are saved separately

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.2.2

System information

ubuntu20.04
0.2.2

Problem description

When I use the experiment grid in GPU, exps with the same file are saved separately. It looks like because they are running in different GPU.

Reproducible example code

The Python snippets:

    eg = ExperimentGrid(exp_name='sg')

    # Set the algorithms.
    base_policy = ['PPO', 'PolicyGradient', 'P3O', 'PPOLag', 'FOCOPS', 'CUP']

    sg_envs = [
        'SafetyPointGoal0-v0',
        'SafetyPointGoal1-v0',
        'SafetyPointGoal2-v0',
        'SafetyPointButton0-v0',
        'SafetyPointButton1-v0',
        'SafetyPointButton2-v0',
        'SafetyPointCircle0-v0',
        'SafetyPointCircle1-v0',
        'SafetyPointCircle2-v0',

        'SafetyCarGoal0-v0',
        'SafetyCarGoal1-v0',
        'SafetyCarGoal2-v0',
        'SafetyCarButton0-v0',
        'SafetyCarButton1-v0',
        'SafetyCarButton2-v0',
        'SafetyCarCircle0-v0',
        'SafetyCarCircle1-v0',
        'SafetyCarCircle2-v0',
    ]
    eg.add('env_id', sg_envs)

    # Set the device.
    avaliable_gpus = [num for num in range(torch.cuda.device_count())]
    gpu_id = [0, 1, 2, 3, 4, 5, 6, 7]
    # if you want to use CPU, please set gpu_id = None
    # gpu_id = None

    if set(gpu_id) > set(avaliable_gpus):
        warnings.warn('The GPU ID is not available, use CPU instead.')
        gpu_id = None

    eg.add('algo', base_policy)
    eg.add('logger_cfgs:use_wandb', [False])
    eg.add('train_cfgs:vector_env_nums', [32])
    eg.add('train_cfgs:torch_threads', [1])
    eg.add('algo_cfgs:cost_normalize', [False])
    eg.add('algo_cfgs:reward_normalize', [False])
    eg.add('algo_cfgs:obs_normalize', [True])
    eg.add('algo_cfgs:update_cycle', [32768])
    eg.add('train_cfgs:total_steps', [32768 * 500])
    eg.add('seed', [0, 5, 10])
    # total experiment num must can be divided by num_pool
    # meanwhile, users should decide this value according to their machine
    eg.run(train, num_pool=81, gpu_id=gpu_id)

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

if exps are only different in seeds, I think they should be saved in same folder.

Additional context

No response

[Question] Aren't your normalization functions doing standardization?

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

I mean, this is normalization:

and, this is standardization:

this is an example of your code:

    def normalize(self, data: torch.Tensor) -> torch.Tensor:
        """Normalize the _data."""
        data = data.to(self._mean.device)
        self._push(data)
        if self._count <= 1:
            return data
        output = (data - self._mean) / self._std
        return torch.clamp(output, -self._clip, self._clip)

[Question] implementation, training, and performance of p3o.

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

Hi, I still fail to achieve a success training of p3o, with the latest implementation at PR#112.

A possible reason is that the implementation of P3O is inconsistent with the paper.

Currently, line 65 of p3o.py is
"loss_pi_c = self.cfgs.kappa * F.relu(surr_cadv + * Jc)",
which should have been
"loss_pi_c = self.cfgs.kappa * F.relu(surr_cadv + (1 - self.cfgs.cost_gamma) * Jc)".

Besides, I also test p3o in another environment (bullet_safety_gym safety-point-reach-v0), but it seem a little conservative.
The given cost_limit is 10. Some other algorithms, including PPO-Lag, TRPO-Lag, achieve a better performance than P3O (with cost of about 10.0 and return over 15).

[BUG] Errors about CLI

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.2.2

System information

3.9.13 (main, Aug 25 2022, 18:29:29)
[Clang 12.0.0 ] darwin
0.2.2

Problem description

When use omnisafe benchmark --help, i run the example command provided by omnisafe, this bug happens.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] dependencies confliction in version

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

While I install omnisafe from PyPI using pip install omnisafe, there are some dependency conflicts.
Does it matter?

ERROR: pip's dependency resolver does not currently take into
account
al1 the packaces
that are
installed. This behaviour is the source of the following dep
pandas-profiling 3.2.0 requires joblib-=1.1.0, but you have joblib 1.2.0 which is incompatible.

[Question] Parallel running

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

I have completed the installation of omnisafe and successfully run the example code. To make the experiment faster, I try to adjust the ''parallel'' parameter, I find that the program can run when parallel=2, which is shown below:

However, when this parameter is set ≥3, it seems that the code can not be executed. The error is shown below:

Maybe this problem is caused by that my device does not have enough cores to use, but I'm not sure. In this way, I put forward this problem and hope the developers can check the reason of this issue.

[Feature Request] Will support Sauté RL(ICML 2022)?

Motivation

I was recently reading ICML 2022: Sauté RL: Almost Surely Safe Reinforcement Learning
Using State Augmentation, which is also a kind of safe reinforcement learning, but I found that there is no code published in this article, will omnisafe add the code of this article subsequently?

Sauté RL: https://proceedings.mlr.press/v162/sootla22a/sootla22a.pdf

Hope for your reply.

Checklist

I have checked that there is no similar issue in the repo. (required)

[Question] Training Data Visualization with tensorboard?

Questions

I found that omnisafe records a lot of data from training, some of which are particularly useful for tuning references, but some of the recorded data I did not understand, for example,

FPS, I run ppo in omnisafe, in environment SafetyPointGoal1-v0, and the following graph appears. Can you please explain what FPS means?
I ran TRPOLag at the same time, and I found that the tensorboard recorded the following data. What is the meaning of these losses? Is the smaller, the better?

Checklist

I have checked that there is no similar issue in the repo. (required)
I have read the documentation. (required)

[Question] Cost Loss and Update for SAC Lagrangian

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

Why does the backup for the cost critic loss assign data['rew'] instead of data['cost'] to cost? Wouldn't this update result in a cost critic identical to the standard value critic?
The initial update for the Lagrange multiplier uses Jc = data['cost'].sum().item(). However, the update_lagrange_multiplier method uses Jc to compute the lambda loss which has function signature: def compute_lambda_loss(self, mean_ep_cost): Shouldn't Jc be defined as Jc = data['cost'].mean().item() if it's the mean_ep_cost?

[BUG] Nice Repo! But when use CUP algorithm's configs doesn't have lagrangian_upper_bound

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.0.2

System information

0.0.2

Problem description

In this file https://github.com/PKU-MARL/omnisafe/blob/dev/omnisafe/algorithms/on_policy/cup.py#L57.
The codes are,

        Lagrange.__init__(
            self,
            cost_limit=self.cfgs.lagrange_cfgs.cost_limit,
            lagrangian_multiplier_init=self.cfgs.lagrange_cfgs.lagrangian_multiplier_init,
            lambda_lr=self.cfgs.lagrange_cfgs.lambda_lr,
            lambda_optimizer=self.cfgs.lagrange_cfgs.lambda_optimizer,
            lagrangian_upper_bound=self.cfgs.lagrange_cfgs.lagrangian_upper_bound,
        )

But the CUP's configs yaml files doesn't have lagrangian_upper_bound in lagrange_cfgs.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] pylint: error: argument --spelling-dict

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.0.1

System information

3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0] linux
0.0.1

Problem description

When I use pre-commit run --all-files, the following error appears:

usage: pylint [options]
pylint: error: argument --spelling-dict: invalid choice: 'en_US' (choose from '')

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] How many types of cameras are available and what are their names？

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

What an amazing repo! It exports demos that I really like. But when I was using omnisafe's evaluate_saved_policy.py , I found that the default value of the parameter camera_name is track. Are there other types of cameras that can be used and what are their names?

[Question] how can I run agent in omnisafe but using safety_gymnasium env

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

I want to run a omnisafe with a safety_gymnasium env or some other env, how can i train it and eval it.
`import safety_gymnasium
import omnisafe

if name == 'main':
env = safety_gymnasium.make("SafetyCarPush2-v0")
agent = omnisafe.Agent('PPOLag', env)
agent.learn()
obs, info = env.reset()
ep_reward, ep_cost = 0, 0
for i in range(1000):
action, _states = agent.predict(obs, deterministic=True)
obs, reward, cost, done, _, info = env.step(action)
ep_reward += reward
ep_cost += cost
env.render()
if done:
print(ep_reward, ep_cost)
obs, info = env.reset()
ep_reward, ep_cost = 0, 0
env.close()`

like this.

[BUG] Is there something wrong in AutoResetWrapper？

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.1.0

System information

no need

Problem description

I noticed that you design wrappers to unify various environments, it is a pretty good design. But when I was trying to use autoreset wrapper, I found that it seems to be uncompleted. If you needed, I can try to fix this.

class AutoReset(Wrapper):
    """Auto reset the environment when the episode is terminated.

    Example:
        >>> env = AutoReset(env)

    """

    def __init__(self, env: CMDP) -> None:
        super().__init__(env)

        assert self.num_envs == 1, 'AutoReset only supports single environment'

    def step(
        self, action: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, Dict]:
        obs, reward, cost, terminated, truncated, info = super().step(action)

        if terminated or truncated:
            info['last_episode_obs'], _ = self.reset()
            obs, _ = self.reset()

        return obs, reward, cost, terminated, truncated, info

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Feature Request] More details about how to DIY users' local environments and modify default configs of environments in documentation and README.

Required prerequisites

I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Motivation

Consider add more details about how to DIY local environments and modify default configs of environments in documentation and README.

Solution

No response

Alternatives

No response

Additional context

Just like this in safety-gym:

[BUG] When I enable `smooth` in statistics tools, costs are skipped.

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.3.0

System information

ubuntu 20.04
0.3.0

Problem description

When I enable smooth in statistics tools, costs are skipped.
Before enabling smooth:

After enabling smooth:

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

1.plot without smooth
2.enable smooth and plot
3.

Traceback

No response

Expected behavior

Smooth in all graphs

Additional context

No response

[Question] Importance!!!! Do you have the performance of the CUP(NeurIPS 2022) algorithm on safety gym?

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

I see that there is already CUP related code in the dev branch, but I can't run the algorithm at the moment, do you have the performance of the CUP algorithm on safety gym?

[Question] Why you use the raw action in off-policy algorithms to update actor-critic

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

Why you use the raw action in off-policy algorithms to update actor-critic?
Shouldn't it be updated using the action after scaling?
https://github.com/PKU-MARL/omnisafe/blob/main/omnisafe/algorithms/off_policy/sac.py#L91

[BUG] Something wrong with tutorial

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.3.0

System information

ubuntu 20.04, 0.3.0

Problem description

when i use tutorial jupyter notebook local in pc, I found some error in default settings of usage.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] How can I run off-policy algorithms on main branch

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

When I try

cd examples
python train_policy.py --algo SAC

I encountered problems below:

In #153 we set the default total-step=3276800 and vector-env-nums=16 , which not match the off-policy algorithms.

[BUG] Standardize advantage twice in on-policy buffer and vector-on-policy buffer

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.1.1

System information

No need

Problem description

I found that the advantage will be standardized twice in the on-policy buffer and vector-on-policy buffer. I believe this is a bug.
See below:

    def get(self) -> Dict[str, torch.Tensor]:
        """Get the data in the buffer."""
        self.ptr, self.path_start_idx = 0, 0

        data = {
            'obs': self.data['obs'],
            'act': self.data['act'],
            'target_value_r': self.data['target_value_r'],
            'adv_r': self.data['adv_r'],
            'logp': self.data['logp'],
            'discounted_ret': self.data['discounted_ret'],
            'adv_c': self.data['adv_c'],
            'target_value_c': self.data['target_value_c'],
        }

        # self.data['adv_r'] = torch.zeros_like(self.data['adv_r'])
        # self.data['adv_c'] = torch.zeros_like(self.data['adv_c'])

        adv_mean, adv_std, *_ = distributed.dist_statistics_scalar(data['adv_r'])
        cadv_mean, *_ = distributed.dist_statistics_scalar(data['adv_c'])
        if self._standardized_adv_r:
            data['adv_r'] = (data['adv_r'] - adv_mean) / (adv_std + 1e-8)
        if self._standardized_adv_c:
            data['adv_c'] = data['adv_c'] - cadv_mean

        return data

    def get(self) -> Dict[str, torch.Tensor]:
        """Get the data from the buffer."""
        data_pre = {k: [v] for k, v in self.buffers[0].get().items()}
        for buffer in self.buffers[1:]:
            for k, v in buffer.get().items():
                data_pre[k].append(v)
        data = {k: torch.cat(v, dim=0) for k, v in data_pre.items()}

        adv_mean, adv_std, *_ = distributed.dist_statistics_scalar(data['adv_r'])
        cadv_mean, *_ = distributed.dist_statistics_scalar(data['adv_c'])
        if self._standardized_adv_r:
            data['adv_r'] = (data['adv_r'] - adv_mean) / (adv_std + 1e-8)
        if self._standardized_adv_c:
            data['adv_c'] = data['adv_c'] - cadv_mean

        return data

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] Some questions about the cpo documentation.

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

python3 -m pip show omnisafe

System information

Describe the characteristic of your environment:

Describe how the library was installed (pip, conda, source, ...)
Python version
Versions of any other relevant libraries

import sys, omnisafe
print(sys.version, sys.platform)
print(omnisafe.__version__)

Problem description

There are some parts that don't seem to be quite right and have problems with the rendering.

Reproducible example code

The Python snippets:

Run the snippets with the following commands:

Extra dependencies:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] Why does pytest exit with errors?

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

Hi! I am trying to contribute to omnisafe with code implemented by myself. But when I make test to check my code, these errors appear.

It seems that pytest can not recognize the syntax of python and import modules. And I noticed that it uses python 2.7.
Any help will be appreciated!

[Question] How to use the vision-based Safety Gymnasium in a headless server

Hi, I've recently come across the following error, on a machine with Nvidia driver version: 515.76, CUDA Version: 11.7, when trying to use the vision-based Safety Gymnasium in a headless Ubuntu 20.04 remote server. The same exact code was running properly on a machine with display.

Any idea how to fix the issue?

The program I run is as follows:

import argparse
import os

# import gymnasium
import safety_gymnasium
from gymnasium.utils.save_video import save_video


WORKDIR = os.path.abspath('.')
DIR = os.path.join(WORKDIR, 'omnisafe/envs/safety-gymnasium/examples', 'cached_test_vision_video')


def run_random(env_name):
    env = safety_gymnasium.make(env_name)
    # env.seed(0)
    obs, _ = env.reset()
    terminled = False
    ep_ret = 0
    ep_cost = 0
    render_list = []
    for i in range(1001):
        if terminled:
            print('Episode Return: %.3f \t Episode Cost: %.3f' % (ep_ret, ep_cost))
            ep_ret, ep_cost = 0, 0
            obs, _ = env.reset()
            save_video(
                frames=render_list,
                video_folder=DIR,
                name_prefix=f'test_vision_output',
                fps=30,
            )
            render_list = []
        assert env.observation_space.contains(obs)
        act = env.action_space.sample()
        assert env.action_space.contains(act)
        # Use the environment's built_in max_episode_steps
        if hasattr(env, '_max_episode_steps'):
            max_ep_len = env._max_episode_steps
        render_list.append(obs['vision'])
        obs, reward, cost, terminled, truncated, info = env.step(act)

        ep_ret += reward
        ep_cost += cost


if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('--env', default='SafetyCarGoal0Vision-v0')
    args = parser.parse_args()
    run_random(args.env)

which produced errors:

(mbppo) $ python safety_gym_v2_vision.py 
/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/utils/passive_env_checker.py:49: UserWarning: WARN: A Box observation space has an unconventional shape (neither an image, nor a 1D vector). We recommend flattening the observation to have only a 1D vector or use a custom policy to properly process the data. Actual observation shape: (3, 3)
  logger.warn(
/home/weidong/anaconda3/envs/mbppo/lib/python3.8/site-packages/glfw/__init__.py:912: GLFWError: (65544) b'X11: The DISPLAY environment variable is missing'
  warnings.warn(message, GLFWError)
/home/weidong/anaconda3/envs/mbppo/lib/python3.8/site-packages/glfw/__init__.py:912: GLFWError: (65537) b'The GLFW library is not initialized'
  warnings.warn(message, GLFWError)
Traceback (most recent call last):
  File "safety_gym_v2_vision.py", line 66, in <module>
    run_random(args.env)
  File "safety_gym_v2_vision.py", line 31, in run_random
    obs, _ = env.reset()
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/wrappers/order_enforcing.py", line 57, in reset
    return self.env.reset(**kwargs)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/wrappers/env_checker.py", line 60, in reset
    return env_reset_passive_checker(self.env, **kwargs)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/utils/passive_env_checker.py", line 214, in env_reset_passive_checker
    result = env.reset(**kwargs)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/builder.py", line 183, in reset
    return (self.task.obs(), info)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/tasks/goal/goal_level0.py", line 214, in obs
    obs['vision'] = self.obs_vision()
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/base_task.py", line 375, in obs_vision
    vision = self.engine.render(width, height, mode='rgb_array', camera_name='vision', cost={})
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/engine.py", line 339, in render
    self._get_viewer(mode).render(camera_id=camera_id)
  File "/home/weidong/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/engine.py", line 453, in _get_viewer
    self.viewer = RenderContextOffscreen(self.model, self.data)
  File "/home/weidong/anaconda3/envs/mbppo/lib/python3.8/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 232, in __init__
    super().__init__(model, data, offscreen=True)
  File "/home/weidong/anaconda3/envs/mbppo/lib/python3.8/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 57, in __init__
    self.con = mujoco.MjrContext(self.model, mujoco.mjtFontScale.mjFONTSCALE_150)
mujoco.FatalError: gladLoadGL error

[BUG] Omnisafe cannot run on a Mac M1 device

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.2.2

System information

3.9.13 (main, Aug 25 2022, 18:29:29)
[Clang 12.0.0 ] darwin
0.2.2

Problem description

Admittedly, this is a very nice library. However, Omnisafe has the following bug when i run the following command omnisafe train --algo PPO --total-steps 1024 --vector-env-nums 1 --custom-cfgs algo_cfgs:update_cycle --custom-cfgs 512 --device cpu.

Note: My device is MacBook Pro M1 Pro.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] How good is CPO on point goal.

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

Hi, I have ran CPO on PointGoal for about 1M time step.
The return is only 20 and cost is 40-50.

However, a good policy on pointgoal should be return=80 and cost = 2-3, is that normal?

[BUG] Why can I make Gymnasium environments when using Safety-Gymnasium?

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.01

System information

3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0] linux
0.0.1

Problem description

When I test safety-gymnasium, I wrongly input 'Humanoid-v4' as env_id, but it works and something goes run.

Reproducible example code

The Python snippets:

import argparse

import safety_gymnasium


def run_random(env_name):
    """Random run."""
    env = safety_gymnasium.make(env_name, render_mode='human')
    obs, _ = env.reset()
    # Use below to specify seed.
    # obs, _ = env.reset(seed=0)
    terminated, truncated = False, False
    ep_ret, ep_cost = 0, 0
    while True:
        if terminated or truncated:
            print(f'Episode Return: {ep_ret} \t Episode Cost: {ep_cost}')
            ep_ret, ep_cost = 0, 0
            obs, _ = env.reset()
        assert env.observation_space.contains(obs)
        act = env.action_space.sample()
        assert env.action_space.contains(act)
        # Use the environment's built_in max_episode_steps
        if hasattr(env, '_max_episode_steps'):  # pylint: disable=unused-variable
            max_ep_len = env._max_episode_steps  # pylint: disable=unused-variable,protected-access
        # pylint: disable-next=unused-variable
        obs, reward, cost, terminated, truncated, info = env.step(act)

        ep_ret += reward
        ep_cost += cost


if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('--env', default='Humanoid-v4')
    args = parser.parse_args()
    run_random(args.env)

Command lines:

Extra dependencies:

Steps to reproduce:

run code above.

Traceback

Traceback (most recent call last):
  File "~/omnisafe/envs/safety_gymnasium/examples/env.py", line 53, in <module>
    run_random(args.env)
  File "~/omnisafe/envs/safety_gymnasium/examples/env.py", line 42, in run_random
    obs, reward, cost, terminated, truncated, info = env.step(act)
  File "~/omnisafe/envs/safety-gymnasium/safety_gymnasium/wrappers/time_limit.py", line 45, in step
    observation, reward, cost, terminated, truncated, info = self.env.step(action)
ValueError: not enough values to unpack (expected 6, got 5)

Expected behavior

No response

Additional context

No response

[Feature Request] OmniSafe will support PyTorch 2.0

Required prerequisites

I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Motivation

The PyTorch team has released PyTorch 2.0: Our next-generation release that is faster, more Pythonic and Dynamic as ever. More detail can be referred to pytorch-2.0-release-blog.
We will support PyTorch 2.0 so that developers can use the latest features of PyTorch 2.0 in the framework of OmniSafe.

Solution

No response

Alternatives

No response

Additional context

No response

[Question] Something wrong when I check the progress.txt

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

When I trained TRPO on SafetyHumanoidVelocity-v4, I found that progress.txt recorded multiple lines of data in the same line. This makes it difficult for me to draw the training curve locally.

[BUG] model 'distutils' has no attribute 'version'

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.0.1

System information

3.8.15 (default, Nov 24 2022, 15:19:38)
[GCC 11.2.0] linux
0.0.1

Problem description

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] A bug in on-policy adapter with autoreset mechanism

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.1.1

System information

3.8.16 (default, Mar 2 2023, 03:21:46)
[GCC 11.2.0] linux
0.1.1

Problem description

In onpolicy_adapter.py， the end of the episodes are handled like this：

            obs = next_obs
            epoch_end = step >= steps_per_epoch - 1
            for idx, (done, time_out) in enumerate(zip(terminated, truncated)):
                if epoch_end or done or time_out:
                    if (epoch_end or time_out) and not done:
                        if epoch_end:
                            logger.log(
                                f'Warning: trajectory cut off when rollout by epoch at {self._ep_len[idx]} steps.'
                            )
                        _, last_value_r, last_value_c, _ = agent.step(obs[idx])
                        last_value_r = last_value_r.unsqueeze(0)
                        last_value_c = last_value_c.unsqueeze(0)
                    elif done:
                        last_value_r = torch.zeros(1)
                        last_value_c = torch.zeros(1)

                    if done or time_out:
                        self._log_metrics(logger, idx)
                        self._reset_log(idx)

                        self._ep_ret[idx] = 0.0
                        self._ep_cost[idx] = 0.0
                        self._ep_len[idx] = 0.0

                    buffer.finish_path(last_value_r, last_value_c, idx)

while in safety-gymnasium, when the episode end, it will auto reset immediately and carry the last state by info. For example, I think when 'time_out==True, epoch_end==False, done==False', the value of the last state is calculated from the first observation of next episode.

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[BUG] AutoReset error while run single environment in omnisafe

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.1.0

System information

0.1.0 linux ubuntu 20.04

Problem description

When I parallel an environment, I found that the environment would report errors. I checked the specific reason for the error. When the number of environments was 1, it called AutoResetWrapper of the environment, but safety-gymnasium==0.1.0, AutoResetWrapper did not return the information related to cost. I found that the maintainer of safety-gymnasium has been fixed in https://github.com/PKU-MARL/safety-gymnasium/pull/23/files, but has not updated the latest code version in pypi, But omnisafe depends on safety-gymnasium: 0.1.0. Could you tell me how to solve it here?

issue pictures:

safety-gymnasium fixd code in https://github.com/PKU-MARL/safety-gymnasium/pull/23/files:

Reproducible example code

The Python snippets:

Command lines:

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

[Question] wandb.sdk.lib.config_util.ConfigError when using exp_grid

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

First, Thanks for this well-designed library, it is elegant and reliable.
Yesterday, When I start a set of experiments, I found a strange ERROR, I am not sure whether it is a bug.
This is my core configuration for running, the other part is as same as your example in omnisafe/examples/benchmarks:

    eg = ExperimentGrid(exp_name='Test')
    eg.add('algo', ['PPOLag'])
    eg.add('env_id', ['SafetyPointGoal0-v0', 'SafetyPointGoal1-v0', 'SafetyPointGoal2-v0', 'SafetyAntVelocity-v4'])
    eg.add('epochs', 20)
    # eg.add('actor_lr', [0.001, 0.003, 0.004], 'lr', True)
    # eg.add('actor_iters', [1, 2], 'ac_iters', True)
    eg.add('wandb_project', 'test')
    eg.add('num_envs', [1, 2, 4, 8, 16, 32])
    # eg.add('seed', [0, 5, 10])
    eg.run(train, num_pool=10)

This is the information which wandb throw:

[BUG]'mujoco.structs.MjModelGeomViews' object has no attribute 'name

Describe the bug

'mujoco.structs.MjModelGeomViews' object has no attribute 'name'

When I run examples/vis_safety_gymnasium.py, the bugs as follows,

"/home/saferl/Documents/github/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/world.py", line 435, in init
    self.geom_names = [
  File "/home/saferl/Documents/github/omnisafe/omnisafe/envs/Safety_Gymnasium/safety_gymnasium/envs/safety_gym_v2/world.py", line 438, in <listcomp>
    if self.model.geom(i).name != 'floor'
AttributeError: 'mujoco.structs.MjModelGeomViews' object has no attribute 'name'

My virtual environment depends on

Gymnasium           0.26.3
mujoco                  2.2.0
numpy                   1.23.5
torch                      1.10.0+cu111
torchaudio             0.10.0+rocm4.1
torchvision             0.11.0+cu111

Checklist

I have checked that there is no similar issue in the repo. (required)
I have read the documentation. (required)
I have provided a minimal working example to reproduce the bug. (required)

[BUG] progress.csv saved incorrectly

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.3.0

System information

ubuntu 20.04
0.3.0

Problem description

After I ran some experiments, I tried to analyze these results via `StatisticsTools class. There is something that went wrong in the .csv file. And I am sure the processes end with no error.

  File "~/anaconda3/envs/test_vel/lib/python3.8/site-packages/matplotlib/_api/__init__.py", line 93, in check_isinstance
    raise TypeError(
TypeError: 'value' must be an instance of str or bytes, not a float

Reproducible example code

The Python snippets:

    eg = ExperimentGrid(exp_name='benchmark')

    # Set the algorithms.
    base_policy = ['PPO', 'PolicyGradient', 'P3O', 'PPOLag', 'FOCOPS', 'CUP', 'NaturalPG', 'TRPO']
    first_order_policy = ['CUP', 'FOCOPS', 'TRPOLag']
    second_order_policy = ['CPO', 'PCPO', 'RCPO']

    # Set the environments.
    mujoco_envs = [
        # 'SafetyAntVelocity-v4',
        # 'SafetyHopperVelocity-v4',
        # 'SafetyHumanoidVelocity-v4',
        'SafetyWalker2dVelocity-v4',
        # 'SafetyHalfCheetahVelocity-v4',
        # 'SafetySwimmerVelocity-v4',
    ]
    eg.add('env_id', mujoco_envs)

    # # Set the device.
    # avaliable_gpus = list(range(torch.cuda.device_count()))
    # gpu_id = [0, 1, 2, 3]
    # # if you want to use CPU, please set gpu_id = None
    # # gpu_id = None

    # if not set(gpu_id).issubset(avaliable_gpus):
    #     warnings.warn('The GPU ID is not available, use CPU instead.', stacklevel=1)
    #     gpu_id = None

    eg.add('algo', base_policy + first_order_policy + second_order_policy)
    eg.add('logger_cfgs:use_wandb', [False])
    eg.add('train_cfgs:vector_env_nums', [1])
    eg.add('train_cfgs:torch_threads', [1])
    eg.add('algo_cfgs:update_cycle', [2048])
    eg.add('train_cfgs:total_steps', [10240000])
    eg.add('seed', [i for i in range(0, 51, 5)])
    # total experiment num must can be divided by num_pool
    # meanwhile, users should decide this value according to their machine
    eg.run(train, num_pool=140)

    # just fill in the name of the parameter of which value you want to compare.
    # then you can specify the value of the parameter you want to compare,
    # or you can just specify how many values you want to compare in single graph at most,
    # and the function will automatically generate all possible combinations of the graph.
    # but the two mode can not be used at the same time.
    # eg.analyze(parameter='algo', values=None, compare_num=6, cost_limit=25)
    eg.render(num_episodes=10, render_mode='rgb_array', width=256, height=256)
    # eg.evaluate(num_episodes=1)

Command lines:

Extra dependencies:

Steps to reproduce:

just run the python script via experiment gird.
using StatisticsTools to analyze.

Traceback

File "~/anaconda3/envs/test_vel/lib/python3.8/site-packages/matplotlib/_api/__init__.py", line 93, in check_isinstance
    raise TypeError(
TypeError: 'value' must be an instance of str or bytes, not a float

Expected behavior

No response

Additional context

No response

[BUG] mujoco.FatalError: an OpenGL platform library has not been loaded into this process.

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

What version of OmniSafe are you using?

0.0.2

System information

3.8.15 (default, Nov 4 2022, 20:59:55)
[GCC 11.2.0] linux
0.0.2

Problem description

self = <gymnasium.envs.mujoco.mujoco_rendering.RenderContextOffscreen object at 0x7f5121c6dbe0>
model = <mujoco._structs.MjModel object at 0x7f5121c80e30>, data = <mujoco._structs.MjData object at 0x7f5146d457f0>, offscreen = True

    def __init__(self, model, data, offscreen=True):
    
        self.model = model
        self.data = data
        self.offscreen = offscreen
        self.offwidth = model.vis.global_.offwidth
        self.offheight = model.vis.global_.offheight
        max_geom = 1000
    
        mujoco.mj_forward(self.model, self.data)
    
        self.scn = mujoco.MjvScene(self.model, max_geom)
        self.cam = mujoco.MjvCamera()
        self.vopt = mujoco.MjvOption()
        self.pert = mujoco.MjvPerturb()
>       self.con = mujoco.MjrContext(self.model, mujoco.mjtFontScale.mjFONTSCALE_150)
E       mujoco.FatalError: an OpenGL platform library has not been loaded into this process, this most likely means that a valid OpenGL context has not been created before mjr_makeContext was called

../../../anaconda3/envs/omnisafe/lib/python3.8/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py:57: FatalError

Reproducible example code

The Python snippets:

import os

import helpers
import omnisafe

def test_evaluate_saved_policy():
    """Test render policy."""
    DIR = os.path.join(os.path.dirname(__file__), 'runs')
    evaluator = omnisafe.Evaluator()
    for env in os.scandir(DIR):
        env_path = os.path.join(DIR, env)
        for algo in os.scandir(env_path):
            print(algo)
            algo_path = os.path.join(env_path, algo)
            for exp in os.scandir(algo_path):
                exp_path = os.path.join(algo_path, exp)
                for item in os.scandir(os.path.join(exp_path, 'torch_save')):
                    if item.is_file() and item.name.split('.')[-1] == 'pt':
                        evaluator.load_saved_model(save_dir=exp_path, model_name=item.name)
                        evaluator.evaluate(num_episodes=1)
                        evaluator.render(num_episode=1, camera_name='track', width=256, height=256)

Command lines:

pytest test_evaluate_saved_policy.py

Extra dependencies:

Steps to reproduce:

Traceback

No response

Expected behavior

No response

Additional context

No response

How to change the render mode?

Questions

It seems that the environment cannot modify its rendering mode. We tested two ways and both failed.

Like the new way in gymnasium library:
env = safety_gymnasium.make(env_name, render='rgb_array')
which gets
TypeError: __init__() got an unexpected keyword argument 'render'

Or the old way in gym library:
env.render(render_mode='rgb_array')
which gets
TypeError: env_render_passive_checker() got an unexpected keyword argument 'render_mode'.

Checklist

I have checked that there is no similar issue in the repo. (required)
I have read the documentation. (required)

[Feature Request] rgb_array don't not support

Describe the bug

I really appreciate this work, because I don't have to waste time on tedious mujoco200_linux installation and use old dependencies to adapt to safety_gym, but I noticed that you seem to have some environment codes that is not released, please tell me when you will release it!!!

Also I have a problem when I use it on my server, I want to save the visual mp4 via rgb_array, when I use render_mode = 'rbg_array' on safety_gymnasium, this doesn't seem to be supported, will this be supported subsequently?

Screenshots

Checklist

I have checked that there is no similar issue in the repo. (required)
I have read the documentation. (required)
I have provided a minimal working example to reproduce the bug. (required)

[Question] SafetyGym and vision-based in Safety Gymnasium

Questions

Thank you very much for your contribution; this has dramatically reduced the tedious process of installing safety gym on different machines.

I see the following statement in README.

Further, to facilitate the progress of community research, we redesigned [Safety_Gym](https://github.com/openai/safety-gym),
removed the dependency on mujoco_py, made it created on top of Mujoco and fixed some bugs.

I have two puzzles in the process, the first one is why there is no doggo agent, but I noticed that the original safety gym has this one.

Secondly, I noticed that there is no upload of these environments in the repo at the moment. They look very meaningful. May I presume to ask when they will be released approximately? I can cite your work when using this environment, although I didn't find any article about omnisafe or safety gymnasium; can you provide a valid way to cite it?

Checklist

I have checked that there is no similar issue in the repo. (required)
I have read the documentation. (required)

pku-alignment / omnisafe Goto Github PK

omnisafe's Introduction

Table of Contents

Quick Start

Installation

Prerequisites

Install from source

Install from PyPI

Implemented Algorithms

Examples

Algorithms Registry

Supported Environments

Customizing your environment

Try with CLI

Getting Started

Important Hints

Quickstart: Colab on the Cloud

Changelog

Citing OmniSafe

Publications using OmniSafe

The OmniSafe Team

License

omnisafe's People

Contributors

Stargazers

Watchers

Forkers

omnisafe's Issues

Motivation

Checklist

Required prerequisites

What version of OmniSafe are you using?

System information

Problem description

Reproducible example code

Traceback

Expected behavior

Additional context

Required prerequisites

Questions

Required prerequisites

Motivation

Solution

Alternatives

Additional context

Required prerequisites

What version of OmniSafe are you using?

System information

Problem description

Reproducible example code

Traceback

Expected behavior

Additional context

Required prerequisites

What version of OmniSafe are you using?

System information

Problem description

Reproducible example code

Traceback

Expected behavior

Additional context

Required prerequisites

What version of OmniSafe are you using?

System information

Problem description

Reproducible example code

Traceback

Expected behavior

Additional context

Motivation

Checklist

Required prerequisites

Motivation

Solution

Alternatives

Additional context

Required prerequisites

What version of OmniSafe are you using?

System information

Problem description