Giter Club home page Giter Club logo

safe-multi-agent-isaac-gym's Introduction

Safe Multi-Agent Isaac Gym Benchmark (Safe MAIG)

Safe Multi-Agent Isaac Gym benchmark is for safe multi-agent reinforcement learning research.


The README is organized as follows:


About this repository

This repository is an extention of DexterousHands which is from PKU MARL research team, Safe MAIG contains complex dexterous hand RL environments for the NVIDIA Isaac Gym high performance environments described in the NeurIPS 2021 Datasets and Benchmarks paper.

๐ŸŒŸ This repository is under actively development. We appreciate any constructive comments and suggestions. If you have any questions, please feel free to email <gshangd[AT]foxmail.com>.

Figure.1 Safe multi-agent Isaac Gym environments. Body parts of different colours of robots are controlled by different agents in each pair of hands. Agents jointly learn to manipulate the robot, while avoiding violating the safety constraints.

Installation

Details regarding installation of IsaacGym can be found here. We currently support the Preview Release 3 version of IsaacGym.

Pre-requisites

The code has been tested on Ubuntu 18.04 with Python 3.7. The minimum recommended NVIDIA driver version for Linux is 470 (dictated by support of IsaacGym).

It uses Anaconda to create virtual environments. To install Anaconda, follow instructions here.

Ensure that Isaac Gym works on your system by running one of the examples from the python/examples directory, like joint_monkey.py. Follow troubleshooting steps described in the Isaac Gym Preview 2 install instructions if you have any trouble running the samples.

install this repo

Once Isaac Gym is installed and samples work within your current python environment, install this repo:

pip install -e .

Running the benchmarks

To train your first policy, run this line:

python train.py --task=ShadowHandOver --algo=macpo

Select an algorithm

To select an algorithm, pass --algo=ppo/mappo/happo/hatrpo as an argument:

python train.py --task=ShadowHandOver --algo=macpo

At present, we only support these four algorithms.

Select tasks

Source code for tasks can be found in dexteroushandenvs/tasks.

Until now we only suppose the following environments:

Environments ShadowHandOver ShadowHandCatchUnderarm ShadowHandTwoCatchUnderarm ShadowHandCatchAbreast ShadowHandOver2Underarm
Description These environments involve two fixed-position hands. The hand which starts with the object must find a way to hand it over to the second hand. These environments again have two hands, however now they have some additional degrees of freedom that allows them to translate/rotate their centre of masses within some constrained region. These environments involve coordination between the two hands so as to throw the two objects between hands (i.e. swapping them). This environment is similar to ShadowHandCatchUnderarm, the difference is that the two hands are changed from relative to side-by-side posture. This environment is is made up of half ShadowHandCatchUnderarm and half ShadowHandCatchOverarm, the object needs to be thrown from the vertical hand to the palm-up hand
Actions Type Continuous Continuous Continuous Continuous Continuous
Total Action Num 40 52 52 52 52
Action Values [-1, 1] [-1, 1] [-1, 1] [-1, 1] [-1, 1]
Action Index and Description detail detail detail detail detail
Observation Shape (num_envs, 2, 211) (num_envs, 2, 217) (num_envs, 2, 217) (num_envs, 2, 217) (num_envs, 2, 217)
Observation Values [-5, 5] [-5, 5] [-5, 5] [-5, 5] [-5, 5]
Observation Index and Description detail detail detail detail detail
State Shape (num_envs, 2, 398) (num_envs, 2, 422) (num_envs, 2, 422) (num_envs, 2, 422) (num_envs, 2, 422)
State Values [-5, 5] [-5, 5] [-5, 5] [-5, 5] [-5, 5]
Rewards Rewards is the pose distance between object and goal. You can check out the details here Rewards is the pose distance between object and goal. You can check out the details here Rewards is the pose distance between object and goal. You can check out the details here Rewards is the pose distance between two object and two goal, this means that both objects have to be thrown in order to be swapped over. You can check out the details here Rewards is the pose distance between object and goal. You can check out the details here
Demo

HandOver Environments

These environments involve two fixed-position hands. The hand which starts with the object must find a way to hand it over to the second hand. To use the HandOver environment, pass --task=ShadowHandOver

Observation Space

Index Description
0 - 23 shadow hand dof position
24 - 47 shadow hand dof velocity
48 - 71 shadow hand dof force
72 - 136 shadow hand fingertip pose, linear velocity, angle velocity (5 x 13)
137 - 166 shadow hand fingertip force, torque (5 x 6)
167 - 186 actions
187 - 193 object pose
194 - 196 object linear velocity
197 - 199 object angle velocity
200 - 206 goal pose
207 - 210 goal rot - object rot

Action Space

The shadow hand has 24 joints, 20 actual drive joints and 4 underdrive joints. So our Action is the joint Angle value of the 20 dimensional actuated joint.

Index Description
0 - 19 shadow hand actuated joint

Rewards

Rewards is the pose distance between object and goal, and the specific formula is as follows:

goal_dist = torch.norm(target_pos - object_pos, p=2, dim=-1)

quat_diff = quat_mul(object_rot, quat_conjugate(target_rot))
rot_dist = 2.0 * torch.asin(torch.clamp(torch.norm(quat_diff[:, 0:3], p=2, dim=-1), max=1.0))

dist_rew = goal_dist

reward = torch.exp(-0.2*(dist_rew * dist_reward_scale + rot_dist))

HandCatchUnderarm Environments

These environments again have two hands, however now they have some additional degrees of freedom that allows them to translate/rotate their centre of masses within some constrained region. To use the HandCatchUnderarm environment, pass --task=ShadowHandCatchUnderarm

Observation Space

Index Description
0 - 23 shadow hand dof position
24 - 47 shadow hand dof velocity
48 - 71 shadow hand dof force
72 - 136 shadow hand fingertip pose, linear velocity, angle velocity (5 x 13)
137 - 166 shadow hand fingertip force, torque (5 x 6)
167 - 192 actions
193 - 195 shadow hand transition
196 - 198 shadow hand orientation
199 - 205 object pose
206 - 208 object linear velocity
209 - 211 object angle velocity
212 - 218 goal pose
219 - 222 goal rot - object rot

Action Space

Similar to the HandOver environments, except now the bases are not fixed and have translational and rotational degrees of freedom that allow them to move within some range.

Index Description
0 - 19 shadow hand actuated joint
20 - 22 shadow hand actor translation
23 - 25 shadow hand actor rotation

Rewards

Rewards is the pose distance between object and goal, and the specific formula is as follows:

goal_dist = torch.norm(target_pos - object_pos, p=2, dim=-1)

quat_diff = quat_mul(object_rot, quat_conjugate(target_rot))
rot_dist = 2.0 * torch.asin(torch.clamp(torch.norm(quat_diff[:, 0:3], p=2, dim=-1), max=1.0))

dist_rew = goal_dist

reward = torch.exp(-0.2*(dist_rew * dist_reward_scale + rot_dist))

HandCatchOver2Underarm Environments

This environment is is made up of half ShadowHandCatchUnderarm and half ShadowHandCatchOverarm, the object needs to be thrown from the vertical hand to the palm-up hand. To use the HandCatchUnderarm environment, pass --task=ShadowHandCatchOver2Underarm

Observation Space

Index Description
0 - 23 shadow hand dof position
24 - 47 shadow hand dof velocity
48 - 71 shadow hand dof force
72 - 136 shadow hand fingertip pose, linear velocity, angle velocity (5 x 13)
137 - 166 shadow hand fingertip force, torque (5 x 6)
167 - 192 actions
193 - 195 shadow hand transition
196 - 198 shadow hand orientation
199 - 205 object pose
206 - 208 object linear velocity
209 - 211 object angle velocity
212 - 218 goal pose
219 - 222 goal rot - object rot

Action Space

Similar to the HandOver environments, except now the bases are not fixed and have translational and rotational degrees of freedom that allow them to move within some range.

Index Description
0 - 19 shadow hand actuated joint
20 - 22 shadow hand actor translation
23 - 25 shadow hand actor rotation

Rewards

Rewards is the pose distance between object and goal, and the specific formula is as follows:

goal_dist = torch.norm(target_pos - object_pos, p=2, dim=-1)
# Orientation alignment for the cube in hand and goal cube
quat_diff = quat_mul(object_rot, quat_conjugate(target_rot)
reward = (0.3 - goal_dist - quat_diff)

TwoObjectCatch Environments

These environments involve coordination between the two hands so as to throw the two objects between hands (i.e. swapping them). This is necessary since each object's goal can only be reached by the other hand. To use the HandCatchUnderarm environment, pass --task=ShadowHandTwoCatchUnderarm

Observation Space

Index Description
0 - 23 shadow hand dof position
24 - 47 shadow hand dof velocity
48 - 71 shadow hand dof force
72 - 136 shadow hand fingertip pose, linear velocity, angle velocity (5 x 13)
137 - 166 shadow hand fingertip force, torque (5 x 6)
167 - 192 actions
193 - 195 shadow hand transition
196 - 198 shadow hand orientation
199 - 205 object1 pose
206 - 208 object1 linear velocity
210 - 212 object1 angle velocity
213 - 219 goal1 pose
220 - 223 goal1 rot - object1 rot
224 - 230 object2 pose
231 - 233 object2 linear velocity
234 - 236 object2 angle velocity
237 - 243 goal2 pose
244 - 247 goal2 rot - object2 rot

Action Space

Similar to the HandOver environments, except now the bases are not fixed and have translational and rotational degrees of freedom that allow them to move within some range.

Index Description
0 - 19 shadow hand actuated joint
20 - 22 shadow hand actor translation
23 - 25 shadow hand actor rotation

Rewards

Rewards is the pose distance between two object and two goal, this means that both objects have to be thrown in order to be swapped over. The specific formula is as follows:

goal_dist = torch.norm(target_pos - object_pos, p=2, dim=-1)
goal_another_dist = torch.norm(target_another_pos - object_another_pos, p=2, dim=-1)

# Orientation alignment for the cube in hand and goal cube
quat_diff = quat_mul(object_rot, quat_conjugate(target_rot))
rot_dist = 2.0 * torch.asin(torch.clamp(torch.norm(quat_diff[:, 0:3], p=2, dim=-1), max=1.0))

quat_another_diff = quat_mul(object_another_rot, quat_conjugate(target_another_rot))
rot_another_dist = 2.0 * torch.asin(torch.clamp(torch.norm(quat_another_diff[:, 0:3], p=2, dim=-1), max=1.0))

dist_rew = goal_dist

reward = torch.exp(-0.2*(dist_rew * dist_reward_scale + rot_dist)) + torch.exp(-0.2*(goal_another_dist * dist_reward_scale + rot_another_dist))

HandCatchAbreast Environments

This environment is similar to ShadowHandCatchUnderarm, the difference is that the two hands are changed from relative to side-by-side posture.. To use the HandCatchAbreast environment, pass --task=ShadowHandCatchAbreast

Observation Space

Index Description
0 - 23 shadow hand dof position
24 - 47 shadow hand dof velocity
48 - 71 shadow hand dof force
72 - 136 shadow hand fingertip pose, linear velocity, angle velocity (5 x 13)
137 - 166 shadow hand fingertip force, torque (5 x 6)
167 - 192 actions
193 - 195 shadow hand transition
196 - 198 shadow hand orientation
199 - 205 object pose
206 - 208 object linear velocity
209 - 211 object angle velocity
212 - 218 goal pose
219 - 222 goal rot - object rot

Action Space

Similar to the HandOver environments, except now the bases are not fixed and have translational and rotational degrees of freedom that allow them to move within some range.

Index Description
0 - 19 shadow hand actuated joint
20 - 22 shadow hand actor translation
23 - 25 shadow hand actor rotation

Rewards

Rewards is the pose distance between object and goal, and the specific formula is as follows:

goal_dist = torch.norm(target_pos - object_pos, p=2, dim=-1)

quat_diff = quat_mul(object_rot, quat_conjugate(target_rot))
rot_dist = 2.0 * torch.asin(torch.clamp(torch.norm(quat_diff[:, 0:3], p=2, dim=-1), max=1.0))

dist_rew = goal_dist

reward = torch.exp(-0.2*(dist_rew * dist_reward_scale + rot_dist))

safe-multi-agent-isaac-gym's People

Contributors

chauncygu avatar cypypccpy avatar feidieufo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.