The minimal-stable-ppo from toruowo

Supporting discrete action space?

Hi @ToruOwO , the codes of this repo are very easy to read and at the same time performant enough, which I appreciate so much. However, it seems that your implementation only supports continuous action space. If it is the case, will you consider to add support for discrete action space in the future?

The reason why I'm interested in discrete action space is that I'm now working with a customized RL environment that leverages Isaac Gym to accelerate the physics simulation but has a customized discrete action space. Unfortunately, I find that Isaac Gym acceleration + discrete action space is a demand seldom considered by mainstream RL frameworks on the market.

I would be very grateful if you could help implement the discrete action space version of PPO, or just provide any potentially helpful suggestions. Looking forward to your reply!

A Small Typo in the Model Initialization Code?

In ppo/models.py L64, should it be torch.nn.init.orthogonal_(self.mu.weight, gain=0.01)? Currently, the code seems to be initializing the value final layer twice.

Openrlbenchmark integration

Hi @ToruOwO, this is very cool stuff. I especially like your results in AllegroHand. It was a hard environment and difficult to tune. I wondering if you found any particular tricks / hyperparameter important for AllegroHand.

Also, I see you are using wandb for experiment logging. Would you be interested in making your wandb project public and integrating the openrlbenchmark utilities? It could help make plots faster and easier.

E.g.,

pip install --upgrade openrlbenchmark
python -m openrlbenchmark.rlops \
    --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' 'ppo_continuous_action_isaacgym?tag=rlops-pilot' \
    --check-empty-runs False \
    --env-ids Cartpole Ant Humanoid BallBalance Anymal AllegroHand ShadowHand \
    --ncols 4 \
    --ncols-legend 1 \
    --output-filename static/isaacgym \
    --scan-history

──────────────────────────────────────────────────────────────── Runtime (m) (mean ± std) ────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Environment ┃ openrlbenchmark/cleanrl/ppo_continuous_action_isaacgym ({'tag': ['rlops-pilot']}) ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Cartpole    │ 1.8076923076923077                                                                │
│ Ant         │ 4.049301675977653                                                                 │
│ Humanoid    │ 5.590862173649058                                                                 │
│ BallBalance │ 2.803096539162113                                                                 │
│ Anymal      │ 5.112263257575758                                                                 │
│ AllegroHand │ 185.13721494420184                                                                │
│ ShadowHand  │ 124.74728214697494                                                                │
└─────────────┴───────────────────────────────────────────────────────────────────────────────────┘
────────────────────────────────────────────────────────────── Episodic Return (mean ± std) ──────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Environment ┃ openrlbenchmark/cleanrl/ppo_continuous_action_isaacgym ({'tag': ['rlops-pilot']}) ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Cartpole    │ 369.28 ± 172.53                                                                   │
│ Ant         │ 3914.98 ± 483.76                                                                  │
│ Humanoid    │ 2498.95 ± 612.68                                                                  │
│ BallBalance │ 178.43 ± 25.68                                                                    │
│ Anymal      │ 24.04 ± 3.94                                                                      │
│ AllegroHand │ 791.76 ± 237.08                                                                   │
│ ShadowHand  │ 403.67 ± 88.94                                                                    │
└─────────────┴───────────────────────────────────────────────────────────────────────────────────┘
────────────────────────────────────────────────────────────────── Runtime (m) Average ───────────────────────────────────────────────────────────────────
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Environment                                                                       ┃ Average Runtime    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ openrlbenchmark/cleanrl/ppo_continuous_action_isaacgym ({'tag': ['rlops-pilot']}) │ 47.035387577890525 │
└───────────────────────────────────────────────────────────────────────────────────┴────────────────────┘

toruowo / minimal-stable-ppo Goto Github PK

minimal-stable-ppo's People

Contributors

Stargazers

Watchers

Forkers

minimal-stable-ppo's Issues

Supporting discrete action space?

A Small Typo in the Model Initialization Code?

Openrlbenchmark integration

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent