lorenmt / minimal-isaac-gym Goto Github PK

View Code? Open in Web Editor NEW

84.0 84.0 13.0 13 KB

A Minimal Example of Isaac Gym with DQN and PPO.

Python 100.00%

dqn isaac-gym ppo pytorch

minimal-isaac-gym's People

Contributors

Stargazers

Watchers

Forkers

superdiode ma-tsu-o-ka-k henls sungwook87 yycho0108 petim0 sibocw kaneikiken dibbla satpreetsingh kiritoshaw jieli18

minimal-isaac-gym's Issues

Possible mistake in DQN implementation

Hi Shikun & Marwan, thank you for releasing this repo!

I'm using DQN implementation in my customed environment. However, I have noticed that resetting the environment in every step of the run() function (located in this line of code) is causing an issue. It seems that this can cause the agent to become stuck in the initial scene of the environment.

In addition, the evaluation consequently doesn't make sense due to reset and we have 1-step evaluation.

I'm running the code (both CartPole & mine) headlessly, so I'm not very sure about the conclusion of CartPole above, but indeed the bug happens in custom environment.

Broadcast error (`cannot broadcast to a lower rank tensor`)

I have a working install of IsaagGym, IsaacGymEnvs and rl-games, and I just followed the steps in this repository's README to get it to run, however I get the following issue :

Traceback (most recent call last):
  File "trainer.py", line 35, in <module>
    policy.run()
  File "/home/theo/Documents/minimal-isaac-gym/dqn.py", line 105, in run
    self.env.step(action)
  File "/home/theo/Documents/minimal-isaac-gym/env.py", line 199, in step
    self.get_reward()
  File "/home/theo/Documents/minimal-isaac-gym/env.py", line 139, in get_reward
    self.max_episode_length)
RuntimeError: MALFORMED INPUT: Cannot broadcast to a lower rank tensor

Does anyone know whether this is a common issue ?

Possible mistake in ppo.py

Hi Shikun & Marwan, thank you for releasing this repo!

My conclusion: I think the code on line 103 in ppo.py will get the wrong result. My solution is to remove the reversed function.

I simplified a trajectory into s1, s2, ... , st.

In ppo.py run():

self.env.step(action)
next_obs, reward, done = self.env.obs_buf.clone(), self.env.reward_buf.clone(), self.env.reset_buf.clone()
self.env.reset()

self.data.append((obs, action, reward, next_obs, log_prob, 1 - done))

Therefore, in self.data, it is like s1, s2, ... ,st.

In ppo.py make_data():

After self.data.pop(), the obs, action and reward are located at the last position of trajectory.

So obs_lst is like st, st-1, ..., sk (k=t-self.mini_chunk_size+1).

After executing the code on line 92, obs is like st, st-1, ..., sk (k=t-self.mini_chunk_size+1).

The variable delta is also like st, st-1, ..., sk. (delta_t, ... delta_k)

One calculation method of GAE is from back to front (the same as in the code):

But the code on line 102 reverses the delta, reversed(delta) is like sk, sk+1, ... st.

So I think the code on line 103 will get the wrong result. My solution is to remove the reversed function.

But I found that removing the reversed function has little effect on the training results [Lol]

Original version: reversed

Without reversed：

lorenmt / minimal-isaac-gym Goto Github PK

minimal-isaac-gym's People

Contributors

Stargazers

Watchers

Forkers

minimal-isaac-gym's Issues

Possible mistake in DQN implementation

Broadcast error (`cannot broadcast to a lower rank tensor`)

Possible mistake in ppo.py

In ppo.py run():

In ppo.py make_data():

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent