I am conducting reinforcement learning for a robot using rsl_rl and isaac lab. While i

[Bug Report] actor's std becomes "nan" during PPO training about rsl_rl HOT 8 OPEN

leggedrobotics commented on September 17, 2024

[Bug Report] actor's std becomes "nan" during PPO training

from rsl_rl.

Comments (8)

Lruomeng commented on September 17, 2024

me too

from rsl_rl.

shafeef901 commented on September 17, 2024

Can confirm I've experienced it too. In my case, I had introduced some sparse rewards to my environment. Not sure that's the cause tho.

from rsl_rl.

felipemohr commented on September 17, 2024

Same problem here. When visualizing the training data in tensorboard, I notice that Loss/value_function suddenly goes to infinity

from rsl_rl.

xliu0105 commented on September 17, 2024

Same problem

from rsl_rl.

xliu0105 commented on September 17, 2024

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues ray-project/ray#19291 with the fix ray-project/ray#22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

Thanks for your answer, I'll try it out

from rsl_rl.

AlexanderAbernathy commented on September 17, 2024

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues ray-project/ray#19291 with the fix ray-project/ray#22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

from rsl_rl.

AlexanderAbernathy commented on September 17, 2024

When facing the error std>=0, check the output 'Value Function Loss' to see whether it's inf or not. If it is inf, there is a solution that you can try. Based on the knowledge from issues ray-project/ray#19291 with the fix ray-project/ray#22171 and ray-project/ray@ddd1160, the codes starting at L159 in ppo.py file of rsl_rl (version 2.0.2) need to be modified as follows:

Note this kind of method may not work and it may reduce the learning speed. I've tested it using parameters 'iteration : 30000' & 'num_envs : 12000 to 30000' for training my own robot. The training process randomly failed between 1,000 and 18,000 iterations. I've checked the 'value batch' and 'return batch'. Once the training failed, these two values showed very large positive or negative numbers. I ultimately completed the entire training process by modifying the reward and penalty. Since I'm still fresh to the RL, I don't know exactly what happened. By the way, I've tested modifying the hyperparameters of PPO and the structure of networks. It didn't work. I would greatly appreciate it if someone could provide some information on this topic.

There is an unsuitable method to ensure the training proceeds. When std >= 0 and the Value Function Loss shows inf, you can first adjust some parameters in the project and then use --resume to load the checkpoint and continue training.

from rsl_rl.

weifeng-lt commented on September 17, 2024

Adding code actions = torch.clip(actions, min=-6.28, max=6.28) before env.step(actions) seems to help. And it is better to add a penalty to actions to prevent the actor model from outputting too large values.

from rsl_rl.

[Bug Report] actor's std becomes "nan" during PPO training about rsl_rl HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent