Hello Professor, Recently, I've been studying your paper and reproducing your code

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

some question about code about rl-collision-avoidance HOT 6 OPEN

acmece commented on August 10, 2024

some question about code

from rl-collision-avoidance.

Comments (6)

Acmece commented on August 10, 2024

@BingHan0458 Hi,
I did not benchmark the algorithm. The success rate and the extra time, I think is easy to access if you run many epochs and count them. I test the algorithm by monitoring the average reward. If the average reward is converged, the model is converged too. I may find the code of this part and post it.

from rl-collision-avoidance.

BingHan0458 commented on August 10, 2024

OK, Thank you very much!
And there is another question:
when I first run your code by rosrun stage_ros_add_pose_and_crash stageros worlds/stage1.world and mpiexec -np 24 python ppo_stage1.py, there are some output as follows:

####################################
############Loading Model###########
####################################
/home/.local/lib/python2.7/site-packages/torch/nn/functional.py:1351: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/home/.local/lib/python2.7/site-packages/torch/nn/functional.py:1340: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
Env 10, Goal (002.8, -08.4), Episode 00001, setp 003, Reward -33.6, Distance 008.3, Crashed
Env 00, Goal (-02.2, -04.6), Episode 00001, setp 003, Reward -15.0, Distance 009.2, Crashed
Env 14, Goal (-02.8, -06.6), Episode 00001, setp 004, Reward -5.2 , Distance 008.8, Crashed
Env 10, Goal (002.9, -02.7), Episode 00002, setp 002, Reward -15.0, Distance 009.6, Crashed
Env 21, Goal (-05.6, 001.1), Episode 00001, setp 005, Reward -26.1, Distance 008.4, Crashed
Env 10, Goal (-04.6, -04.2), Episode 00003, setp 002, Reward -15.0, Distance 010.0, Crashed
Env 20, Goal (004.0, -04.7), Episode 00001, setp 019, Reward 37.4 , Distance 009.5, Reach Goal
Env 08, Goal (001.0, -07.9), Episode 00001, setp 020, Reward -28.6, Distance 008.3, Crashed
Env 20, Goal (-07.6, -01.4), Episode 00002, setp 002, Reward -15.0, Distance 008.5, Crashed
Env 01, Goal (007.2, 003.5), Episode 00001, setp 024, Reward 35.1 , Distance 008.6, Reach Goal
Env 13, Goal (-04.7, -02.9), Episode 00001, setp 026, Reward 35.0 , Distance 008.6, Reach Goal
Env 16, Goal (-02.9, 000.3), Episode 00001, setp 037, Reward -1.0 , Distance 009.3, Crashed
Env 19, Goal (-00.2, 004.2), Episode 00001, setp 048, Reward -12.9, Distance 008.5, Crashed
Env 17, Goal (005.4, 000.6), Episode 00001, setp 056, Reward -7.9 , Distance 009.4, Crashed
Env 07, Goal (-02.6, -03.0), Episode 00001, setp 056, Reward -11.6, Distance 008.5, Crashed
Env 09, Goal (-05.7, 000.3), Episode 00001, setp 065, Reward 38.5 , Distance 009.9, Reach Goal
Env 16, Goal (001.1, -07.3), Episode 00002, setp 031, Reward -13.2, Distance 009.4, Crashed
Env 06, Goal (-02.3, -00.2), Episode 00001, setp 104, Reward 35.1 , Distance 008.5, Reach Goal
Env 11, Goal (-06.4, 000.7), Episode 00001, setp 115, Reward 3.0  , Distance 008.0, Crashed
Env 10, Goal (001.5, -07.2), Episode 00004, setp 111, Reward 35.0 , Distance 008.5, Reach Goal
Env 14, Goal (002.0, -05.6), Episode 00002, setp 113, Reward 37.2 , Distance 009.5, Reach Goal
Env 18, Goal (004.1, -00.8), Episode 00001, setp 117, Reward 35.2 , Distance 008.6, Reach Goal
Env 02, Goal (-01.3, 001.5), Episode 00001, setp 119, Reward 34.5 , Distance 008.4, Reach Goal
Env 01, Goal (008.4, 000.9), Episode 00002, setp 097, Reward 35.6 , Distance 008.9, Reach Goal
Env 22, Goal (-00.2, 001.1), Episode 00001, setp 122, Reward 0.6  , Distance 008.4, Crashed
Env 03, Goal (-04.5, -05.8), Episode 00001, setp 123, Reward -14.7, Distance 008.2, Crashed
Env 23, Goal (002.0, 007.8), Episode 00001, setp 123, Reward 35.7 , Distance 008.9, Reach Goal
Env 01, Goal (003.4, 001.0), Episode 00003, setp 005, Reward -15.0, Distance 008.3, Crashed
Env 18, Goal (-03.2, 004.6), Episode 00002, setp 008, Reward -12.7, Distance 009.3, Crashed
Env 08, Goal (008.0, -02.0), Episode 00002, setp 105, Reward 34.9 , Distance 008.5, Reach Goal
Env 19, Goal (-07.4, 002.9), Episode 00002, setp 078, Reward 3.0  , Distance 008.2, Crashed
Env 01, Goal (003.1, 004.7), Episode 00004, setp 002, Reward -15.0, Distance 009.7, Crashed
Env 00, Goal (000.7, -07.8), Episode 00002, setp 123, Reward 36.5 , Distance 009.2, Reach Goal
Env 21, Goal (000.6, 008.9), Episode 00002, setp 121, Reward 35.3 , Distance 008.6, Reach Goal
Env 04, Goal (008.1, -03.2), Episode 00001, setp 126, Reward 36.6 , Distance 009.2, Reach Goal
Env 08, Goal (006.6, -01.9), Episode 00003, setp 004, Reward -14.5, Distance 009.3, Crashed
update
......

But after that, when I run the same code again, there are nothing output after ############Loading Model###########.
I am really very confused. I don't know why and how to modify it to continue to display the previous output such as Env 10, Goal (002.8, -08.4), Episode 00001, setp 003, Reward -33.6, Distance 008.3, Crashed. I would appreciate it if you could answer my question.
Thank you very much!

from rl-collision-avoidance.

BingHan0458 commented on August 10, 2024

And the above question is about the model trained, when I change the code and let the program go to line 192 in ppo_stage1.py, but there are nothing output after ############Start Training###########. Is this the right way to train the model? What should the console output?

from rl-collision-avoidance.

Acmece commented on August 10, 2024

I have tested it and cannot reproduce the problem. Could you pls provide more info? Or you can revert the repo to the original one.

from rl-collision-avoidance.

BingHan0458 commented on August 10, 2024

Hello! I try to change the command mpiexec -np 44 python ppo_stage1.py to mpiexec -np 22 python ppo_stage1.py and the output is as follows:

####################################
############Loading Model###########
####################################
/home/.local/lib/python2.7/site-packages/torch/nn/functional.py:1351: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/home/.local/lib/python2.7/site-packages/torch/nn/functional.py:1340: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
Env 15, Goal (-06.4, 002.8), Episode 00001, setp 017, Reward -9.2 , Distance 009.0, Crashed
Env 10, Goal (-06.1, 005.1), Episode 00001, setp 018, Reward 36.7 , Distance 009.4, Reach Goal
Env 17, Goal (-03.7, 004.6), Episode 00001, setp 020, Reward -17.5, Distance 008.5, Crashed
Env 08, Goal (-05.7, -04.4), Episode 00001, setp 021, Reward -13.5, Distance 009.7, Crashed
Env 09, Goal (003.3, -00.2), Episode 00001, setp 024, Reward -9.4 , Distance 009.3, Crashed
Env 12, Goal (-07.4, 002.7), Episode 00001, setp 026, Reward 37.2 , Distance 009.4, Reach Goal
Env 03, Goal (002.5, 007.1), Episode 00001, setp 029, Reward 36.6 , Distance 009.4, Reach Goal
Env 05, Goal (003.7, 003.9), Episode 00001, setp 039, Reward 35.2 , Distance 008.7, Reach Goal
Env 02, Goal (003.2, 007.6), Episode 00001, setp 042, Reward 33.7 , Distance 008.0, Reach Goal
Env 19, Goal (-01.7, -02.8), Episode 00001, setp 045, Reward 34.0 , Distance 008.3, Reach Goal
Env 14, Goal (000.0, -03.6), Episode 00001, setp 060, Reward 34.6 , Distance 008.5, Reach Goal
Env 00, Goal (008.9, -01.2), Episode 00001, setp 064, Reward 34.9 , Distance 008.5, Reach Goal
......

but if I run by mpiexec -np 44 python ppo_stage1.py everytime with the unchanged number 44 there is nothing output after ############Loading Model###########. So I guess that is due to the number in the command, but I don't know why and how to set the number? And can I change any number?

Thank you !

from rl-collision-avoidance.

Balajinatesan commented on August 10, 2024

Env 03, Goal (-07.0, 009.5), Episode 00000, setp 097, Reward 12.6 , Reach Goal,
Env 04, Goal (-12.5, 004.0), Episode 00000, setp 052, Reward -33.4, Crashed,
Env 00, Goal (-18.0, 011.5), Episode 00000, setp 110, Reward 13.0 , Reach Goal,
Env 01, Goal (-18.0, 009.5), Episode 00000, setp 095, Reward 12.9 , Reach Goal,
Env 05, Goal (-12.5, 017.0), Episode 00000, setp 081, Reward -28.1, Crashed,
Env 02, Goal (-07.0, 011.5), Episode 00000, setp 044, Reward -30.0, Crashed,
Traceback (most recent call last):
File "ppo_stage2.py", line 212, in
run(comm=comm, env=env, policy=policy, policy_path=policy_path, action_bound=action_bound, optimizer=opt)
File "ppo_stage2.py", line 120, in run
obs_size=OBS_SIZE, act_size=ACT_SIZE)
File "/home/balaji/rover_ws/src/rl-collision-avoidance/model/ppo.py", line 204, in ppo_update_stage2
obss = obss.reshape((num_step*num_env, frames, obs_size))
ValueError: cannot reshape array of size 1966080 into shape (5632,3,512)
Hi, I got this error when I trained the second stage mpiexec -np 44 python ppo_stage2.py Do I need to change anything to train the model.
and also I finished the training for the first model can you guys give me an idea of how to implement the code on the real robot. If any of you guys have any idea please share with me here or at Gmail([email protected]) you can mail me. This might be a great help for us to get succeed.
@Acmece @BingHan0458

from rl-collision-avoidance.

some question about code about rl-collision-avoidance HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent