modestyachts / ars Goto Github PK

View Code? Open in Web Editor NEW

419.0 419.0 101.0 2.08 MB

An implementation of the Augmented Random Search algorithm

License: Other

Python 100.00%

ars's People

Contributors

Stargazers

Watchers

Forkers

kastnerkyle jdc08161063 wuter vishalbelsare snazz2001 zhouyonglong jietan stefaj codeaudit erwincoumans shyamalschandra shubhampachori12110095 pedronahum miquelramirez vvanirudh list12356 spencerx fiberleif navneet-nmk wwxfromtju hzyjerry afcarl arnaudmkonan arunrajeie reiisky bilio rohitn leinsister greatfathan shiyongde sumguyneedz callmedxx mrtechnoo whikwon shivanshmundra kirk86 ituco 170928 ravel44 coaxlab shubhamrao6 ytolochko zhan0903 kismuz dunovank diyano stjordanis matthieu637 quoffie edwithschoolofai stenpiren hari-sikchi liyininglynn junjzhang trinhvo amitbiswas26 tdczlhb soroushmehr vincentyu68 roihn navigator8972 zhaoyang626 fukaf sriyash421 mbchang salahroom marsxyr ranjitcit hwpeng wyz2368 david-lindner tinamilee bcolloran vmbbc rayckey maxgaz59 ttyo rrhossain stevenyangyj vballoli linnanwang sgillen kirikirito whisht120 foamofthesea jayskz xrosliang cocobar williamd4112 srihari-humbarwadi vfsousas plucaci jucaleb4 notvenky pikachu-thunderbolt loongsunchan calixtang

ars's Issues

assert len(plasma_managers) >= 1 AssertionError

anyone know how to solve the problem?

AR

Hello, thank you guys for this great code, I've been using it to my research. I wanted to know if there is some code for the Basic Random Search only.

Divide by zero

Hi,
First and foremost, thanks for sharing the code. This is greatly appreciated.

Currently testing ARS in other learning environments and found that for very difficult environments the users of the code might face a divide by zero error, particularly at early stages of the learning process (ie, zero reward in all the initial rollouts).

# normalize rewards by their standard deviation
rollout_rewards /= np.std(rollout_rewards)

Thanks,

[Question] What is the purpose of doing rollouts twice?

(If there is a better forum to ask clarification questions, please let me know!)

In aggregate_rollouts in code/ars.py, I see that the code does

        rollout_ids_one = [worker.do_rollouts.remote(policy_id,
                                                 num_rollouts = num_rollouts,
                                                 shift = self.shift,
                                                 evaluate=evaluate) for worker in self.workers]

        rollout_ids_two = [worker.do_rollouts.remote(policy_id,
                                                 num_rollouts = 1,
                                                 shift = self.shift,
                                                 evaluate=evaluate) for worker in self.workers[:(num_deltas % self.num_workers)]]

What is the purpose of doing the rollouts twice, with one doing num_rollouts rollouts per worker and the other doing 1 rollout per worker for num_deltas % self.num_workers workers?

[Errno 8] nodename nor servname provided, or not known`

how to solve this error

File "ars.py", line 409, in <module> local_ip = socket.gethostbyname(socket.gethostname()) socket.gaierror: [Errno 8] nodename nor servname provided, or not known

Discrete Action

can you explain to me or maybe just give me some pointer about this. what should i do if i want to make the agent output some discrete action? thank you

Variance is not computed like in blog post by John D. Cook

In the aforementioned blog post, one step in computing the variance looks like this:

However, in this project, the corresponding step looks like this:

Since this

is the mathematical equation corresponding to this computation, it's looking like the blog post implementation is in accordance to the mathematical formulation while this project's implementation is not. Or am I missing something?

Thanks!

can not run the expert policy

Hi,
I run the expert policy python run_policy.py ../trained_policies/Humanoid-v1/policy_reward_11600/lin_policy_plus.npz Humanoid-v1 --render --num_rollouts 20 and stuck at the first few lines.

I got the error like this:

loading and building expert policy
Traceback (most recent call last):
  File "run_policy.py", line 62, in <module>
    main()
  File "run_policy.py", line 23, in main
    lin_policy = lin_policy.items()[0][1]
TypeError: 'ItemsView' object does not support indexing

Could anyone tell me how to fix this?

Training on BipedalWalkerHardcore seems to result in a negative reward

Hi and thanks for sharing the code.
I've tried to run the training process on a different environment such as the BipedalWalkerHardcore-v2 but it seems that is not able to learn anything. I even tried with different shift values as noted in the code comments but still in the end I get a negative reward. Should we train for longer or there any hyperparams that we are missing?

License - trained policies

Hi,

We want to use trained policies data in Nevergrad for the purpose of benchmarking Mujoco envs.
We thus add ARS license in the concerned folder and link your repository in our code, see PR: facebookresearch/nevergrad#790
To avoid license issue, could you let us know if it is fine for you ?

Thank you !

scitch to Mujoco pro 1.5

hello
i have mujoco 150 and when i run ARS.py file i got this error

Please put your binaries into ~/.mujoco/mjpro131 or set MUJOCO_PY_MJPRO_PATH. Follow the instructions on https://github.com/openai/mujoco-py for setup.') mujoco_py.error.MujocoDependencyError: Found your MuJoCo license key but not binaries. Please put your binaries into ~/.mujoco/mjpro131 or set MUJOCO_PY_MJPRO_PATH. Follow the instructions on https://github.com/openai/mujoco-py for setup

cannot reproduce result in ARS paper

Dear authors,
just for halfcheetah-v1, l use multiple seeds to try to get the exp result in table1&table2 in paper:
python code/ars.py
but it seems negative. could you give some guide?
thanks so much!

About SHIFT

I have no idea about why we need to subtract a shift from reward, and how to set this value?