Giter Club home page Giter Club logo

ars's People

Contributors

aurelia-guy avatar hmania avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ars's Issues

AR

Hello, thank you guys for this great code, I've been using it to my research. I wanted to know if there is some code for the Basic Random Search only.

Divide by zero

Hi,
First and foremost, thanks for sharing the code. This is greatly appreciated.

Currently testing ARS in other learning environments and found that for very difficult environments the users of the code might face a divide by zero error, particularly at early stages of the learning process (ie, zero reward in all the initial rollouts).

# normalize rewards by their standard deviation
rollout_rewards /= np.std(rollout_rewards)

Thanks,

[Question] What is the purpose of doing rollouts twice?

(If there is a better forum to ask clarification questions, please let me know!)

In aggregate_rollouts in code/ars.py, I see that the code does

        rollout_ids_one = [worker.do_rollouts.remote(policy_id,
                                                 num_rollouts = num_rollouts,
                                                 shift = self.shift,
                                                 evaluate=evaluate) for worker in self.workers]

        rollout_ids_two = [worker.do_rollouts.remote(policy_id,
                                                 num_rollouts = 1,
                                                 shift = self.shift,
                                                 evaluate=evaluate) for worker in self.workers[:(num_deltas % self.num_workers)]]

What is the purpose of doing the rollouts twice, with one doing num_rollouts rollouts per worker and the other doing 1 rollout per worker for num_deltas % self.num_workers workers?

Discrete Action

can you explain to me or maybe just give me some pointer about this. what should i do if i want to make the agent output some discrete action? thank you

Variance is not computed like in blog post by John D. Cook

In the aforementioned blog post, one step in computing the variance looks like this:

image

However, in this project, the corresponding step looks like this:

image

Since this
image

is the mathematical equation corresponding to this computation, it's looking like the blog post implementation is in accordance to the mathematical formulation while this project's implementation is not. Or am I missing something?

Thanks!

can not run the expert policy

Hi,
I run the expert policy python run_policy.py ../trained_policies/Humanoid-v1/policy_reward_11600/lin_policy_plus.npz Humanoid-v1 --render --num_rollouts 20 and stuck at the first few lines.

I got the error like this:

loading and building expert policy
Traceback (most recent call last):
  File "run_policy.py", line 62, in <module>
    main()
  File "run_policy.py", line 23, in main
    lin_policy = lin_policy.items()[0][1]
TypeError: 'ItemsView' object does not support indexing

Could anyone tell me how to fix this?

Training on BipedalWalkerHardcore seems to result in a negative reward

Hi and thanks for sharing the code.
I've tried to run the training process on a different environment such as the BipedalWalkerHardcore-v2 but it seems that is not able to learn anything. I even tried with different shift values as noted in the code comments but still in the end I get a negative reward. Should we train for longer or there any hyperparams that we are missing?

License - trained policies

Hi,

We want to use trained policies data in Nevergrad for the purpose of benchmarking Mujoco envs.
We thus add ARS license in the concerned folder and link your repository in our code, see PR: facebookresearch/nevergrad#790
To avoid license issue, could you let us know if it is fine for you ?

Thank you !

scitch to Mujoco pro 1.5

hello
i have mujoco 150 and when i run ARS.py file i got this error

Please put your binaries into ~/.mujoco/mjpro131 or set MUJOCO_PY_MJPRO_PATH. Follow the instructions on https://github.com/openai/mujoco-py for setup.') mujoco_py.error.MujocoDependencyError: Found your MuJoCo license key but not binaries. Please put your binaries into ~/.mujoco/mjpro131 or set MUJOCO_PY_MJPRO_PATH. Follow the instructions on https://github.com/openai/mujoco-py for setup

cannot reproduce result in ARS paper

Dear authors,
just for halfcheetah-v1, l use multiple seeds to try to get the exp result in table1&table2 in paper:
python code/ars.py
but it seems negative. could you give some guide?
thanks so much!

About SHIFT

I have no idea about why we need to subtract a shift from reward, and how to set this value?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.