Giter Club home page Giter Club logo

estool's People

Contributors

bhartl avatar dependabot[bot] avatar erwincoumans avatar hardmaru avatar keithmgould avatar maraoz avatar mmilk1231 avatar slremy avatar zuoxingdong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

estool's Issues

bug in train.py

  1. Run python train.py robo_ant -n 8 -t 4
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 9 slots
that were requested by the application:
  /Users/wsgdrfz/anaconda2/bin/python

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
Traceback (most recent call last):
  File "train.py", line 440, in <module>
    if "parent" == mpi_fork(args.num_worker+1): os.exit()
  File "train.py", line 414, in mpi_fork
    subprocess.check_call(["mpirun", "-np", str(n), sys.executable] +['-u']+ sys.argv, env=env)
  File "/Users/wsgdrfz/anaconda2/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpirun', '-np', '9', '/Users/wsgdrfz/anaconda2/bin/python', '-u', 'train.py', 'bullet_racecar', '-n', '8', '-t', '4']' returned non-zero exit status 1

wsgdrfz@wsgdrfz-MBP /Users/wsgdrfz/Downloads/estool-master                                                                                           
⚡ python train.py robo_ant -n 8 -t 4
pybullet build time: Feb  9 2018 16:40:49
current_dir=/Users/wsgdrfz/anaconda2/lib/python2.7/site-packages/pybullet_envs/bullet
current_dir=/Users/wsgdrfz/anaconda2/lib/python2.7/site-packages/pybullet_envs/bullet
['mpirun', '-np', '9', '/Users/wsgdrfz/anaconda2/bin/python', 'train.py', 'robo_ant', '-n', '8', '-t', '4']
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 9 slots
that were requested by the application:
  /Users/wsgdrfz/anaconda2/bin/python

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
Traceback (most recent call last):
  File "train.py", line 440, in <module>
    if "parent" == mpi_fork(args.num_worker+1): os.exit()
  File "train.py", line 414, in mpi_fork
    subprocess.check_call(["mpirun", "-np", str(n), sys.executable] +['-u']+ sys.argv, env=env)
  File "/Users/wsgdrfz/anaconda2/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpirun', '-np', '9', '/Users/wsgdrfz/anaconda2/bin/python', '-u', 'train.py', 'robo_ant', '-n', '8', '-t', '4']' returned non-zero exit status 1

  1. If I ignore if "parent" == mpi_fork(args.num_worker+1): os.exit()

pybullet build time: Apr 16 2018 07:12:17
current_dir=/usr/local/lib/python3.6/site-packages/pybullet_envs/bullet
current_dir=/usr/local/lib/python3.6/site-packages/pybullet_envs/bullet
size of model 10176
(16_w,32)-aCMA-ES (mu_w=9.2,w_1=19%) in dimension 10176 (seed=911961, Mon Apr 16 08:34:09 2018)
('process', 0, 'out of total ', 1, 'started')
('training', 'robo_ant')
('population', 32)
('num_worker', 8)
('num_worker_trial', 4)
Traceback (most recent call last):
  File "train.py", line 441, in <module>
    main(args)
  File "train.py", line 395, in main
    master()
  File "train.py", line 315, in master
    reward_list_total = receive_packets_from_slaves()
  File "train.py", line 243, in receive_packets_from_slaves
    comm.Recv(result_packet, source=i)
  File "mpi4py/MPI/Comm.pyx", line 285, in mpi4py.MPI.Comm.Recv
mpi4py.MPI.Exception: MPI_ERR_RANK: invalid rank

My env:
python3: 3.6.5
mpi4py: 3.0.0

Run on AWS

Hi Hardmaru,
we would be interested to run ES-Tool (Thanks for creating it!!) on a cluster of machines on AWS. Would you be interested in a pull request, if we implement this, and would it be OK for you if we use Star Cluster (http://star.mit.edu/cluster/index.html) as a cluster management tool?

Thanks & kind regards
Ernst

raise NotImplementedError

Hello!

I'm running python 3.6 over miniconda on Mac 10.14

versions:

gym: 0.10.4
cma: 2.6.0
pybullet: 2.2.2

I'm able to run the BipedalWalker-v2 env on its own with a mini test script:

import gym
env = gym.make('BipedalWalker-v2')
env.reset()
for t in range(1000):
    env.render()
    observation, reward, done, info = env.step(env.action_space.sample()) # take a random action
    if done:
      break

So given the above, I know gym and the env are working fine.

Also I'm able to train with the estool just fine using the bullet racecar env. I'm also able to run the trained model from bullet racecar.

What does not work is training with the BipedalWalker-v2 env.

Here is the command I run: python train.py biped -n 1 -t 2
(same results for when n=8 and t=4)

And here are the results:

(py36) Keiths-MacBook-Pro:estool keithgould$ p train.py biped -n 1 -t 2
pybullet build time: Sep 26 2018 10:51:37
current_dir=/Users/keithgould/miniconda3/envs/py36/lib/python3.6/site-packages/pybullet_envs/bullet
['mpirun', '-np', '2', '/Users/keithgould/miniconda3/envs/py36/bin/python3', 'train.py', 'biped', '-n', '1', '-t', '2']
pybullet build time: Sep 26 2018 10:51:37
pybullet build time: Sep 26 2018 10:51:37
current_dir=/Users/keithgould/miniconda3/envs/py36/lib/python3.6/site-packages/pybullet_envs/bullet
current_dir=/Users/keithgould/miniconda3/envs/py36/lib/python3.6/site-packages/pybullet_envs/bullet
assigning the rank and nworkers 2 0
assigning the rank and nworkers 2 1
making model from game: Game(env_name='BipedalWalker-v2', time_factor=0, input_size=24, output_size=4, layers=[40, 40], activation='tanh', noise_bias=0.0, output_noise=[False, False, False])
making model from game: Game(env_name='BipedalWalker-v2', time_factor=0, input_size=24, output_size=4, layers=[40, 40], activation='tanh', noise_bias=0.0, output_noise=[False, False, False])
size of model 2804
size of model 2804
(1,2mirr1)-aCMA-ES (mu_w=1.0,w_1=100%) in dimension 2804 (seed=364359, Tue Nov 27 12:07:55 2018)
('process', 0, 'out of total ', 2, 'started')
('training', 'biped')
('population', 2)
('num_worker', 1)
('num_worker_trial', 2)


(1,2mirr1)-aCMA-ES (mu_w=1.0,w_1=100%) in dimension 2804 (seed=385130, Tue Nov 27 12:07:55 2018)
('process', 1, 'out of total ', 2, 'started')

WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
sweet, loaded the env.
<BipedalWalker instance>
cool right??
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
sweet, loaded the env.
<BipedalWalker instance>
cool right??
Traceback (most recent call last):
  File "train.py", line 448, in <module>
    main(args)
  File "train.py", line 404, in main
    slave()
  File "train.py", line 229, in slave
    fitness, timesteps = worker(weights, seed, train_mode, max_len)
  File "train.py", line 205, in worker
    train_mode=train_mode, render_mode=False, num_episode=num_episode, seed=seed, max_len=max_len)
  File "/Users/keithgould/Robotics/estool/model.py", line 258, in simulate
    obs, reward, done, info = model.env.step(action)
  File "/Users/keithgould/miniconda3/envs/py36/lib/python3.6/site-packages/gym/core.py", line 63, in step
    raise NotImplementedError
NotImplementedError

There are some print statements in there where I'm trying to figure out why the step function is not defined (causing the NotImplementedError).

As far as I can tell the environment is loaded, and I know it has a step function...

Any thoughts/help appreciated.

Keith

PEPG questions

hey Hardmaru,
Thanks for creating your ES blogs, they've been really interesting.

I had a couple of quick and probably silly questions about your PEPG implementation.

I've read the original paper and attempted to implement the PEPG algorithm with symmetric sampling. I noticed a couple of differences to your implementation, and I was wondering if you could enlighten me.

In es.py where you're updating the mean, I notice that the calculated gradient does not get normalized by the batch size.

  rT = (reward[:self.batch_size] - reward[self.batch_size:])
  change_mu = np.dot(rT, epsilon)
  self.optimizer.stepsize = self.learning_rate
  update_ratio = self.optimizer.update(-change_mu) # adam, rmsprop, momentum, etc.

I guess that if you tune the learning rate according to batch size this is not an issue, but I was just wondering why you took this approach?

Also, where you're making the symmetric samples:

self.epsilon = np.random.randn(self.batch_size, self.num_params) * self.sigma.reshape(1, self.num_params)

You're sampling from a uniform distribution. Is there a reason that you take this approach rather than sampling from a normal distribution. Also, and I may be completely wrong here, as the uniform distribution is taken from [0,1), doesn't this mean that your parameters will always larger than the mean (for the + symmetric case) and always smaller than the mean (for the - symmetric case). You won't end up with the case where there's a mix of some parameters above the mean and some below.

I'm a completely self taught beginner to this stuff, so apologies for the naive questions.
cheers

Question for CMA-ES test part

Thanks for your contributions, it helps a lot. But I wonder why after I ran the 'cmaes' part of 'simple_es_example.py ' several times, I got different results?

How to train "KukaBulletEnv-v0" ?

I use the following order:
python train.py bullet_kuka_grasping -n 8 -t 4
I have trained kuka for a whole night, but the reward still has no change:"-1869.074175"
('improvement', 8150, -149.94043125000007, 'curr', -1869.074175, 'prev', -1719.13374375, 'best', -1719.13374375)

Then i use the following order to do the evaluation:
python model.py bullet_kuka_grasping log/bullet_kuka_grasping.cma.1.32.json
The kuka can still not grasp the object.

image

image

Could you please give some advices?

Bug in train.py

Hi,
I faced with the following error and similar to [https://github.com//issues/8]

File "/usr/lib/python3.6/subprocess.py", line 291, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['mpirun', '-np', '2', '/home/baheri/.virtualenvs/worldmodels/bin/python', '-u', '05_train_controller.py', 'car_racing', '--num_worker', '1', '--num_worker_trial', '2', '--num_episode', '4', '--max_length', '1000', '--eval_steps', '25']' returned non-zero exit status 1.

The suggestion at that post did not help me. Any other suggestion to solve this issue?

No colour

screen shot 2018-10-21 at 02 12 37

Hi, thanks for this. I installed everything correctly and I can only get the opengl window to visualise the robot by pressing "w". There are no coloured simulations.

I'm on OS high sierra, pyenv with 3.5.2, gym 0.9.1, latest bullet3 and pybullet. Has anyone seen this before (could be something simple)

Can't run pre-trained biped models

Hi, first of all, thanks for making this great tool!

I've tried a couple of pre-trained models, e.g., bullet_ant, bullet_kuka_grasping, and they all work fine. But I couldn't run the zoo biped models (same for bipedhard, bipedhard_stoc) with this command:
python model.py biped zoo/biped.cma.1.96.best.json.
It failed with the NotImplementedError, full trace below:

Traceback (most recent call last): File "model.py", line 340, in <module> main() File "model.py", line 335, in main train_mode=False, render_mode=render_mode, num_episode=1) File "model.py", line 240, in simulate model.env.render("human") File "/Users/duongnt/tensorflow/lib/python3.6/site-packages/gym/core.py", line 110, in render raise NotImplementedError NotImplementedError

I already installed required dependencies gym, box2d, pybullet and mpi4py. Am I missing something?

Bad score when training "KukaBulletEnv-v0"

I have been training bullet_kuka_grasping for a long time.
Using 'python train.py bullet_kuka_grasping -e 16 -n 16 -t 16' ,nearly training a week.
image
but the best score was likely to converge to -1776.
How can I solve this problem? Thanks in advance!

Class OpenES, function ask. self.mu never gets updated.

First of all, Thanks for all your contribution! :)
I looked at the original algorithm from the paper "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" for OpenES implementation.
They update the policy parameters theta after every iteration or rollouts.
But the implementation in es.py file under OpenES class has this line commented in ask function.
#self.mu += self.learning_rate * change_mu .
https://github.com/hardmaru/estool/blob/master/es.py#L328C1-L328C47

Even the Adam optimizer which is initialized doesn't change the self.mu array.
Just wanted to know if this a mistake or am I missing something here.

The meaning of ”rms_stdev“ function?

Hi !

The function rms_stdev appears many times in the es.py.

I guess the meaning of this function is to calculate the root mean square (rms) of std (sigma).

Therefore, this function should first calculate the mean and then the square, i.e. np.sqrt(np.mean(sigma*sigma)).

But the implementation is just the opposite in es.py.

Natural gradients for deep layers

In NES algorithms do we use backpropogation from the last layers gradients(computed by the objectve function). I am curious as to how to optimize the hidden layers since they are not directly affecting the objective function

PEPG and NES

Looking through your implementation of PEPG, is it accurate to call it a NES?

bipedhard doesn't converge

I tried both bipedhard and bipedhard_stoc. They jut don't converge. After like 150 generations they are still not showing any progress.
I am training with the following command:
python train.py bipedhard -n 8 -t 4
The best individuals are around -100 cumulative reward. Something must be wrong

Do the parameters need to be in [a,b]

Hello, I don't see anything in the codebase the enforces such a restriction, but should there be bounds on the values parameters can have?

A related question is should each of the parameters have the same bounds (e.g. is normalization a requirement for strategies like OpenES to work properly?

What is the best way to step through the code ?

Hi, I want to to step through the code and look at it in detail.

However MPI will give me Segmentation fault error:

[gantosaxe:07615] *** Process received signal ***
[gantosaxe:07615] Signal: Segmentation fault (11)
[gantosaxe:07615] Signal code:  (128)
[gantosaxe:07615] Failing at address: (nil)

Is there a way to run the code without MPI or a reasonable way to debug it ?

Thanks !

class OpenES, function ask: np.array mu come from update and compute step Adam optimizer with shape(popsize, numparams) and it was error when try to reshape mu.reshape(1, self.num_params)

self.solutions = self.mu.reshape(1, self.num_params) + self.epsilon * self.sigma

 def _compute_step(self, globalg):
    a = self.stepsize * np.sqrt(1 - self.beta2 ** self.t) / (1 - self.beta1 ** self.t)
    self.m = self.beta1 * self.m + (1 - self.beta1) * globalg
    self.v = self.beta2 * self.v + (1 - self.beta2) * (globalg * globalg)
    step = -a * self.m / (np.sqrt(self.v) + self.epsilon)
    return step

Training time for BipedalWalkerHardcore-v2

First off, terrific work on repo and blog post, very detailed and clear.

I was able to solve the BipedalWalkerHardcore-v2, average 300+ for 100eps, with rl with an a3c implentation I made but it took quite a while train, unlike for BipedalWalker-v2 which took 5-10minutes, it took nearly two full days of training. I think mainly due to how fast it wanted to run to the finish but eventually learned some impressive moves. Did you experience a long training time using es as well?

Look forward to experimenting with your estool as final performance and robustness are most important in my use cases. Great work and thanks for creating!

CMA-ES with O(N)

Hello Hardmaru,
thank you very much for you great blog and for publishing the code!!! I love it!
Just a question - you mention that there is an CMA-ES scaling with O(N). We would love to use CMA-ES on larger problems. Do you happen to know a paper on that?

Thanks and kind regards,
Ernst

[Weight decay] Should weight decay of model parameters instead ?

It seems the current implementation does weight decay on the fitness value, I am wondering if it should be performed on model parameters ?
i.e.

if self.weight_decay > 0:
      l2_decay = compute_weight_decay(self.weight_decay, self.solutions)
      reward_table += l2_decay

the last line change to

self.solutions += l2_decay

And if it is the case that weight decay performing on model parameters, should it be after the results to be computed for current iteration ? (e.g. in tell() of OpenES, now is weight decay -> get current best param, should it be opposite order, get current best param -> weight decay )

License

Thanks for the code. I couldn't find a license file, am I free to adapt and reuse the code?

Solution range?

Is there currently a way to set the solution range that the population can search over? For example, for some problems I may wish to bound solutions for each parameter in the (0, 1] range, or (-1, 1).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.