hardmaru / estool Goto Github PK
View Code? Open in Web Editor NEWEvolution Strategies Tool
License: Other
Evolution Strategies Tool
License: Other
python train.py robo_ant -n 8 -t 4
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 9 slots
that were requested by the application:
/Users/wsgdrfz/anaconda2/bin/python
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
Traceback (most recent call last):
File "train.py", line 440, in <module>
if "parent" == mpi_fork(args.num_worker+1): os.exit()
File "train.py", line 414, in mpi_fork
subprocess.check_call(["mpirun", "-np", str(n), sys.executable] +['-u']+ sys.argv, env=env)
File "/Users/wsgdrfz/anaconda2/lib/python2.7/subprocess.py", line 186, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpirun', '-np', '9', '/Users/wsgdrfz/anaconda2/bin/python', '-u', 'train.py', 'bullet_racecar', '-n', '8', '-t', '4']' returned non-zero exit status 1
wsgdrfz@wsgdrfz-MBP /Users/wsgdrfz/Downloads/estool-master
⚡ python train.py robo_ant -n 8 -t 4
pybullet build time: Feb 9 2018 16:40:49
current_dir=/Users/wsgdrfz/anaconda2/lib/python2.7/site-packages/pybullet_envs/bullet
current_dir=/Users/wsgdrfz/anaconda2/lib/python2.7/site-packages/pybullet_envs/bullet
['mpirun', '-np', '9', '/Users/wsgdrfz/anaconda2/bin/python', 'train.py', 'robo_ant', '-n', '8', '-t', '4']
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 9 slots
that were requested by the application:
/Users/wsgdrfz/anaconda2/bin/python
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
Traceback (most recent call last):
File "train.py", line 440, in <module>
if "parent" == mpi_fork(args.num_worker+1): os.exit()
File "train.py", line 414, in mpi_fork
subprocess.check_call(["mpirun", "-np", str(n), sys.executable] +['-u']+ sys.argv, env=env)
File "/Users/wsgdrfz/anaconda2/lib/python2.7/subprocess.py", line 186, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpirun', '-np', '9', '/Users/wsgdrfz/anaconda2/bin/python', '-u', 'train.py', 'robo_ant', '-n', '8', '-t', '4']' returned non-zero exit status 1
if "parent" == mpi_fork(args.num_worker+1): os.exit()
pybullet build time: Apr 16 2018 07:12:17
current_dir=/usr/local/lib/python3.6/site-packages/pybullet_envs/bullet
current_dir=/usr/local/lib/python3.6/site-packages/pybullet_envs/bullet
size of model 10176
(16_w,32)-aCMA-ES (mu_w=9.2,w_1=19%) in dimension 10176 (seed=911961, Mon Apr 16 08:34:09 2018)
('process', 0, 'out of total ', 1, 'started')
('training', 'robo_ant')
('population', 32)
('num_worker', 8)
('num_worker_trial', 4)
Traceback (most recent call last):
File "train.py", line 441, in <module>
main(args)
File "train.py", line 395, in main
master()
File "train.py", line 315, in master
reward_list_total = receive_packets_from_slaves()
File "train.py", line 243, in receive_packets_from_slaves
comm.Recv(result_packet, source=i)
File "mpi4py/MPI/Comm.pyx", line 285, in mpi4py.MPI.Comm.Recv
mpi4py.MPI.Exception: MPI_ERR_RANK: invalid rank
My env:
python3: 3.6.5
mpi4py: 3.0.0
Hi Hardmaru,
we would be interested to run ES-Tool (Thanks for creating it!!) on a cluster of machines on AWS. Would you be interested in a pull request, if we implement this, and would it be OK for you if we use Star Cluster (http://star.mit.edu/cluster/index.html) as a cluster management tool?
Thanks & kind regards
Ernst
Hello!
I'm running python 3.6 over miniconda on Mac 10.14
versions:
gym: 0.10.4
cma: 2.6.0
pybullet: 2.2.2
I'm able to run the BipedalWalker-v2 env on its own with a mini test script:
import gym
env = gym.make('BipedalWalker-v2')
env.reset()
for t in range(1000):
env.render()
observation, reward, done, info = env.step(env.action_space.sample()) # take a random action
if done:
break
So given the above, I know gym and the env are working fine.
Also I'm able to train with the estool just fine using the bullet racecar env. I'm also able to run the trained model from bullet racecar.
What does not work is training with the BipedalWalker-v2 env.
Here is the command I run: python train.py biped -n 1 -t 2
(same results for when n=8 and t=4)
And here are the results:
(py36) Keiths-MacBook-Pro:estool keithgould$ p train.py biped -n 1 -t 2
pybullet build time: Sep 26 2018 10:51:37
current_dir=/Users/keithgould/miniconda3/envs/py36/lib/python3.6/site-packages/pybullet_envs/bullet
['mpirun', '-np', '2', '/Users/keithgould/miniconda3/envs/py36/bin/python3', 'train.py', 'biped', '-n', '1', '-t', '2']
pybullet build time: Sep 26 2018 10:51:37
pybullet build time: Sep 26 2018 10:51:37
current_dir=/Users/keithgould/miniconda3/envs/py36/lib/python3.6/site-packages/pybullet_envs/bullet
current_dir=/Users/keithgould/miniconda3/envs/py36/lib/python3.6/site-packages/pybullet_envs/bullet
assigning the rank and nworkers 2 0
assigning the rank and nworkers 2 1
making model from game: Game(env_name='BipedalWalker-v2', time_factor=0, input_size=24, output_size=4, layers=[40, 40], activation='tanh', noise_bias=0.0, output_noise=[False, False, False])
making model from game: Game(env_name='BipedalWalker-v2', time_factor=0, input_size=24, output_size=4, layers=[40, 40], activation='tanh', noise_bias=0.0, output_noise=[False, False, False])
size of model 2804
size of model 2804
(1,2mirr1)-aCMA-ES (mu_w=1.0,w_1=100%) in dimension 2804 (seed=364359, Tue Nov 27 12:07:55 2018)
('process', 0, 'out of total ', 2, 'started')
('training', 'biped')
('population', 2)
('num_worker', 1)
('num_worker_trial', 2)
(1,2mirr1)-aCMA-ES (mu_w=1.0,w_1=100%) in dimension 2804 (seed=385130, Tue Nov 27 12:07:55 2018)
('process', 1, 'out of total ', 2, 'started')
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
sweet, loaded the env.
<BipedalWalker instance>
cool right??
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.
sweet, loaded the env.
<BipedalWalker instance>
cool right??
Traceback (most recent call last):
File "train.py", line 448, in <module>
main(args)
File "train.py", line 404, in main
slave()
File "train.py", line 229, in slave
fitness, timesteps = worker(weights, seed, train_mode, max_len)
File "train.py", line 205, in worker
train_mode=train_mode, render_mode=False, num_episode=num_episode, seed=seed, max_len=max_len)
File "/Users/keithgould/Robotics/estool/model.py", line 258, in simulate
obs, reward, done, info = model.env.step(action)
File "/Users/keithgould/miniconda3/envs/py36/lib/python3.6/site-packages/gym/core.py", line 63, in step
raise NotImplementedError
NotImplementedError
There are some print statements in there where I'm trying to figure out why the step
function is not defined (causing the NotImplementedError).
As far as I can tell the environment is loaded, and I know it has a step function...
Any thoughts/help appreciated.
Keith
hey Hardmaru,
Thanks for creating your ES blogs, they've been really interesting.
I had a couple of quick and probably silly questions about your PEPG implementation.
I've read the original paper and attempted to implement the PEPG algorithm with symmetric sampling. I noticed a couple of differences to your implementation, and I was wondering if you could enlighten me.
In es.py where you're updating the mean, I notice that the calculated gradient does not get normalized by the batch size.
rT = (reward[:self.batch_size] - reward[self.batch_size:])
change_mu = np.dot(rT, epsilon)
self.optimizer.stepsize = self.learning_rate
update_ratio = self.optimizer.update(-change_mu) # adam, rmsprop, momentum, etc.
I guess that if you tune the learning rate according to batch size this is not an issue, but I was just wondering why you took this approach?
Also, where you're making the symmetric samples:
self.epsilon = np.random.randn(self.batch_size, self.num_params) * self.sigma.reshape(1, self.num_params)
You're sampling from a uniform distribution. Is there a reason that you take this approach rather than sampling from a normal distribution. Also, and I may be completely wrong here, as the uniform distribution is taken from [0,1), doesn't this mean that your parameters will always larger than the mean (for the + symmetric case) and always smaller than the mean (for the - symmetric case). You won't end up with the case where there's a mix of some parameters above the mean and some below.
I'm a completely self taught beginner to this stuff, so apologies for the naive questions.
cheers
Thanks for your contributions, it helps a lot. But I wonder why after I ran the 'cmaes' part of 'simple_es_example.py ' several times, I got different results?
I use the following order:
python train.py bullet_kuka_grasping -n 8 -t 4
I have trained kuka for a whole night, but the reward still has no change:"-1869.074175"
('improvement', 8150, -149.94043125000007, 'curr', -1869.074175, 'prev', -1719.13374375, 'best', -1719.13374375)
Then i use the following order to do the evaluation:
python model.py bullet_kuka_grasping log/bullet_kuka_grasping.cma.1.32.json
The kuka can still not grasp the object.
Could you please give some advices?
I tried the pre-trained Kuka model in ./Zoo, but it can not grasp an object. Does anyone know the reason?
Hi,
I faced with the following error and similar to [https://github.com//issues/8]
File "/usr/lib/python3.6/subprocess.py", line 291, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['mpirun', '-np', '2', '/home/baheri/.virtualenvs/worldmodels/bin/python', '-u', '05_train_controller.py', 'car_racing', '--num_worker', '1', '--num_worker_trial', '2', '--num_episode', '4', '--max_length', '1000', '--eval_steps', '25']' returned non-zero exit status 1.
The suggestion at that post did not help me. Any other suggestion to solve this issue?
Hi, thanks for this. I installed everything correctly and I can only get the opengl window to visualise the robot by pressing "w". There are no coloured simulations.
I'm on OS high sierra, pyenv with 3.5.2, gym 0.9.1, latest bullet3 and pybullet. Has anyone seen this before (could be something simple)
Hi, first of all, thanks for making this great tool!
I've tried a couple of pre-trained models, e.g., bullet_ant, bullet_kuka_grasping, and they all work fine. But I couldn't run the zoo biped
models (same for bipedhard
, bipedhard_stoc
) with this command:
python model.py biped zoo/biped.cma.1.96.best.json
.
It failed with the NotImplementedError
, full trace below:
Traceback (most recent call last): File "model.py", line 340, in <module> main() File "model.py", line 335, in main train_mode=False, render_mode=render_mode, num_episode=1) File "model.py", line 240, in simulate model.env.render("human") File "/Users/duongnt/tensorflow/lib/python3.6/site-packages/gym/core.py", line 110, in render raise NotImplementedError NotImplementedError
I already installed required dependencies gym, box2d, pybullet and mpi4py. Am I missing something?
Hi, I was confused by the this line in es.py
https://github.com/hardmaru/estool/blob/master/es.py#L115
reward_table = -np.array(reward_table_result)
The reward_table will be passed to tell() method as function_values. But why it assign a negative sign to raw rewards collected from rollouts?
First of all, Thanks for all your contribution! :)
I looked at the original algorithm from the paper "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" for OpenES implementation.
They update the policy parameters theta after every iteration or rollouts.
But the implementation in es.py file under OpenES class has this line commented in ask function.
#self.mu += self.learning_rate * change_mu
.
https://github.com/hardmaru/estool/blob/master/es.py#L328C1-L328C47
Even the Adam optimizer which is initialized doesn't change the self.mu array.
Just wanted to know if this a mistake or am I missing something here.
I needed to pip install matplotlib
for the simple_es_example.ipynb demo to work; should it be in requirements.txt?
Hi !
The function rms_stdev
appears many times in the es.py.
I guess the meaning of this function is to calculate the root mean square (rms) of std (sigma).
Therefore, this function should first calculate the mean and then the square, i.e. np.sqrt(np.mean(sigma*sigma))
.
But the implementation is just the opposite in es.py.
In NES algorithms do we use backpropogation from the last layers gradients(computed by the objectve function). I am curious as to how to optimize the hidden layers since they are not directly affecting the objective function
Looking through your implementation of PEPG, is it accurate to call it a NES?
I tried both bipedhard and bipedhard_stoc. They jut don't converge. After like 150 generations they are still not showing any progress.
I am training with the following command:
python train.py bipedhard -n 8 -t 4
The best individuals are around -100 cumulative reward. Something must be wrong
rT = (reward[:self.batch_size] - reward[self.batch_size:])
change_mu = np.dot(rT, epsilon)
self.optimizer.stepsize = self.learning_rate
update_ratio = self.optimizer.update(-change_mu) # adam, rmsprop, momentum, etc.
#self.mu += (change_mu * self.learning_rate) # normal SGD method
so change_mu will be half shorter than need for pop_size
Hello, I don't see anything in the codebase the enforces such a restriction, but should there be bounds on the values parameters can have?
A related question is should each of the parameters have the same bounds (e.g. is normalization a requirement for strategies like OpenES to work properly?
Hi, I want to to step through the code and look at it in detail.
However MPI will give me Segmentation fault error:
[gantosaxe:07615] *** Process received signal ***
[gantosaxe:07615] Signal: Segmentation fault (11)
[gantosaxe:07615] Signal code: (128)
[gantosaxe:07615] Failing at address: (nil)
Is there a way to run the code without MPI or a reasonable way to debug it ?
Thanks !
self.solutions = self.mu.reshape(1, self.num_params) + self.epsilon * self.sigma
def _compute_step(self, globalg):
a = self.stepsize * np.sqrt(1 - self.beta2 ** self.t) / (1 - self.beta1 ** self.t)
self.m = self.beta1 * self.m + (1 - self.beta1) * globalg
self.v = self.beta2 * self.v + (1 - self.beta2) * (globalg * globalg)
step = -a * self.m / (np.sqrt(self.v) + self.epsilon)
return step
First off, terrific work on repo and blog post, very detailed and clear.
I was able to solve the BipedalWalkerHardcore-v2, average 300+ for 100eps, with rl with an a3c implentation I made but it took quite a while train, unlike for BipedalWalker-v2 which took 5-10minutes, it took nearly two full days of training. I think mainly due to how fast it wanted to run to the finish but eventually learned some impressive moves. Did you experience a long training time using es as well?
Look forward to experimenting with your estool as final performance and robustness are most important in my use cases. Great work and thanks for creating!
Hello Hardmaru,
thank you very much for you great blog and for publishing the code!!! I love it!
Just a question - you mention that there is an CMA-ES scaling with O(N). We would love to use CMA-ES on larger problems. Do you happen to know a paper on that?
Thanks and kind regards,
Ernst
It seems the current implementation does weight decay on the fitness value, I am wondering if it should be performed on model parameters ?
i.e.
if self.weight_decay > 0:
l2_decay = compute_weight_decay(self.weight_decay, self.solutions)
reward_table += l2_decay
the last line change to
self.solutions += l2_decay
And if it is the case that weight decay performing on model parameters, should it be after the results to be computed for current iteration ? (e.g. in tell() of OpenES, now is weight decay -> get current best param
, should it be opposite order, get current best param -> weight decay
)
Thanks for the code. I couldn't find a license file, am I free to adapt and reuse the code?
Is there currently a way to set the solution range that the population can search over? For example, for some problems I may wish to bound solutions for each parameter in the (0, 1] range, or (-1, 1).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.