millionintegrals / vel Goto Github PK
View Code? Open in Web Editor NEWVelocity in deep-learning research
License: MIT License
Velocity in deep-learning research
License: MIT License
Before metrics are calculated, callbacks should be able to modify the data_dictionary for more efficient computations.
Just as config may be lost at some point for future reference it would be good to store it as well.
Do you have a preferred way to cite your work?
Measure time it takes to train an epoch.
Measure total time of training.
Functionally they are almost the same and share basically all the code. They should be a single model class.
Another piece worth writing some tests for is the launcher/dependency injection part.
Have an example code that trains a classifier up to the same accuracy on "Planet Earth" dataset as Fast AI course lesson 2: http://course.fast.ai/lessons/lesson2.html
I installed vel through pip, and then downloaded one of the example YAML config files so I could modify it. However, I can't seem to get the launcher to run a config file from the current directory, and I'm not sure why:
maxime@Desktop:~/Desktop/gym-miniworld$ ls -al
...
-rw-rw-r-- 1 maxime maxime 1636 Nov 21 15:23 breakout_ppo.yaml
vel breakout_ppo.yaml train
Traceback (most recent call last):
File "/home/maxime/.local/bin/vel", line 11, in <module>
sys.exit(main())
File "/home/maxime/.local/lib/python3.6/site-packages/vel/launcher.py", line 30, in main
params={k: v for (k, v) in (Parser.parse_equality(eq) for eq in args.param)}
File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 39, in from_file
project_config_path = ModelConfig.find_project_directory(os.path.dirname(os.path.abspath(filename)))
File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 31, in find_project_directory
return ModelConfig.find_project_directory(up_path)
File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 31, in find_project_directory
return ModelConfig.find_project_directory(up_path)
File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 31, in find_project_directory
return ModelConfig.find_project_directory(up_path)
File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 29, in find_project_directory
raise RuntimeError(f"Couldn't find project file starting from {start_path}")
RuntimeError: Couldn't find project file starting from /
Also tried the following, which also fail in the same way:
vel ./breakout_ppo.yaml train
python3 -m vel.launcher breakout_ppo.yaml train
python3 -m vel.launcher ./breakout_ppo.yaml train
It should be possible to add flag to the launcher to reset the checkpointed state.
I'm trying to load a trained model to investigate its behavior (I'm interested in for example training a walking agent on one set of obstacles and then investigating that policy's performance on a different set).
I've run one of the example configs with:
python3 -m vel.launcher examples-configs/rl/mujoco/ppo/walker_ppo.yaml train
That all seems to work fine, when I investigate the output, I see that the field labeled "PMM:episode_rewards" gets up to 1500-2000 or so:
So far so good, now I'm trying to load this trained model into pytorch and run it back in the same environment, just to make sure I can. I went through your example scripts, and also looked into your 'infra-baselines' repo for hints. I've also dug through the meat of the codebase while debugging etc.
I was able to hack together this script, it seems though that the agent performs poorly, I get an average reward of about 4, and the walker is clearly not walking.
import torch
import pprint
import vel
from vel.rl.models.policy_gradient_model_separate import PolicyGradientModelSeparateFactory
from vel.rl.models.backbone.mlp import MLPFactory
from vel.util.random import set_seed
from vel.rl.env.mujoco import MujocoEnv
state_dict = torch.load('/Users/sgillen/work_dir/output/checkpoints/walker_ppo/0/checkpoint_00000489.data', map_location = 'cpu')
hidden_dict = torch.load('/Users/sgillen/work_dir/output/checkpoints/walker_ppo/0/checkpoint_hidden_00000489.data', map_location = 'cpu')
seed = 1002
set_seed(seed) # Set random seed in python std lib, numpy and pytorch
env = MujocoEnv('Walker2d-v2').instantiate(seed=seed)
policy_in_size = state_dict['policy_backbone.model.0.weight'].shape[1]
value_in_size = state_dict['value_backbone.model.0.weight'].shape[1]
model_factory = PolicyGradientModelSeparateFactory(
policy_backbone=MLPFactory(input_length=policy_in_size, hidden_layers=[64, 64], activation='tanh'),
value_backbone=MLPFactory(input_length=value_in_size, hidden_layers=[64, 64], activation='tanh'),
)
#sgillen - pretty sure this infers the output size from the action space
model = model_factory.instantiate(action_space=env.action_space)
model.load_state_dict(state_dict)
env.allow_early_resets = True
ob = env.reset()
rewards = []
while True:
#action = model.step(torch.Tensor(ob)).detach().numpy()
action = model.step(torch.Tensor(ob).view(1,-1))['actions'].detach().numpy()
ob, reward , done, _ = env.step(action)
rewards.append(reward)
env.render()
if done:
print(max(rewards))
ob = env.reset()
It would be very helpful if you had any advice for why this might be happening, I have a feeling I'm misunderstanding something about your codebase, or possibly pytorch itself (I'm relatively new to it). It would also be great if you could tell me if there is a "right way" to do this with your code base, and if there is not if there is any interest in me (trying to) build one up.
Thanks very much!
My next step is to have clean working and benchmarked policy gradient reinforcement learning algorithms.
There is no information how to actually run a model. Some basic information should be added.
It seems that each experiment creates two visdom environments, and name of one of them is a typo.
RNN and in particular LSTM are a useful class of policies and framework should support them.
Major part of Vel 0.3 release will be simplifying/unifying interface for environment rollers so that it has less special cases and works well together.
In specifics that will inclue:
Can you implement Guided Policy Search Algorithm as described here (https://papers.nips.cc/paper/5444-learning-neural-network-policies-with-guided-policy-search-under-unknown-dynamics.pdf). There isn't any good implementation available online for this algorithm and it is super-useful for Robotics and meta learning problems.
Thanks
Write some example configuration files that show a "super-convergence" phenomenon.
Thanks for your great work. I found a small bug in the line of 104 of stochastic_policy_rnn_model.py , the key name of returned value logprobs should be 'action:logprobs', instead of 'logprobs'.
Some architectures like GANs use two or more optimizers and it would be nice to add this behavior to vel.
vel right now relies on the function calculate_gradient
to compute the loss backward. What do you think if instead the calculate_gradient
will return a list of loss tensors (and metrics) and the Trainer will also have a list of optimizers and the backward() will be called by the Trainer in the train_batch
function.
I am thinking about something like
for loss, optimizer in zip(self.model.calculate_gradient(), self.model.optimizers):
optimizer.zero_grad()
loss.backward()
#clip gradients
optimizer.step()
The calculate_gradient
can be also retro compatible by checking the return type (dict or list).
What do you think? if you are ok with it i can work on it ๐
Do you have any plans to port your library and algorithms to stable pytorch 1.0?
There is no really good reason for them to be separate.
Hello! I'm trying to install vel to use with a custom OpenAI Gym environment that I'm creating.
I ran the command found in the README: pip3 install vel[gym,mongo,visdom]
. Unfortunately, the install is repeatedly failing because of missing dependencies, and I'm having to hunt and peck to find which ones. In particular, the pillow and mujoco libraries are pulling in a lot of extra dependencies: zlib, cffi, Cython, jpeg, etc.
Currently, I feel like there are not many good options for RL in PyTorch. I would like to see something better take hold. I think that one way you could help vel gain traction is by making it easier to install. Minimize unnecessary dependencies, and you will help ensure that it doesn't fail to install when new users try it. Most people who try open source software, if it doesn't work out of the box, will just try something else and never look back. In my case, I have absolutely no need for mujoco environments. I don't think these should be installed by default. All I really want is to have access to a quality ppo implementation.
I came across some interesting results when running ACER on the Enduro environment when digging into hyperparameter optimization for this model. Here are the results of 50 runs for one particular set of hyperparameters:
I was a little bit surprised to see negative rewards here. Do you know how the Enduro environment can return negative rewards?
There was another curious case for the best configuration found through random search. I found that many of the evaluations only managed to run for a handful of frames. When running 50 runs of the environment in this case, only 12 runs made it past 1000 frames. Unfortunately all the rewards remained at 0.0 here.
Would you have any recommendations for me to dig into why this particular configuration seems to fail often, and only returns 0.0 rewards for the remaining runs? If you are curious, here is the configuration I was using: random search yaml. And here are the openai logging files.
Preferably percentiles should be plotted on the same graph.
Some environments can get "stuck" not advancing beyond certain stage due to bugs. We should clip the episode length to a certain number - e.g. 10'000 frames.
I have see that you use init.orthogonal_
exclusively everywhere. I don't think it is mentioned in any paper that I'm aware of (definitely not in Nature DQN paper). I want to understand more about the reasoning behind this decision.
Thank you.
Currently most checkpointing options are set on a project level. There needs to be an experiment-level setting to tune these options (lr_streaming, checkpoint frequency, whether to store the best checkpoint etc.)
Currently experiments are connecting to a local Visdom server. Conection string needs to be parametrized to allow a remote visdom.
Current solution is quite inflexible.
How to implement/tie to current setup
That would decouple slightly models from the input format thus possibly increasing code reuse potential.
Hey,
I have been trying out Vel for a day or two now, using it with a few gym classic control environments.
One issue I have run into is when I use the vel configs/cartpole_dqn.yaml record
command I get the following:
RuntimeError: expected type torch.cuda.DoubleTensor but got torch.cuda.FloatTensor
I have worked around the issue by editing the normalize_observations.py
file register_buffer
functions to expect dtype=torch.double
when I intend to use the Record command, and change it back to torch.float
when I want to train, but hopefully you can figure out what is causing it as I cannot (I am a bit of newbie with PyTorch).
Here is the full stack trace:
Evaluating environment...
Traceback (most recent call last):
File "/home/j/anaconda3/bin/vel", line 11, in <module>
load_entry_point('vel', 'console_scripts', 'vel')()
File "/home/j/dev/vel/vel/launcher.py", line 64, in main
model_config.run_command(args.command, args.varargs)
File "/home/j/dev/vel/vel/internals/model_config.py", line 119, in run_command
return command_descriptor.run(*varargs)
File "/home/j/dev/vel/vel/rl/commands/record_movie_command.py", line 43, in run
self.record_take(model, env, device, take_number=i + 1)
File "/home/j/anaconda3/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
return func(*args, **kwargs)
File "/home/j/dev/vel/vel/rl/commands/record_movie_command.py", line 68, in record_take
actions = model.step(observation_tensor, **self.sample_args)['actions']
File "/home/j/dev/vel/vel/rl/models/q_model.py", line 65, in step
q_values = self(observations)
File "/home/j/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/j/dev/vel/vel/rl/models/q_model.py", line 58, in forward
observations = self.input_block(observations)
File "/home/j/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/j/dev/vel/vel/modules/input/normalize_observations.py", line 48, in forward
return (input_vector- self.running_mean.unsqueeze(0)) / torch.sqrt(self.running_var.unsqueeze(0))
RuntimeError: expected type torch.cuda.DoubleTensor but got torch.cuda.FloatTensor
In classic_atari.py
, there is an option scale_float_frames
which is False
be default. I think this is to reduce the memory usage (a byte is smaller than a float). I wonder if that is the case there must be a scaling somewhere else, but I have yet to find it. Would you also elaborate more on the structure of relevant classes? Keywords are hard to get by intuition.
Thanks for putting together this library!
I have installed the library on a headless server, along with MongoDB and Visdom. Is there a way to view the results after running from the .yaml configs?
I am testing it out with the example
vel examples-configs/rl/atari/a2c/breakout_a2c.yaml train
Everything trains fine, but then when I look at the logfile at vel/output/openai/breakout_a2c/0/log.txt
, it only saves the following:
Logging to /home/ygx/src/vel/output/openai/breakout_a2c/0
And the progress.csv
at that directory level is empty.
When looking at the yaml config at https://github.com/yngtodd/vel/blob/master/examples-configs/rl/atari/a2c/breakout_a2c.yaml#L57, I see that it is saving a video. Where is that stored?
Thanks!
For reference:
https://timothycrosley.github.io/portray/
LrFinder currently only shows a plot. It would be nice if the output could be saved to MongoDB or Visdom alternatively.
I have seen this in classic_atari.py
:
if 'FIRE' in env.unwrapped.get_action_meanings():
# Take action on reset for environments that are fixed until firing.
if disable_episodic_life:
env = FireEpisodicLifeEnv(env)
else:
env = FireResetEnv(env)
I don't know what's it for. Could you elaborate on this?
Update: I now know that some environments need some actions to really start after a reset.
Also, zooming out a bit I see:
if not disable_episodic_life:
# Make end-of-life == end-of-episode, but only reset on true game over.
# Done by DeepMind for the DQN and co. since it helps value estimation.
env = EpisodicLifeEnv(env)
if 'FIRE' in env.unwrapped.get_action_meanings():
# Take action on reset for environments that are fixed until firing.
if disable_episodic_life:
env = FireEpisodicLifeEnv(env)
else:
env = FireResetEnv(env)
The two if statements seem conflicting, doesn't it? I think if EpisodicLifeEnv
is used, FireEpisodicLifeEnv
also must be used.
Because of a rework of metrics system I had to disable test-time augmentation code from the framework.
It should be put back somewhere, maybe in a different spot as the previous one proved to be very hard to implement in a generic way.
Implement an improved DQN algorithm with many improvements combined, described in https://arxiv.org/abs/1710.02298.
Hi,
I have been trying to train a DQN on breakout using the example config proposed, but by using several environments instead of just 1 to speed up training (which I understood is possible since version 0.3, according to comments in issue #28 ).
I modified the breakout_ddqn.yaml
configuration file by using the vel.rl.vecenv.shared_mem
vec_env and set the parameter parallel_envs
of reinforcer to 4, 8 and 12 (file attached).
breakout_ddqn_parallel.yaml.txt
Once launched, the code indeed seems to create several environments (according to the processes created in htop).
Moreover, the number of frames at each epoch grows linearly with the number of parallel envs, as expected.
However, the DQN does not train faster: for instance, with the 12 parallel envs, the DQN has an average episode reward of ~1-2 after
~9M frames.
This makes me think that the number of frames displayed does not correspond to the number of frames actually seen.
I might have been doing something wrong, in that case I would be very grateful for any advice!
Apart from that, thanks a lot for making this very nice repository public, I look forward to see its further developments.
Thanks in advance for your help!
Maxime
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.