millionintegrals / vel Goto Github PK

View Code? Open in Web Editor NEW

276.0 276.0 33.0 1.58 MB

Velocity in deep-learning research

License: MIT License

Python 99.65% Dockerfile 0.24% Makefile 0.10%

convolutional-neural-networks deep-learning python pytorch reinforcement-learning

vel's People

Contributors

Stargazers

Watchers

vel's Issues

Callbacks should access data-dict before metrics

Before metrics are calculated, callbacks should be able to modify the data_dictionary for more efficient computations.

Store config in the DB together with the metrics.

Just as config may be lost at some point for future reference it would be good to store it as well.

How can I cite you?

Do you have a preferred way to cite your work?

Training time measurements

Measure time it takes to train an epoch.
Measure total time of training.

Unify LSTM, GRU and RNN models

Functionally they are almost the same and share basically all the code. They should be a single model class.

Test launcher/dependency injection code

Another piece worth writing some tests for is the launcher/dependency injection part.

Planet Earth transfer learning

Have an example code that trains a classifier up to the same accuracy on "Planet Earth" dataset as Fast AI course lesson 2: http://course.fast.ai/lessons/lesson2.html

Can't run YAML config file in local directory

I installed vel through pip, and then downloaded one of the example YAML config files so I could modify it. However, I can't seem to get the launcher to run a config file from the current directory, and I'm not sure why:

maxime@Desktop:~/Desktop/gym-miniworld$ ls -al
...
-rw-rw-r-- 1 maxime maxime 1636 Nov 21 15:23 breakout_ppo.yaml

 vel breakout_ppo.yaml train
Traceback (most recent call last):
  File "/home/maxime/.local/bin/vel", line 11, in <module>
    sys.exit(main())
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/launcher.py", line 30, in main
    params={k: v for (k, v) in (Parser.parse_equality(eq) for eq in args.param)}
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 39, in from_file
    project_config_path = ModelConfig.find_project_directory(os.path.dirname(os.path.abspath(filename)))
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 31, in find_project_directory
    return ModelConfig.find_project_directory(up_path)
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 31, in find_project_directory
    return ModelConfig.find_project_directory(up_path)
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 31, in find_project_directory
    return ModelConfig.find_project_directory(up_path)
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 29, in find_project_directory
    raise RuntimeError(f"Couldn't find project file starting from {start_path}")
RuntimeError: Couldn't find project file starting from /

Also tried the following, which also fail in the same way:

vel ./breakout_ppo.yaml train
python3 -m vel.launcher breakout_ppo.yaml train
python3 -m vel.launcher ./breakout_ppo.yaml train

Add reset flag to the launcher

It should be possible to add flag to the launcher to reset the checkpointed state.

Implement Transformer network architecture

https://arxiv.org/abs/1706.03762

Loading saved models?

I'm trying to load a trained model to investigate its behavior (I'm interested in for example training a walking agent on one set of obstacles and then investigating that policy's performance on a different set).

I've run one of the example configs with:

python3 -m vel.launcher examples-configs/rl/mujoco/ppo/walker_ppo.yaml train

That all seems to work fine, when I investigate the output, I see that the field labeled "PMM:episode_rewards" gets up to 1500-2000 or so:

So far so good, now I'm trying to load this trained model into pytorch and run it back in the same environment, just to make sure I can. I went through your example scripts, and also looked into your 'infra-baselines' repo for hints. I've also dug through the meat of the codebase while debugging etc.

I was able to hack together this script, it seems though that the agent performs poorly, I get an average reward of about 4, and the walker is clearly not walking.

import torch
import pprint
import vel
from vel.rl.models.policy_gradient_model_separate import PolicyGradientModelSeparateFactory
from vel.rl.models.backbone.mlp import MLPFactory
from vel.util.random import set_seed
from vel.rl.env.mujoco import MujocoEnv

state_dict = torch.load('/Users/sgillen/work_dir/output/checkpoints/walker_ppo/0/checkpoint_00000489.data', map_location = 'cpu')
hidden_dict =  torch.load('/Users/sgillen/work_dir/output/checkpoints/walker_ppo/0/checkpoint_hidden_00000489.data', map_location = 'cpu')


seed = 1002
set_seed(seed) # Set random seed in python std lib, numpy and pytorch
env = MujocoEnv('Walker2d-v2').instantiate(seed=seed)


policy_in_size = state_dict['policy_backbone.model.0.weight'].shape[1]
value_in_size = state_dict['value_backbone.model.0.weight'].shape[1]


model_factory = PolicyGradientModelSeparateFactory(
    policy_backbone=MLPFactory(input_length=policy_in_size, hidden_layers=[64, 64], activation='tanh'),
    value_backbone=MLPFactory(input_length=value_in_size, hidden_layers=[64, 64], activation='tanh'),
)

#sgillen - pretty sure this infers the output size from the action space
model = model_factory.instantiate(action_space=env.action_space)
model.load_state_dict(state_dict)

env.allow_early_resets = True 


ob = env.reset()    
rewards = []
while True:
    #action = model.step(torch.Tensor(ob)).detach().numpy()
    action = model.step(torch.Tensor(ob).view(1,-1))['actions'].detach().numpy()
    ob, reward , done, _ =  env.step(action)
    
    rewards.append(reward)
    
    env.render()
    if done:
        print(max(rewards))
        ob = env.reset()

It would be very helpful if you had any advice for why this might be happening, I have a feeling I'm misunderstanding something about your codebase, or possibly pytorch itself (I'm relatively new to it). It would also be great if you could tell me if there is a "right way" to do this with your code base, and if there is not if there is any interest in me (trying to) build one up.

Thanks very much!

Variational Auto-Encoder

https://arxiv.org/abs/1312.6114

Implement policy gradient reinforcement learning algorithms

My next step is to have clean working and benchmarked policy gradient reinforcement learning algorithms.

Add README information how to run a config

There is no information how to actually run a model. Some basic information should be added.

Bug in visdom environment naming

It seems that each experiment creates two visdom environments, and name of one of them is a typo.

Include linter in the build process

Add support for LSTM policies

RNN and in particular LSTM are a useful class of policies and framework should support them.

Large refactoring of envs, vec envs, environment rollers and frame stack

Major part of Vel 0.3 release will be simplifying/unifying interface for environment rollers so that it has less special cases and works well together.

In specifics that will inclue:

Instead of separate set of reinforcers for single envs/vec envs, only single set of reinforcers
Only single set of environment rollers, rather than separate for envs/vec envs
Unified handling of framestack accross all the modules

Implementation of Guided Policy Search

Can you implement Guided Policy Search Algorithm as described here (https://papers.nips.cc/paper/5444-learning-neural-network-policies-with-guided-policy-search-under-unknown-dynamics.pdf). There isn't any good implementation available online for this algorithm and it is super-useful for Robotics and meta learning problems.

Thanks

Super-convergence experiments

Write some example configuration files that show a "super-convergence" phenomenon.

Name error in stochastic_policy_rnn_model.py

Thanks for your great work. I found a small bug in the line of 104 of stochastic_policy_rnn_model.py , the key name of returned value logprobs should be 'action:logprobs', instead of 'logprobs'.

Multiple optimizers support

Some architectures like GANs use two or more optimizers and it would be nice to add this behavior to vel.
vel right now relies on the function calculate_gradient to compute the loss backward. What do you think if instead the calculate_gradient will return a list of loss tensors (and metrics) and the Trainer will also have a list of optimizers and the backward() will be called by the Trainer in the train_batch function.

I am thinking about something like

for loss, optimizer in zip(self.model.calculate_gradient(), self.model.optimizers):
   optimizer.zero_grad()
   loss.backward()
   #clip gradients
   optimizer.step()

The calculate_gradient can be also retro compatible by checking the return type (dict or list).

What do you think? if you are ok with it i can work on it 😃

PyTorch 1.0 support

Do you have any plans to port your library and algorithms to stable pytorch 1.0?

Merge .api and .api.base packages

There is no really good reason for them to be separate.

Minimize unnecessary dependencies when installing

Hello! I'm trying to install vel to use with a custom OpenAI Gym environment that I'm creating.

I ran the command found in the README: pip3 install vel[gym,mongo,visdom]. Unfortunately, the install is repeatedly failing because of missing dependencies, and I'm having to hunt and peck to find which ones. In particular, the pillow and mujoco libraries are pulling in a lot of extra dependencies: zlib, cffi, Cython, jpeg, etc.

Currently, I feel like there are not many good options for RL in PyTorch. I would like to see something better take hold. I think that one way you could help vel gain traction is by making it easier to install. Minimize unnecessary dependencies, and you will help ensure that it doesn't fail to install when new users try it. Most people who try open source software, if it doesn't work out of the box, will just try something else and never look back. In my case, I have absolutely no need for mujoco environments. I don't think these should be installed by default. All I really want is to have access to a quality ppo implementation.

Understanding results from the ACER algorithm on Enduro

I came across some interesting results when running ACER on the Enduro environment when digging into hyperparameter optimization for this model. Here are the results of 50 runs for one particular set of hyperparameters:

I was a little bit surprised to see negative rewards here. Do you know how the Enduro environment can return negative rewards?

There was another curious case for the best configuration found through random search. I found that many of the evaluations only managed to run for a handful of frames. When running 50 runs of the environment in this case, only 12 runs made it past 1000 frames. Unfortunately all the rewards remained at 0.0 here.

Would you have any recommendations for me to dig into why this particular configuration seems to fail often, and only returns 0.0 rewards for the remaining runs? If you are curious, here is the configuration I was using: random search yaml. And here are the openai logging files.

Implement tracking percentiles (e.g. 90%, 10%) of episode rewards.

Preferably percentiles should be plotted on the same graph.

Implement ENV wrapper for limiting episode length

Some environments can get "stuck" not advancing beyond certain stage due to bugs. We should clip the episode length to a certain number - e.g. 10'000 frames.

What's the reason behind using orthogonal weight initialization?

I have see that you use init.orthogonal_ exclusively everywhere. I don't think it is mentioned in any paper that I'm aware of (definitely not in Nature DQN paper). I want to understand more about the reasoning behind this decision.

Thank you.

Expose more checkpointing options to the model

Currently most checkpointing options are set on a project level. There needs to be an experiment-level setting to tune these options (lr_streaming, checkpoint frequency, whether to store the best checkpoint etc.)

Parametrize visdom connection better

Currently experiments are connecting to a local Visdom server. Conection string needs to be parametrized to allow a remote visdom.

Rename models to policies

Nicer API for progress idx and 'progress_meter'

Current solution is quite inflexible.

Think about restarting reinforcement learning algorithms

How to implement/tie to current setup

Factor out input_block from each model

That would decouple slightly models from the input format thus possibly increasing code reuse potential.

[v0.3] Record command getting Type exception using NormalizeObservations

Hey,

I have been trying out Vel for a day or two now, using it with a few gym classic control environments.

One issue I have run into is when I use the vel configs/cartpole_dqn.yaml record command I get the following:

RuntimeError: expected type torch.cuda.DoubleTensor but got torch.cuda.FloatTensor

I have worked around the issue by editing the normalize_observations.py file register_buffer functions to expect dtype=torch.double when I intend to use the Record command, and change it back to torch.float when I want to train, but hopefully you can figure out what is causing it as I cannot (I am a bit of newbie with PyTorch).

Here is the full stack trace:

Evaluating environment...
Traceback (most recent call last):
  File "/home/j/anaconda3/bin/vel", line 11, in <module>
    load_entry_point('vel', 'console_scripts', 'vel')()
  File "/home/j/dev/vel/vel/launcher.py", line 64, in main
    model_config.run_command(args.command, args.varargs)
  File "/home/j/dev/vel/vel/internals/model_config.py", line 119, in run_command
    return command_descriptor.run(*varargs)
  File "/home/j/dev/vel/vel/rl/commands/record_movie_command.py", line 43, in run
    self.record_take(model, env, device, take_number=i + 1)
  File "/home/j/anaconda3/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/j/dev/vel/vel/rl/commands/record_movie_command.py", line 68, in record_take
    actions = model.step(observation_tensor, **self.sample_args)['actions']
  File "/home/j/dev/vel/vel/rl/models/q_model.py", line 65, in step
    q_values = self(observations)
  File "/home/j/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/j/dev/vel/vel/rl/models/q_model.py", line 58, in forward
    observations = self.input_block(observations)
  File "/home/j/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/j/dev/vel/vel/modules/input/normalize_observations.py", line 48, in forward
    return (input_vector- self.running_mean.unsqueeze(0)) / torch.sqrt(self.running_var.unsqueeze(0))
RuntimeError: expected type torch.cuda.DoubleTensor but got torch.cuda.FloatTensor

Soft Actor-Critic and Twin Delayed DDPG

https://arxiv.org/abs/1801.01290
https://arxiv.org/abs/1802.09477

How do models deal with environments with no scale_float_frames?

In classic_atari.py, there is an option scale_float_frames which is False be default. I think this is to reduce the memory usage (a byte is smaller than a float). I wonder if that is the case there must be a scaling somewhere else, but I have yet to find it. Would you also elaborate more on the structure of relevant classes? Keywords are hard to get by intuition.

Viewing results.

Thanks for putting together this library!

I have installed the library on a headless server, along with MongoDB and Visdom. Is there a way to view the results after running from the .yaml configs?

I am testing it out with the example

vel examples-configs/rl/atari/a2c/breakout_a2c.yaml train

Everything trains fine, but then when I look at the logfile at vel/output/openai/breakout_a2c/0/log.txt, it only saves the following:

Logging to /home/ygx/src/vel/output/openai/breakout_a2c/0

And the progress.csv at that directory level is empty.

When looking at the yaml config at https://github.com/yngtodd/vel/blob/master/examples-configs/rl/atari/a2c/breakout_a2c.yaml#L57, I see that it is saving a video. Where is that stored?

Thanks!

Create documentation with something like portay.

For reference:
https://timothycrosley.github.io/portray/

Write output of LrFinder to storage rather than to matplotlib

LrFinder currently only shows a plot. It would be nice if the output could be saved to MongoDB or Visdom alternatively.

What is consideration regarding "FIRE" action in the env?

I have seen this in classic_atari.py:

    if 'FIRE' in env.unwrapped.get_action_meanings():
        # Take action on reset for environments that are fixed until firing.
        if disable_episodic_life:
            env = FireEpisodicLifeEnv(env)
        else:
            env = FireResetEnv(env)

I don't know what's it for. Could you elaborate on this?
Update: I now know that some environments need some actions to really start after a reset.

Also, zooming out a bit I see:

    if not disable_episodic_life:
        # Make end-of-life == end-of-episode, but only reset on true game over.
        # Done by DeepMind for the DQN and co. since it helps value estimation.
        env = EpisodicLifeEnv(env)

    if 'FIRE' in env.unwrapped.get_action_meanings():
        # Take action on reset for environments that are fixed until firing.
        if disable_episodic_life:
            env = FireEpisodicLifeEnv(env)
        else:
            env = FireResetEnv(env)

The two if statements seem conflicting, doesn't it? I think if EpisodicLifeEnv is used, FireEpisodicLifeEnv also must be used.

Revive test-time augmentation

Because of a rework of metrics system I had to disable test-time augmentation code from the framework.

It should be put back somewhere, maybe in a different spot as the previous one proved to be very hard to implement in a generic way.

Use torch.distributions instead of handrolled logic

In files like:
https://github.com/MillionIntegrals/vel/blob/master/vel/rl/modules/action_head.py
https://github.com/MillionIntegrals/vel/blob/master/vel/rl/modules/q_head.py
https://github.com/MillionIntegrals/vel/blob/master/vel/rl/modules/double_q_head.py

Implement 'Rainbow' DQN version.

Implement an improved DQN algorithm with many improvements combined, described in https://arxiv.org/abs/1710.02298.

Compatibility of DQN and parallel environments

Hi,

I have been trying to train a DQN on breakout using the example config proposed, but by using several environments instead of just 1 to speed up training (which I understood is possible since version 0.3, according to comments in issue #28 ).
I modified the breakout_ddqn.yaml configuration file by using the vel.rl.vecenv.shared_mem vec_env and set the parameter parallel_envs of reinforcer to 4, 8 and 12 (file attached).

breakout_ddqn_parallel.yaml.txt

Once launched, the code indeed seems to create several environments (according to the processes created in htop).
Moreover, the number of frames at each epoch grows linearly with the number of parallel envs, as expected.
However, the DQN does not train faster: for instance, with the 12 parallel envs, the DQN has an average episode reward of ~1-2 after
~9M frames.

This makes me think that the number of frames displayed does not correspond to the number of frames actually seen.

I might have been doing something wrong, in that case I would be very grateful for any advice!

Apart from that, thanks a lot for making this very nice repository public, I look forward to see its further developments.

Thanks in advance for your help!

Maxime

millionintegrals / vel Goto Github PK

vel's People

Contributors

Stargazers

Watchers

Forkers

vel's Issues

Recommend Projects

Recommend Topics

Recommend Org