Giter Club home page Giter Club logo

vel's People

Contributors

dependabot[bot] avatar millionintegrals avatar yngtodd avatar youngbink avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vel's Issues

Can't run YAML config file in local directory

I installed vel through pip, and then downloaded one of the example YAML config files so I could modify it. However, I can't seem to get the launcher to run a config file from the current directory, and I'm not sure why:

maxime@Desktop:~/Desktop/gym-miniworld$ ls -al
...
-rw-rw-r-- 1 maxime maxime 1636 Nov 21 15:23 breakout_ppo.yaml

 vel breakout_ppo.yaml train
Traceback (most recent call last):
  File "/home/maxime/.local/bin/vel", line 11, in <module>
    sys.exit(main())
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/launcher.py", line 30, in main
    params={k: v for (k, v) in (Parser.parse_equality(eq) for eq in args.param)}
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 39, in from_file
    project_config_path = ModelConfig.find_project_directory(os.path.dirname(os.path.abspath(filename)))
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 31, in find_project_directory
    return ModelConfig.find_project_directory(up_path)
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 31, in find_project_directory
    return ModelConfig.find_project_directory(up_path)
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 31, in find_project_directory
    return ModelConfig.find_project_directory(up_path)
  File "/home/maxime/.local/lib/python3.6/site-packages/vel/api/model_config.py", line 29, in find_project_directory
    raise RuntimeError(f"Couldn't find project file starting from {start_path}")
RuntimeError: Couldn't find project file starting from /

Also tried the following, which also fail in the same way:

vel ./breakout_ppo.yaml train
python3 -m vel.launcher breakout_ppo.yaml train
python3 -m vel.launcher ./breakout_ppo.yaml train

Loading saved models?

I'm trying to load a trained model to investigate its behavior (I'm interested in for example training a walking agent on one set of obstacles and then investigating that policy's performance on a different set).

I've run one of the example configs with:

python3 -m vel.launcher examples-configs/rl/mujoco/ppo/walker_ppo.yaml train

That all seems to work fine, when I investigate the output, I see that the field labeled "PMM:episode_rewards" gets up to 1500-2000 or so:

image

So far so good, now I'm trying to load this trained model into pytorch and run it back in the same environment, just to make sure I can. I went through your example scripts, and also looked into your 'infra-baselines' repo for hints. I've also dug through the meat of the codebase while debugging etc.

I was able to hack together this script, it seems though that the agent performs poorly, I get an average reward of about 4, and the walker is clearly not walking.

import torch
import pprint
import vel
from vel.rl.models.policy_gradient_model_separate import PolicyGradientModelSeparateFactory
from vel.rl.models.backbone.mlp import MLPFactory
from vel.util.random import set_seed
from vel.rl.env.mujoco import MujocoEnv

state_dict = torch.load('/Users/sgillen/work_dir/output/checkpoints/walker_ppo/0/checkpoint_00000489.data', map_location = 'cpu')
hidden_dict =  torch.load('/Users/sgillen/work_dir/output/checkpoints/walker_ppo/0/checkpoint_hidden_00000489.data', map_location = 'cpu')


seed = 1002
set_seed(seed) # Set random seed in python std lib, numpy and pytorch
env = MujocoEnv('Walker2d-v2').instantiate(seed=seed)


policy_in_size = state_dict['policy_backbone.model.0.weight'].shape[1]
value_in_size = state_dict['value_backbone.model.0.weight'].shape[1]


model_factory = PolicyGradientModelSeparateFactory(
    policy_backbone=MLPFactory(input_length=policy_in_size, hidden_layers=[64, 64], activation='tanh'),
    value_backbone=MLPFactory(input_length=value_in_size, hidden_layers=[64, 64], activation='tanh'),
)

#sgillen - pretty sure this infers the output size from the action space
model = model_factory.instantiate(action_space=env.action_space)
model.load_state_dict(state_dict)

env.allow_early_resets = True 


ob = env.reset()    
rewards = []
while True:
    #action = model.step(torch.Tensor(ob)).detach().numpy()
    action = model.step(torch.Tensor(ob).view(1,-1))['actions'].detach().numpy()
    ob, reward , done, _ =  env.step(action)
    
    rewards.append(reward)
    
    env.render()
    if done:
        print(max(rewards))
        ob = env.reset()

It would be very helpful if you had any advice for why this might be happening, I have a feeling I'm misunderstanding something about your codebase, or possibly pytorch itself (I'm relatively new to it). It would also be great if you could tell me if there is a "right way" to do this with your code base, and if there is not if there is any interest in me (trying to) build one up.

Thanks very much!

Large refactoring of envs, vec envs, environment rollers and frame stack

Major part of Vel 0.3 release will be simplifying/unifying interface for environment rollers so that it has less special cases and works well together.

In specifics that will inclue:

  • Instead of separate set of reinforcers for single envs/vec envs, only single set of reinforcers
  • Only single set of environment rollers, rather than separate for envs/vec envs
  • Unified handling of framestack accross all the modules

Name error in stochastic_policy_rnn_model.py

Thanks for your great work. I found a small bug in the line of 104 of stochastic_policy_rnn_model.py , the key name of returned value logprobs should be 'action:logprobs', instead of 'logprobs'.

Multiple optimizers support

Some architectures like GANs use two or more optimizers and it would be nice to add this behavior to vel.
vel right now relies on the function calculate_gradient to compute the loss backward. What do you think if instead the calculate_gradient will return a list of loss tensors (and metrics) and the Trainer will also have a list of optimizers and the backward() will be called by the Trainer in the train_batch function.

I am thinking about something like

for loss, optimizer in zip(self.model.calculate_gradient(), self.model.optimizers):
   optimizer.zero_grad()
   loss.backward()
   #clip gradients
   optimizer.step()

The calculate_gradient can be also retro compatible by checking the return type (dict or list).

What do you think? if you are ok with it i can work on it ๐Ÿ˜ƒ

PyTorch 1.0 support

Do you have any plans to port your library and algorithms to stable pytorch 1.0?

Minimize unnecessary dependencies when installing

Hello! I'm trying to install vel to use with a custom OpenAI Gym environment that I'm creating.

I ran the command found in the README: pip3 install vel[gym,mongo,visdom]. Unfortunately, the install is repeatedly failing because of missing dependencies, and I'm having to hunt and peck to find which ones. In particular, the pillow and mujoco libraries are pulling in a lot of extra dependencies: zlib, cffi, Cython, jpeg, etc.

Currently, I feel like there are not many good options for RL in PyTorch. I would like to see something better take hold. I think that one way you could help vel gain traction is by making it easier to install. Minimize unnecessary dependencies, and you will help ensure that it doesn't fail to install when new users try it. Most people who try open source software, if it doesn't work out of the box, will just try something else and never look back. In my case, I have absolutely no need for mujoco environments. I don't think these should be installed by default. All I really want is to have access to a quality ppo implementation.

Understanding results from the ACER algorithm on Enduro

I came across some interesting results when running ACER on the Enduro environment when digging into hyperparameter optimization for this model. Here are the results of 50 runs for one particular set of hyperparameters:

image

I was a little bit surprised to see negative rewards here. Do you know how the Enduro environment can return negative rewards?

There was another curious case for the best configuration found through random search. I found that many of the evaluations only managed to run for a handful of frames. When running 50 runs of the environment in this case, only 12 runs made it past 1000 frames. Unfortunately all the rewards remained at 0.0 here.

image

Would you have any recommendations for me to dig into why this particular configuration seems to fail often, and only returns 0.0 rewards for the remaining runs? If you are curious, here is the configuration I was using: random search yaml. And here are the openai logging files.

Expose more checkpointing options to the model

Currently most checkpointing options are set on a project level. There needs to be an experiment-level setting to tune these options (lr_streaming, checkpoint frequency, whether to store the best checkpoint etc.)

[v0.3] Record command getting Type exception using NormalizeObservations

Hey,

I have been trying out Vel for a day or two now, using it with a few gym classic control environments.

One issue I have run into is when I use the vel configs/cartpole_dqn.yaml record command I get the following:

RuntimeError: expected type torch.cuda.DoubleTensor but got torch.cuda.FloatTensor

I have worked around the issue by editing the normalize_observations.py file register_buffer functions to expect dtype=torch.double when I intend to use the Record command, and change it back to torch.float when I want to train, but hopefully you can figure out what is causing it as I cannot (I am a bit of newbie with PyTorch).

Here is the full stack trace:

Evaluating environment...
Traceback (most recent call last):
  File "/home/j/anaconda3/bin/vel", line 11, in <module>
    load_entry_point('vel', 'console_scripts', 'vel')()
  File "/home/j/dev/vel/vel/launcher.py", line 64, in main
    model_config.run_command(args.command, args.varargs)
  File "/home/j/dev/vel/vel/internals/model_config.py", line 119, in run_command
    return command_descriptor.run(*varargs)
  File "/home/j/dev/vel/vel/rl/commands/record_movie_command.py", line 43, in run
    self.record_take(model, env, device, take_number=i + 1)
  File "/home/j/anaconda3/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/j/dev/vel/vel/rl/commands/record_movie_command.py", line 68, in record_take
    actions = model.step(observation_tensor, **self.sample_args)['actions']
  File "/home/j/dev/vel/vel/rl/models/q_model.py", line 65, in step
    q_values = self(observations)
  File "/home/j/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/j/dev/vel/vel/rl/models/q_model.py", line 58, in forward
    observations = self.input_block(observations)
  File "/home/j/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/j/dev/vel/vel/modules/input/normalize_observations.py", line 48, in forward
    return (input_vector- self.running_mean.unsqueeze(0)) / torch.sqrt(self.running_var.unsqueeze(0))
RuntimeError: expected type torch.cuda.DoubleTensor but got torch.cuda.FloatTensor

How do models deal with environments with no scale_float_frames?

In classic_atari.py, there is an option scale_float_frames which is False be default. I think this is to reduce the memory usage (a byte is smaller than a float). I wonder if that is the case there must be a scaling somewhere else, but I have yet to find it. Would you also elaborate more on the structure of relevant classes? Keywords are hard to get by intuition.

Viewing results.

Thanks for putting together this library!

I have installed the library on a headless server, along with MongoDB and Visdom. Is there a way to view the results after running from the .yaml configs?

I am testing it out with the example

vel examples-configs/rl/atari/a2c/breakout_a2c.yaml train

Everything trains fine, but then when I look at the logfile at vel/output/openai/breakout_a2c/0/log.txt, it only saves the following:

Logging to /home/ygx/src/vel/output/openai/breakout_a2c/0

And the progress.csv at that directory level is empty.

When looking at the yaml config at https://github.com/yngtodd/vel/blob/master/examples-configs/rl/atari/a2c/breakout_a2c.yaml#L57, I see that it is saving a video. Where is that stored?

Thanks!

What is consideration regarding "FIRE" action in the env?

I have seen this in classic_atari.py:

    if 'FIRE' in env.unwrapped.get_action_meanings():
        # Take action on reset for environments that are fixed until firing.
        if disable_episodic_life:
            env = FireEpisodicLifeEnv(env)
        else:
            env = FireResetEnv(env)

I don't know what's it for. Could you elaborate on this?
Update: I now know that some environments need some actions to really start after a reset.

Also, zooming out a bit I see:

    if not disable_episodic_life:
        # Make end-of-life == end-of-episode, but only reset on true game over.
        # Done by DeepMind for the DQN and co. since it helps value estimation.
        env = EpisodicLifeEnv(env)

    if 'FIRE' in env.unwrapped.get_action_meanings():
        # Take action on reset for environments that are fixed until firing.
        if disable_episodic_life:
            env = FireEpisodicLifeEnv(env)
        else:
            env = FireResetEnv(env)

The two if statements seem conflicting, doesn't it? I think if EpisodicLifeEnv is used, FireEpisodicLifeEnv also must be used.

Revive test-time augmentation

Because of a rework of metrics system I had to disable test-time augmentation code from the framework.

It should be put back somewhere, maybe in a different spot as the previous one proved to be very hard to implement in a generic way.

Compatibility of DQN and parallel environments

Hi,

I have been trying to train a DQN on breakout using the example config proposed, but by using several environments instead of just 1 to speed up training (which I understood is possible since version 0.3, according to comments in issue #28 ).
I modified the breakout_ddqn.yaml configuration file by using the vel.rl.vecenv.shared_mem vec_env and set the parameter parallel_envs of reinforcer to 4, 8 and 12 (file attached).

breakout_ddqn_parallel.yaml.txt

Once launched, the code indeed seems to create several environments (according to the processes created in htop).
Moreover, the number of frames at each epoch grows linearly with the number of parallel envs, as expected.
However, the DQN does not train faster: for instance, with the 12 parallel envs, the DQN has an average episode reward of ~1-2 after
~9M frames.

This makes me think that the number of frames displayed does not correspond to the number of frames actually seen.

I might have been doing something wrong, in that case I would be very grateful for any advice!

Apart from that, thanks a lot for making this very nice repository public, I look forward to see its further developments.

Thanks in advance for your help!

Maxime

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.