Giter Club home page Giter Club logo

nip-deeprl-project's People

Contributors

bordeauxred2 avatar laermannjan avatar monoel avatar monoelh avatar skynbe avatar

Watchers

 avatar  avatar

nip-deeprl-project's Issues

Pickling broken

Environment
LunarLander

Behaviour
Agents cannot be pickled and written to file.

Reproduction Procedure
Run any lunarlander experiment, e.g. python testbench.py LunarLander-v2 dummy

Stack Trace

Traceback (most recent call last):
  File "testbench.py", line 61, in <module>
    train(args.env, config_name, args.pickle_root, args.exp_name, args.num_cpu)
  File "/Users/jan/code/nip-deeprl-project/custom_train.py", line 128, in train
    ActWrapper(act, act_params).save(os.path.join(pickle_dir, pickle_fname))
  File "/usr/local/anaconda3/lib/python3.6/site-packages/baselines/deepq/simple.py", line 55, in save
    dill.dump((model_data, self._act_params), f)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 252, in dump
    pik.dump(obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 841, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 1306, in save_function
    obj.__dict__), obj=obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 1057, in save_cell
    pickler.save_reduce(_create_cell, (obj.cell_contents,), obj=obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 841, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 841, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 496, in save
    rv = reduce(self.proto)
TypeError: can't pickle SwigPyObject objects

Reevaluate custom_train.py

There seem to be some inconsistencies.
Some means and such get calculated unnecessarily often or at the wrong point of the procedure.

Maybe also revamp script into something like class Trainer

misc_util.RunningAvg

Use RunningAvg util function instead of or in addition to our sliding window mean calculations(convolution style).

SimpleMonitor

misc_util.SimpleMonitor might be a better simpler method of tracking interesting params.
Maybe we can even redirect the logging output to a file (maybe even convert to npy binary file beforehand?)

Write Dockerfile

We need a dockerfile to build an image to deploy on servers.

  • Docker-image for normal CPU learning (8138e2e)
  • Docker-image utilizing GPU parallelization.

Refactor configs

configs should now be dictionaries within the overarching Configs dictionary, like so:

Configs = {
    'config_name1': {
        'env': 'Acrobot-v1',
        'gamma': 0.99
        ...
    },
    'config_name2: {
        'env': 'LunarLander-v2',
        'gamma': 0.01
        ...
    },
    ....
}

Important: a 'basic' config should be defined for each environment, which sets the baseline parameters for all future adaptions in experiments.
Per convention the names of those configs for LunarLander-v1, Acrobot-v2 and Cartpole-v0 are LL_basic, AB_basic and CP_basic, respectively.
A specific config for an experiment, e.g. LL_exp1 must always contain the key env defining the environment of the experiment. Everything else is optional and can therefore be limited to only those keys which shall differ from the environment's baseline config (XX_basic).

Attention

key names used by the new training facility (ccbb561) have changed compared to the old configs. Some keys also have been dropped and others added.
Check testbench.py for a full list of all keys and possible values. Note that an option name like
--foo-bar translates into a key name foo_bar.

Provide resume functionality

We should be able to stop and resume an experiment or to just add onto an existing experiment.
That way we would have no troubles running longer experiments without freezing our laptops 24/7

Save (pickle) agent with highest reward rating

Save (pickle) agent with highest reward rating in addition to the regular one at the 'end of training'. Thereby we could investigate questions such as if the reward (even as mean over past episodes) can give a qualitative indication to the agent's performance. It might be of interest to see how the 'best agent' competes against the one from the (arbitrary) end of the training in a test environment (one without learning or exploration). This would probably be done in a qualitative relatively subjective way where we could try to examine complexity of strategies or the similarity to human-strategies (e.g. by playing it on our own with play.py)

ParamTuning Envs

Checkout train_deep_cnn.py and convergence.py!
This could solve all our problems at once?!

Package the project

Make project a python package to be able to ensure dependencies on all platforms and make structure more modular.

Capture videos without rendering them.

According to this issue it should be possible to capture videos of our agent without rendering them.
They basically use a headless Xserver (Xdummy) which we can just rip off from the gym repo: make docker-build.
Just need to install baselines, tensorflow, etc. into the docker image.

Implement resume training

When we stop training after a number of steps or episodes and save the logs and models we should be able to resume training the same agent given all these infos (DQNs are certainly capable, why aren't we?)

Stop training on max_episodes rather than max_steps

This way we could have have equally long logs of all agents (i.e. number of episode rewards, lengths, etc) and could overlay such a plot with another plot showing the total number of steps taken at each episode to give an idea how fast episode length is declining.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.