laermannjan / nip-deeprl-project Goto Github PK

Student project in deep reinforcement learning with the OpenAI Gym. We evaluated and analyzed how different model architectures performed as agents in various games.

Python 2.36% Jupyter Notebook 97.41% Shell 0.23%

nip-deeprl-project's People

Contributors

Watchers

nip-deeprl-project's Issues

Pickling broken

Environment
LunarLander

Behaviour
Agents cannot be pickled and written to file.

Reproduction Procedure
Run any lunarlander experiment, e.g. python testbench.py LunarLander-v2 dummy

Stack Trace

Traceback (most recent call last):
  File "testbench.py", line 61, in <module>
    train(args.env, config_name, args.pickle_root, args.exp_name, args.num_cpu)
  File "/Users/jan/code/nip-deeprl-project/custom_train.py", line 128, in train
    ActWrapper(act, act_params).save(os.path.join(pickle_dir, pickle_fname))
  File "/usr/local/anaconda3/lib/python3.6/site-packages/baselines/deepq/simple.py", line 55, in save
    dill.dump((model_data, self._act_params), f)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 252, in dump
    pik.dump(obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 841, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 1306, in save_function
    obj.__dict__), obj=obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 1057, in save_cell
    pickler.save_reduce(_create_cell, (obj.cell_contents,), obj=obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 841, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 841, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 496, in save
    rv = reduce(self.proto)
TypeError: can't pickle SwigPyObject objects

Reevaluate custom_train.py

There seem to be some inconsistencies.
Some means and such get calculated unnecessarily often or at the wrong point of the procedure.

Maybe also revamp script into something like class Trainer

misc_util.RunningAvg

Use RunningAvg util function instead of or in addition to our sliding window mean calculations(convolution style).

SimpleMonitor

misc_util.SimpleMonitor might be a better simpler method of tracking interesting params.
Maybe we can even redirect the logging output to a file (maybe even convert to npy binary file beforehand?)

Write Dockerfile

We need a dockerfile to build an image to deploy on servers.

Docker-image for normal CPU learning (8138e2e)
Docker-image utilizing GPU parallelization.

Refactor configs

configs should now be dictionaries within the overarching Configs dictionary, like so:

Configs = {
    'config_name1': {
        'env': 'Acrobot-v1',
        'gamma': 0.99
        ...
    },
    'config_name2: {
        'env': 'LunarLander-v2',
        'gamma': 0.01
        ...
    },
    ....
}

Important: a 'basic' config should be defined for each environment, which sets the baseline parameters for all future adaptions in experiments.
Per convention the names of those configs for LunarLander-v1, Acrobot-v2 and Cartpole-v0 are LL_basic, AB_basic and CP_basic, respectively.
A specific config for an experiment, e.g. LL_exp1 must always contain the key env defining the environment of the experiment. Everything else is optional and can therefore be limited to only those keys which shall differ from the environment's baseline config (XX_basic).

Attention

key names used by the new training facility (ccbb561) have changed compared to the old configs. Some keys also have been dropped and others added.
Check testbench.py for a full list of all keys and possible values. Note that an option name like
--foo-bar translates into a key name foo_bar.

Provide resume functionality

We should be able to stop and resume an experiment or to just add onto an existing experiment.
That way we would have no troubles running longer experiments without freezing our laptops 24/7

Swig_constant_randint

Standardise and complete logging

Save (pickle) agent with highest reward rating

Save (pickle) agent with highest reward rating in addition to the regular one at the 'end of training'. Thereby we could investigate questions such as if the reward (even as mean over past episodes) can give a qualitative indication to the agent's performance. It might be of interest to see how the 'best agent' competes against the one from the (arbitrary) end of the training in a test environment (one without learning or exploration). This would probably be done in a qualitative relatively subjective way where we could try to examine complexity of strategies or the similarity to human-strategies (e.g. by playing it on our own with play.py)

ParamTuning Envs

Checkout train_deep_cnn.py and convergence.py!
This could solve all our problems at once?!

Package the project

Make project a python package to be able to ensure dependencies on all platforms and make structure more modular.

Capture videos without rendering them.

According to this issue it should be possible to capture videos of our agent without rendering them.
They basically use a headless Xserver (Xdummy) which we can just rip off from the gym repo: make docker-build.
Just need to install baselines, tensorflow, etc. into the docker image.

Implement resume training

When we stop training after a number of steps or episodes and save the logs and models we should be able to resume training the same agent given all these infos (DQNs are certainly capable, why aren't we?)

Change how timestep limit gets set.

As until now we naively end iteration after config.max_timesteps, which breaks wrapping with gym.wrappers.Monitor.
According to OpenAI Devs we should rather change it like this

Stop training on max_episodes rather than max_steps

This way we could have have equally long logs of all agents (i.e. number of episode rewards, lengths, etc) and could overlay such a plot with another plot showing the total number of steps taken at each episode to give an idea how fast episode length is declining.

laermannjan / nip-deeprl-project Goto Github PK

nip-deeprl-project's People

Contributors

Watchers

nip-deeprl-project's Issues

Attention

Recommend Projects

Recommend Topics

Recommend Org