laermannjan / nip-deeprl-project Goto Github PK
View Code? Open in Web Editor NEWStudent project in deep reinforcement learning with the OpenAI Gym. We evaluated and analyzed how different model architectures performed as agents in various games.
Student project in deep reinforcement learning with the OpenAI Gym. We evaluated and analyzed how different model architectures performed as agents in various games.
Environment
LunarLander
Behaviour
Agents cannot be pickled and written to file.
Reproduction Procedure
Run any lunarlander experiment, e.g. python testbench.py LunarLander-v2 dummy
Stack Trace
Traceback (most recent call last):
File "testbench.py", line 61, in <module>
train(args.env, config_name, args.pickle_root, args.exp_name, args.num_cpu)
File "/Users/jan/code/nip-deeprl-project/custom_train.py", line 128, in train
ActWrapper(act, act_params).save(os.path.join(pickle_dir, pickle_fname))
File "/usr/local/anaconda3/lib/python3.6/site-packages/baselines/deepq/simple.py", line 55, in save
dill.dump((model_data, self._act_params), f)
File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 252, in dump
pik.dump(obj)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 841, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 1306, in save_function
obj.__dict__), obj=obj)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 1057, in save_cell
pickler.save_reduce(_create_cell, (obj.cell_contents,), obj=obj)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 841, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/anaconda3/lib/python3.6/site-packages/dill/dill.py", line 841, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/usr/local/anaconda3/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle SwigPyObject objects
There seem to be some inconsistencies.
Some means and such get calculated unnecessarily often or at the wrong point of the procedure.
Maybe also revamp script into something like class Trainer
Use RunningAvg util function instead of or in addition to our sliding window mean calculations(convolution style).
misc_util.SimpleMonitor might be a better simpler method of tracking interesting params.
Maybe we can even redirect the logging output to a file (maybe even convert to npy binary file beforehand?)
We need a dockerfile to build an image to deploy on servers.
configs should now be dictionaries within the overarching Configs
dictionary, like so:
Configs = {
'config_name1': {
'env': 'Acrobot-v1',
'gamma': 0.99
...
},
'config_name2: {
'env': 'LunarLander-v2',
'gamma': 0.01
...
},
....
}
Important: a 'basic' config should be defined for each environment, which sets the baseline parameters for all future adaptions in experiments.
Per convention the names of those configs for LunarLander-v1
, Acrobot-v2
and Cartpole-v0
are LL_basic
, AB_basic
and CP_basic
, respectively.
A specific config for an experiment, e.g. LL_exp1
must always contain the key env
defining the environment of the experiment. Everything else is optional and can therefore be limited to only those keys which shall differ from the environment's baseline config (XX_basic
).
key names used by the new training facility (ccbb561) have changed compared to the old configs. Some keys also have been dropped and others added.
Check testbench.py
for a full list of all keys and possible values. Note that an option name like
--foo-bar
translates into a key name foo_bar
.
We should be able to stop and resume an experiment or to just add onto an existing experiment.
That way we would have no troubles running longer experiments without freezing our laptops 24/7
Save (pickle) agent with highest reward rating in addition to the regular one at the 'end of training'. Thereby we could investigate questions such as if the reward (even as mean over past episodes) can give a qualitative indication to the agent's performance. It might be of interest to see how the 'best agent' competes against the one from the (arbitrary) end of the training in a test environment (one without learning or exploration). This would probably be done in a qualitative relatively subjective way where we could try to examine complexity of strategies or the similarity to human-strategies (e.g. by playing it on our own with play.py
)
Checkout train_deep_cnn.py and convergence.py!
This could solve all our problems at once?!
Make project a python package to be able to ensure dependencies on all platforms and make structure more modular.
According to this issue it should be possible to capture videos of our agent without rendering them.
They basically use a headless Xserver (Xdummy) which we can just rip off from the gym repo: make docker-build
.
Just need to install baselines, tensorflow, etc. into the docker image.
When we stop training after a number of steps or episodes and save the logs and models we should be able to resume training the same agent given all these infos (DQNs are certainly capable, why aren't we?)
As until now we naively end iteration after config.max_timesteps
, which breaks wrapping with gym.wrappers.Monitor
.
According to OpenAI Devs we should rather change it like this
This way we could have have equally long logs of all agents (i.e. number of episode rewards, lengths, etc) and could overlay such a plot with another plot showing the total number of steps taken at each episode to give an idea how fast episode length is declining.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.