Deep Recurrent Q-Learning vs Deep Q Learning on a simple Partially Observable Markov Decision Process with Minecraft

Jupyter Notebook 99.23% Python 0.77%

deep-reinforcement-learning deep-learning deeplearning minecraft-reinforcement-learning gym-minecraft dqn deep-recurrent-q-network pomdp

minecraft-reinforcement-learning's Introduction

Minecraft-Reinforcement-Learning

We here compare Deep Recurrent Q-Learning and Deep Q-Learning on two simple missions in a Partially Observable Markov Decision Process (POMDP) based on Minecraft environment. We use gym-minecraft which allows the use of the MalmoProject with an OpenAI like API.

Our work is in the notebook DRQN_vs_DQN_minecraft.ipynb.

Our paper can be found here.

Work realised in collaboration with :

Prerequisites

Python 3.6
Jupyter
Tensorflow

Installation

You need to install Malmö
You can then install gym-minecraft
You can find in the folder "envs" :
- The slightly modified version of gym-minecraft main code we used named minecraft.py. Put it in your_pip_folder/site-packages/gym_minecraft-0.0.2-py3.6.egg/gym-minecraft/envs/
The missions we used. Put them in your_pip_folder/site-packages/gym_minecraft-0.0.2-py3.6.egg/gym-minecraft/assets/

Models

You can choose between 3 models :

Simple DQN : Convolutional Neural Network with the current frame
DQN : Convolutional Neural Network with the last 4 frames
DRQN : Convolutional Neural Network + LSTM layer

DQN settings

Implementation of Double Q Learning
ε-greedy exploration
Experience replay iplementation

Note

Unlike Deepmind’s implementations of DQN for Atari games, Minecraft has the constraint that the game isn’t in pause during two actions ordered by the agent. Accordingly the agent and the network have to be as fast as needed to play in the range of time fixed in the environment.

Credits

We would like to thank Arthur Juliani for all his work and medium articles. Tambet Matiisen for his nice implementation of Gym-Minecraft.

References

minecraft-reinforcement-learning's People

Contributors

Stargazers

Watchers

Forkers

adil25 fd-mingjie wxmyyj 5l1v3r1 isthatasim qiaowenchuan

minecraft-reinforcement-learning's Issues

How could you help us to add Mobs killed, Damagetaken, and Damage dealt to the XML file of MinecraftBasic-v0??

Hello ClementRomac, I hope you are fine and happy. The MinecraftBasic-v0 does not contain any mobs etc except the target-block. Please, how could you help us to add mobskilled, Damage dealt by the agent, and Damage taken by the agent in the XML file for MinecraftBasic-v0? We have almost succeeded but now we just needs to plot the results of these three features explained above in the form of a Graph. The graph that we got after training the model is pretty ordinary, we could not get the desired output, please if you would like to help us to get some fruitful graphs with good results so we may keep you as a co-author in our publications. Thanks and looking forward.
OR
if you have already plotted some graphs so please share with us if it could help us in any way. thanks again and looking forward if you would like to work with us in a team...

experience_buffer() - sample: "ValueError: Sample larger than population or is negative"

Following on from my previous issue, when I attempt to run the training episodes, once the agent goes past the "pre_train_steps", I am getting the following error:

This has been from simply replicating all steps given with the exact code copied from the notebook. The only small change I have made is to change the following parameters to test in reasonable time:

num_episodes = 100,000 --> 100
pre_train_steps = 10,000 --> 10

Is this a know issue or should I investigate further?

Thanks

Hello vincentberaud, we are facing an error after running your code. could you please help us? thanks

Being as researchers, we thought to extend your this work further but we are facing an error after running your code on our GPU, could you please help us to overcome this error? we will be thankful to you. below is the output of your program with an error:

    [2018-10-07 10:56:26,665] Making new env: MinecraftBasic-v0

2018-10-07 10:57:26.056640: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
#######################################
% Win : 0.2%
% Nothing : 0.0%
% Loss : 0.0%
Nb J before win: 79.0
/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/usr/local/lib/python3.5/dist-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Nb J before die: nan
Total Steps: 79
I: 0
Epsilon: 1

LAST EPISODE MOVES

Buffer Move 0 : move 1
Move Array 0 : move 1
ZPos : 0.5
XPos : 3.5
Yaw : 0.0
1 0 0 0

Buffer Move 1 : turn -1
Move Array 1 : turn -1
ZPos : 0.5
XPos : 3.5
Yaw : 0.0
0 0 0 1

Buffer Move 2 : move -1
Move Array 2 : move -1
ZPos : 1.5
XPos : 3.5
Yaw : 270.0
0 1 0 0

Buffer Move 3 : move 1
Move Array 3 : move 1
ZPos : 1.5
XPos : 3.5
Yaw : 270.0
1 0 0 0

Buffer Move 4 : turn -1
Move Array 4 : turn -1
ZPos : 1.5
XPos : 3.5
Yaw : 270.0
0 0 0 1
[2018-10-07 11:00:57,885] Agent missed 3 observation(s).
[2018-10-07 11:00:57,886] Agent missed 3 observation(s).

Process finished with exit code 135 (interrupted by signal 7: SIGEMT)

After following your recent and updated work we failed to load the client? could you please tell us whats wrong with us?

/home/adil/anaconda3/bin/python /home/adil/pycharm/pycharm-2017.2.4/helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 127.0.0.1 --port 45121 --file /home/adil/Downloads/Minecraft-Reinforcement-Learning-master/DRQN_vs_DQN_minecraft.py
pydev debugger: process 1374 is connecting

Connected to pydev debugger (build 172.4343.24)
/home/adil/anaconda3/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
[2019-03-30 21:14:24,557] Making new env: MinecraftBasic-v0
/home/adil/anaconda3/lib/python3.6/site-packages/minecraft_py-0.0.2-py3.6.egg/minecraft_py/Malmo/Minecraft
Traceback (most recent call last):
File "/home/adil/pycharm/pycharm-2017.2.4/helpers/pydev/pydevd.py", line 1599, in
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/adil/pycharm/pycharm-2017.2.4/helpers/pydev/pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/adil/pycharm/pycharm-2017.2.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/adil/Downloads/Minecraft-Reinforcement-Learning-master/DRQN_vs_DQN_minecraft.py", line 50, in
skip_steps = 0) #Movements modified to a faster convergence
File "/home/adil/anaconda3/lib/python3.6/site-packages/gym_minecraft-0.0.2-py3.6.egg/gym_minecraft/envs/minecraft_env.py", line 114, in init
self.mc_process, port = minecraft_py.start()
File "/home/adil/anaconda3/lib/python3.6/site-packages/minecraft_py-0.0.2-py3.6.egg/minecraft_py/init.py", line 77, in start
raise EOFError("Minecraft process finished unexpectedly")
EOFError: Minecraft process finished unexpectedly
Backend Qt5Agg is interactive backend. Turning interactive mode on.

How could you help us to add Mobs killed, Damagetaken, and Damage dealt to the XML file of MinecraftBasic-v0?

Tensorflow compatibility issues - "AttributeError: module 'tensorflow' has no attribute 'reset_default_graph'"

Hi,

I am attempting to replicate your code but am having issues with the Tensorflow dependecy, namely with Tensorflow 2.0, many attributes were removed such as ".reset_default_graph()" and ".Sessions()" so the following error is returned:

Traceback (most recent call last):
File "agent_visual.py", line 461, in
tf.reset_default_graph()
AttributeError: module 'tensorflow' has no attribute 'reset_default_graph'

The simple solution (found here: https://stackoverflow.com/questions/40782271/attributeerror-module-tensorflow-has-no-attribute-reset-default-graph) is to return Tensorflow to version 1.X.

However, this is causing more issues therefore wanted to confirm what version of Tensorflow was used for your work?

Alternatively, these attributes seem to have been replaced by tf.function but I am not familiar with Tensorflow enough to confirm this. Would this be suitable to update your work to Tensorflow 2.0 or have I missunderstood something?

https://www.tensorflow.org/tutorials/customization/performance

Thanks in advance
Phil

vincentberaud / minecraft-reinforcement-learning Goto Github PK