siemanko / tensorflow-deepq Goto Github PK
View Code? Open in Web Editor NEWA deep Q learning demonstration using Google Tensorflow
License: MIT License
A deep Q learning demonstration using Google Tensorflow
License: MIT License
how to save and later resume training after restoring already trained model
Thank You !
I tried to figure out tf.train.Saver or something similar in the code, but couldn't locate it. So, where does the model ( *.ckpt ) gets saved ?
I can write my own saver, but since there is saved model for karpathy game already, I was wondering if I am missing something ?
I'm so sorry to be asking this.
I tried for an hour and I can't figure out how to use or run it. Could someone please give me a pointer towards the right direction? I got as far as moving it into the site-packages directory of python, I am running a redis server even though I have no idea how to use it, and I think I have all the modules installed. I really would like to play around with this and experiment with different games but I would rather not start from scratch until I'm sure this is what I want to learn.
Hi,
I have a problem with executing the Karpathy game notebook. Cell 9 gives rise to the following error:
---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-6-16dd03e0e8b6> in <module>()
6 else:
7 # Tensorflow business - it is always good to reset a graph before creating a new controller.
----> 8 tf.ops.reset_default_graph()
9 session = tf.InteractiveSession()
10
AttributeError: 'module' object has no attribute 'ops
This usually happens when there are circular import dependencies, however, I could not find any. I am using Tensorflow 0.71 and Python 2.7.6 The error also occurs with Python 3.4. Which Python and tensorflow version are you using?
Thanks a lot for the nice code!
Hi,
A non-technical question, I hope its OK to ask here in github...
I am working on continuous robot control problems and was wondering which approach you are following for the continuous branch. I guess it is the Advantage Actor-Critic (A3C) approach in the 2016 Mnih paper here. However, that method is actually not Q-Learning but a variation of a policy GD method. However, many variables in your controller code suggest that DeepQ learning is applied, so I am a bit confused. Could you confirm that the code tries to reproduce the A3C method in that paper?
Firstly, let me just say that I love this project, the code is so easy to read and understand, and it blends two things I really wanted to experiment with!
I'm getting to the stage where I want to save/restore model variables after training. I notice that you have a saved_model folder with a .ckpt file for the karpathy game, but I do not see any way to load it using the current notebook.
I've read up on tf.train.Saver, and have tried to use its save and restore functions with partial success (my q_network's weights seem to get restored fine, but the target_q_network's weights do not). Do you have a version of the karpathy_game notebook that was used to create/restore your saved model? If not, could you please advise on what I should keep in mind when setting up tf.train.Saver() and using the save/restore methods?
(EDIT: As is pretty typical for me, I asked this prematurely - I figured out my mistake after a little more fiddling with the code. I had initialised the Saver before the controller was set up, so of course it couldn't save all the variables used by the controller... Sorry!)
Hi! what's the license this is released under?
I can't seem to replicate your results in the game notebook. Is your ipython notebook output for that outdated? or is there something subtle going on? I simply ran your game python notebook, but it doesn't seem to learn at all. I haven't seriously studied your code, so some help is appreciated in debugging
Since I am using this repo in my work as well, some of the features I found might be useful:
continuous
branch was initiated but has been inactive for a while (is it totally abandoned?).The latter three are supported in Asyncronous RL in Tensorflow + Keras + OpenAI's Gym However I do prefer the architecture here to keep things well structured.
Should add a dependency for 'matplotlib' to your README
envy@ub1404:~/os_pri/github/tensorflow-deepq$ python3 tf_rl/controller/human_controller.py
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I am trying to run this game in the browser from windows cmd but I dob't understand why I have this problem.
python version 3.5
jupyter
How do I solve this?
from future import print_function
import numpy as np
import tempfile
import tensorflow as tf
from tf_rl.controller import DiscreteDeepQ, HumanController
from tf_rl.simulation import KarpathyGame
from tf_rl import simulate
from tf_rl.models import MLP
ImportError Traceback (most recent call last)
in ()
5 import tensorflow as tf
6
----> 7 from tf_rl.controller import DiscreteDeepQ, HumanController
8 from tf_rl.simulation import KarpathyGame
9 from tf_rl import simulate
ImportError: No module named 'tf_rl'
In simulate(...)
, it assumes the environment is drawn with svg strings. However it might not generalise to complex game environment. For example, im drawing in 3D at the moment and it is easy to use matplotlib (maybe pyopengl in the future?).
Is it possible to make the environment drawing interface more general? My thought is something like
...
simulation.setup_draw() # Initialise figure handles, axes for reuse
...
for frame_no ...
...
simulation.draw() # Draw things by plot, scatter, etc or Ipython display()
...
In this way, all the figure handles and drawing stuff are handled by the environment class, e.g. KarparthyGame
Pls let me know if you like this idea. If so, I can make another pull request and see how it goes
In ./notebooks/DoublePendulumn.ipynb
, when I run
try:
simulate(d, fps=30, actions_per_simulation_second=1, speed=1.0, simulation_resultion=0.01)
except KeyboardInterrupt:
print("Interrupted")
It complains
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-7b5361d33da5> in <module>()
1 try:
----> 2 simulate(d, fps=30, actions_per_simulation_second=1, speed=1.0, simulation_resultion=0.01)
3 except KeyboardInterrupt:
4 print("Interrupted")
TypeError: simulate() got an unexpected keyword argument 'actions_per_simulation_second'
The current simulate(...)
has an interface way different to what is being called here
def simulate(simulation,
controller= None,
fps=60,
visualize_every=1,
action_every=1,
simulation_resolution=None,
wait=False,
disable_training=False,
save_path=None):
Really thankful for your work. The code is neat and simple, which help a python beginner like me a lot.
After run the example in the notebook folder, I'm now trying to extend it to be a multiagents version by simply replicate multiple copy of the DiscreteDeepQ.
However, I encounter some error of the namescope in DiscreteDeepQ.
I try to solve the issue by add agent id to the namescope. I not quite sure if it's the 'correct' way to solve.
Would it mass up the storage data of the network? Or the problem can be solved by these simple solution?
Ps. The program can be successfully compiled, but the agent seems to just move randomly...
Great project! I'm looking to use this with a Kinect v2 camera for a robotics application. I have 26 different joints each with x,y,z coordinates that will be my state space. Looking through the code it looks like the state it just a single int. Can you give me some guidance on how to feed in all these states?
I know with standard qlearning you have a S-A pair which are both just a single value. Is it possible to have 3 actions. In my example it would be motor 1, motor 2, motor 3.
Hi,
I wonder, how long training taking time so that the ai agent can act like your gif on this github page?
Thanks,
Hello, I am a newbie in Reinforcement Learning, and I thought this project is a wonderful resource to learn RL.
I have been going through your source code, and the line
https://github.com/nivwusquorum/tensorflow-deepq/blob/master/tf_rl/simulation/karpathy_game.py#L202
is kind of confusing for me.
From my understanding, each observation of hero's line consists of
[ type_of_object (wall/friend/enemy), object_speed (x,y), dist_from_hero ]
so the type_of_object should be something like [0,1,0], [1,0,0] or [0,0,1]
in the source code I attached, however, it assigns [1,1,1], and that sounds like it treats all types (wall/friend/enemy) equally.
Is it my misunderstanding? Please help me understand.
Thank you.
-Taeksoo
Did you use some specific paper to write the continuous controller code? I would like to take a look at your continuous branch, and would like to look at such paper first beforehand :) Did you use something like this?
envy@ub1404:/os_pri/github/tensorflow-deepq$ PYTHONPATH=/os_pri/github/tensorflow-deepq:/home/envy/os_pri/github/tensorflow/_python_build:$PYTHONPATH python3 tf_rl/controller/human_controller.py
Traceback (most recent call last):
File "tf_rl/controller/human_controller.py", line 1, in
from tf_rl.utils.getch import getch
File "/home/envy/os_pri/github/tensorflow-deepq/tf_rl/utils/init.py", line 1, in
import tensorflow as tf
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/init.py", line 23, in
from tensorflow.python import *
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/init.py", line 49, in
from tensorflow import contrib
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/contrib/init.py", line 23, in
from tensorflow.contrib import layers
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/contrib/layers/init.py", line 67, in
from tensorflow.contrib.layers.python.framework.tensor_util import *
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/contrib/layers/python/framework/tensor_util.py", line 21, in
from tensorflow.python.framework.ops import Tensor
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 39, in
from tensorflow.python.framework import versions
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/framework/versions.py", line 22, in
from tensorflow.python import pywrap_tensorflow
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/usr/lib/python3.4/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
ImportError: /home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: PyClass_Type
envy@ub1404:/os_pri/github/tensorflow-deepq$/os_pri/github/tensorflow-deepq$
envy@ub1404:
Hi -
I've installed TensorFlow and I can run their examples. I suspect I'm missing a path or an initialization step.
This is my version of Python:
Python 2.7.10 :: Anaconda 2.4.0 (x86_64)
When I invoke your sample command it throws an error on the first line. I get similar import problems when I use Jupyter to open the Karpathy notebook:
python tf_rl/controller/human_controller.py
Traceback (most recent call last):
File "tf_rl/controller/human_controller.py", line 1, in
from tf_rl.utils.getch import getch
File "/Users/mesozoic/Documents/MachineLearning/google/tensorflow-deepq/tf_rl/init.py", line 1, in
from .simulate import simulate
File "/Users/mesozoic/Documents/MachineLearning/google/tensorflow-deepq/tf_rl/simulate.py", line 7, in
from tf_rl.utils.event_queue import EventQueue
File "/Users/mesozoic/Documents/MachineLearning/google/tensorflow-deepq/tf_rl/utils/event_queue.py", line 3, in
from queue import PriorityQueue
ImportError: No module named queue
Thanks!
Thanks for the head start on RL with your DeepQ work. I am relatively new with RL and I was trying to get a system to converge for the longest time using your DeepQ controller, but it kept tending to 0 in total reward. My environment gives positive and negative reward, but almost always "converged" to 0 total reward (lowest energy?).
After re-reviewing many RL examples and TensorFlow, I think I found the issue which was surprising to say the least. I think it is related to TensorFlow's automatic calculation of the gradients. I feel the error is in this line:
temp_diff = self.value_given_action - self.future_reward
I my mind this is the difference of (Y(x) - Yexpected). The derivative of this is ultimately +Y(x)/dCost and I think this forces the solution "away" from the minimum. This in turn increases cost and forced my system to eventually decide to take no action at all (best convergence case it could find). So I reversed the terms according to most of the literature to (Yexpected - Y(x)) and sure enough, reward would grow positive and converge.
This may not affect examples where the data is all positive and you possible stop early enough with a slow learning rate. So changing this line to read as follows may improve the algorithm. This might also fix you Continuous solution as well if that had +/- rewards:
temp_diff = self.future_reward - self.value_given_action
Hi! Thanks for useful examples of tensorflow usage.
I'm playing with Deep Q-Learning example and noticed that after some of recent commits performance of DiscreteDeepQ dropped ~5x times. I'm wondering what caused it? Is it because of maintaining 2 copies of q-network? Is there an option to update weights less frequently? Sorry, I don't understand what is going on there very well yet.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.