vinf / deer Goto Github PK
View Code? Open in Web Editor NEWDEEp Reinforcement learning framework
License: Other
DEEp Reinforcement learning framework
License: Other
I ran the PIP installer under Ubuntu 15.10 under SUDO and got the following:
$ sudo pip install deer
Downloading/unpacking deer
Downloading deer-0.2.4-py2-none-any.whl (122kB): 122kB downloaded
Installing collected packages: deer
Compiling /tmp/pip-build-duZeFH/deer/deeprl/core_optim.py ...
Sorry: IndentationError: unindent does not match any outer indentation level (core_optim.py, line 144)
Successfully installed deer
I don't know if this is going to be a problem, but subsequent PIP installs yield:
$ sudo pip install deer
Requirement already satisfied (use --upgrade to upgrade): deer in /usr/local/lib/python2.7/dist-packages
Cleaning up...
So it might have installed properly.
Great project, I can't wait to work with it.
Hey VinF,
thanks for your work!
I have questions about the DDPG implementation in deer.
Patrick Emami recommends in http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html to use for the actor and critic two functions in separate classes.
Additionally, he adds the action tensor in the 2nd hidden layer of the Critic Network.
Is my assumption correct that the ddpg implementation in deer is different?
King regards,
Roman
python run_toy_env.py returns:
Traceback (most recent call last):
File "run_toy_env.py", line 17, in
from deer.policies import EpsilonGreedyPolicy
ImportError: No module named 'deer.policies'
Using Python 3.5.2
envy@ub1404:/os_pri/github/General_Deep_Q_RL/examples/toy_env$ python run_toy_env_simple.py/os_pri/github/General_Deep_Q_RL/examples/toy_env$
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
/home/envy/.local/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:5: UserWarning: downsample module has been moved to the pool module.
warnings.warn("downsample module has been moved to the pool module.")
Traceback (most recent call last):
File "run_toy_env_simple.py", line 23, in
random_state=rng)
TypeError: init() got an unexpected keyword argument 'random_state'
envy@ub1404:
Found a bug in the override method "append" for CircularBuffer in agent. I fixed it. Basically it is a indexing error.
Hi, first of all congratulations, interesting repo.
I would like to use the MG example by modifying the environment a bit. In particular, in my case there is no long-term storage, but only the battery (and obviously also consumption and production). I also have historical data about the energy storage system (kwh) of a building and would like to integrate them into the environment.
Can you help me?
Dear VinF,
thank you for your great work.
I'm using temporary the DDPG algorithm. I would like to try to use the LSTM network.
I changed in AC_net_keras the line nr. 10 for this purpose to
from .NN_keras_LSTM import NN
When I'm trying to start the optimisation following error occurs:
"Q_net = neural_network_critic(self._batch_size, self._input_dimensions, self._n_actions, self._random_state, True)
TypeError: init() takes 5 positional arguments but 6 were given"
How is it possible to use it?
Thanks
dynamik
Hi Vince, thank you so much to offer this useful toolbox for us. I just find the Q_network cannot be dumped and resumed with setNetwork / dumpNetwork appropriately. The Learning rate / Epsilon / Discount factor cannot be transferred from trained model to the new one. We can add some tags in the agent.py in the _runEpisode after self._total_mode_reward += reward
as follows:
print 'Action is {}, V is {}'.format(action, V)
print '#{} --- Reward is {}:'.format(maxSteps, reward)
Hi Vince, I've noticed that the default mode of Conv2D function in the Keras is channels_last. In the 57th line and 73rd line of the NN_keras.py, the Reshape operation sets the channels as the first dimension. In my test, the function cannot work properly. The mountain_car_continuous works well because the dim == 1 and dim[0] == 1.
I would avoid keeping binary object such as those in the repository.
i creat deer model with custom gym environment and i want to test it, please help me
I would like to request adding TensorFlow compatibility. I would like to help on this if I can, but don't fully understand what needs to be done/built in TensorFlow as well as how to integrate that into your library.
hello dear Vincent,
Thank you so much for your precious work, it is so helpful and practical for me. I have some questions about micro gride two storages. For my data it's a time consuming process about 53 minutes for each epoch, and every time that I want to do some little tuning, I must run it again (with a new untrained network). Is it possible to save and use the trained network in a new training epoch?
Thank you
Erfan
I am runnning the run_ALE.py, and I use the keras model:
from deer.q_networks.q_net_keras import MyQNetwork.
However, this goes wrong.
Traceback (most recent call last):
File "run_ALE.py", line 85, in
rng)
File "/share/syou/deer/local/lib/python2.7/site-packages/deer/q_networks/q_net_keras.py", line 60, in init
self.q_vals, self.params = Q_net._buildDQN()
TypeError: _buildDQN() takes exactly 2 arguments (1 given)
Btw, when I use theano, it goes well. Any ideas?
I am trying to reproduce the results of the CRAR agent in the maze environment and am observing that the agent's test reward is not improving at all. It stays at about -5 for all the 250 epochs. Can you please point me to the experiment settings that can reproduce the results?
Environment
and Policy
both contain act
method, but they do quite different things.
In my opinion, act
is a verb to perform sth. Therefore, in Policy
abstract class, it should be a noun action
just like bestAction
. However, chooseAction
and chooseBestAction
are good, too.
class Policy(object):
"""Abstract class for all policies, i.e. objects that can take any space as input, and output an action.
"""
def __init__(self, q_network, n_actions,random_state):
self.q_network = q_network
self.n_actions = n_actions
self.random_state = random_state
pass
def bestAction(self, state):
""" Returns the best Action
"""
action = self.q_network.chooseBestAction(state)
V = max(self.q_network.qValues(state))
return action, V
def act(self, state):
"""Main method of the Policy class. It can be called by agent.py, given a state,
and should return a valid action w.r.t. the environment given to the constructor.
"""
raise NotImplementedError()
It would help organize the project if the Python code was organized as a standalone library instead of a bunch of scripts. In this way, you could also write nice standalone examples and unit tests.
Hey VinF, thank you very much for your great work! I'm enjoying to use deer a lot.
I would like to save trained ACNetworks.
But when I'm training a ACNetwork for example with run_mountain_car_continuous.py it's for me not possible to get (and save) the network parameters.
qnetwork.getAllParams()
-> AttributeError: 'Variable' object has no attribute 'get_value'
How can I save a network?
Thank for your help!
Hi Vince, many thanks to your fantastic work! I would like to know if there is a plant to support the TRPO algorithm? Thanks a lot!
Do you happen to know a reason why it would use such little GPU memory? Trying to use the library but running into issues with this. It's saying that the GPU has no more space but the GPU has way more space than 95MB
Hi Vince, many many thanks for this wonderful small toy called 'toy_env'. It is a joy to watch this toy's learning how to make a progress on 'buy and sell' technique, almost as taught on the trading textbook !
My background is trading, ie. not coding, hence find it very difficult to refine this framework further. For example changing the price feed structure from 'random' to the real numbers like 'csv', or provide more terrain information for it to make a better(or might be worse, of course) decision. Most likely this kind of work is for me to sort out, but wondered whether you have any plan to mature this toy environment further.
Dear VinF,
thank you very much for this great library!
I have noticed following behavior:
Example: mountain_car_continuous_env.py
The action space ist limited to [-1.0, 1.0]
But during the training sometimes values bigger than 1.0 occur.
Is it possible to prevent this?
Thanks!
Hi Vincent,
Thank you for this nice and useful work. I am wondering whether deep learning models from H2O can be integrated and used as part of learning model in deer?
Dear Mr.VinF,
I follow your instruction to install bleeding edge version, but I cannot apply the following code you already mentioned: pip install git+git://github.com/VINF/deer.git@master.
I got the error: The system cannot find the file specified while executing command git clone -q git://github.com/VINF/deer.git C:\Users\admin\AppData\Local\Temp\pip-r24t5v_x-build
Cannot find command 'git'
Could you help me figure out this problem?
Hi dear Vincent,
Thank you very much for your work, it is very helpful. I tried to run MG_two_storages that I realized an error occur as follow, in FindBestController. Does it need any configuration before run?
Best regards,
EJ
"Average (on the epoch) training loss: 0.9318450183311346
Episode average V value: 0
epoch 1:
Learning rate: 0.0002
Discount factor: 0.9
Epsilon: 0.9987931999999653
Best neural net obtained after 1 epochs, with validation score -78.40202677778416
Traceback (most recent call last):
File "run_MG_two_storages.py", line 194, in
agent.run(parameters.epochs, parameters.steps_per_epoch)
File "/usr/local/lib/python3.7/dist-packages/deer/agent.py", line 269, in run
self._run_train(n_epochs, epoch_length)
File "/usr/local/lib/python3.7/dist-packages/deer/agent.py", line 296, in _run_train
for c in self._controllers: c.onEpochEnd(self)
File "/usr/local/lib/python3.7/dist-packages/deer/experiment/base_controllers.py", line 338, in onEpochEnd
agent._run_non_train(n_epochs=1, epoch_length=self._epoch_length)
File "/usr/local/lib/python3.7/dist-packages/deer/agent.py", line 324, in _run_non_train
for c in self._controllers: c.onEnd(self)
File "/usr/local/lib/python3.7/dist-packages/deer/experiment/base_controllers.py", line 558, in onEnd
print("Test score of this neural net: {}".format(self._testScores[bestIndex]))
IndexError: list index out of range"
Hi VinF,
your library is very helpful. Thank you!
Weight normalization might be a way to make SGD-based algorithms suitable for a wider range of environments without the need to manually scale observation vectors and finely tune hyper-parameters. Moreover, the training process might be accelerated considerably for certain environments.
Maybe there is a straightforward way to apply weight normalization to your implementation of actor-critic learning like example code by OpenAI suggests. It appears that only the initialization of the critic would need to be adapted. The example code provides the adaptions for the SGD and Adam optimizers of Keras and the initialization of the critic's parameters based on a single minibatch of data.
The two challenges I see are the following:
I would really appreciate your thoughts on this.
EDIT: url of the links fixed
Hi, I tried running the PLE example on a simple pygame I came up with but encountered the following error. Would appreciate some guidance here on how to overcome the error. Thanks.
Traceback (most recent call last):
File "run_PLE.py", line 190, in
agent.run(parameters.epochs, parameters.steps_per_epoch)
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\agent.py", line 282, in run
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\experiment\base_controllers.py", line 346, in onEpochEnd
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\agent.py", line 173, in startMode
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\agent.py", line 434, in init
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\agent.py", line 676, in init
MemoryError
Hi,
thank you very much for your work, it has helped me a lot until now!
Is there any possibility that the CRAR implementation will be adapted for the continuous action space in the near future? I managed to adapt the NN_CRAR_keras
adapter (and fix a few bugs), but the CRAR learning algo itself is a little bit over my head for now.
Best regards,
Nik
Hey VinF,
do you have more information about the LongerExplorationPolicy?
I'm wondering whether this policy is suitable for my environment. How should the length parameter be chosen?
Thanks!
Best wishes
The ReadTheDocs link provided in the readme is currently broken.
The naming convention for this project has been fixed : http://deer.readthedocs.io/en/latest/user/development.html#naming-conv
The downside is that if you upgrade from 0.2.x (stable branch) to 0.3.devx (master branch) and still want to use the "old version" of the examples from the 0.2.x version, you will need to slightly modify the run_XXX of your examples. You can take a look at how run_toy_env.py has been modified from 0.2.x to 0.3.devx and do the same :
0b97398#diff-df55532cc225e6233c89a825fb61048c
Sorry for the hassle (should be the only one of that kind for a while!)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.