carpedm20 / deep-rl-tensorflow Goto Github PK
View Code? Open in Web Editor NEWTensorFlow implementation of Deep Reinforcement Learning papers
License: MIT License
TensorFlow implementation of Deep Reinforcement Learning papers
License: MIT License
When I train the DQN,it just stop suddenly and says
/t_train_max=50000000/unrolled_lstm=False/use_cumulated_reward=False/
0%| | 0/50000000 [00:00<?, ?it/s]/home/hanzy/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/hanzy/.local/lib/python2.7/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
0%| | 50002/50000000 [28:59<482:41:00, 28.75it/s]E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7005 (compatibility version 7000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
已放弃 (核心已转储)
Why?
Hi, this error occurs when I try to run the demo of Breakout-v0.
And, I have install gym[atari] and can make the environment 'Breakout-v0', however, this env has no 'ale' attribute to get the current lives.
I found there should be some modifications in agent.py. It's strange to use old history when we start playing a new game.
for self.t in tqdm(range(start_t, t_max), ncols=70, initial=start_t):
ep = (self.ep_end +
max(0., (self.ep_start - self.ep_end)
* (self.t_ep_end - max(0., self.t - self.t_learn_start)) / self.t_ep_end))
# 1. predict
action = self.predict(self.history.get(), ep)
# 2. act
observation, reward, terminal, info = self.env.step(action, is_training=True)
# 3. observe
q, loss, is_update = self.observe(observation, reward, action, terminal)
logger.debug("a: %d, r: %d, t: %d, q: %.4f, l: %.2f" % \
(action, reward, terminal, np.mean(q), loss))
if self.stat:
self.stat.on_step(self.t, action, reward, terminal,
ep, q, loss, is_update, self.learning_rate_op)
if terminal:
observation, reward, terminal = self.new_game()
## update history if the state is a terminal state
## for _ in range(self.history_length):
## self.history.add(observation)
No matter I am doing training or testing, there is a warning like
[!] Load FAILED: checkpoints/Breakout-v0/env_name=Breakout-v0/agent_type=DQN/batch_size=32/beta=0.01/data_format=NHWC/decay=0.99/discount_r=0.99/double_q=False/ep_end=0.01/ep_start=1.0/gamma=0.99/history_length=4/learning_rate=0.00025/learning_rate_decay=0.96/learning_rate_decay_step=50000/learning_rate_minimum=0.00025/max_delta=None/max_grad_norm=None/max_r=1/min_delta=None/min_r=-1/momentum=0.0/n_action_repeat=1/network_header_type=nips/network_output_type=normal/observation_dims=80,80/random_start=True/t_ep_end=1000000/t_learn_start=50000/t_target_q_update_freq=10000/t_test=10000/t_train_freq=4/t_train_max=500000/use_cumulated_reward=False/
Actually there is indeed an output file generated in the path of above warning under a new folder named 'logs'. But that is a weird file named 'events.out.tfevents.1508026080.vpn-campus-152-3-71-166.ssl.vpn.school_name.edu'
. It does not look like the checkpoint, but a file about network.
Thus I am wondering how the check point file stored after training and how to load them when testing? Any suggestion will be greatly appreciated, thanks !
https://github.com/carpedm20/deep-rl-tensorflow/blob/master/environments/environment.py#L114
why not return the cumulated_reward?
Hi all, does any body have a pre-trained model?
It takes a long time to train on my machine (about 2 weeks).
It will be great if any body can offer a pre-trained model :).
When running a test with python3 (from readme), numpy.ravel complains for data type:
python3 main.py --network_header_type=mlp --network_output_type=normal --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025
[2017-03-20 13:28:03,027] Making new env: CorridorSmall-v5
Traceback (most recent call last):
File "main.py", line 168, in
tf.app.run()
[...]/deep-rl-tensorflow/environments/corridor.py", line 66, in init
isd = (desc == 'S').ravel().astype('float64')
AttributeError: 'bool' object has no attribute 'ravel'
Is it because of python3? I saw some fixes for python3 on git already.
Hi,
I have the above error whenever I use GPU for training
My command used for running:
python3 main.py --network_header_type=nature --env_name=Breakout-v0 --is_train=True --display=True --t_train_max=50
The crash comes whenever it reaches a certain number of training (about 10%). Full error report:
10%|██▍ | 49973/500000 [02:52<25:54, 289.55it/s]2018-05-28 23:08:09.915921: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-05-28 23:08:09.915959: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-05-28 23:08:09.915981: F tensorflow/core/kernels/conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted (core dumped)
Is anyway to fix this? Thanks!
I think ddpg can be added, this algorithm performs better for continuous action space.
Look forward. :)
how to run CartPole-v0
Hello,
I tried to reproduce the result (with n_action_repeat 1) on the computer with GTX 1080, however the performance is not as good as shown in the figure. After 2.88 M steps the average reward is 0.0174,
the average ep_reward is 3.1071, and the max ep_reward is 7.
Maybe I did something wrong in the setting or misread some information. Could you give me some suggestions? Thanks a lot!
Chih-Chieh
After I read your code carefully, I cannot figure out how you define your action space. For example , how many actions you define and how to represent each action? Waiting for your answers.
Sincerely
Can't train the DQN!, I have installed gym[all] and tensorflow 1.9.0 with python 3.6.8, any idea?
$ python main.py --network_header_type=nature --env_name=Breakout-v0 Traceback (most recent call last): File "main.py", line 10, in <module> from environments.environment import ToyEnvironment, AtariEnvironment File "/media/bigdata/Solid2/DQN/deep-rl-tensorflow-master/environments/environment.py", line 6, in <module> from .corridor import CorridorEnv File "/media/bigdata/Solid2/DQN/deep-rl-tensorflow-master/environments/corridor.py", line 131, in <module> timestep_limit=100, File "/home/bigdata/.conda/envs/tensorflow/lib/python3.6/site-packages/gym/envs/registration.py", line 153, in register return registry.register(id, **kwargs) File "/home/bigdata/.conda/envs/tensorflow/lib/python3.6/site-packages/gym/envs/registration.py", line 147, in register self.env_specs[id] = EnvSpec(id, **kwargs) TypeError: __init__() got an unexpected keyword argument 'timestep_limit'
Hi
I downloaded the codes, and then test it as it described here.
However, I got this error as follows,
I think, all requirements are installed except opencv2 and openAI gym was tested.
I would appreciate that someone finds the cause and the solution.
Traceback (most recent call last):
File "/DQN-tensorflow-master/main.py", line 69, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/DQN-tensorflow-master/main.py", line 64, in main
agent.train()
File "/DQN-tensorflow-master/dqn/agent.py", line 40, in train
screen, reward, action, terminal = self.env.new_random_game()
File /DQN-tensorflow-master/dqn/environment.py", line 28, in new_random_game
self.new_game(True)
File "/DQN-tensorflow-master/dqn/environment.py", line 21, in new_game
if self.lives == 0:
File "/DQN-tensorflow-master/dqn/environment.py", line 52, in lives
return self.env.ale.lives()
AttributeError: 'TimeLimit' object has no attribute 'ale'
Could you please tell me how did you set the reward at each state? It seems that all F states will receive an reward thus an agent might just keep staying on F states till episode ends and it will automatically receive max reward. I cannot reproduce the result of the dueling network's corridor game. Could you please give me any hints?
@carpedm20 In my mac, when I ran the example, error occurred:
SystemError: new style getargs format but argument is not a tuple
Finally I found the solution for the problem, go to http://stackoverflow.com/questions/26964379/systemerror-new-style-getargs-format-but-argument-is-not-a-tuple-in-ros-camerac for more detailed explanation
in environment/environment.py
line 111, the last argument for imresize
should be cast to tuple, using tuple(***)
for that, like this: y_screen = imresize(y, tuple(self.observation_dims))
.
This problem is in utils.py, line 12, and i am use python2.7, tensorflow 1.4.0 and gym 0.7
Training model takes lot of time. So, it would be nice to have trained model to evaluate directly.
Getting this warning which requires updating agent.py code
WARNING:tensorflow:From /../deep-rl-tensorflow/agents/agent.py:61 in train.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use tf.global_variables_initializer
instead.
Hi,
It seems that you didn't include the code related plotting, I don't know in which way you plot your training results, do you have any suggestions?
Traceback (most recent call last): File "main.py", line 168, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "main.py", line 163, in main agent.train(conf.t_train_max) File "/home/ddy/workspace/deep-rl-tensorflow/agents/agent.py", line 67, in train observation, reward, terminal = self.new_game() File "/home/ddy/workspace/deep-rl-tensorflow/environments/environment.py", line 87, in new_random_game self.lives = self.env.ale.lives() AttributeError: 'TimeLimit' object has no attribute 'ale'
Hi all,
I wonder what's the meaning of frame-skip in the README.
Seeing agents/statistic.py line 20:
self.writer = tf.summary.FileWriter('./logs/%s' % self.model_dir, self.sess.graph)
It tries to use self.model_dir to create file but it raises error : "tensorflow.python.framework.errors_impl.NotFoundError: Failed to create a directory:"
How can I fix it? My python version is 3.6, and tensorflow version is 1.12
It seems to me that in all of the agents you are clipping the gradient. This would mean that the gradients are zero for large errors.
It might be because the paper "Human-level control through deep reinforcement
learning" makes a mistake when talking about clipping the loss.
What they actually do in the implementation is: abs(loss) for abs(loss) > 1
and loss^2 for abs(loss)<1
.
This can be implemented like this:
delta_grad_clip = 1
batch_delta = Y - DQN_acted
batch_delta_abs = tf.abs(batch_delta)
batch_delta_quadratic = tf.minimum(batch_delta_abs, delta_grad_clip)
batch_delta_linear = batch_delta_abs - batch_delta_quadratic
batch_loss = batch_delta_linear + batch_delta_quadratic**2
loss = tf.reduce_mean(batch_loss)
I got the following bug. Could you please help? Thank you!
$python main.py --network_header_type=nips --env_name=Breakout-v0 --use_gpu=False
Traceback (most recent call last):
File "main.py", line 173, in
tf.app.run()
File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "main.py", line 107, in main
conf.data_format = 'NHWC'
File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/tensorflow/python/platform/flags.py", line 88, in setattr
return self.dict['__wrapped'].setattr(name, value)
File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 496, in setattr
return self._set_unknown_flag(name, value)
File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 374, in _set_unknown_flag
raise _exceptions.UnrecognizedFlagError(name, value)
absl.flags._exceptions.UnrecognizedFlagError: Unknown command line flag 'data_format'
SARSA does not caculated the loss as well as Q-Learning
in 78 line of sarsa.py
Hello.
I am running your code in the atari game Breakout-v0.
the settings are simple DQN(nips), DQN(nature), DDQN, dueling DQN, dueling DDQN.
now, each processes are running almost 6M(6,000,000) frames.
but all output of processes are very low.
-> avg_p ~ [0.3 ~ 0.5]
-> avg_ep_r ~ [0.25 ~ 0.45]
on the paper (Mihn et al, 2013),
[avg_p ~ at least 2.5] and [avg_ep_r ~ at least 50] in Breakout, running time ~ 6M
I didn't adjust any code about DQN.
Has anyone seen a breakout example run? was i something wrong?
Hi
It's really a good code for learning Reinforcement Learning.
In the network.py, I have 2 questions.
Looking forward to your further response.
Got an InvalidArgumentError after 26 minutes of training. I upgraded to the most recent TensorFlow as suggested and did $ pip install -U 'gym[all]' tqdm scipy
. I ran this on a TitanX and Ubuntu 16.10.
$ time python main.py --network_header_type=nips --env_name=Breakout-v0 --use_gpu=True --display=True
[2017-01-26 23:58:40,289] DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace any calls to `register(timestep_limit=200)` with `register(tags={'wrapper_config.TimeLimit.max_episode_steps': 200)}`, . This change was made 12/28/2016 and is included in gym version 0.7.0. If you are getting many of these warnings, you may need to update universe past version 0.21.1
[2017-01-26 23:58:40,289] DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace any calls to `register(timestep_limit=200)` with `register(tags={'wrapper_config.TimeLimit.max_episode_steps': 200)}`, . This change was made 12/28/2016 and is included in gym version 0.7.0. If you are getting many of these warnings, you may need to update universe past version 0.21.1
[2017-01-26 23:58:40,289] DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace any calls to `register(timestep_limit=200)` with `register(tags={'wrapper_config.TimeLimit.max_episode_steps': 200)}`, . This change was made 12/28/2016 and is included in gym version 0.7.0. If you are getting many of these warnings, you may need to update universe past version 0.21.1
[2017-01-26 23:58:40,289] DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace any calls to `register(timestep_limit=200)` with `register(tags={'wrapper_config.TimeLimit.max_episode_steps': 200)}`, . This change was made 12/28/2016 and is included in gym version 0.7.0. If you are getting many of these warnings, you may need to update universe past version 0.21.1
{'agent_type': 'DQN',
'batch_size': 32,
'beta': 0.01,
'data_format': 'NCHW',
'decay': 0.99,
'discount_r': 0.99,
'display': True,
'double_q': False,
'env_name': 'Breakout-v0',
'ep_end': 0.01,
'ep_start': 1.0,
'gamma': 0.99,
'gpu_fraction': '1/1',
'history_length': 4,
'is_train': True,
'learning_rate': 0.00025,
'learning_rate_decay': 0.96,
'learning_rate_decay_step': 50000,
'learning_rate_minimum': 0.00025,
'log_level': 'INFO',
'max_delta': None,
'max_grad_norm': None,
'max_r': 1,
'max_random_start': 30,
'memory_size': 1000000,
'min_delta': None,
'min_r': -1,
'momentum': 0.0,
'n_action_repeat': 4,
'network_header_type': 'nips',
'network_output_type': 'normal',
'observation_dims': [80, 80],
'random_seed': 123,
'random_start': True,
'scale': 10000,
't_ep_end': 1000000,
't_learn_start': 50000,
't_target_q_update_freq': 10000,
't_test': 10000,
't_train_freq': 4,
't_train_max': 50000000,
'tag': '',
'unrolled_lstm': False,
'use_cumulated_reward': False,
'use_gpu': True}
[*] GPU : 1.0000
[2017-01-26 23:58:40,330] Making new env: Breakout-v0
[2017-01-26 23:58:40,352] Using 6 actions : NOOP, FIRE, RIGHT, LEFT, RIGHTFIRE, LEFTFIRE
INFO:tensorflow:Summary name episode/max reward is illegal; using episode/max_reward instead.
[2017-01-26 23:58:40,938] Summary name episode/max reward is illegal; using episode/max_reward instead.
INFO:tensorflow:Summary name episode/min reward is illegal; using episode/min_reward instead.
[2017-01-26 23:58:40,940] Summary name episode/min reward is illegal; using episode/min_reward instead.
INFO:tensorflow:Summary name episode/avg reward is illegal; using episode/avg_reward instead.
[2017-01-26 23:58:40,941] Summary name episode/avg reward is illegal; using episode/avg_reward instead.
INFO:tensorflow:Summary name episode/num of game is illegal; using episode/num_of_game instead.
[2017-01-26 23:58:40,943] Summary name episode/num of game is illegal; using episode/num_of_game instead.
WARNING:tensorflow:From /media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/agent.py:61 in train.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
[2017-01-26 23:58:41,273] From /media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/agent.py:61 in train.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
[!] Load FAILED: checkpoints/Breakout-v0/env_name=Breakout-v0/agent_type=DQN/batch_size=32/beta=0.01/data_format=NCHW/decay=0.99/discount_r=0.99/double_q=False/ep_end=0.01/ep_start=1.0/gamma=0.99/history_length=4/learning_rate=0.00025/learning_rate_decay=0.96/learning_rate_decay_step=50000/learning_rate_minimum=0.00025/max_delta=None/max_grad_norm=None/max_r=1/min_delta=None/min_r=-1/momentum=0.0/n_action_repeat=4/network_header_type=nips/network_output_type=normal/observation_dims=80,80/random_start=True/t_ep_end=1000000/t_learn_start=50000/t_target_q_update_freq=10000/t_test=10000/t_train_freq=4/t_train_max=50000000/unrolled_lstm=False/use_cumulated_reward=False/
0%| | 0/50000000 [00:00<?, ?it/s]/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
0%| | 50002/50000000 [25:59<436:00:27, 31.82it/s]E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Invalid argument: CPU BiasOp only supports NHWC.
[[Node: target_network/l1_conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target_network/l1_conv/Conv2D, target_network/l1_conv/b/read)]]
Traceback (most recent call last):
File "main.py", line 168, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 163, in main
agent.train(conf.t_train_max)
File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/agent.py", line 82, in train
q, loss, is_update = self.observe(observation, reward, action, terminal)
File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/deep_q.py", line 61, in observe
result = self.q_learning_minibatch()
File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/deep_q.py", line 84, in q_learning_minibatch
max_q_t_plus_1 = self.target_network.calc_max_outputs(s_t_plus_1)
File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/networks/network.py", line 87, in calc_max_outputs
return self.max_outputs.eval({self.inputs: observation}, session=self.sess)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 575, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3633, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: CPU BiasOp only supports NHWC.
[[Node: target_network/l1_conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target_network/l1_conv/Conv2D, target_network/l1_conv/b/read)]]
Caused by op u'target_network/l1_conv/BiasAdd', defined at:
File "main.py", line 168, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 140, in main
name='target_network', trainable=False)
File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/networks/cnn.py", line 55, in __init__
hidden_activation_fn, data_format, name='l1_conv')
File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/networks/layers.py", line 30, in conv2d
out = tf.nn.bias_add(conv, b, data_format)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 1172, in bias_add
return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 281, in _bias_add
data_format=data_format, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC.
[[Node: target_network/l1_conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target_network/l1_conv/Conv2D, target_network/l1_conv/b/read)]]
real 26m4.536s
user 10m27.848s
sys 7m41.136s
I'm training the DQN for Enduro-v0.It performs well but how can I see the average reward and loss on tensorboard?
Traceback (most recent call last):
File "main.py", line 172, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 169, in main
agent.play(conf.ep_end)
File "../deep-rl-tensorflow/agents/agent.py", line 101, in play
self.env.env.monitor.start(gym_dir)
File "/usr/local/lib/python2.7/dist-packages/gym/core.py", line 92, in monitor
raise error.Error("env.monitor has been deprecated as of 12/23/2016. Remove your call to env.monitor.start(directory)
and instead wrap your env with env = gym.wrappers.Monitor(env, directory)
to record data.")
gym.error.Error: env.monitor has been deprecated as of 12/23/2016. Remove your call to env.monitor.start(directory)
and instead wrap your env with env = gym.wrappers.Monitor(env, directory)
to record data.
hello, I'm very confused about the speed of iteration.
Within a few minutes after the program runs, the value of it/s was pretty big and it runs very fast that's what I'd love to see. But, in the process of running, the value of it/s is constantly decreasing. After about 30 minutes, the value will drop from about 10000 to 900, and it's going down.
Is this the problem of setting up the GPU or tqdm?
The graphics card I used is two Nvidia K40.
I'm getting the MemoryError. Ubuntu /2Gb ram + 4 Gb gwap:
Traceback (most recent call last):
File "main.py", line 168, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 160, in main
agent = TrainAgent(sess, pred_network, env, stat, conf, target_network=target_network)
File "/home/strky/deep-rl-tensorflow/agents/deep_q.py", line 13, in init
super(DeepQ, self).init(sess, pred_network, env, stat, conf, target_network=target_network)
File "/home/strky/deep-rl-tensorflow/agents/agent.py", line 53, in init
conf.batch_size, conf.history_length, conf.memory_size, conf.observation_dims)
File "/home/strky/deep-rl-tensorflow/agents/experience.py", line 15, in init
self.observations = np.empty([self.memory_size] + observation_dims, dtype=np.uint8)
MemoryError
diff --git a/networks/layers.py b/networks/layers.py
index 86d6052..5c8e329 100644
--- a/networks/layers.py
+++ b/networks/layers.py
@@ -7,7 +7,7 @@ def conv2d(x,
kernel_size,
stride,
weights_initializer=tf.contrib.layers.xavier_initializer(),
biases_initializer=tf.zeros_initializer,
biases_initializer=tf.zeros_initializer(),
activation_fn=tf.nn.relu,
data_format='NHWC',
padding='VALID',
diff --git a/networks/mlp.py b/networks/mlp.py
index 9c9e58f..8f595f8 100644
--- a/networks/mlp.py
+++ b/networks/mlp.py
@@ -12,7 +12,7 @@ class MLPSmall(Network):
trainable=True,
batch_size=None,
weights_initializer=initializers.xavier_initializer(),
biases_initializer=tf.zeros_initializer,
biases_initializer=tf.zeros_initializer(),
hidden_activation_fn=tf.nn.relu,
output_activation_fn=None,
hidden_sizes=[50, 50, 50],
Thanks for sharing this repo! I wish to try changing the rendering of display, and noticed the render() function is not explictly implemented in both AtariEnvironment and ToyEnvironment. Do you mind letting me know where to find the rendering options? (e.g. where you referenced the atari_env.py of gym in your code? or you didn't?)
Thanks in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.