carpedm20 / deep-rl-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

1.6K 92.0 400.0 613 KB

TensorFlow implementation of Deep Reinforcement Learning papers

License: MIT License

Python 98.08% Shell 1.92%

tensorflow deep-reinforcement-learning dqn

deep-rl-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

ml-ai-nlp-ir ashiqrh salopge asmith26 skypea paulhendricks jacklone panyang caigaojiang binbinbian wangxiao5791509 lyk125 techscientist campuslifeceo tonydeep cchongxd floodsung chagge tigerneil liurida rickyall huleg mkolod hedgefair stephen-xu zeyuan1987 skaasj rhaps0dy milestonesvn tspannhw zhangzongliang dpanshu offbit zbxzc35 liu0329 jajohe jhayes14 zuiwufenghua yanadm csdlrl shashankg7 yenchenlin andhus enyun chingyaoc vyraun viralparekh tushitao benjamesbabala eraoul andrewliao11 ungar7 guojiyao chang810249 alfiyazi kalraa debasmitdas gyunt lisiyuan656 xfdywy splendor-kill borgr skumar9876 junmyung ml-lab nccheng xiaoiker hitflame mansteinliliang seyoonhan bodidze harpribot tartavull deepalcoholic jadielam thienhoang23 vicros solertis tavpritesh helloyhan wubin1836 ksahare lqshixinlei gdtm86 zencoding wsjeon dillonalaird godpgf lan1991xu evgkarasev hongxin xiliangsong renyi533 gjp1203 haoshuji valdersoul chuanfeihuang collector-m alexzhou1995 ppoasdd

deep-rl-tensorflow's Issues

Can't finish the training

When I train the DQN,it just stop suddenly and says

/t_train_max=50000000/unrolled_lstm=False/use_cumulated_reward=False/
0%| | 0/50000000 [00:00<?, ?it/s]/home/hanzy/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/hanzy/.local/lib/python2.7/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
0%| | 50002/50000000 [28:59<482:41:00, 28.75it/s]E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7005 (compatibility version 7000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
已放弃 (核心已转储)

Why?

'TimeLimit' object has no attribute 'ale' Error

Hi, this error occurs when I try to run the demo of Breakout-v0.

And, I have install gym[atari] and can make the environment 'Breakout-v0', however, this env has no 'ale' attribute to get the current lives.

Questions about function train in agent.py

I found there should be some modifications in agent.py. It's strange to use old history when we start playing a new game.

    for self.t in tqdm(range(start_t, t_max), ncols=70, initial=start_t):
      ep = (self.ep_end +
          max(0., (self.ep_start - self.ep_end)
            * (self.t_ep_end - max(0., self.t - self.t_learn_start)) / self.t_ep_end))

      # 1. predict
      action = self.predict(self.history.get(), ep)
      # 2. act
      observation, reward, terminal, info = self.env.step(action, is_training=True)
      # 3. observe
      q, loss, is_update = self.observe(observation, reward, action, terminal)

      logger.debug("a: %d, r: %d, t: %d, q: %.4f, l: %.2f" % \
          (action, reward, terminal, np.mean(q), loss))

      if self.stat:
        self.stat.on_step(self.t, action, reward, terminal,
                          ep, q, loss, is_update, self.learning_rate_op)
      if terminal:
        observation, reward, terminal = self.new_game()
##    update history if the state is a terminal state
##    for _ in range(self.history_length):
##      self.history.add(observation)

Trained model (checkpoint file) not found?

No matter I am doing training or testing, there is a warning like
[!] Load FAILED: checkpoints/Breakout-v0/env_name=Breakout-v0/agent_type=DQN/batch_size=32/beta=0.01/data_format=NHWC/decay=0.99/discount_r=0.99/double_q=False/ep_end=0.01/ep_start=1.0/gamma=0.99/history_length=4/learning_rate=0.00025/learning_rate_decay=0.96/learning_rate_decay_step=50000/learning_rate_minimum=0.00025/max_delta=None/max_grad_norm=None/max_r=1/min_delta=None/min_r=-1/momentum=0.0/n_action_repeat=1/network_header_type=nips/network_output_type=normal/observation_dims=80,80/random_start=True/t_ep_end=1000000/t_learn_start=50000/t_target_q_update_freq=10000/t_test=10000/t_train_freq=4/t_train_max=500000/use_cumulated_reward=False/

Actually there is indeed an output file generated in the path of above warning under a new folder named 'logs'. But that is a weird file named 'events.out.tfevents.1508026080.vpn-campus-152-3-71-166.ssl.vpn.school_name.edu'. It does not look like the checkpoint, but a file about network.

Thus I am wondering how the check point file stored after training and how to load them when testing? Any suggestion will be greatly appreciated, thanks !

typo or ?

https://github.com/carpedm20/deep-rl-tensorflow/blob/master/environments/environment.py#L114
why not return the cumulated_reward?

Pre-trained Model

Hi all, does any body have a pre-trained model?
It takes a long time to train on my machine (about 2 weeks).
It will be great if any body can offer a pre-trained model :).

Numpy Ravel error

When running a test with python3 (from readme), numpy.ravel complains for data type:

python3 main.py --network_header_type=mlp --network_output_type=normal --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025

[2017-03-20 13:28:03,027] Making new env: CorridorSmall-v5
Traceback (most recent call last):
File "main.py", line 168, in
tf.app.run()
[...]/deep-rl-tensorflow/environments/corridor.py", line 66, in init
isd = (desc == 'S').ravel().astype('float64')
AttributeError: 'bool' object has no attribute 'ravel'

Is it because of python3? I saw some fixes for python3 on git already.

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Hi,

I have the above error whenever I use GPU for training
My command used for running:
python3 main.py --network_header_type=nature --env_name=Breakout-v0 --is_train=True --display=True --t_train_max=50
The crash comes whenever it reaches a certain number of training (about 10%). Full error report:

10%|██▍ | 49973/500000 [02:52<25:54, 289.55it/s]2018-05-28 23:08:09.915921: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-05-28 23:08:09.915959: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-05-28 23:08:09.915981: F tensorflow/core/kernels/conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted (core dumped)

Is anyway to fix this? Thanks!

ddpg can be added

I think ddpg can be added, this algorithm performs better for continuous action space.
Look forward. :)

how to run CartPole-v0

Question about reproducing the result

Hello,

I tried to reproduce the result (with n_action_repeat 1) on the computer with GTX 1080, however the performance is not as good as shown in the figure. After 2.88 M steps the average reward is 0.0174,
the average ep_reward is 3.1071, and the max ep_reward is 7.

Maybe I did something wrong in the setting or misread some information. Could you give me some suggestions? Thanks a lot!

Chih-Chieh

how did you define your action space?

After I read your code carefully, I cannot figure out how you define your action space. For example , how many actions you define and how to represent each action? Waiting for your answers.

Sincerely

TypeError: init() got an unexpected keyword argument 'timestep_limit'

Can't train the DQN!, I have installed gym[all] and tensorflow 1.9.0 with python 3.6.8, any idea?
$ python main.py --network_header_type=nature --env_name=Breakout-v0 Traceback (most recent call last): File "main.py", line 10, in <module> from environments.environment import ToyEnvironment, AtariEnvironment File "/media/bigdata/Solid2/DQN/deep-rl-tensorflow-master/environments/environment.py", line 6, in <module> from .corridor import CorridorEnv File "/media/bigdata/Solid2/DQN/deep-rl-tensorflow-master/environments/corridor.py", line 131, in <module> timestep_limit=100, File "/home/bigdata/.conda/envs/tensorflow/lib/python3.6/site-packages/gym/envs/registration.py", line 153, in register return registry.register(id, **kwargs) File "/home/bigdata/.conda/envs/tensorflow/lib/python3.6/site-packages/gym/envs/registration.py", line 147, in register self.env_specs[id] = EnvSpec(id, **kwargs) TypeError: __init__() got an unexpected keyword argument 'timestep_limit'

AttributeError: 'TimeLimit' object has no attribute 'ale'

Hi
I downloaded the codes, and then test it as it described here.
However, I got this error as follows,
I think, all requirements are installed except opencv2 and openAI gym was tested.
I would appreciate that someone finds the cause and the solution.

Traceback (most recent call last):
File "/DQN-tensorflow-master/main.py", line 69, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/DQN-tensorflow-master/main.py", line 64, in main
agent.train()
File "/DQN-tensorflow-master/dqn/agent.py", line 40, in train
screen, reward, action, terminal = self.env.new_random_game()
File /DQN-tensorflow-master/dqn/environment.py", line 28, in new_random_game
self.new_game(True)
File "/DQN-tensorflow-master/dqn/environment.py", line 21, in new_game
if self.lives == 0:
File "/DQN-tensorflow-master/dqn/environment.py", line 52, in lives
return self.env.ale.lives()
AttributeError: 'TimeLimit' object has no attribute 'ale'

Setting's of the Corridor game

Could you please tell me how did you set the reward at each state? It seems that all F states will receive an reward thus an agent might just keep staying on F states till episode ends and it will automatically receive max reward. I cannot reproduce the result of the dueling network's corridor game. Could you please give me any hints?

SystemError: new style getargs format but argument is not a tuple

@carpedm20 In my mac, when I ran the example, error occurred:

SystemError: new style getargs format but argument is not a tuple

Finally I found the solution for the problem, go to http://stackoverflow.com/questions/26964379/systemerror-new-style-getargs-format-but-argument-is-not-a-tuple-in-ros-camerac for more detailed explanation

in environment/environment.py line 111, the last argument for imresize should be cast to tuple, using tuple(***) for that, like this: y_screen = imresize(y, tuple(self.observation_dims)).

ValueError: list.remove(x): x not in list

This problem is in utils.py, line 12, and i am use python2.7, tensorflow 1.4.0 and gym 0.7

Add trained Snapshots

Training model takes lot of time. So, it would be nice to have trained model to evaluate directly.

initialize_all_variables deprecation warning

Getting this warning which requires updating agent.py code

WARNING:tensorflow:From /../deep-rl-tensorflow/agents/agent.py:61 in train.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use tf.global_variables_initializer instead.

How did you plot results?

Hi,

It seems that you didn't include the code related plotting, I don't know in which way you plot your training results, do you have any suggestions?

can't run in my pc

Traceback (most recent call last):
  File "main.py", line 168, in 
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 163, in main
    agent.train(conf.t_train_max)
  File "/home/ddy/workspace/deep-rl-tensorflow/agents/agent.py", line 67, in train
    observation, reward, terminal = self.new_game()
  File "/home/ddy/workspace/deep-rl-tensorflow/environments/environment.py", line 87, in new_random_game
    self.lives = self.env.ale.lives()
AttributeError: 'TimeLimit' object has no attribute 'ale'

Some questions about the results

Hi all,
I wonder what's the meaning of frame-skip in the README.

file catalog is too long that can't be created

Seeing agents/statistic.py line 20:
self.writer = tf.summary.FileWriter('./logs/%s' % self.model_dir, self.sess.graph)
It tries to use self.model_dir to create file but it raises error : "tensorflow.python.framework.errors_impl.NotFoundError: Failed to create a directory:"
How can I fix it? My python version is 3.6, and tensorflow version is 1.12

clipping the delta zeros gradients

It seems to me that in all of the agents you are clipping the gradient. This would mean that the gradients are zero for large errors.
It might be because the paper "Human-level control through deep reinforcement
learning" makes a mistake when talking about clipping the loss.
What they actually do in the implementation is: abs(loss) for abs(loss) > 1 and loss^2 for abs(loss)<1.
This can be implemented like this:

delta_grad_clip = 1                                                                                       
batch_delta = Y - DQN_acted                                                                               
batch_delta_abs = tf.abs(batch_delta)                                                                     
batch_delta_quadratic = tf.minimum(batch_delta_abs, delta_grad_clip)                                      
batch_delta_linear = batch_delta_abs - batch_delta_quadratic                                              
batch_loss = batch_delta_linear + batch_delta_quadratic**2                                                
loss = tf.reduce_mean(batch_loss)

Unknown command line flag 'data_format'

I got the following bug. Could you please help? Thank you!

$python main.py --network_header_type=nips --env_name=Breakout-v0 --use_gpu=False
Traceback (most recent call last):
File "main.py", line 173, in
tf.app.run()
File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "main.py", line 107, in main
conf.data_format = 'NHWC'
File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/tensorflow/python/platform/flags.py", line 88, in setattr
return self.dict['__wrapped'].setattr(name, value)
File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 496, in setattr
return self._set_unknown_flag(name, value)
File "/Documents/openai_gym/openai_gym/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 374, in _set_unknown_flag
raise _exceptions.UnrecognizedFlagError(name, value)
absl.flags._exceptions.UnrecognizedFlagError: Unknown command line flag 'data_format'

wrong implement of SARSA

SARSA does not caculated the loss as well as Q-Learning

in 78 line of sarsa.py

Performance is not well

Hello.
I am running your code in the atari game Breakout-v0.
the settings are simple DQN(nips), DQN(nature), DDQN, dueling DQN, dueling DDQN.

now, each processes are running almost 6M(6,000,000) frames.
but all output of processes are very low.
-> avg_p ~ [0.3 ~ 0.5]
-> avg_ep_r ~ [0.25 ~ 0.45]

on the paper (Mihn et al, 2013),
[avg_p ~ at least 2.5] and [avg_ep_r ~ at least 50] in Breakout, running time ~ 6M

I didn't adjust any code about DQN.
Has anyone seen a breakout example run? was i something wrong?

questions about the Dueling logic in network.py

Hi
It's really a good code for learning Reinforcement Learning.
In the network.py, I have 2 questions.

I think you want to assert len(value_hidden_sizes) != 0 and len(advantage_hidden_sizes) != 0.
About the Dueling part, the logic in code is layer contains value_hidden_sizes linear, then the layer is delivered to the next advantage logic. But I read the related paper, if I understand correct, it describes that the state-value and advantage are generated from the same source observation, then they're added together, and minus the mean advantage value.

Looking forward to your further response.

InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC

Got an InvalidArgumentError after 26 minutes of training. I upgraded to the most recent TensorFlow as suggested and did $ pip install -U 'gym[all]' tqdm scipy. I ran this on a TitanX and Ubuntu 16.10.

$ time python main.py --network_header_type=nips --env_name=Breakout-v0 --use_gpu=True --display=True
[2017-01-26 23:58:40,289] DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace any calls to `register(timestep_limit=200)` with `register(tags={'wrapper_config.TimeLimit.max_episode_steps': 200)}`, . This change was made 12/28/2016 and is included in gym version 0.7.0. If you are getting many of these warnings, you may need to update universe past version 0.21.1
[2017-01-26 23:58:40,289] DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace any calls to `register(timestep_limit=200)` with `register(tags={'wrapper_config.TimeLimit.max_episode_steps': 200)}`, . This change was made 12/28/2016 and is included in gym version 0.7.0. If you are getting many of these warnings, you may need to update universe past version 0.21.1
[2017-01-26 23:58:40,289] DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace any calls to `register(timestep_limit=200)` with `register(tags={'wrapper_config.TimeLimit.max_episode_steps': 200)}`, . This change was made 12/28/2016 and is included in gym version 0.7.0. If you are getting many of these warnings, you may need to update universe past version 0.21.1
[2017-01-26 23:58:40,289] DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace any calls to `register(timestep_limit=200)` with `register(tags={'wrapper_config.TimeLimit.max_episode_steps': 200)}`, . This change was made 12/28/2016 and is included in gym version 0.7.0. If you are getting many of these warnings, you may need to update universe past version 0.21.1
{'agent_type': 'DQN',
 'batch_size': 32,
 'beta': 0.01,
 'data_format': 'NCHW',
 'decay': 0.99,
 'discount_r': 0.99,
 'display': True,
 'double_q': False,
 'env_name': 'Breakout-v0',
 'ep_end': 0.01,
 'ep_start': 1.0,
 'gamma': 0.99,
 'gpu_fraction': '1/1',
 'history_length': 4,
 'is_train': True,
 'learning_rate': 0.00025,
 'learning_rate_decay': 0.96,
 'learning_rate_decay_step': 50000,
 'learning_rate_minimum': 0.00025,
 'log_level': 'INFO',
 'max_delta': None,
 'max_grad_norm': None,
 'max_r': 1,
 'max_random_start': 30,
 'memory_size': 1000000,
 'min_delta': None,
 'min_r': -1,
 'momentum': 0.0,
 'n_action_repeat': 4,
 'network_header_type': 'nips',
 'network_output_type': 'normal',
 'observation_dims': [80, 80],
 'random_seed': 123,
 'random_start': True,
 'scale': 10000,
 't_ep_end': 1000000,
 't_learn_start': 50000,
 't_target_q_update_freq': 10000,
 't_test': 10000,
 't_train_freq': 4,
 't_train_max': 50000000,
 'tag': '',
 'unrolled_lstm': False,
 'use_cumulated_reward': False,
 'use_gpu': True}
 [*] GPU : 1.0000
[2017-01-26 23:58:40,330] Making new env: Breakout-v0
[2017-01-26 23:58:40,352] Using 6 actions : NOOP, FIRE, RIGHT, LEFT, RIGHTFIRE, LEFTFIRE
INFO:tensorflow:Summary name episode/max reward is illegal; using episode/max_reward instead.
[2017-01-26 23:58:40,938] Summary name episode/max reward is illegal; using episode/max_reward instead.
INFO:tensorflow:Summary name episode/min reward is illegal; using episode/min_reward instead.
[2017-01-26 23:58:40,940] Summary name episode/min reward is illegal; using episode/min_reward instead.
INFO:tensorflow:Summary name episode/avg reward is illegal; using episode/avg_reward instead.
[2017-01-26 23:58:40,941] Summary name episode/avg reward is illegal; using episode/avg_reward instead.
INFO:tensorflow:Summary name episode/num of game is illegal; using episode/num_of_game instead.
[2017-01-26 23:58:40,943] Summary name episode/num of game is illegal; using episode/num_of_game instead.
WARNING:tensorflow:From /media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/agent.py:61 in train.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
[2017-01-26 23:58:41,273] From /media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/agent.py:61 in train.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
 [!] Load FAILED: checkpoints/Breakout-v0/env_name=Breakout-v0/agent_type=DQN/batch_size=32/beta=0.01/data_format=NCHW/decay=0.99/discount_r=0.99/double_q=False/ep_end=0.01/ep_start=1.0/gamma=0.99/history_length=4/learning_rate=0.00025/learning_rate_decay=0.96/learning_rate_decay_step=50000/learning_rate_minimum=0.00025/max_delta=None/max_grad_norm=None/max_r=1/min_delta=None/min_r=-1/momentum=0.0/n_action_repeat=4/network_header_type=nips/network_output_type=normal/observation_dims=80,80/random_start=True/t_ep_end=1000000/t_learn_start=50000/t_target_q_update_freq=10000/t_test=10000/t_train_freq=4/t_train_max=50000000/unrolled_lstm=False/use_cumulated_reward=False/
  0%|                                    | 0/50000000 [00:00<?, ?it/s]/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
  0%|                    | 50002/50000000 [25:59<436:00:27, 31.82it/s]E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Invalid argument: CPU BiasOp only supports NHWC.
	 [[Node: target_network/l1_conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target_network/l1_conv/Conv2D, target_network/l1_conv/b/read)]]
Traceback (most recent call last):
  File "main.py", line 168, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "main.py", line 163, in main
    agent.train(conf.t_train_max)
  File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/agent.py", line 82, in train
    q, loss, is_update = self.observe(observation, reward, action, terminal)
  File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/deep_q.py", line 61, in observe
    result = self.q_learning_minibatch()
  File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/agents/deep_q.py", line 84, in q_learning_minibatch
    max_q_t_plus_1 = self.target_network.calc_max_outputs(s_t_plus_1)
  File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/networks/network.py", line 87, in calc_max_outputs
    return self.max_outputs.eval({self.inputs: observation}, session=self.sess)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 575, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3633, in _eval_using_default_session
    return session.run(tensors, feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: CPU BiasOp only supports NHWC.
	 [[Node: target_network/l1_conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target_network/l1_conv/Conv2D, target_network/l1_conv/b/read)]]

Caused by op u'target_network/l1_conv/BiasAdd', defined at:
  File "main.py", line 168, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "main.py", line 140, in main
    name='target_network', trainable=False)
  File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/networks/cnn.py", line 55, in __init__
    hidden_activation_fn, data_format, name='l1_conv')
  File "/media/ch3njus/Seagate4TB/research/ab/deep-rl-tensorflow/networks/layers.py", line 30, in conv2d
    out = tf.nn.bias_add(conv, b, data_format)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 1172, in bias_add
    return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 281, in _bias_add
    data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC.
	 [[Node: target_network/l1_conv/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target_network/l1_conv/Conv2D, target_network/l1_conv/b/read)]]


real	26m4.536s
user	10m27.848s
sys	7m41.136s

How can I get the result of my own?

I'm training the DQN for Enduro-v0.It performs well but how can I see the average reward and loss on tensorboard?

gym env.monitor.start should be replaced for new gym version:

Traceback (most recent call last):
File "main.py", line 172, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 169, in main
agent.play(conf.ep_end)
File "../deep-rl-tensorflow/agents/agent.py", line 101, in play
self.env.env.monitor.start(gym_dir)
File "/usr/local/lib/python2.7/dist-packages/gym/core.py", line 92, in monitor
raise error.Error("env.monitor has been deprecated as of 12/23/2016. Remove your call to env.monitor.start(directory) and instead wrap your env with env = gym.wrappers.Monitor(env, directory) to record data.")
gym.error.Error: env.monitor has been deprecated as of 12/23/2016. Remove your call to env.monitor.start(directory) and instead wrap your env with env = gym.wrappers.Monitor(env, directory) to record data.

About tqdm and its constantly decreasing iterative speed

hello, I'm very confused about the speed of iteration.
Within a few minutes after the program runs, the value of it/s was pretty big and it runs very fast that's what I'd love to see. But, in the process of running, the value of it/s is constantly decreasing. After about 30 minutes, the value will drop from about 10000 to 900, and it's going down.

Is this the problem of setting up the GPU or tqdm?

The graphics card I used is two Nvidia K40.

The picture below shows 30 minutes later

for nvidia-smi

MemoryError when running the examples

I'm getting the MemoryError. Ubuntu /2Gb ram + 4 Gb gwap:

Traceback (most recent call last):
File "main.py", line 168, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 160, in main
agent = TrainAgent(sess, pred_network, env, stat, conf, target_network=target_network)
File "/home/strky/deep-rl-tensorflow/agents/deep_q.py", line 13, in init
super(DeepQ, self).init(sess, pred_network, env, stat, conf, target_network=target_network)
File "/home/strky/deep-rl-tensorflow/agents/agent.py", line 53, in init
conf.batch_size, conf.history_length, conf.memory_size, conf.observation_dims)
File "/home/strky/deep-rl-tensorflow/agents/experience.py", line 15, in init
self.observations = np.empty([self.memory_size] + observation_dims, dtype=np.uint8)
MemoryError

The solution to migrate code to compatible with Tensorflow1.0.1

diff --git a/networks/layers.py b/networks/layers.py
index 86d6052..5c8e329 100644
--- a/networks/layers.py
+++ b/networks/layers.py
@@ -7,7 +7,7 @@ def conv2d(x,
kernel_size,
stride,
weights_initializer=tf.contrib.layers.xavier_initializer(),

      biases_initializer=tf.zeros_initializer,

      biases_initializer=tf.zeros_initializer(),
      activation_fn=tf.nn.relu,
      data_format='NHWC',
      padding='VALID',

diff --git a/networks/mlp.py b/networks/mlp.py
index 9c9e58f..8f595f8 100644
--- a/networks/mlp.py
+++ b/networks/mlp.py
@@ -12,7 +12,7 @@ class MLPSmall(Network):
trainable=True,
batch_size=None,
weights_initializer=initializers.xavier_initializer(),

          biases_initializer=tf.zeros_initializer,

          biases_initializer=tf.zeros_initializer(),
          hidden_activation_fn=tf.nn.relu,
          output_activation_fn=None,
          hidden_sizes=[50, 50, 50],

Where is render() referenced in the environment?

Thanks for sharing this repo! I wish to try changing the rendering of display, and noticed the render() function is not explictly implemented in both AtariEnvironment and ToyEnvironment. Do you mind letting me know where to find the rendering options? (e.g. where you referenced the atari_env.py of gym in your code? or you didn't?)

Thanks in advance!