shangtongzhang / deeprl Goto Github PK

Modularized Implementation of Deep RL Algorithms in PyTorch

License: MIT License

Python 97.54% Dockerfile 1.43% Shell 1.04%

pytorch deep-reinforcement-learning dqn dueling-network-architecture double-dqn deeprl ddpg ppo categorical-dqn option-critic

deeprl's People

Contributors

Stargazers

Watchers

Forkers

larenzhang ml-lab jason9263 benjamesbabala zhanghonglishanzai ambier ksharpdabu objects-time wicky08 flydsc hal2001 gxieaa voodooshen wangjingbo1219 hedgefair picopoco mingchixie zhexiaozhe xiaoshingshing ddrise ajaytalati saadmahboob meelement snowdj boyali seanlee97 nsokhand akumar14 w1368027790 lienbo lageek amoliu wassname cxy19941228 msaffarm lineryang gpnu-frank bekerov ryfan-rs chanyk-joseph anantcs leechikara johny-c nithin127 kentsommer keithsw michaelchi08 hercky pazocal graysondu yydxlv tallni2 liupingchao layne-wang shubhampachori12110095 transcendency tonylins monkeyjohn interceptldn xiaojianzhang eycab xiaoliang008 nadavbh12 jwyang agakshat lgdkobe24 jpilaul onisimchukv sboominathan wbokun onlytailei jackhaha363 zy20091082 suhangpro unbornchikken dolphinamy smrjans ml-ai-nlp-ir wanghm92 yding37 parrondo beckgom jihyunyoon baifengbai minchaokang neo4reo didw parsonszeng kun0906 codeaudit zenozhouzhao xiaogengyaokeyan nathanielwei ii0 emezac kougou nasimrahaman namisan zhuzhenping zvezt

deeprl's Issues

Option-Critic doesn't seem to converge

Hi,
I used your implementation of the OC framework with the cartpole environment and it doesn't seem to converge.

One questions regarding your implementation in the iteration() method of the OptionCriticAgent class:
Why do you calculate the returns only for the current option (prev_option)? What if during the rollout the option changed? Shouldn't the option parameters be updated right after the option ended?

Best regards

Simple Question

It seems like the code gets slower epoch by epoch. Is there any way I can fix this? (After exploration steps which is 50000)

No Monitor Files

When I run the code examples.py an error is raised

Traceback (most recent call last):
  File "examples.py", line 445, in <module>
    plot()
  File "examples.py", line 391, in plot
    data = plotter.load_results(names)
  File "plot.py", line 47, in load_results
    ts = load_monitor_log(dir)
  File "bench.py", line 107, in load_monitor_log
    raise LoadMonitorResultsError("no monitor files of the form *%s found in %s" % (Monitor.EXT, dir))
deep_rl.component.bench.LoadMonitorResultsError: no monitor files of the form *monitor.csv found in ./log

I wonder whether there is something wrong with the generation of monitor file

Segmentation fault when importing on a headless server

I just noticed that it segfaults when used on a headless server. I tried it on an ec2 Deep Learning AMI Image, with ubuntu16.04, python3.6, all requirements.txt same as the file. Meanwhile it worked fine on my desktop. I'm guessing it due to lack of a display, has anyone else experienced this?

I'm logging this as an issue so others know the cause of their segfault may be use on a headless server.

P.S I like how you've packaged DeepRL up

Issues with upgrade to PyTorch v0.2

If you find some implementation consistently diverges, try to run it with v0.1.12, v0.2 may have potential issues.

A3C with Atari game will throw segment fault in v0.2.0_4. You can run A3C with Atari game in v0.1.12 as the implementation of A3C is backward compatible.

Atari_wrapper different state shape

There are some environments such as DoubleDunk and BankHeist, of which the state shape is not 210 160 3. The wrapper you write failed in these environments.

Optimizer and traning frequency

The optimizer in origin Deepmind DQN paper is not the traditional RMSprop,
you can refer to here https://github.com/spragunr/deep_q_rl/blob/master/deep_q_rl/updates.py
and section 6.2 in this paper https://arxiv.org/pdf/1711.07478.pdf

Also, often we update the Q network after 4 step or something like 'update frequency = 4', I cannot find such implementation in your DQN code.

Please correct me if I am wrong.

Log dir is empty.

Hi, I just run your code and I can see the output in the terminal. But the log dir is empty. I don't know how to output the log info into the log file. Besides, how can I use the tensorboard? Looking forward to your reply. Thanks.

can't run last experiments in examples.py

Hi
Thanks for sharing this. I installed the requirements and running the model on CPU, and I could run the first two experiments in examples.py but I can't run the last one

    game = 'BreakoutNoFrameskip-v4'
    dqn_pixel(game=game)
    quantile_regression_dqn_pixel(game=game)
    categorical_dqn_pixel(game=game)
    a2c_pixel(game=game)
    n_step_dqn_pixel(game=game)
    option_critic_pixel(game=game)
    ppo_pixel(game=game)

it just freezes here after this message 2019-05-20 13:19:20,818 - root - INFO: steps 0, 38479853.21 steps/s

more specifically it freezes in the step function in "baseAgent.py" when running return self.__pipe.recv(); can you please have a look at this?

Rendering with DummyVecEnv

Hi I would like to render with DummyVecEnv but I get an error back saying it has not been implemented from baselines/common/vec_env/vec_env.py. How would I go about to implement rendering. Here is the traceback, any help would be appreciated:

Traceback (most recent call last):
  File "examples.py", line 473, in <module>
    dqn_pixel_atari(game)
  File "examples.py", line 80, in dqn_pixel_atari
    evaluate_game.evaluate_game(DRQNAgent(config),log, name)
  File "/home/mariano/Documents/DeepRL-0.3/evaluate_game.py", line 18, in evaluate_game
    task.render()
  File "/home/mariano/Documents/DeepRL-0.3/deep_rl/component/envs.py", line 203, in render
    return self.env.render()
  File "/home/mariano/Documents/DeepRL-0.3/baselines/baselines/common/vec_env/vec_env.py", line 111, in render
    imgs = self.get_images()
  File "/home/mariano/Documents/DeepRL-0.3/baselines/baselines/common/vec_env/vec_env.py", line 125, in get_images
    raise NotImplementedError
NotImplementedError

a3c worker exited unexpectedly

I can run your dqn code under pytorch=v0.2, but the a3c seems cannot run correctly. It always logs that
worker xxx exited unexpectedly
worker xxx restarted
xxx is the different process numbers.

But when I switch to pytorch v0.1.12. a3c seems partially correct. There still "worker xxx exited unexpectedly worker xxx restarted" log in some process, but much less. And I can see the log print something like "total step" and "averaged return". Is it a known bug? Do you know why this happens?

Dueling DQN ,The expanded size of the tensor (2) must match the existing size (10) at non-singleton dimension 1

Ubuntu 16.04
pytorch 0.3.1
run your Dueling DQN, it reports

Traceback (most recent call last):
  File "/home/opencv/PycharmProjects/RL-Pytorch/main.py", line 422, in <module>
    dqn_cart_pole()
  File "/home/opencv/PycharmProjects/RL-Pytorch/main.py", line 29, in dqn_cart_pole
    run_episodes(DQNAgent(config))
  File "/home/opencv/PycharmProjects/RL-Pytorch/utils/misc.py", line 21, in run_episodes
    reward, step = agent.episode()
  File "/home/opencv/PycharmProjects/RL-Pytorch/agent/DQN_agent.py", line 59, in episode
    q_next = self.target_network.predict(next_states, False).detach()  # TD网络计算Q'
  File "/home/opencv/PycharmProjects/RL-Pytorch/network/base_network.py", line 91, in predict
    q = value.expand_as(advantange) + (advantange - advantange.mean(1).expand_as(advantange))
  File "/anaconda3/envs/gymlab/lib/python3.5/site-packages/torch/autograd/variable.py", line 433, in expand_as
    return self.expand(tensor.size())
RuntimeError: The expanded size of the tensor (2) must match the existing size (10) at non-singleton dimension 1.

But i solved it ,
in base_network.py ,line 91
change the
q = value.expand_as(advantange) + (advantange - advantange.mean(1).expand_as(advantange))
to
q = value.expand_as(advantange) + (advantange - advantange.mean(1,keepdim=True).expand_as(advantange))

then it works well

Error when running the code

When i run the code for dqn_atari, I get the following error by python examples.py-

  File "examples.py", line 423, in <module>
    dqn_cart_pole()
  File "examples.py", line 15, in dqn_cart_pole
    config.evaluation_env = config.task_fn()
  File "examples.py", line 14, in <lambda>
    config.task_fn = lambda: ClassicalControl(game, max_steps=200)
  File "/home/daksh/DeepRL/deep_rl/component/task.py", line 35, in __init__
    BaseTask.__init__(self)
AttributeError: class BaseTask has no attribute '__init__'

I tried adding an init function to that class and then it gives a segfault.
I have all requirements satisfied as given in requirements.txt

continuous a3c

Hey,
nice code! you mentioned that sometimes you get nans for training the continuous a3c, maybe you can try to add some weight decay like 1e-4, cos this at least helps me with my implementation, maybe you can also try :)

issue in async_agent.py

Hi Shangtong:

I was wondering whether here should be: if i == 0 rather than if i == config.num_workers, as in the procs definition the first process is evaluate.

DeepRL/agent/async_agent.py

Lines 103 to 106 in 8224a41

 if i == config.num_workers: 

 target = evaluate 

 else: 

 target = train

Running examples.py

Hey ShangtonZhang,

Thanks for putting this library up! I am having a bit of trouble running examples.py with cuda (cpu version is working fine):

(python35) [ygx@benson DeepRL]$ python examples.py 
Traceback (most recent call last):
  File "examples.py", line 560, in <module>
    dqn_pixel_atari(game)
  File "examples.py", line 66, in dqn_pixel_atari
    run_steps(DQNAgent(config))
  File "/home/ygx/libraries/DeepRL/deep_rl/agent/DQN_agent.py", line 54, in __init__
    self.actor.set_network(self.network)
  File "/home/ygx/libraries/DeepRL/deep_rl/agent/BaseAgent.py", line 129, in set_network
    self.__pipe.send([self.NETWORK, net])
  File "/home/ygx/libraries/anaconda3/envs/python35/lib/python3.5/multiprocessing/connection.py", line 206, in send
    self._send_bytes(ForkingPickler.dumps(obj))
  File "/home/ygx/libraries/anaconda3/envs/python35/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/ygx/libraries/anaconda3/envs/python35/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 179, in reduce_storage
    raise RuntimeError("Cannot pickle CUDA storage; try pickling a CUDA tensor instead")
RuntimeError: Cannot pickle CUDA storage; try pickling a CUDA tensor instead

I was running the DQN on breakout. Here are the lines I uncommented in examples.py:

if __name__ == '__main__':                                                      
    mkdir('data/video')                                                         
    mkdir('dataset')                                                            
    mkdir('log')                                                                
    set_one_thread()                                                            
    select_device(1)                                                            
    # select_device(0)                                                                                                    
                                    
    game = 'Breakout'                                                           
    dqn_pixel_atari(game)                                                       
    # quantile_regression_dqn_pixel_atari(game)                                 
    #categorical_dqn_pixel_atari(game)                                          
    # a2c_pixel_atari(game)                                                     
    # n_step_dqn_pixel_atari(game)                                              
    #option_ciritc_pixel_atari(game)                                            
    # ppo_pixel_atari(game)                                                     
    # dqn_ram_atari(game)                                                       
    #ddpg_pixel()                                                               
                                                                                 
    # action_conditional_video_prediction()                                                                                                                      
    # plot()

I am using on a Linux machine with a couple V100s, using Python 3.5 and Pytorch 0.4.1.post2. I also tried Python 3.6 and Pytorch 1.0rc1.

Runtimeerror: size mismatch

simple question about the episode return curve in README

Your code is very professional an is very helpful for me. Thank you very much! Here, I have a simple question: whether the episode return curve of BreakoutNoFrameskip-v4 in README is the return in eval process rather than the return during training?

set_one_thread() in example.py

Hello Shangtong,

Sorry to interrupt you but I am new to pytorch. A quick question - what is the purpose of set_one_thread() in example.py? Is it simply equivalent to setting num_workers = 1 ? Am I understanding your code correctly that every time I want to run an algorithm with num_workers>1, I should comment out set_one_thread()?

How to run the code?

Hi, how can I run the code like DQN_agent.py? How can I test it?

Q learning: epsilon-greedy during test phase

Hey,

Notice that as opposed to the original implementation, you present the average reward based on the epsilon-greedy policy, however in Q-learning epsilon is added during training and should be set to a small constant during the test phase (5% if I recall correctly).

Running your code on the server will appear: Segmentation fault (core dumped)

 Sorry to bother you, I am getting an error when submitting your code to the server: Segmentation fault (core dumped). There is no problem running on my stand-alone machine. I don't know why, please explain it.

Issue of running async_cart_pole()

When I try to run async_cart_pole() in main.py, I received the following error:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/xuxie/pytorch/Py3Virtual/DeepRL/agent/async_agent.py", line 23, in train
worker = config.worker(config, learning_network, extra)
File "/home/xuxie/pytorch/Py3Virtual/DeepRL/async_worker/one_step_q.py", line 16, in init
self.worker_network = config.network_fn()
File "main.py", line 30, in
config.network_fn = lambda: FCNet([4, 50, 200, 2])
File "/home/xuxie/pytorch/Py3Virtual/DeepRL/network/shallow_network.py", line 17, in init
BasicNet.init(self, optimizer_fn, gpu)
File "/home/xuxie/pytorch/Py3Virtual/DeepRL/network/base_network.py", line 21, in init
self.cuda()
File "/home/xuxie/pytorch/Py3Virtual/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in cuda
return self._apply(lambda t: t.cuda(device_id))
File "/home/xuxie/pytorch/Py3Virtual/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/home/xuxie/pytorch/Py3Virtual/lib/python3.6/site-packages/torch/nn/modules/module.py", line 124, in _apply
param.data = fn(param.data)
File "/home/xuxie/pytorch/Py3Virtual/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in
return self._apply(lambda t: t.cuda(device_id))
File "/home/xuxie/pytorch/Py3Virtual/lib/python3.6/site-packages/torch/_utils.py", line 66, in cuda
return new_type(self.size()).copy(self, async)
File "/home/xuxie/pytorch/Py3Virtual/lib/python3.6/site-packages/torch/cuda/init.py", line 266, in _lazy_new
_lazy_init()
File "/home/xuxie/pytorch/Py3Virtual/lib/python3.6/site-packages/torch/cuda/init.py", line 83, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I also tried to add following lines in agent/async_agent.py:

try:

mp.set_start_method('spawn')

except RuntimeError:

pass

However, I received the following error:
AttributeError: Can't pickle local object 'async_cart_pole..'
But both Train() and Evaluate() methods are on the top of that module, shouldn't be unpickable.

BTW, I use the following command to run the script(inside python virtual environment):
python3.6 main.py

Do you know how to solve the above issue? Or is there any proper way to run the async methods?

Thx!

Are you speed up the env?

In Atari game env,I found every episode is over soon.And I can't add env.render() to show the game.
(抱歉又留一次言。我发现在atari游戏里，每一个回合都跑得很快。比如Pong这个游戏，虽然一个回合要打满21分，但是还是感觉一秒钟就结束了。请问您是把游戏加速了吗？)

DQN taking too long to converge

I am running DQN on 1 GPU but it seems to take more than a few days to converge on Pong. A2C needs about 15 hours to converge on Pong using my current setup. I am using Python 3.6.5 and I also set config.async_actor = True as recommended ;D

Just trying to understand the run times for different methods as I don't see any benchmark in terms of time for this repo. Two notes from my side:

It would be great to have the run time benchmarks on different games using different models.
Is my DQN taking way too long than it should? If so, where do you suggest that I can go hunt down the issue? The step per second seems to be on the order of hundreds in the beginning but it gradually decreases to 10-20 steps per second.

can not start the dqn_pixel_atari agent

if I start the dqn_pixel_atari('PongNoFrameskip-v4') or the dqn_pixel_atari('BreakoutNoFrameskip-v4') agent, I receive an error message:

Traceback (most recent call last):
  File "main.py", line 283, in <module>
    dqn_pixel_atari('PongNoFrameskip-v4')
  File "main.py", line 84, in dqn_pixel_atari
    run_episodes(DQNAgent(config))
  File ".../DeepRL/DeepRL/utils/misc.py", line 21, in run_episodes
    reward, step = agent.episode()
  File ".../DeepRL/DeepRL/agent/DQN_agent.py", line 54, in episode
    self.replay.feed([state, action, reward, next_state, int(done)])
  File ".../DeepRL/DeepRL/component/replay.py", line 32, in feed
    self.states = np.empty((self.memory_size, ) + state.shape, dtype=self.dtype)
MemoryError

Any idea what is the error? Thanks !

P3O_continuous stuck in a loop.

The p30 code, seems to be stuck in loop, without any progress.

I think it is at the While True loop, on line 97, in the async_agent.py file. I can't seem to debug that.

Could you please help me with it? Thanks!

Testing for DQN

I am not able to find the code for testing of DQN agent. Is it implemented or we ourselves have to implement it?

Adding TRPO

I know TRPO might get out of fashion because of PPO, but it might worth implementing.

What does torch.cuda.is_available() by itself do?

Hi, thanks for this codebase! I've been trying to get familiar with it and I was wondering why torch.cuda.is_available() is called in a couple of places:

DeepRL/deep_rl/agent/BaseAgent.py

Line 87 in 6f40399

torch.cuda.is_available()

DeepRL/deep_rl/component/replay.py

Line 60 in 6f40399

torch.cuda.is_available()

It seems like this statement should have no effect. Is it some kind of workaround for something?

why is the episode run such fast in atari game?

entropy term in continuous spaces?

Hello Shangtong,

Sorry for bothering you, it's more a question of the code clarification instead of an bug issue. I am a bit confused on how you define the entropy in class GaussianActorCriticNet(nn.Module, BaseNet). The final return is tensor(np.zeros((log_prob.size(0), 1))) for entropy. Does that mean we just define the entropy term as zeros in this case.

Thanks again for any help.

Why "End of Asynchronous Methods"?

Hi,

You said "asynchronous methods are getting deprecated nowadays". Why are they deprecated now? Could you recommend some papers about this conclusion? Thanks.

Error in A2C_agent.py?

Thanks a lot for writing this!

In line 51 of A2C_agent.py, we have next_value = rollout[i + 1][2], however in line 42 only the second value is populated for the last row, and in general values are in the second, not third, position in each rollout row - should it be next_value = rollout[i + 1][1] ?
That value is never actually used unless self.config.use_gae=True, and use_gae is False in examples.py, could this be why this has gone unnoticed - or am I missing something?

What's the difference of the feature and the pixels?

Excuse me, I'm a beginner of reinforcement learning, I was recently interested in your QUOTA algorithm, while running your code, I saw feature and pixel, I'd like to know the difference. Also, I didn't find the code about QUOTA, please give me some advice when you are free

Docker script running into an error

After running docker_build.sh it fails at this step. And fails at the same point on rerun.

Step 14/33 : COPY ./mjkey.txt /root/.mujoco/mjkey.txt
COPY failed: stat /var/lib/docker/tmp/docker-builder373981879/mjkey.txt: no such file or directory

where to download dataset

$ python dataset.py
Traceback (most recent call last):
  File "dataset.py", line 103, in <module>
    generate_dateset(game)
  File "dataset.py", line 67, in generate_dateset
    agent.load(model_file)
  File "/home/xxx/DeepRL/agent/BaseAgent.py", line 21, in load
    state_dict = torch.load(filename, map_location=lambda storage, loc: storage)
  File "/home/xxx/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 265, in load
    f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'data/DQNAgent-vanilla-model-PongNoFrameskip-v4.bin'

Thanks for noticing the data directory or where to download this.

Does DQN on Pong work?

Hi,

I ran your code with a DQN on Pong, but it doesn't seem to give returns that anywhere look learned. The returns are almost always -20 or so. This even happens with code directly cloned from your repo and with the same version of pytorch 0.4.0, though I turned on async actor and async memory. Do you know what might be going wrong?

Also, can you attach your pong training curve as a reference?

Thanks!

For example, this is printed on the terminal and even after 1486000 steps, the returns are too bad: 2019-01-11 18:39:54,986 - root - INFO: total steps 1486000, returns -20.00/-20.00/-20.00/-20.00 (mean/median/min/max), 259.28 steps/s

Does DeepRL support multiple gpus or nn.DataParallel()?

Hi Shangtong,

I wonder whether DeepRL modules could be run on multiple gpus of single machine?
Or could DeepRL modules be easily applied to nn.DataParallel()?

Thanks and waiting for your reply.
Simon

DQN performance on Breakout

Hi. Can I ask what is the final score of your DQN implementation and how long it takes to converge?

Thanks!

Two questions

HI Shangtong：

Thanks for making this repository public. I was reading your repository and have two questions, and would really appreciate if you could help give more information.

In the DDPG implementation, you mentioned that the repo you refer to is problematic. Can you please be more specific why that implementation is wrong? As I can see, you have one more line in the actor update which is

DeepRL/agent/DDPG_agent.py

Line 98 in db49a76

var_actions = Variable(actions.data, requires_grad=True)

is this the reason?

Also, in the Normalizer function

DeepRL/utils/normalizer.py

Line 9 in db49a76

class Normalizer:

, they are variables "m", "v", "n", can you pls comment on what they stand for? Thanks.

does it allow to save and load optimizer information in A2CAgent pixel

I tried to save agent.network as a model with torch.save(), and succeed but without optimizer information when I tried to resume training, which is I need. I tried to save with torch.save({'state_dict':agent.network.state_dict, 'optimizer':agent.optimizer}, path), when I load this kind of model, the code runs into error, meaning model.load_state_dict() is not employed. The optimization method is RMSprop, Can you help solve this problem, thank you.

name 'random_seed(seed)' is not defined

Exception has occurred: NameError
name 'random_seed' is not defined
  File "/home/nudt302/myPythonDemo/DeepRL/deep_rl/component/envs.py", line 29, in _thunk
    random_seed(seed)
  File "/home/nudt302/myPythonDemo/DeepRL/deep_rl/component/envs.py", line 128, in <listcomp>
    self.envs = [fn() for fn in env_fns]
  File "/home/nudt302/myPythonDemo/DeepRL/deep_rl/component/envs.py", line 128, in __init__
    self.envs = [fn() for fn in env_fns]
  File "/home/nudt302/myPythonDemo/DeepRL/deep_rl/component/envs.py", line 168, in __init__
    self.env = Wrapper(envs)
  File "/home/nudt302/myPythonDemo/DeepRL/examples.py", line 240, in a2c_continuous
    config.eval_env = Task(config.game)
  File "/home/nudt302/myPythonDemo/DeepRL/examples.py", line 470, in <module>
    a2c_continuous(game=game)

Does anyone know this issue? help me, thanks.

can't find the data directory

Hi Shangtong,

Could you tell me how can I create the data directory or where can I download the dataset? Your code can't run successfully without the data data directory. Thanks!

Performance tendency of game Breakout is not same as the plot you put on the homepage

Hello, thank you for your contribution again. Till now, some problems still worries me. The figure, as shown in the below, is my training plot by using your code. However, the performance tendency of game Breakout is not same as the plot you put on the homepage. In addition, my training is so slow and I have to use 3e7 rather than 2e7 steps to train it. The cost time of each training with 3e7 steps may be 2 days. I don't know why?

Memory leak?

Hello,

I ran python examples.py using game = 'Breakout' and dqn_pixel_atari_game(game), and after about 60K steps, I ran out of memory. I have 16Gb of RAM (and a 6Gb NVIDIA 1060), so I'm surprised this happened. Is there an explanation for this? I'm using PyTorch 1.0 preview.

some questions on the actor_critic.py file

Hi Shangtong,

Sorry to trouble again. Recently I am reading your source code, especially on the A3C part. I am a bit confused on the following part.

GAE = torch.FloatTensor([[0]])
for i in reversed(range(len(pending))):
    prob, log_prob, value, action, reward = pending[i]
    if i == len(pending) - 1:
        delta = reward + config.discount * R - value.data
    else:
        delta = reward + pending[i + 1][2].data - value.data
    
    GAE = config.discount * config.gae_tau * GAE + delta
    loss += -log_prob.gather(1, Variable(torch.LongTensor([[action]]))) * Variable(GAE)
    loss += config.entropy_weight * torch.sum(torch.mul(prob, log_prob))
    R = reward + config.discount * R
    loss += 0.5 * (Variable(R) - value).pow(2)

Which paper do you refer on implementing the A3C, since I want to know the detailed equation of calculating GAE. I don't know the gae_tau mean, since in the paper I have read, there is no such symbol, so I am a bit confused.
I can't understand the way of calculating loss. Could you give some comments on this part.

Since I am a beginner in reinforcement learning, maybe I will encounter more problem when I finish reading your code. It's highly appreciated if you can help me or give me some suggestions. Thanks.

quantile_regression_dqn_cart_pole()

Hi ShangtonZhang,

I am running your examples.py with the following lines in the main:

if name == 'main':
mkdir('log')
mkdir('tf_log')
set_one_thread()
random_seed()
select_device(-1)
#select_device(0)
quantile_regression_dqn_cart_pole()

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
However, I got this error:
Traceback (most recent call last):
File "examples.py", line 445, in
quantile_regression_dqn_cart_pole()
File "examples.py", line 89, in quantile_regression_dqn_cart_pole
run_steps(QuantileRegressionDQNAgent(config))
File "C:\ShangtongZhang\code\deep_rl\agent\QuantileRegressionDQN_agent.py", line 38, in init
self.actor = QuantileRegressionDQNActor(config)
File "C:\ShangtongZhang\code\deep_rl\agent\QuantileRegressionDQN_agent.py", line 16, in init
self.start()
File "C:\Users\w\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\w\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\w\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\w\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\w\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'ArgumentParser.init..identity'

Thanks for your help!
I can run your: a2c_cart_pole()

Just a quick question

DeepRL/agent/QuantileRegressionDQN_agent.py

Lines 67 to 68 in e427e8f

 q_next = (quantiles_next * self.quantile_weight).sum(-1) 

 _, a_next = torch.max(q_next, dim=1)

Just a few quick questions...:

The '* self.quantile_weight' component seems to be not really relevant here because doing the multiplication will not change the relative order of each action value, therefore no matter it is multiplied or not, all action values stay the same order.
Since quantiles_next has dim=-1 softmaxed (right?), therefore all the actions in this line will have the same values. Just wondering if this is correct.

Thanks for your codes.

How can I build the gpu version docker

I build the docker by using the dockerfile. But it seems do not have cuda inside.

How can I build a gpu version docker to run your code?

	if i == config.num_workers:
	target = evaluate
	else:
	target = train

	q_next = (quantiles_next * self.quantile_weight).sum(-1)
	_, a_next = torch.max(q_next, dim=1)

shangtongzhang / deeprl Goto Github PK

deeprl's People

Contributors

Stargazers

Watchers

Forkers

deeprl's Issues

try:

mp.set_start_method('spawn')

except RuntimeError:

pass

Recommend Projects

Recommend Topics

Recommend Org