dxyang / dqn_pytorch Goto Github PK

View Code? Open in Web Editor NEW

425.0 425.0 93.0 2.83 MB

Vanilla DQN, Double DQN, and Dueling DQN implemented in PyTorch

Python 100.00%

dqn_pytorch's People

Contributors

Stargazers

Watchers

Forkers

bgreenawald a-blackfish xiaoyigwr mohamad-hasan-sohan-ajini root-master negation russellksing lbingd17 kaikangsdu gcbbobo samtonetto kite-hz etarakci-hvl ageliss tomgtqq yuanqinglee yifan-you-37 jiangnanhugo akaanug myxyzy barneyqiao loicvz190 miracle1207 dukecheung frankgt darrenzhang01 mulintia liutong966 tzq2doc mgilgamesh vitorascorrea longjiao993 msfallah58 cav-research-lab amine179 alirezashamsoshoara zouying-sjtu geyour tech-jin-coder lulululuxingyu xinqiangyu tomsong00 alex-shilei hsinglukliu jianghuchuanshuo shi-tianyi glen9010 andrepattu usuallyoralways drib861204 nobuhiro14 xkaik ankurhcu yangmli af-74413592 susan1314 whoismanoj zjw49246 bigbigyellow zhujinyang610 ysj1646 rnaimehaom ovechou dbdsir oneenooo kcdgz ankitshaw codwest w844680976 zhang-xinyuan selap91 dancingonmoon jocyei chajiuqqq xiuyuan0216 ddouble chengji-bgnr dynamicdr comp3030j-group-4 minizhao learlzy listen-wind6 bmblackmap xhfxhfff littlejudy shenjiede ruyi-feng lala6686 ikunerer i9nis wyq199321 wjm-24 yangsharing

dqn_pytorch's Issues

What's the meaning of flip the bellman error?

Thanks for your wonderful code, I only use the logic of your trainig part in my code. But I fount that the effect of my model gets worse with training. When I edit

DQN_pytorch/learn.py

Line 225 in 43fe371

clipped_error = -1.0 * error.clamp(-1, 1)

to
clipped_error = 1.0 * bellman_error the model works well.
I don't understand why the bellman error needs to be flipped here?

Runtime Error : Index tensor must have same dimensions as input tensor

Hi If i run the code for breakout, i am getting the following error.

IndexError

Traceback (most recent call last):
File "main.py", line 120, in
main()
File "main.py", line 117, in main
atari_learn(env, task.env_id, num_timesteps=task.max_timesteps, double_dqn=double_dqn, dueling_dqn=dueling_dqn)
File "main.py", line 72, in atari_learn
dueling_dqn=dueling_dqn
File "/home/sotirisnik/other/DQN_pytorch/learn.py", line 149, in dqn_learning
obs, reward, done, info = env.step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 99, in step
return self._step(action)
File "/home/sotirisnik/other/DQN_pytorch/utils/atari_wrappers.py", line 132, in _step
obs, reward, done, info = self.env.step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 99, in step
return self._step(action)
File "/home/sotirisnik/other/DQN_pytorch/utils/atari_wrappers.py", line 124, in _step
obs, reward, done, info = self.env.step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 99, in step
return self._step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 279, in _step
return self.env.step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 99, in step
return self._step(action)
File "/home/sotirisnik/other/DQN_pytorch/utils/atari_wrappers.py", line 93, in _step
obs, reward, done, info = self.env.step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 99, in step
return self._step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 279, in _step
return self.env.step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 99, in step
return self._step(action)
File "/home/sotirisnik/other/DQN_pytorch/utils/atari_wrappers.py", line 53, in _step
obs, reward, done, info = self.env.step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 99, in step
return self._step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/wrappers/monitoring.py", line 33, in _step
observation, reward, done, info = self.env.step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 99, in step
return self._step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/wrappers/time_limit.py", line 36, in _step
observation, reward, done, info = self.env.step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/core.py", line 99, in step
return self._step(action)
File "/home/sotirisnik/anaconda3/envs/my_env/lib/python3.6/site-packages/gym/envs/atari/atari_env.py", line 73, in _step
action = self._action_set[a]
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
[2018-06-27 02:58:15,904] Finished writing results. You can upload them to the scoreboard via gym.upload('/home/sotirisnik/other/DQN_pytorch/tmp/BreakoutNoFrameskip-v4')

It stills works? Cause i am not being able to test it.

I know Gym has moved to Gymnasium, but some functions change... I tried to make the transition but i am not getting it. Is it possible you to update so we can test your amazing work? I would be very pleased if so. Thanks

Dueling dqn equation

Thanks for offering this wonderful code. But I have a question.

Why in the combination part of the equation, the advantage A need to subtract it's average? I've already refer to the paper but still don't understand.

dxyang / dqn_pytorch Goto Github PK

dqn_pytorch's People

Contributors

Stargazers

Watchers

Forkers

dqn_pytorch's Issues

What's the meaning of flip the bellman error?

Runtime Error : Index tensor must have same dimensions as input tensor

IndexError

It stills works? Cause i am not being able to test it.

Dueling dqn equation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent