philtabor / deep-q-learning-paper-to-code Goto Github PK

View Code? Open in Web Editor NEW

338.0 19.0 146.0 48.17 MB

License: MIT License

Python 100.00%

deep-q-learning-paper-to-code's Introduction

Deep-Q-Learning-Paper-To-Code

Code for my course at Udemy:

https://www.udemy.com/course/deep-q-learning-from-paper-to-code/?referralCode=CBA45A3B737237E7BFD2

We analyze and implement the following papers:

Human Level Control Through Deep Reinforcement Learning

https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf

Deep Reinforcement Learning with Double Q Learning:

https://arxiv.org/abs/1509.06461

Dueling Network Architectures for Deep Reinforcement Learning:

https://arxiv.org/abs/1511.06581

The course is still in review, and this readme is a work in progress.

Better docs to come!

deep-q-learning-paper-to-code's People

Contributors

Stargazers

Watchers

Forkers

alomdaelmasry khuongnd zubair1811 mohammedgomaa pfoy ichko wangxn2015 niyazikemer 5l1v3r1 marlingod gotvandhub atlevesque leetoo lydxlx1 martinhavlicek 0201jin epigos srikanthkb sethguy panchul aloshalaa1992 mohankumar1905 capri2014 bakekolo fowlerlee luckysouthchou exjustice bsciftler chhatarshal yangyangfu efpm04013 saviooo osvaldomorenoornelas aivanni boangri elephann kiminh soonjune saham-negar natiy4 kaffejunge matt-net rahuldhanasiri supercatex hardhik-99 bahaduraly aryanphd simonzhangzp ccoo ghudeihed janesbuaa alphacentauri763 candy404 noralschimpf jlabhishek guy-amir jbruinink ahnafshariar syeddabeer xiongtongzhao hauselin julianlopezb eh-tien shamim237 andreadorizza agcxgz321 shajib2be aravind-11 eswaraprasadp dnovischi 13inhao justcuriousagain souradip-chakraborty knarxi goswamig mohammadforouhesh amin-tabrizian otcen4s conely89 stephenhky vamorel smita-09 wangjiannan98 himandy302 pa-wan dottu-mju jackcarr45 corot mvtkurd dancps kthuu jrgantunes pio3m youneskamel projetsplusia slhrsb bkaplowitz niceboy120 logic2code musavimariam

deep-q-learning-paper-to-code's Issues

DQN implementation error

I have this issue when I want to implement DQN from paper to code:
cannot reshape array of size 8 into shape (4,84,84),
it's raised from: return np.array(self.stack).reshape(self.observation_space.low.shape) in StackFrame class.
thank you.

Warning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.

When running the main.py for all variants of DQN, the warning is produced on recent versions of Pytorch (1.2 or higher). I tested this with 1.5, the uint8 is completely deprecated for this version.

ref: [https://github.com/open-mmlab/mmdetection/pull/2105]

No registered env with id: PongNoFrameskip-v4

Please, I have the following issue while trying to run main_dueling_ddqn.py

No gradient required for q_target calculation

Hi @philtabor,
There is a possible bug in the dqn_agent.py file at line 93:

q_target = rewards + self.gamma*q_next

needs to be replaced with:

with torch.no_grad():
      q_target = rewards + self.gamma*q_next

This issue is also raised in #9 (comment)

Could you please take a look?

Regarding target calculations in DuelDDQN and indices

Hi Phil,

I implemented your DuelDDQN architecture for myself, and was curious as to the following snippet of the learning function, as my question wasn't detailed in the course.

        q_pred = T.add(V_s,
                        (A_s - A_s.mean(dim=1, keepdim=True)))[indices, actions]

        q_next = T.add(V_s_, (A_s_ - A_s_.mean(dim=1, keepdim=True)))

        q_eval = T.add(V_s_eval, (A_s_eval - A_s_eval.mean(dim=1,keepdim=True)))

Why is it that only Q_pred is muliplied by the action maatrix, is it because it represents the actions we have just taken in the current state? Are all of these q_value matrices of the same dimensions?

Detach target nexwork in DQN

In DQNAgent, I think you may need to call detach() at line 90 to detach the target network from gradient evaluation.

Runtime error in cartpole_naive_dqn.py

While running cartpole_naive_dqn.py, I am getting an error,

actions = T.tensor(action).to(self.Q.device)
RuntimeError: Could not infer dtype of numpy.int64

I already have a conda environment setup with deep learning and gym modules installed. Do you think this is a problem with versions?

Network is not learning when convolutional layers are applied.

Hey Phil! Thanks for the course. I'm really enjoying it so far.

I've implemented the first real Deep Q Network, and it is not learning. Whenever I take off the convolutional layers and just use the fully connected layers and test it on the CartPole-v1 it is able to learn, however, whenever I test it on Pong or Breakout with the convolutional layers it does not work. I've gone through all of my code many times. I don't know what it is I messed up. I've checked the wrapper, the network, the agent, even the main loop. Could it possibly be my imports?

I'm not sure what's the best way to upload code. Let me know if there is a better way.
ExperienceReplay.txt
GymWrapper.txt
TrainAgent.txt
DeepQNetwork.txt

IndexError: tensors used as indices must be long, byte or bool tensors

When I run dqn_agent.py to train the DQN for atari pong I get the following error:

(learn36) D:\deepleren\Learn36>python main_dqn.py
Traceback (most recent call last):
File "main_dqn.py", line 40, in
agent.learn()
File "D:\deepleren\Learn36\dqn_agent.py", line 89, in learn
q_pred = self.q_eval.forward(states)[indices, actions]
IndexError: tensors used as indices must be long, byte or bool tensors

Indention of DQN/preprocess_pseudocode is broken

The indention of DQN/preprocess_pseudocode is broken and difficult to read.

ndarray error

I get this error without any line of code with Python 3.8 and the latest pytorch

<array_function internals>:5: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

and after the first episode it fails to save the model, probably because the array is not created

what if i set load_check point to true from argparser

If i set the value of load_checkpoint to True from argparser, how would the if statement work in line 95 of main.py?
if not args.load_checkpoint:
agent.store_transition(observation, action, reward, observation_, int(done))
agent.learn()
I think the agent wont be able to store the (state, action, reward, state_, done) in replay buffer and the agent wont learn either, won't it?

Viewing results of gameplay?

Hi Phil,

Thank you for your course, I found it to be the most informative and clear approach to Pytorch OpenAI RL.
I would like to view the results of the trained agent in action,

I've been previously using an array to store individual observation and and then using sk-video (See below) to make an .mp4 file out of it, or using the Monitor class.

for i in range(2):
  done = False
  observation = env.reset()
  
  score = 0
  while not done:
    action = agent.choose_action(observation)
    observation_, reward, done, info = env.step(action)
  
    img_array.append(observation)

However, as you've essentially wrapped up the environment itself, these approaches are no longer possible,as the agent expects the fully stacked, resized inputs of shape (1,84,84,4). I think it would be very helpful to get a simple way of viewing the performance of the trained agent in action as a supplementary as part of the course, is this possible?

The dqn_agent.py loss function is the wrong way around

In dqn_agent.py, shouldnt the loss function be:

loss = self.q_eval.loss(q_pred, q_target).to(self.q_eval.device)

Instead of:

loss = self.q_eval.loss(q_target, q_pred).to(self.q_eval.device)

Pytorch documentation for reference