philtabor / deep-q-learning-paper-to-code Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi @philtabor,
There is a possible bug in the dqn_agent.py file at line 93:
q_target = rewards + self.gamma*q_next
needs to be replaced with:
with torch.no_grad():
q_target = rewards + self.gamma*q_next
This issue is also raised in #9 (comment)
Could you please take a look?
When running the main.py for all variants of DQN, the warning is produced on recent versions of Pytorch (1.2 or higher). I tested this with 1.5, the uint8 is completely deprecated for this version.
ref: [https://github.com/open-mmlab/mmdetection/pull/2105]
In dqn_agent.py, shouldnt the loss function be:
loss = self.q_eval.loss(q_pred, q_target).to(self.q_eval.device)
Instead of:
loss = self.q_eval.loss(q_target, q_pred).to(self.q_eval.device)
The indention of DQN/preprocess_pseudocode is broken and difficult to read.
While running cartpole_naive_dqn.py, I am getting an error,
actions = T.tensor(action).to(self.Q.device)
RuntimeError: Could not infer dtype of numpy.int64
I already have a conda environment setup with deep learning and gym modules installed. Do you think this is a problem with versions?
Hey Phil! Thanks for the course. I'm really enjoying it so far.
I've implemented the first real Deep Q Network, and it is not learning. Whenever I take off the convolutional layers and just use the fully connected layers and test it on the CartPole-v1 it is able to learn, however, whenever I test it on Pong or Breakout with the convolutional layers it does not work. I've gone through all of my code many times. I don't know what it is I messed up. I've checked the wrapper, the network, the agent, even the main loop. Could it possibly be my imports?
I'm not sure what's the best way to upload code. Let me know if there is a better way.
ExperienceReplay.txt
GymWrapper.txt
TrainAgent.txt
DeepQNetwork.txt
In DQNAgent, I think you may need to call detach()
at line 90 to detach the target network from gradient evaluation.
Hi Phil,
I implemented your DuelDDQN architecture for myself, and was curious as to the following snippet of the learning function, as my question wasn't detailed in the course.
q_pred = T.add(V_s,
(A_s - A_s.mean(dim=1, keepdim=True)))[indices, actions]
q_next = T.add(V_s_, (A_s_ - A_s_.mean(dim=1, keepdim=True)))
q_eval = T.add(V_s_eval, (A_s_eval - A_s_eval.mean(dim=1,keepdim=True)))
Why is it that only Q_pred is muliplied by the action maatrix, is it because it represents the actions we have just taken in the current state? Are all of these q_value matrices of the same dimensions?
When I run dqn_agent.py to train the DQN for atari pong I get the following error:
(learn36) D:\deepleren\Learn36>python main_dqn.py
Traceback (most recent call last):
File "main_dqn.py", line 40, in
agent.learn()
File "D:\deepleren\Learn36\dqn_agent.py", line 89, in learn
q_pred = self.q_eval.forward(states)[indices, actions]
IndexError: tensors used as indices must be long, byte or bool tensors
I get this error without any line of code with Python 3.8 and the latest pytorch
<array_function internals>:5: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
and after the first episode it fails to save the model, probably because the array is not created
I have this issue when I want to implement DQN from paper to code:
cannot reshape array of size 8 into shape (4,84,84),
it's raised from: return np.array(self.stack).reshape(self.observation_space.low.shape) in StackFrame class.
thank you.
Hi Phil,
Thank you for your course, I found it to be the most informative and clear approach to Pytorch OpenAI RL.
I would like to view the results of the trained agent in action,
I've been previously using an array to store individual observation and and then using sk-video (See below) to make an .mp4 file out of it, or using the Monitor class.
for i in range(2):
done = False
observation = env.reset()
score = 0
while not done:
action = agent.choose_action(observation)
observation_, reward, done, info = env.step(action)
img_array.append(observation)
However, as you've essentially wrapped up the environment itself, these approaches are no longer possible,as the agent expects the fully stacked, resized inputs of shape (1,84,84,4). I think it would be very helpful to get a simple way of viewing the performance of the trained agent in action as a supplementary as part of the course, is this possible?
If i set the value of load_checkpoint to True from argparser, how would the if statement work in line 95 of main.py?
if not args.load_checkpoint:
agent.store_transition(observation, action, reward, observation_, int(done))
agent.learn()
I think the agent wont be able to store the (state, action, reward, state_, done) in replay buffer and the agent wont learn either, won't it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.