Comments (1)
Closed, i was confused by different versions of a DDQN.
It is explained here:
What makes this network a Double DQN?
The Bellman equation used to calculate the Q values to update the online network follows the equation:
value = reward + discount_factor * target_network.predict(next_state)[argmax(online_network.predict(next_state))]
The Bellman equation used to calculate the Q value updates in the original (vanilla) DQN[1] is:
value = reward + discount_factor * max(target_network.predict(next_state))
The difference is that, using the terminology of the field, the second equation uses the target network for both SELECTING and EVALUATING the action to take whereas the first equation uses the online network for SELECTING the action to take and the target network for EVALUATING the action. Selection here means choosing which action to take, and evaluation means getting the projected Q value for that action. This form of the Bellman equation is what makes this agent a Double DQN and not just a DQN and was introduced in [2].
And also the names confused me, everything is a target, you renamed a lot of stuff that makes it harder to understand your code.
But it seems to be correct.
from reinforcement-learning.
Related Issues (20)
- 5_A3C Cartpole Script - AttributeError: 'Functional' object has no attribute '_make_predict_function' HOT 4
- Variable Tensor("Neg:0", shape=(), dtype=float32) has `None` for gradient. HOT 1
- How to run this example code?
- Cartpole Policy Gradient script does not converge (2-cartpole/3-reinforce/cartpole_reinforce.py)
- Question on Policy Gradient
- Pong Policy Gradient-important error in the definition of the convolutional net. HOT 1
- A3C on GPU
- rlcode.github.io does not exist !
- is it possible to apply categorical_crossentropy to a3c? HOT 1
- How to add Dropout
- update target_model before loading saved model in cartpole_dqn.py
- Implementing policy gradient when number of output classes is large
- Dqn-per does not use importance sampling weight in training。
- reinforcement learning real life use cases
- The issue about breakout_a3c.py in 3-atari, when i execute source HOT 1
- issue regarding saved models
- How to run threading while using Keras and tensorflow
- Can this code run other atari game beside breakout?
- Diagonal movement? - Grid Score
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reinforcement-learning.