Comments (6)
I think there is a mistake in the input dimension of the Flux model.
If you observation vector is of size 2, then the input dimension of the first layer should be 2, not 3.
More generally, in DQN, the input size of your model should be equal to the dimensions of your observation vector, the output size should be equal to the number of actions.
from deepqlearning.jl.
Thanks for the tip.
I think my confusion is related to thinking about learning the Q-function. If the NN is approximating Q(s,a) then I figured the input needed to be s and a. Then the output is the estimate for Q.
From what you are saying I think the input is the state (in this case size 2) and the output is an array of Q(s,a) for all possible actions, a. I think that makes sense.
Yea, ok, I just dug in to the policy.jl file and can see that the qnetwork is expected to return all of the Q values such that action(policy,state) can return the action associated with max(Q). I'll see if I can get this going, thanks!
vals = policy.qnetwork(obatch)
return policy.action_map[argmax(vals)]
from deepqlearning.jl.
MWE:
using POMDPs, POMDPModels, Flux, DeepQLearning
mdp = MountainCar()
model = Chain(Dense(2, 32, relu), Dense(32, 3))
solver = DeepQLearningSolver(qnetwork=model)
solve(mdp, solver)
from deepqlearning.jl.
Yup, I got it working...ish. It runs, now I have to actually tune it and see if I can get up the hill.
Thanks for the help!
from deepqlearning.jl.
I have another, sort of adjacent question. I am trying to add an exploration bonus using the exploration_policy argument but I am having trouble. The idea is to penalize Q(s,a) if that particular action at that state has been tried before. I think this is already being tracked somewhere in the replay and experience objects but I haven't deciphered it yet. Is there a way to access those objects from an exploration_policy? From the looks of it, the replay object is lost in dqntrain!(...), which only returns a policy.
from deepqlearning.jl.
There is no API to access the replay buffer during training, it would probably require changes in the algorithm.
Adding bonuses should be done through the reward function rather than the exploration policy.
With the current interface your best chance would be to augment the state and design a new reward function.
If you are looking a more sophisticated stuff like curiosity driven RL, this would require significant changes to the current implementation.
from deepqlearning.jl.
Related Issues (20)
- Support last Flux version with Float32 HOT 1
- Logging Training Information HOT 3
- Fix avgR in terminal while training HOT 1
- TensorBoardLogger.jl New Version Compatability HOT 2
- Compilation Error HOT 1
- Type of the discount factor. HOT 1
- Use only RLInterface.jl interface HOT 3
- DQExperience should support AbstractArrays HOT 1
- Support of AbtractEnvironment HOT 2
- Problem with reading log files HOT 1
- Exploration Policy requires a (PO)MDP
- Automatically convert to Float32
- TagBot trigger issue HOT 10
- Error: Can't differentiate loopinfo expression HOT 6
- Question: How would you make a decay schedule for prioritized replay alpha/beta? HOT 1
- Tests still contain "using RLInterface" HOT 4
- Deprecation of `loadparams!`
- Action masking feature (legal actions) HOT 1
- GPU support HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepqlearning.jl.