Giter Club home page Giter Club logo

Comments (6)

MaximeBouton avatar MaximeBouton commented on July 24, 2024

I think there is a mistake in the input dimension of the Flux model.
If you observation vector is of size 2, then the input dimension of the first layer should be 2, not 3.

More generally, in DQN, the input size of your model should be equal to the dimensions of your observation vector, the output size should be equal to the number of actions.

from deepqlearning.jl.

lsmith7661 avatar lsmith7661 commented on July 24, 2024

Thanks for the tip.

I think my confusion is related to thinking about learning the Q-function. If the NN is approximating Q(s,a) then I figured the input needed to be s and a. Then the output is the estimate for Q.

From what you are saying I think the input is the state (in this case size 2) and the output is an array of Q(s,a) for all possible actions, a. I think that makes sense.


Yea, ok, I just dug in to the policy.jl file and can see that the qnetwork is expected to return all of the Q values such that action(policy,state) can return the action associated with max(Q). I'll see if I can get this going, thanks!

vals = policy.qnetwork(obatch)
return policy.action_map[argmax(vals)]

from deepqlearning.jl.

MaximeBouton avatar MaximeBouton commented on July 24, 2024

MWE:

using POMDPs, POMDPModels, Flux, DeepQLearning 
mdp = MountainCar()
model = Chain(Dense(2, 32, relu), Dense(32, 3))
solver = DeepQLearningSolver(qnetwork=model)
solve(mdp, solver)

from deepqlearning.jl.

lsmith7661 avatar lsmith7661 commented on July 24, 2024

Yup, I got it working...ish. It runs, now I have to actually tune it and see if I can get up the hill.

Thanks for the help!

from deepqlearning.jl.

lsmith7661 avatar lsmith7661 commented on July 24, 2024

I have another, sort of adjacent question. I am trying to add an exploration bonus using the exploration_policy argument but I am having trouble. The idea is to penalize Q(s,a) if that particular action at that state has been tried before. I think this is already being tracked somewhere in the replay and experience objects but I haven't deciphered it yet. Is there a way to access those objects from an exploration_policy? From the looks of it, the replay object is lost in dqntrain!(...), which only returns a policy.

from deepqlearning.jl.

MaximeBouton avatar MaximeBouton commented on July 24, 2024

There is no API to access the replay buffer during training, it would probably require changes in the algorithm.

Adding bonuses should be done through the reward function rather than the exploration policy.
With the current interface your best chance would be to augment the state and design a new reward function.

If you are looking a more sophisticated stuff like curiosity driven RL, this would require significant changes to the current implementation.

from deepqlearning.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.