I am having trouble debugging my use of the DeepQLearning package and looking for some

MWE: <div class="highlight highlight-source-julia notranslate position-relative ov

Dimension Mismatch about deepqlearning.jl HOT 6 CLOSED

lsmith7661 commented on August 27, 2024

Dimension Mismatch

from deepqlearning.jl.

Comments (6)

MaximeBouton commented on August 27, 2024

I think there is a mistake in the input dimension of the Flux model.
If you observation vector is of size 2, then the input dimension of the first layer should be 2, not 3.

More generally, in DQN, the input size of your model should be equal to the dimensions of your observation vector, the output size should be equal to the number of actions.

from deepqlearning.jl.

lsmith7661 commented on August 27, 2024

Thanks for the tip.

I think my confusion is related to thinking about learning the Q-function. If the NN is approximating Q(s,a) then I figured the input needed to be s and a. Then the output is the estimate for Q.

From what you are saying I think the input is the state (in this case size 2) and the output is an array of Q(s,a) for all possible actions, a. I think that makes sense.

Yea, ok, I just dug in to the policy.jl file and can see that the qnetwork is expected to return all of the Q values such that action(policy,state) can return the action associated with max(Q). I'll see if I can get this going, thanks!

vals = policy.qnetwork(obatch)
return policy.action_map[argmax(vals)]

from deepqlearning.jl.

MaximeBouton commented on August 27, 2024

MWE:

using POMDPs, POMDPModels, Flux, DeepQLearning 
mdp = MountainCar()
model = Chain(Dense(2, 32, relu), Dense(32, 3))
solver = DeepQLearningSolver(qnetwork=model)
solve(mdp, solver)

from deepqlearning.jl.

lsmith7661 commented on August 27, 2024

Yup, I got it working...ish. It runs, now I have to actually tune it and see if I can get up the hill.

Thanks for the help!

from deepqlearning.jl.

lsmith7661 commented on August 27, 2024

I have another, sort of adjacent question. I am trying to add an exploration bonus using the exploration_policy argument but I am having trouble. The idea is to penalize Q(s,a) if that particular action at that state has been tried before. I think this is already being tracked somewhere in the replay and experience objects but I haven't deciphered it yet. Is there a way to access those objects from an exploration_policy? From the looks of it, the replay object is lost in dqntrain!(...), which only returns a policy.

from deepqlearning.jl.

MaximeBouton commented on August 27, 2024

There is no API to access the replay buffer during training, it would probably require changes in the algorithm.

Adding bonuses should be done through the reward function rather than the exploration policy.
With the current interface your best chance would be to augment the state and design a new reward function.

If you are looking a more sophisticated stuff like curiosity driven RL, this would require significant changes to the current implementation.

from deepqlearning.jl.

Dimension Mismatch about deepqlearning.jl HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent