Giter Club home page Giter Club logo

deepqlearning.jl's People

Contributors

dylan-asmar avatar github-actions[bot] avatar juliatagbot avatar lkruse avatar maximebouton avatar peggyyuchunwang avatar rejuvyesh avatar zsunberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepqlearning.jl's Issues

Switch to Flux.jl

The current package is relying on TensorFlow.jl, it might be interesting to test out an implementation using Flux.jl since it seems to be the future of deep learning for Julia.

Compilation Error

The following error is thrown when attempting to use DeepQLearning.jl:
ERROR: LoadError: LoadError: UndefVarError: Tracker not defined

This appears to be a Flux issue.

Dimension Mismatch

I am having trouble debugging my use of the DeepQLearning package and looking for some help.

My problem is the mountain car problem where the state represents the position and velocity of the car and the action is the force you can apply in order to climb the mountain. The car starts in a valley and needs to climb out to get the reward. The force is not enough to climb to the top so you need to build up momentum to get up the hill.

The state is a 2-element StaticArrays.SArray{Tuple{2},Float64,1,2} with indices SOneTo(2).
The action space is RealInterval{Float64}(-1.0, 1.0), but I discretized this.

My network is as follows:
#Define the Q Network (input {state,action}, return Q(s,a))
activation = leakyrelu;
inputlayer = Dense(3,50,activation); # Input is the size of the state-action pair
hiddenlayer1 = Dense(50,50,activation);
outputlayer = Dense(50,1,activation);
model = Chain(inputlayer,hiddenlayer1,outputlayer)

The environment is an MDPEnvironment and my solver is a DeepQLearningSolver but running the following results in a dimension mismatch:
policy = solve(solver,env)
DimensionMismatch("A has dimensions (50,3) but B has dimensions (2,32)")

I followed the stacktrace and its happening in the Flux library here:
function (a::Dense)(x::AbstractArray)
W, b, σ = a.W, a.b, a.σ
σ.(W*x .+ b)
end

But I have no idea why any of these matrices would be size (2,32)? I assume its this AbstractArray x, since the weights would be size (50,3)... but shouldn't x just be the input size (3,1)? Is it trying to do some batch processing or something? But that doesn't explain how we get to (2,32). The only place I can imagine those numbers is that the weight matrix itself would be a Array{Float32,2}, which makes no sense but does match up if it's somehow getting transposed... I'm not sure if this is a bug or if I am implementing this incorrectly. Any thoughts would be great appreciated. Thanks!

The entire Stacktrace is below for reference:
DimensionMismatch("A has dimensions (50,3) but B has dimensions (2,32)")
Stacktrace:
[1] gemm_wrapper!(::Array{Float32,2}, ::Char, ::Char, ::Array{Float32,2}, ::Array{Float32,2}, ::LinearAlgebra.MulAddMul{true,true,Float32,Float32}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:545
[2] mul! at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:160 [inlined]
[3] mul! at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:203 [inlined]
[4] * at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:153 [inlined]
[5] (::Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}})(::Array{Float32,2}) at /Users/liamsmith/.julia/packages/Flux/NpkMm/src/layers/basic.jl:115
[6] applychain at /Users/liamsmith/.julia/packages/Flux/NpkMm/src/layers/basic.jl:126 [inlined]
[7] Chain at /Users/liamsmith/.julia/packages/Flux/NpkMm/src/layers/basic.jl:32 [inlined]
[8] batch_train!(::DeepQLearningSolver, ::MDPEnvironment{Array{Float32,1},QuickPOMDPs.QuickMDP{UUID("c4d31997-7cb6-478c-8b46-c104fdaf65ad"),StaticArrays.SArray{Tuple{2},Float64,1,2},Float64,NamedTuple{(:isterminal, :render, :initialstate, :gen, :actions, :discount),Tuple{DMUStudent.HW4.var"#3#10",DMUStudent.HW4.var"#4#11",DMUStudent.HW4.var"#2#9",DMUStudent.HW4.var"#1#8",DMUStudent.HW4.RealInterval{Float64},Float64}}},StaticArrays.SArray{Tuple{2},Float64,1,2},Random.MersenneTwister,false}, ::NNPolicy{QuickPOMDPs.QuickMDP{UUID("c4d31997-7cb6-478c-8b46-c104fdaf65ad"),StaticArrays.SArray{Tuple{2},Float64,1,2},Float64,NamedTuple{(:isterminal, :render, :initialstate, :gen, :actions, :discount),Tuple{DMUStudent.HW4.var"#3#10",DMUStudent.HW4.var"#4#11",DMUStudent.HW4.var"#2#9",DMUStudent.HW4.var"#1#8",DMUStudent.HW4.RealInterval{Float64},Float64}}},Chain{Tuple{Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}},Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}},Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}}}},Float64}, ::ADAM, ::Chain{Tuple{Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}},Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}},Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}}}}, ::PrioritizedReplayBuffer{Int32,Float32,CartesianIndex{2},1}) at /Users/liamsmith/.julia/packages/DeepQLearning/wF0rJ/src/solver.jl:208
[9] dqn_train!(::DeepQLearningSolver, ::MDPEnvironment{Array{Float32,1},QuickPOMDPs.QuickMDP{UUID("c4d31997-7cb6-478c-8b46-c104fdaf65ad"),StaticArrays.SArray{Tuple{2},Float64,1,2},Float64,NamedTuple{(:isterminal, :render, :initialstate, :gen, :actions, :discount),Tuple{DMUStudent.HW4.var"#3#10",DMUStudent.HW4.var"#4#11",DMUStudent.HW4.var"#2#9",DMUStudent.HW4.var"#1#8",DMUStudent.HW4.RealInterval{Float64},Float64}}},StaticArrays.SArray{Tuple{2},Float64,1,2},Random.MersenneTwister,false}, ::NNPolicy{QuickPOMDPs.QuickMDP{UUID("c4d31997-7cb6-478c-8b46-c104fdaf65ad"),StaticArrays.SArray{Tuple{2},Float64,1,2},Float64,NamedTuple{(:isterminal, :render, :initialstate, :gen, :actions, :discount),Tuple{DMUStudent.HW4.var"#3#10",DMUStudent.HW4.var"#4#11",DMUStudent.HW4.var"#2#9",DMUStudent.HW4.var"#1#8",DMUStudent.HW4.RealInterval{Float64},Float64}}},Chain{Tuple{Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}},Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}},Dense{typeof(leakyrelu),Array{Float32,2},Array{Float32,1}}}},Float64}, ::PrioritizedReplayBuffer{Int32,Float32,CartesianIndex{2},1}) at /Users/liamsmith/.julia/packages/DeepQLearning/wF0rJ/src/solver.jl:136
[10] solve(::DeepQLearningSolver, ::MDPEnvironment{Array{Float32,1},QuickPOMDPs.QuickMDP{UUID("c4d31997-7cb6-478c-8b46-c104fdaf65ad"),StaticArrays.SArray{Tuple{2},Float64,1,2},Float64,NamedTuple{(:isterminal, :render, :initialstate, :gen, :actions, :discount),Tuple{DMUStudent.HW4.var"#3#10",DMUStudent.HW4.var"#4#11",DMUStudent.HW4.var"#2#9",DMUStudent.HW4.var"#1#8",DMUStudent.HW4.RealInterval{Float64},Float64}}},StaticArrays.SArray{Tuple{2},Float64,1,2},Random.MersenneTwister,false}) at /Users/liamsmith/.julia/packages/DeepQLearning/wF0rJ/src/solver.jl:58
[11] top-level scope at In[26]:1

GPU support

Can a maintainer mention what is the state with the GPU support?
The README says that the gpu-support branch should be used, but that one was updated 5 years ago last time.
Given all the changes that happened in the meantime I guess it might be easier to make a new branch and support GPU from scratch. Also why a different branch? couldn't it be an option? It would be great if someone could provide some insight.

Avoid using env.state

The solver makes use of env.state to resolve conflicts between change in the mutable object env during both exploration and evaluation.

See the comments [here].(cf60925)

Problem with reading log files

Hi,

I'm attempting to read in the log files generated by TensorBoardLogger, but am having some issues. When I try the method for de-serialization recommended in the TensorBoardLogger docs I get an error regarding crc headers, so I'm wondering if there's a specific method that works for reading the logs generated from this package. I've included the error message below.

Alternatively, if there's a way to plot learning curves without reading in the log files that would also be helpful.

Thanks

ERROR: AssertionError: crc_header == crc_header_ck
Stacktrace:
[1] read_event(::IOStream) at /home/ben/.julia/packages/TensorBoardLogger/gv4oF/src/Deserialization/deserialization.jl:16
[2] iterate(::TensorBoardLogger.TBEventFileIterator, ::Int64) at /home/ben/.julia/packages/TensorBoardLogger/gv4oF/src/Deserialization/deserialization.jl:84
[3] iterate at /home/ben/.julia/packages/TensorBoardLogger/gv4oF/src/Deserialization/deserialization.jl:83 [inlined]
[4] iterate(::TensorBoardLogger.TBEventFileCollectionIterator, ::Int64) at /home/ben/.julia/packages/TensorBoardLogger/gv4oF/src/Deserialization/deserialization.jl:59
[5] iterate at /home/ben/.julia/packages/TensorBoardLogger/gv4oF/src/Deserialization/deserialization.jl:52 [inlined]
[6] #map_summaries#158(::Bool, ::Nothing, ::Nothing, ::Bool, ::typeof(map_summaries), ::var"#6#7", ::String) at /home/ben/.julia/packages/TensorBoardLogger/gv4oF/src/Deserialization/deserialization.jl:211
[7] map_summaries(::Function, ::String) at /home/ben/.julia/packages/TensorBoardLogger/gv4oF/src/Deserialization/deserialization.jl:205
[8] top-level scope at REPL[36]:1

Error: Can't differentiate loopinfo expression

Running the example given in the docs:

using DeepQLearning
using POMDPs
using Flux
using POMDPModels
using POMDPSimulators
using POMDPPolicies

# load MDP model from POMDPModels or define your own!
mdp = SimpleGridWorld();

# Define the Q network (see Flux.jl documentation)
# the gridworld state is represented by a 2 dimensional vector.
model = Chain(Dense(2, 32), Dense(32, length(actions(mdp))))

exploration = EpsGreedyPolicy(mdp, LinearDecaySchedule(start=1.0, stop=0.01, steps=10000/2))

solver = DeepQLearningSolver(qnetwork = model, max_steps=10000, 
                             exploration_policy = exploration,
                             learning_rate=0.005,log_freq=500,
                             recurrence=false,double_q=true, dueling=true, prioritized_replay=true)
policy = solve(solver, mdp)

sim = RolloutSimulator(max_steps=30)
r_tot = simulate(sim, mdp, policy)
println("Total discounted reward for 1 simulation: $r_tot")

produces an error as follows, where can I look to fix this? I am using Julia 1.6

ERROR: Can't differentiate loopinfo expression
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] macro expansion
    @ ./simdloop.jl:79 [inlined]
  [3] Pullback
    @ ./reduce.jl:243 [inlined]
  [4] (::typeof(∂(mapreduce_impl)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface2.jl:0
  [5] Pullback
    @ ./reduce.jl:257 [inlined]
  [6] (::typeof(∂(mapreduce_impl)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface2.jl:0
  [7] Pullback
    @ ./reduce.jl:415 [inlined]
  [8] (::typeof(∂(_mapreduce)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface2.jl:0
  [9] Pullback
    @ ./reducedim.jl:318 [inlined]
 [10] Pullback (repeats 2 times)
    @ ./reducedim.jl:310 [inlined]
 [11] (::typeof(∂(mapreduce)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface2.jl:0
 [12] Pullback
    @ ./reducedim.jl:878 [inlined]
 [13] (::typeof(∂(#_sum#682)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface2.jl:0
 [14] Pullback
    @ ./reducedim.jl:878 [inlined]
 [15] (::typeof(∂(_sum)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface2.jl:0
 [16] Pullback (repeats 2 times)
    @ ./reducedim.jl:874 [inlined]
 [17] (::typeof(∂(sum)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface2.jl:0
 [18] Pullback
    @ ~/.julia/packages/DeepQLearning/jJkAu/src/solver.jl:223 [inlined]
 [19] (::typeof(∂(λ)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface2.jl:0
 [20] (::Zygote.var"#69#70"{Zygote.Params, typeof(∂(λ)), Zygote.Context})(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface.jl:252
 [21] gradient(f::Function, args::Zygote.Params)
    @ Zygote ~/.julia/packages/Zygote/6HN9x/src/compiler/interface.jl:59
 [22] batch_train!(solver::DeepQLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, Symbol}}}, env::POMDPModelTools.MDPCommonRLEnv{AbstractArray{Float32, N} where N, SimpleGridWorld, StaticArrays.SVector{2, Int64}}, policy::NNPolicy{SimpleGridWorld, DeepQLearning.DuelingNetwork, Symbol}, optimizer::ADAM, target_q::DeepQLearning.DuelingNetwork, replay::PrioritizedReplayBuffer{Int32, Float32, CartesianIndex{2}, StaticArrays.SVector{2, Float32}, Matrix{Float32}}; discount::Float64)
    @ DeepQLearning ~/.julia/packages/DeepQLearning/jJkAu/src/solver.jl:219
 [23] batch_train!
    @ ~/.julia/packages/DeepQLearning/jJkAu/src/solver.jl:200 [inlined]
 [24] dqn_train!(solver::DeepQLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, Symbol}}}, env::POMDPModelTools.MDPCommonRLEnv{AbstractArray{Float32, N} where N, SimpleGridWorld, StaticArrays.SVector{2, Int64}}, policy::NNPolicy{SimpleGridWorld, DeepQLearning.DuelingNetwork, Symbol}, replay::PrioritizedReplayBuffer{Int32, Float32, CartesianIndex{2}, StaticArrays.SVector{2, Float32}, Matrix{Float32}})
    @ DeepQLearning ~/.julia/packages/DeepQLearning/jJkAu/src/solver.jl:138
 [25] solve(solver::DeepQLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, Symbol}}}, env::POMDPModelTools.MDPCommonRLEnv{AbstractArray{Float32, N} where N, SimpleGridWorld, StaticArrays.SVector{2, Int64}})
    @ DeepQLearning ~/.julia/packages/DeepQLearning/jJkAu/src/solver.jl:56
 [26] solve(solver::DeepQLearningSolver{EpsGreedyPolicy{LinearDecaySchedule{Float64}, Random._GLOBAL_RNG, NTuple{4, Symbol}}}, problem::SimpleGridWorld)
    @ DeepQLearning ~/.julia/packages/DeepQLearning/jJkAu/src/solver.jl:32
 [27] top-level scope
    @ REPL[11]:1

RNN Performance

The solver seems too slow when using RNN. This statement needs to be supported by benchmark of course.

Potential performance issues:

  • the replay buffer
  • computing the loss function

Use only RLInterface.jl interface

One of my students discovered that this package uses POMDPs.actionindex. Would it be possible to make the package only use functions from the RLInterface.jl interface? (i.e. we would need to construct our own action map)

Fix avgR in terminal while training

Logging of avgR in terminal may be off, see below.

51000 / 1000000 eps 0.899 |  avgR -0.997 | Loss 9.240e-03 | Grad 5.585e-03 
51500 / 1000000 eps 0.898 |  avgR -0.997 | Loss 2.298e-02 | Grad 1.119e-02 
52000 / 1000000 eps 0.897 |  avgR -0.997 | Loss 8.478e-02 | Grad 7.080e-02 
52500 / 1000000 eps 0.896 |  avgR -0.997 | Loss 1.660e-02 | Grad 6.243e-03 
53000 / 1000000 eps 0.895 |  avgR -0.997 | Loss 1.105e-02 | Grad 4.012e-03 
53500 / 1000000 eps 0.894 |  avgR -0.997 | Loss 1.182e-02 | Grad 7.963e-03 

Support of AbtractEnvironment

This solver uses some function that are broader than the minimal interface defined in RLInterface and relies on internal fields such as env.problem in many places.
Ideally, the solver should support an RL environment defined just using RLInterface.jl and without necessarily an MDP or POMDP object associated with it.

Exploration Policy requires a (PO)MDP

It would be nice to be able to use this package only with CommonRLInterface and not need to know anything about POMDPs.jl. Currently, the main thing presenting this is the exploration policy.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

DQExperience should support AbstractArrays

From Piazza:

I'm troubleshooting in the deep Q learning package and I'm having a problem with the DQExperience object. The object is defined in 'prioritized_experience_replay.jl' and is as follows:

struct DQExperience{N <: Real,T <: Real, Q}
s::Array{T, Q}
a::N
r::T
sp::Array{T, Q}
done::Bool
end

The problem is that our states are defined using the StaticArrays type

Is there a reason that state is constrained to an array of real numbers? Is there a way to create a subclass of DQExperience or something that allows or the StaticArrays type?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.