Giter Club home page Giter Club logo

covid19's People

Contributors

torfjelde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid19's Issues

Example does not seem to work

Hello,

I don't know whether anyone is working on this anymore, so let me know if not; but, I was trying to follow the basic instructions and cannot seem to make it work. I downloaded the whole project repository, and attempted to execute the following:

using DrWatson
quickactivate(@__DIR__)

using Pkg
Pkg.instantiate()

# using Covid19

I receive the following

  ✗ Covid19
  0 dependencies successfully precompiled in 9 seconds (172 already precompiled, 7 skipped during auto due to previous errors)
  1 dependency errored. To see a full report either run `import Pkg; Pkg.precompile()` or load the package

Precompiling the package yields a dependency error.

Best,
Eric

To the GPU we go

One major difficulty of using CUDA.jl + anything Bayesian is that you immediately need to define all the (at very least, univariate) distributions all over again. But for starters we don't need all of the functionality of a Distribution to do something like Bayesian inference, e.g. for AdvancedHMC.jl we really only need logpdf and its adjoint as the only call to rand is going to be for the momentum (which can be sampled directly on the GPU using CUDA.randn).

But even trying to redefine the logpdf of a Distribution to work on the GPU is often non-trivial.

Issue #1

Before going into the issue, it's important to know the following:

  • All functions that you want to vectorize/broadcast needs to be replaced with the corresponding CUDA function, e.g. Base.log and CUDA.log are actually different methods.
  • AFAIK, CUDA.jl achieves this by effectively converting all expressions of the form f.(args...) to CUDA.cufunc(f).(args...) whenever any of the args is a CUDA.CuArray. by overloading the Broadcast.broadcasted. E.g. CUDA.cufunc(::typeof(log)) = CUDA.log.
  • Therefore, in the case where a function f does not have a cufunc already defined and you do f.(args...) you'll, if you're lucky, get an error but sometimes the entire Julia session will crash.

So what do we do?

  • Well, we define a cufunc(::typeof(f)) = ... which will allow you to broadcast over
  • Great stuff! Then you go and try to take the gradient of this thing and everything breaks again. So you need to also define define rules for this f.
    • AFAIK, for f on GPU Zygote.jl uses ForwardDiff.jl to obtain the adjoints for broadcasting and so we gotta define these rules using DiffRules and evaluate using ForwardDiff.
    • You then try to figure out where the current DiffRules definition is for f and you copy-paste, replacing methods with cufunc methods.
  • Okay, so at this point we have all these lower-level functions with their corresponding cufunc definitions and their corresponding @define_diffrule, so we're good to go right? Now we can just call StatsFuns.gammalogpdf.(α, θ, x), right?
    • Answer: No. AFAIK, there are several things that can fail:
      • A lot of the functions used within the Distributions.jl ecosystem is not pure Julia under the hood, but there are often pure-Julia versions for more generic number types so that one for example can just AD through them without issues, e.g. https://github.com/JuliaStats/StatsFuns.jl/blob/8dfda2c0ee33d5f85eca5c039d31d85c90f363f2/src/distrs/gamma.jl#L19. BUT this doesn't help compat with CUDA.jl because elementypes of a CUDA.CuArray aren't special, i.e. it's just a Float32. And so the function we dispatch on when broadcasting over a CUDA.CuArray will be some function outside of the Julia ecosystem, and so things starts blowing up. EDIT: this is only an issue for eltype Float64, not Float32 as pointed out by @devmotion!
      • Observing this, we specialize by overloading the method for Float32 and so on to use the pure-Julia implementation, e.g.
        StatsFuns.gammalogpdf(k::Float32, θ::Float32, x::Float32) = -SpecialFunctions.loggamma(k) - k * log(θ) + (k - 1) * log(x) - x / θ
      • BUT it still fails because the overloading of broadcasted won't be properly nested, so cufunc will only be called on StatsFuns.gammalogpdf, not on the methods used within! So, we instead do
        cugammalogpdf(k, θ, x) = -CUDA.cufunc(SpecialFunctions.loggamma)(k) - k * CUDA.cufunc(log)(θ) + (k - 1) * CUDA.cufunc(log)(x) - x / θ
        CUDA.cufunc(::typeof(StatsFuns.gammalogpdf)) = cugammalogpdf
        which is really not fun.
      • I.e. it's not enough to define cufunc for the leaves of the method hierarchy! This sucks.
      • Upshot: for most AD-frameworks (Zygote.jl and ForwardDiff.jl because Zygote.jl uses ForwardDiff.jl for broadcasting) we get the AD-rules for all non-leaves in the method hierarchy this way.
  • Of course there is work on doing automatic method substitution: JuliaGPU/GPUCompiler.jl#122, but in the mean-time we need to do the above process.

Potential solutions

  1. Do it by hand, but with the help of an "improved" version of the CUDA.@cufunc macro that I've implemented (https://github.com/JuliaGPU/CUDA.jl/blob/c011ffc0971ab1089f9d56dd338ef4b31e24ecc7/src/broadcast.jl#L101-L112) which has the following additional features:
    • Replaces all functions f in the body with cufunc(f), with the default impl of cufunc(f) = f. I.e. do nothing to almost all methods, but those which have a cufunc impl we replace.
    • Correctly handles namespaces, e.g. @cufunc SpecialFunctions.gamma(x) = ... is converted into cugamma(x) = ...; cufunc(::typeof(SpecialFunctions.gamma)) = cugamma.
    • [OPTIONAL] If f is present in DiffRule.diffrules(), then we extract the corresponding diffrule and replaces all functions g within the diffrule with cufunc(g). I.e. IF there is a scalar-rule for f, then we make it CUDA compatible (assuming the methods in the rule has a cufunc implementation), otherwise we leave it to ForwardDiff.jl to figure it out.
  2. Wait until the work on method-substitution is done.

Personally, I'm in favour of solution (1).

Issue #2

Now, there's also an additional "annoyance" even after solving the above issue. We cannot do something like logpdf.(Gamma.(α, θ), x) because this will first to do map(Gamma, ...) before calling logpdf. There's the possibility that this could have been inlined into completely removing the call to Gamma once it's sufficiently lowered, but GPUCompiler.jl will complain before it reaches that stage (as this is not always a guarantee + I believe it will try to fuse all the broadcasts together into a single operation for efficiency). Therefore we either need to:

  1. Use the underlying methods directly, e.g. gammalogpdf.(α, θ, x).
  2. Define a Vectorize(D, args), e.g. Vectorize(Gamma, (α, θ)), which has a logpdf that lazily calls the underlying method, e.g. logpdf(v::Vectorize{Gamma}, x) = gammalogpdf.(v.args..., x). Equipped with this, we can speed up implementation quite a bit by potentially doing something like:
    1. Overload broadcasted so that if we're using the CUDA.CuArrayStyle and f <: UnivariateDistribution we can materalize args earlier and then wrap it in Vectorize, i.e. Vectorize(f, args).
    2. Define a macro similar to Distributions.@__delegate_statsfuns or whatever to more easily define logpdf(v::Vectorize{D}, x) for different D.
      Worth mentioning that this requires a small redef of this method in Zygote (https://github.com/FluxML/Zygote.jl/blob/2b17256e79b2eca9a6512207284219d279398fc9/src/lib/broadcast.jl#L225-L228), though it should def. be possible to make it work even though we're overloading broadcasted.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.