Giter Club home page Giter Club logo

abstractdifferentiation.jl's Introduction

AbstractDifferentiation

Stable Dev CI Coverage Code Style: Blue ColPrac: Contributor's Guide on Collaborative Practices for Community Packages

Motivation

This is a package that implements an abstract interface for differentiation in Julia. This is particularly useful for implementing abstract algorithms requiring derivatives, gradients, jacobians, Hessians or multiple of those without depending on specific automatic differentiation packages' user interfaces.

Julia has more (automatic) differentiation packages than you can count on 2 hands. Different packages have different user interfaces. Therefore, having a backend-agnostic interface to request the function value and its gradient for example is necessary to avoid a combinatorial explosion of code when trying to support every differentiation package in Julia in every algorithm package requiring gradients. For higher order derivatives, the situation is even more dire since you can combine any 2 differentiation backends together to create a new higher-order backend.

Getting started

  • If you are an autodiff user and want to write code in a backend-agnostic way, read the user guide in the docs.
  • If you are an autodiff developer and want your backend to implement the interface, read the implementer guide in the docs (still in construction).

Citing this package

If you use this package in your work, please cite the package:

@article{schafer2021abstractdifferentiation,
  title={AbstractDifferentiation. jl: Backend-Agnostic Differentiable Programming in Julia},
  author={Sch{\"a}fer, Frank and Tarek, Mohamed and White, Lyndon and Rackauckas, Chris},
  journal={NeurIPS 2021 Differentiable Programming Workshop},
  year={2021}
}

abstractdifferentiation.jl's People

Contributors

adrhill avatar chrisrackauckas avatar dependabot[bot] avatar devmotion avatar frankschae avatar gdalle avatar gerlero avatar github-actions[bot] avatar mohamed82008 avatar oxinabox avatar ranocha avatar sethaxen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abstractdifferentiation.jl's Issues

No method matching ForwardDiffBackend/ZygoteBackend. v1.9

In Version 1.9, ZygoteBackend and ForwardDiffBackend "aren't defined":

ERROR: MethodError: no method matching AbstractDifferentiation.ForwardDiffBackend()

As an example.

I have installed both ForwardDiff and Zygote. This error occurs whether or not the ForwardDiff/Zygote package is loaded or not.

Allow arbitrary objects to be backends?

Currently the fallbacks in the package constrain the backend type to be an AbstractBackend. Is this necessary? e.g. as I think @oxinabox proposed on Slack, in relation to #11, it would be nice if a user-created ChainRules.RuleConfig could directly be used as a backend. Otherwise, one would probably end up implementing a duplicate object or implementing a loose wrapper around it.

How does this package intend to use these types in a way that can't be satisfied by overloading some interface functions?

Zygote context cache incorrectly(?) persists between AD calls

You can see below more objects accumulate into it every call. I'm not an expert but @oxinabox on Slack suggested this might be incorrect:

julia> let
           ad = AD.ZygoteBackend()
           AD.derivative(ad, exp, 1.)
           display(ad.ruleconfig.context.cache)
           AD.derivative(ad, exp, 1.)
           display(ad.ruleconfig.context.cache)
       end
IdDict{Any, Any} with 3 entries:
  Box(true)    => RefValue{Any}((contents = nothing,))
  Box(2.71828) => RefValue{Any}((contents = nothing,))
  Box(2.71828) => RefValue{Any}((contents = nothing,))
IdDict{Any, Any} with 6 entries:
  Box(2.71828) => RefValue{Any}((contents = nothing,))
  Box(2.71828) => RefValue{Any}((contents = nothing,))
  Box(true)    => RefValue{Any}((contents = nothing,))
  Box(2.71828) => RefValue{Any}((contents = nothing,))
  Box(true)    => RefValue{Any}((contents = nothing,))
  Box(2.71828) => RefValue{Any}((contents = nothing,))

I have a non-MWE where this seems to cause an error, but the error is gone if I manually wipe it. Not sure if it could also cause subtly wrong results. Perhaps it should be wiped before or after each pullback call?

JET.jl reports possible errors on `AD.gradient` that do not appear with `Zygote.gradient`

I recently started to use JET.jl for CI, and it turns out that some errors are reported that do not exist when using Zygote.jl directly.

MWE (short version, the full JET output is in the txt file I attached):

julia> using AbstractDifferentiation, JET, Zygote

julia> f(x) = sum(abs2, x)
f (generic function with 1 method)

julia> x = rand(100);

julia> ab = AD.ZygoteBackend()
AbstractDifferentiation.ReverseRuleConfigBackend{Zygote.ZygoteRuleConfig{Zygote.Context}}(Zygote.ZygoteRuleConfig{Zygote.Context}(Zygote.Context(nothing)))

julia> @report_call vscode_console_output=stdout Zygote.gradient(f, x)
No errors detected

julia> @report_call vscode_console_output=stdout AD.gradient(ab, f, x)
═════ 31 possible errors found ═════

JET_report.txt

Better description of how to use and/or simple examples needed

As a unfamiliar user, I am left confused on how to use this package without examples or more documentation.

To use AbstractDifferentiation, first construct a backend instance ab::AD.AbstractBackend using your favorite differentiation package in Julia that supports AbstractDifferentiation.

How does one do this? Is my favorite differentiation package supposed to document this?

Also, some simple examples (ie basic polynomial functions, not complex ODEs) would help new users by to showing how to use this package.

Use multiple arguments instead of a tuple for pushforward and pullback function?

It seems annoying that the pushforward and pullback function accept tuples of co-tangents instead of multiple arguments. Is there a compelling reason for doing so or was this a design decision that could be changed? In my opinion the main annoyance is that one has to handle the case of tuples of length 1 in a special way (as e.g. in #51) (it also makes it impossible to work with actual single-argument functions that take a tuple as only argument but maybe this is not needed anyway). Arguably it is also cleaner to provide multiple arguments as, well, multiple arguments instead of a tuple.

ForwardDiff is broken for differently sized inputs

julia> using AbstractDifferentiation, ForwardDiff

julia> AD = AbstractDifferentiation
AbstractDifferentiation

julia> ad = AD.ForwardDiffBackend()
AbstractDifferentiation.ForwardDiffBackend{Nothing}()

julia> AD.jacobian(ad, (x, y) -> sum(x .+ y), randn(2), randn(1)) # (!!!) Result should not be tuple of inputs!
(1.0, 2.0)
Environment
julia> versioninfo()
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores

(jl_z9VwJo) pkg> st
Status `/tmp/jl_z9VwJo/Project.toml`
  [c29ec348] AbstractDifferentiation v0.4.3
  [f6369f11] ForwardDiff v0.10.32

(jl_z9VwJo) pkg> st --manifest
Status `/tmp/jl_z9VwJo/Manifest.toml`
  [c29ec348] AbstractDifferentiation v0.4.3
  [d360d2e6] ChainRulesCore v1.15.6
  [9e997f8a] ChangesOfVariables v0.1.4
  [bbf7d656] CommonSubexpressions v0.3.0
  [34da2185] Compat v4.3.0
  [163ba53b] DiffResults v1.1.0
  [b552c78f] DiffRules v1.11.1
  [ffbed154] DocStringExtensions v0.9.1
  [e2ba6199] ExprTools v0.1.8
  [f6369f11] ForwardDiff v0.10.32
  [3587e190] InverseFunctions v0.1.8
  [92d709cd] IrrationalConstants v0.1.1
  [692b3bcd] JLLWrappers v1.4.1
  [2ab3a3ac] LogExpFunctions v0.3.18
  [1914dd2f] MacroTools v0.5.10
  [77ba4419] NaNMath v1.0.1
  [21216c6a] Preferences v1.3.0
  [ae029012] Requires v1.3.0
  [276daf66] SpecialFunctions v2.1.7
  [90137ffa] StaticArrays v1.5.9
  [1e83bf80] StaticArraysCore v1.4.0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [0dad84c5] ArgTools v1.1.1
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching
  [b77e0a4c] InteractiveUtils
  [b27032c2] LibCURL v0.6.3
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.8.0
  [de0858da] Printf
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization
  [6462fe0b] Sockets
  [2f01184e] SparseArrays
  [10745b16] Statistics
  [fa267f1f] TOML v1.0.0
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll v0.5.2+0
  [deac9b47] LibCURL_jll v7.84.0+0
  [29816b5a] LibSSH2_jll v1.10.2+0
  [c8ffd9c3] MbedTLS_jll v2.28.0+0
  [14a3606d] MozillaCACerts_jll v2022.2.1
  [4536629a] OpenBLAS_jll v0.3.20+0
  [05823500] OpenLibm_jll v0.8.1+0
  [83775a58] Zlib_jll v1.2.12+3
  [8e850b90] libblastrampoline_jll v5.1.1+0
  [8e850ede] nghttp2_jll v1.48.0+0
  [3f19e933] p7zip_jll v17.4.0+0

Function as first argument?

Is there a specific reason for making the function f the second argument? It seems a bit unfortunate that this makes it impossible to use the do syntax which one could use if f would be the first argument such as in ForwardDiff or Zygote (as suggested also by the Julia style guide).

Jacobians for functions beyond vector-to-vector?

julia> using AbstractDifferentiation, Zygote

julia> function ft1(x)
       y1 = (x[1]*x[2])^x[3]
       y2 = (x[2]*x[3])^x[1]
       y3 = (x[3]*x[1])^x[2]
       [y1, y2, y3]
       end
ft1 (generic function with 1 method)

julia> function ft2(xs)
       ys = ft1.(eachcol(xs))
       hcat(ys...)
       end
ft2 (generic function with 1 method)

julia> r3 = rand(3, 8)
3×8 Matrix{Float64}:
 0.0354617  0.444021  0.161892  0.56656   0.92774   0.260982  0.839223  0.175217
 0.020074   0.185554  0.747159  0.850257  0.930541  0.451429  0.978923  0.937234
 0.213358   0.838412  0.562181  0.256845  0.743921  0.777094  0.207115  0.791544

julia> only(Zygote.jacobian(ft2, r3))
24×24 Matrix{Float64}:
  1.28169   2.26417  -1.54394     0.0        0.0        0.0        0.0        0.0          0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
 -4.4943    1.45594   0.136984    0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.51321  -4.42796   0.0852995   0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.232868   0.557241  -0.307858   0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0        -0.814452   1.04745    0.231817   0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.347887  -0.822595   0.18424    0.0        0.0          0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        1.05908    0.229477      0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0       -0.753767   0.188289      0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.769966  -0.39986       0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0          0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0          0.326677  -0.405857   0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.439836   0.255509   0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0          -0.776414   0.282692   0.0        0.0        0.0        0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.236948   0.203133  -0.188738   0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0       -0.418176   0.224654   1.06182    0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0          0.0        0.0        0.210367  -0.315561   0.852398   0.0        0.0        0.0
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0        1.08112    0.202117  -0.432339
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0       -0.283372   0.177422   0.210078
  0.0       0.0       0.0         0.0        0.0        0.0        0.0        0.0           0.0        0.0        0.0        0.0        0.0        0.839794  -0.310155   0.185898

julia> only(AD.jacobian(AD.ZygoteBackend(), ft2, r3))
ERROR: "The function `identity_matrix_like` is not defined for the type Matrix{Float64}."
Stacktrace:
 [1] identity_matrix_like(x::Matrix{Float64})
   @ AbstractDifferentiation C:\Users\Hossein Pourbozorg\.julia\packages\AbstractDifferentiation\o62DE\src\AbstractDifferentiation.jl:612
 [2] jacobian(ab::AbstractDifferentiation.ReverseRuleConfigBackend{Zygote.ZygoteRuleConfig{Zygote.Context}}, f::Function, xs::Matrix{Float64})
   @ AbstractDifferentiation C:\Users\Hossein Pourbozorg\.julia\packages\AbstractDifferentiation\o62DE\src\AbstractDifferentiation.jl:570
 [3] top-level scope
   @ REPL[22]:1
 [4] top-level scope
   @ C:\Users\Hossein Pourbozorg\.julia\packages\CUDA\tTK8Y\src\initialization.jl:52

How to use AbstractDifferentiation as a user?

Thanks for working on AbstractDifferentiation! It tackles a very relevant practical problem.

Unfortunately, I cannot figure out from the Readme how AbstractDifferentiation should be used.

The Readme says

"To use AbstractDifferentiation, first construct a backend instance ab::AD.AbstractBackend using your favorite differentiation package in Julia that supports AbstractDifferentiation."

but does not explain how to do this, see my failed attempt below (inspired from the test code).
Also,iIt would be nice if I could list all available backends somehow.

using AbstractDifferentiation

import Zygote
import ForwardDiff

# test function
foo(x) = sin(x[1]) + prod(x[2:end].^2)
x = rand(4)

# direct usage works
Zygote.gradient(foo, x)[1]
ForwardDiff.gradient(foo, x)


# Is this the correct way to create a Backend?
struct ForwardDiffBackend1 <: AD.AbstractForwardMode end
const forwarddiff_backend1 = ForwardDiffBackend1()

struct ZygoteBackend1 <: AD.AbstractReverseMode end
const zygote_backend1 = ZygoteBackend1()

# both fail with:
# MethodError: no method matching adjoint(::Nothing)
AD.gradient(zygote_backend1, foo, x) 
AD.gradient(forwarddiff_backend1, foo, x)

Code inside function with rrule should not run

MWE:

using Zygote
import ChainRulesCore.rrule
using ChainRulesCore: NoTangent
using AbstractDifferentiation

function myfunc(x)
    println("This should not print if I have an rrule.")
    x
end

rrule(::typeof(myfunc), x) = (x, (y -> (NoTangent(), y)))

println("Zygote run:")
Zygote.gradient(myfunc, 1) # nothing prints
println("AD run:")
AD.derivative(AD.ZygoteBackend(), myfunc, 1) # something prints

The code inside of myfunc should never run. In addition to possible inefficiency, this may lead to incorrect/confusing results for stateful functions or stochastic functions.

Expose tests through public API in src

If a package loads AbstractDifferentiation to implement the interface,
it should also be able to test it.
We have code for testing it,
but that code lives in test/

Add sparsity functionality to the Jacobian and Hessian functions?

This would be the arguments:

  1. colorvec for the color vector of the independent directions
  2. sparsity for the sparsity pattern used in the decompression
  3. output, jac_prototype, hes_prototype, etc.: the matrix type to be used for the outputted matrix. If mutation then it's just the user given one.

The reason why output can be different from sparsity is because there are many cases where a matrix may be not dense enough for sparse LU-factorization to be efficient, but sparse differentiation may still be a substantial drop in the compute time.

Lazy Jacobian multiplication

I had to write this code recently for ImplicitDifferentiation.jl, and @mohamed82008 suggested it might feel at home here. Any opinions?

"""
   LazyJacobianMul!{M,N}

Callable structure wrapping a lazy Jacobian operator with `N`-dimensional inputs into an in-place multiplication for vectors.
# Fields
- `J::M`: the lazy Jacobian of the function
- `input_size::NTuple{N,Int}`: the array size of the function input
"""
struct LazyJacobianMul!{M<:LazyJacobian,N}
    J::M
    input_size::NTuple{N,Int}
end

"""
    LazyJacobianTransposeMul!{M,N}

Callable structure wrapping a lazy Jacobian operator with `N`-dimensional outputs into an in-place multiplication for vectors.
# Fields
- `J::M`: the lazy Jacobian of the function
- `output_size::NTuple{N,Int}`: the array size of the function output
"""
struct LazyJacobianTransposeMul!{M<:LazyJacobian,N}
    J::M
    output_size::NTuple{N,Int}
end

function (ljm::LazyJacobianMul!)(res::Vector, δinput_vec::Vector)
    (; J, input_size) = ljm
    δinput = reshape(δinput_vec, input_size)
    δoutput = only(J * δinput)
    return res .= vec(δoutput)
end

function (ljtm::LazyJacobianTransposeMul!)(res::Vector, δoutput_vec::Vector)
    (; J, output_size) = ljtm
    δoutput = reshape(δoutput_vec, output_size)
    δinput = only(δoutput' * J)
    return res .= vec(δinput)
end

Planned backends to implement

We should add backends for the following AD/FD packages:

  • ForwardDiff
  • ReverseDiff
  • FiniteDifferences
  • all ChainRules-supporting ADs (see #11, #39)
  • FiniteDiff
  • Tracker
  • Enzyme (#84)
  • Batched Zygote (#40 (comment))
  • SparseDiffTools
  • Symbolics

Complex pullback support

I realized that if the _dot function introduced in #21 is wrapped with real, then pullback_function works for complex arrays as well, whenever gradient supports complex inputs for a given backend. But at least in the scalar case, the real dot product can be computed twice as efficiently as real(_dot(x, y)).

Include dedicated derivative functions for FiniteDifferences/ForwardDiff instead of relying on jacobians?

Hey all.

As it stands, calling AD.derivative for FiniteDifferences and ForwardDiff back-ends first calculates the jacobian and then flattens it into the derivative. For a few edge cases, say a single-input function, this is significantly slower:

using FiniteDifferences, BenchmarkTools
import AbstractDifferentiation as AD

fdm = central_fdm(2,1,adapt=0)

fd = AD.FiniteDifferencesBackend(fdm)

with_AD(x) = AD.derivative(fd,sin,x)
without_AD(x) = fdm(sin,x)
blame_the_jacobian(x) = jacobian(fdm,sin,x)

@benchmark with_AD(1.)
@benchmark without_AD(1.)
@benchmark blame_the_jacobian(1.)

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.070 μs … 552.240 μs  ┊ GC (min … max): 0.00% … 99.30%
 Time  (median):     1.160 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.327 μs ±   5.524 μs  ┊ GC (mean ± σ):  4.13% ±  0.99%

    ▅█
  ▂███▆▄▃▃▃▂▂▂▂▃▂▂▂▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  1.07 μs         Histogram: frequency by time        2.42 μs <

 Memory estimate: 944 bytes, allocs estimate: 17.

BenchmarkTools.Trial: 10000 samples with 961 evaluations.
 Range (min … max):  85.640 ns …  2.145 μs  ┊ GC (min … max): 0.00% … 93.60%
 Time  (median):     88.658 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   97.445 ns ± 46.875 ns  ┊ GC (mean ± σ):  0.98% ±  2.08%

  ▂█▇    ▁▄▅▄▃    ▄     ▁▁▁▁                                  ▁
  ███▇██▆██████▆▆███▄▅▆▇█████▇▆▅▃▄▅▅▄▅▆▆▅▆▇▇█▇▃▄▄▂▅▄▅▄▅▃▄▄▅▃▄ █
  85.6 ns      Histogram: log(frequency) by time       173 ns <

 Memory estimate: 32 bytes, allocs estimate: 2.

BenchmarkTools.Trial: 10000 samples with 111 evaluations.
 Range (min … max):  774.775 ns … 47.623 μs  ┊ GC (min … max): 0.00% … 97.59%
 Time  (median):     819.820 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   950.669 ns ±  1.825 μs  ┊ GC (mean ± σ):  7.82% ±  4.01%

  ▄▇█▆▄▂▂▁▁▁▁  ▃▄    ▁▁                                        ▁
  ██████████████████████▇▆▆▆▆▆▆▆▆▅▆▅▅▆▆▅▆▄▅▅▅▅▄▄▄▄▆▅▄▆▄▄▄▅▅▄▄▅ █
  775 ns        Histogram: log(frequency) by time      1.83 μs <

 Memory estimate: 864 bytes, allocs estimate: 14.

This is also the case for other, less silly examples like small neural networks with a single input. What are the reasons for not implementing the derivative directly? Something along the lines of:

function AD.derivative(ba::AD.FiniteDifferencesBackend, f, xs...)
    return (ba.method(f, xs...),)
end

Caching interfaces

All packages made for performance (ForwardDiff.jl, FiniteDiff.jl, SparseDiffTools.jl), include some kind of caching interface. For example, instead of ForwardDiff.jacobian(f,x), you should call ForwardDiff.jacobian(f,x,config). config, cache, etc. is all important for storing the cache vectors. So it would be good to extend the interface for allow each backend to have an (optional) config struct, which is just created on demand if not supplied by the user (which is how it's done in those packages anyways).

Make value_and_pullback_function a primitive instead of pullback_function

Currently the internals use closures to avoid computing the primal more than once, but it would be easier to read, simpler, and more consistent with ChainRules, Zygote, Diffractor, etc to instead make value_and_pullback_function the primitive. Should come after #4.

Not certain if the same should be done for pushforward_function.

Easy path to using anything with a ChainRulesCore.RuleConfig

We have talked about this before, just writing it down so it doesn't get lost.
The ChainRulesCore.RuleConfig with the right traits, provides all the parts needed for either getting the pushforward (frule_via_ad) or pullback (rrule_via_ad); or both.
This primitives are enough to let AbstractDifferentiation.jl define everything else.

Any ChainRules supporting AD that supports rules that require calling back into AD must define one of these.
Zygote.ZygoteRuleConfig and Diffractor.DiffractorRuleConfig for example.

Given such a RuleConfig type, it should be a one-liner to get the whole AbstractDifferentiation API.
We should provide a macro to make that so.

Establish a benchmark for performance regressions/improvements

The PRs contributing backends have used a simple benchmark to decide which overloads are needed. I suggest we formalize this to a benchmark we can run to assess performance impact for any backends of any PRs changing the fallback implementations.

API for Hessian approximation

Being able to get the diagonal of the Hessian or a subset of the Hessian's elements is something that is sometimes useful in practice. We can think of supporting this as an API function.

Using AbstractDifferentiation and Zygote as package dependencies errors: `UndefVarError: ZygoteBackend not defined`

I wanted to abstract the AD calls in one of my packages that currently uses Zygote.
However, using the readme example in a package results in a UndefVarError: ZygoteBackend not defined.

module TestADDependency

using AbstractDifferentiation, Zygote
ab = AD.ZygoteBackend()

end
(TestADDependency) pkg> status
Project TestADDependency v0.1.0
Status `~/Developer/TestADDependency/Project.toml`
  [c29ec348] AbstractDifferentiation v0.4.3
  [e88e6eb3] Zygote v0.6.40

(TestADDependency) pkg> precompile
Precompiling project...
  ✗ TestADDependency
  0 dependencies successfully precompiled in 3 seconds. 35 already precompiled.

ERROR: The following 1 direct dependency failed to precompile:

TestADDependency [34f2ebba-d68a-4a13-a6cd-2ada1134ccab]

Failed to precompile TestADDependency [34f2ebba-d68a-4a13-a6cd-2ada1134ccab] to /Users/funks/.julia/compiled/v1.8/TestADDependency/jl_EHrP9j.
ERROR: LoadError: UndefVarError: ZygoteBackend not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base ./Base.jl:31
 [2] top-level scope
   @ ~/Developer/TestADDependency/src/TestADDependency.jl:4
 [3] include
   @ ./Base.jl:422 [inlined]
 [4] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
   @ Base ./loading.jl:1400
 [5] top-level scope
   @ stdin:1
in expression starting at /Users/funks/Developer/TestADDependency/src/TestADDependency.jl:1
in expression starting at stdin:1

Here is a Minimum Working Repository that replicates the issue: https://github.com/adrhill/TestADDependency.jl

AD failure where Zygote succeeds

MWE:

julia> Zygote.gradient(nt -> nt.x^2, (x=1.,))
((x = 2.0,),)

julia> AD.gradient(AD.ZygoteBackend(), nt -> nt.x^2, (x=1.,))
ERROR: MethodError: no method matching adjoint(::ChainRulesCore.Tangent{NamedTuple{(:x,), Tuple{Float64}}, NamedTuple{(:x,), Tuple{Float64}}})
Closest candidates are:
  adjoint(::Union{QR, LinearAlgebra.QRCompactWY, QRPivoted}) at ~/.julia/juliaup/julia-1.7.3+0.x64/share/julia/stdlib/v1.7/LinearAlgebra/src/qr.jl:509
  adjoint(::Union{Cholesky, CholeskyPivoted}) at ~/.julia/juliaup/julia-1.7.3+0.x64/share/julia/stdlib/v1.7/LinearAlgebra/src/cholesky.jl:538
  adjoint(::Hessenberg) at ~/.julia/juliaup/julia-1.7.3+0.x64/share/julia/stdlib/v1.7/LinearAlgebra/src/hessenberg.jl:423
  ...
Stacktrace:
 [1] _broadcast_getindex_evalf
   @ ./broadcast.jl:670 [inlined]
 [2] _broadcast_getindex
   @ ./broadcast.jl:643 [inlined]
 [3] (::Base.Broadcast.var"#29#30"{Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(adjoint), Tuple{Tuple{ChainRulesCore.Tangent{NamedTuple{(:x,), Tuple{Float64}}, NamedTuple{(:x,), Tuple{Float64}}}}}}})(k::Int64)
   @ Base.Broadcast ./broadcast.jl:1075
 [4] ntuple
   @ ./ntuple.jl:48 [inlined]
 [5] copy
   @ ./broadcast.jl:1075 [inlined]
 [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(adjoint), Tuple{Tuple{ChainRulesCore.Tangent{NamedTuple{(:x,), Tuple{Float64}}, NamedTuple{(:x,), Tuple{Float64}}}}}})
   @ Base.Broadcast ./broadcast.jl:860
 [7] jacobian(ab::AbstractDifferentiation.ReverseRuleConfigBackend{Zygote.ZygoteRuleConfig{Zygote.Context{false}}}, f::Function, xs::NamedTuple{(:x,), Tuple{Float64}})
   @ AbstractDifferentiation ~/.julia/packages/AbstractDifferentiation/o62DE/src/AbstractDifferentiation.jl:591
 [8] gradient(ab::AbstractDifferentiation.ReverseRuleConfigBackend{Zygote.ZygoteRuleConfig{Zygote.Context{false}}}, f::Function, xs::NamedTuple{(:x,), Tuple{Float64}})
   @ AbstractDifferentiation ~/.julia/packages/AbstractDifferentiation/o62DE/src/AbstractDifferentiation.jl:48
 [9] top-level scope
   @ REPL[6]:1

Does this package expect all arguments be scalars/vectors that have an adjoint defined? Eg this works:

AD.gradient(AD.ZygoteBackend(), nt -> nt.x^2, ComponentVector(x=1.,))

(I'm hoping not because it would definitely hurt usability to not be able to use Zygote's full capability, where that is not a requirement)

AbstractDifferentiation v0.4.3
Zygote v0.6.44

Whats the reason for the derivative / gradient difference?

Is there a reason to have a separate function derivative for the gradient wrt a scalar value? I find that it tends to make me have to add special cases where code would just work for both vectors and scalars if there was a definition of something like:

AD.gradient(ad::AD.AbstractBackend, f, x::Number) = AD.derivative(ad, f, x)

value_gradient_and_hessian for ForwardDiff returns gradient of type Dual

julia> using AbstractDifferentiation

julia> ba = AD.ForwardDiffBackend();

julia> x = randn(3);

julia> AD.gradient(ba, sum, x)
([1.0, 1.0, 1.0],)

julia> AD.value_and_gradient(ba, sum, x)
(-0.056101093099405724, ([1.0, 1.0, 1.0],))

julia> AD.value_gradient_and_hessian(ba, sum, x)
(-0.056101093099405724, (ForwardDiff.Dual{ForwardDiff.Tag{ComposedFunction{typeof(AbstractDifferentiation.asarray), AbstractDifferentiation.var"#3#4"{AbstractDifferentiation.ForwardDiffBackend{Nothing}, AbstractDifferentiation.var"#9#10"{AbstractDifferentiation.ForwardDiffBackend{Nothing}, typeof(sum)}, Tuple{Vector{Float64}}}}, Float64}, Float64, 3}[Dual{ForwardDiff.Tag{ComposedFunction{typeof(AbstractDifferentiation.asarray), AbstractDifferentiation.var"#3#4"{AbstractDifferentiation.ForwardDiffBackend{Nothing}, AbstractDifferentiation.var"#9#10"{AbstractDifferentiation.ForwardDiffBackend{Nothing}, typeof(sum)}, Tuple{Vector{Float64}}}}, Float64}}(1.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{ComposedFunction{typeof(AbstractDifferentiation.asarray), AbstractDifferentiation.var"#3#4"{AbstractDifferentiation.ForwardDiffBackend{Nothing}, AbstractDifferentiation.var"#9#10"{AbstractDifferentiation.ForwardDiffBackend{Nothing}, typeof(sum)}, Tuple{Vector{Float64}}}}, Float64}}(1.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{ComposedFunction{typeof(AbstractDifferentiation.asarray), AbstractDifferentiation.var"#3#4"{AbstractDifferentiation.ForwardDiffBackend{Nothing}, AbstractDifferentiation.var"#9#10"{AbstractDifferentiation.ForwardDiffBackend{Nothing}, typeof(sum)}, Tuple{Vector{Float64}}}}, Float64}}(1.0,0.0,0.0,0.0)],), ([0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0],))

Enzyme support

Now that Enzyme is becoming more usable by more people, it's probably time to try adding support here.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

`AD.jacobian` much slower than `Zygote.jacobian`

Hi, and thanks for this amazing interface!

When computing jacobians, I recently noted a significant speed difference between standalone Zygote and Zygote used as an AD backend. The allocations also differ wildly. Do you happen to know where that comes from?

Here's a minimal working example:

julia> using AbstractDifferentiation, BenchmarkTools, Zygote

julia> f(x) = x .^ 2
f (generic function with 1 method)

julia> x = rand(100);

julia> ab = AD.ZygoteBackend()
AbstractDifferentiation.ReverseRuleConfigBackend{Zygote.ZygoteRuleConfig{Zygote.Context}}(Zygote.ZygoteRuleConfig{Zygote.Context}(Zygote.Context(nothing)))

julia> Zygote.jacobian(f, x)
([1.3044988022198039 0.0  0.0 0.0; 0.0 1.9710054432976252  0.0 0.0;  ; 0.0 0.0  1.9276513864536091 0.0; 0.0 0.0  0.0 1.6930015760093526],)

julia> AD.jacobian(ab, f, x)
([1.3044988022198039 0.0  0.0 0.0; 0.0 1.9710054432976252  0.0 0.0;  ; 0.0 0.0  1.9276513864536091 0.0; 0.0 0.0  0.0 1.6930015760093526],)

julia> @benchmark Zygote.jacobian(f, x)
BenchmarkTools.Trial: 7001 samples with 1 evaluation.
 Range (min  max):  629.524 μs    5.403 ms  ┊ GC (min  max): 0.00%  87.03%
 Time  (median):     681.270 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   712.290 μs ± 295.183 μs  ┊ GC (mean ± σ):  3.42% ±  6.90%

                    ▃█▄                                          
  ▁▁▁▁▁▁▂▁▁▁▁▂▂▂▂▁▁▃███▇▄▄▃▅▅▅▄▄▃▃▃▃▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  630 μs           Histogram: frequency by time          778 μs <

 Memory estimate: 427.06 KiB, allocs estimate: 4141.

julia> @benchmark AD.jacobian(ab, f, x)
BenchmarkTools.Trial: 1393 samples with 1 evaluation.
 Range (min  max):  2.673 ms  64.564 ms  ┊ GC (min  max):  0.00%  94.96%
 Time  (median):     2.999 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.583 ms ±  4.786 ms  ┊ GC (mean ± σ):  11.42% ±  8.30%

  ▂▇██▆▆▅▄▃▂                                                  
  █████████████▆▇█▆▅▅▅▅▅▅▅▅▄▅▁▆▁▁▁▁▄▄▁▄▁▄▁▅▄▄▁▁▁▄▁▅▄▄▁▁▁▁▄▄▄ █
  2.67 ms      Histogram: log(frequency) by time      8.1 ms <

 Memory estimate: 1.26 MiB, allocs estimate: 28023.

LoadError: UndefVarError: StaticArrays not defined

When both AbstractDifferentiation and ForwardDiff get imported. There is a warning!

┌ Warning: Error requiring `ForwardDiff` from `AbstractDifferentiation`
│   exception =
│    LoadError: UndefVarError: StaticArrays not defined
│    Stacktrace:
│      [1] include(mod::Module, _path::String)
│        @ Base .\Base.jl:419
│      [2] include(x::String)
│        @ AbstractDifferentiation C:\Users\Hossein Pourbozorg\.julia\packages\AbstractDifferentiation\o62DE\src\AbstractDifferentiation.jl:1
│      [3] top-level scope
│        @ C:\Users\Hossein Pourbozorg\.julia\packages\Requires\Z8rfN\src\Requires.jl:40
│      [4] eval
│        @ .\boot.jl:368 [inlined]
│      [5] eval
│        @ C:\Users\Hossein Pourbozorg\.julia\packages\AbstractDifferentiation\o62DE\src\AbstractDifferentiation.jl:1 [inlined]
│      [6] (::AbstractDifferentiation.var"#55#70")()
│        @ AbstractDifferentiation C:\Users\Hossein Pourbozorg\.julia\packages\Requires\Z8rfN\src\require.jl:101
│      [7] macro expansion
│        @ timing.jl:382 [inlined]
│      [8] err(f::Any, listener::Module, modname::String, file::String, line::Any)
│        @ Requires C:\Users\Hossein Pourbozorg\.julia\packages\Requires\Z8rfN\src\require.jl:47
│      [9] (::AbstractDifferentiation.var"#54#69")()
│        @ AbstractDifferentiation C:\Users\Hossein Pourbozorg\.julia\packages\Requires\Z8rfN\src\require.jl:100
│     [10] withpath(f::Any, path::String)
│        @ Requires C:\Users\Hossein Pourbozorg\.julia\packages\Requires\Z8rfN\src\require.jl:37
│     [11] (::AbstractDifferentiation.var"#53#68")()
│        @ AbstractDifferentiation C:\Users\Hossein Pourbozorg\.julia\packages\Requires\Z8rfN\src\require.jl:99
│     [12] #invokelatest#2
│        @ .\essentials.jl:729 [inlined]
│     [13] invokelatest
│        @ .\essentials.jl:726 [inlined]
│     [14] foreach(f::typeof(Base.invokelatest), itr::Vector{Function})
│        @ Base .\abstractarray.jl:2774
│     [15] loadpkg(pkg::Base.PkgId)
│        @ Requires C:\Users\Hossein Pourbozorg\.julia\packages\Requires\Z8rfN\src\require.jl:27
│     [16] #invokelatest#2
│        @ .\essentials.jl:729 [inlined]
│     [17] invokelatest
│        @ .\essentials.jl:726 [inlined]
│     [18] run_package_callbacks(modkey::Base.PkgId)
│        @ Base .\loading.jl:869
│     [19] _tryrequire_from_serialized(modkey::Base.PkgId, path::String, sourcepath::String, depmods::Vector{Any})
│        @ Base .\loading.jl:944
│     [20] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt64)
│        @ Base .\loading.jl:1028
│     [21] _require(pkg::Base.PkgId)
│        @ Base .\loading.jl:1315
│     [22] _require_prelocked(uuidkey::Base.PkgId)
│        @ Base .\loading.jl:1200
│     [23] macro expansion
│        @ .\loading.jl:1180 [inlined]
│     [24] macro expansion
│        @ .\lock.jl:223 [inlined]
│     [25] require(into::Module, mod::Symbol)
│        @ Base .\loading.jl:1144
│     [26] include(mod::Module, _path::String)
│        @ Base .\Base.jl:419
│     [27] exec_options(opts::Base.JLOptions)
│        @ Base .\client.jl:303
│     [28] _start()
│        @ Base .\client.jl:522in expression starting at C:\Users\Hossein Pourbozorg\.julia\packages\AbstractDifferentiation\o62DE\src\forwarddiff.jl:1
└ @ Requires C:\Users\Hossein Pourbozorg\.julia\packages\Requires\Z8rfN\src\require.jl:51

Change number of Julia versions being tested

Currently CI tests against all Julia versions from 1.0 to 1.6 (not 1.7). This seems excessive. Wouldn't it be sufficient to test against oldest supported version (1.0), latest LTS (1.6), and latest version (1, currently 1.7)?

ERROR: MethodError: no method matching ZygoteBackend()

julia> using AbstractDifferentiation

julia> AbstractDifferentiation.ZygoteBackend()
ERROR: MethodError: no method matching ZygoteBackend()
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

I know ‍‍‍‍‍‍‍Zygote has to be imported for it to work.
I have it as a default value for a function, and I imported Zygote, but when I added some code to use PrecompileTools to precompile the main functions of my package in Julia 1.9, it failed. Probably because ZygoteBackend is an empty function.

https://github.com/impICNF/ICNF.jl/actions/runs/4822195143/jobs/8589112403#step:6:756

Expectations for implementations of the interface

Is it expected that all packages that implement the interface use AD.@primitive? e.g., ForwardDiff can use DiffResults to enable returning a value and a gradient and/or hessian. In cases like this, what is advised?

I can see a few possibilities:

  1. define @primitive pushforward_function(...) only to keep things simple
  2. define @primitive pushforward_function(...) but then overload methods like value_and_gradient for which the AD package has a more efficient approach.
  3. implement all necessary interface methods by hand

Edit: I'm guessing (2) is the preferred option? For many (most? all?) AD engines, the fallback to working in terms of jacobians will be much less efficient than overloading the corresponding methods.

Adding support for ADTypes.jl

The SciML ecosystem has had it's own "AD-backend" types since before AbstractDifferentiation.jl was created. Those have more recently been exposed in the standalone packages ADTypes.jl.

From a user point of view that means were currently have two incompatible ways of specifying which AD to use, and in packages/algorithms/applications that rely on both SciML and non-SciML-packages we can't easily offer a way for users to select AD.

SciML is unlikely to switch to AbstractDifferentiation (SciML/ADTypes.jl#8), it would require breaking changes and some corresponding types in ADTypes.jl and AbstractDifferentiation.jl have different content (so typedefs also won't work).

Speaking from a user point again, it would be really nice though to have a consistent way of selecting AD across the ecosystem. Could we add bi-directional conversion between ADTypes.AbstractADType and AbstractDifferentiation.AbstractBackend in AbstractDifferentiation.jl?

ADTypes.jl is extremely lightweight (no dependencies and a load time of 0.7 ms) so we could make it a (non-weak) dependency of AbstractDifferentiation.jl and add a

const ADBackendLike = Union{AbstractBackend, ADTypes.AbstractADType}

In AbstractDifferentiation itself (so packages can use that as a field type for algorithm configuration, etc.) and then provide bi-directional Base.convert methods for the AD-types the do correspond (not all are supported by both "systems" at the moment) in the AbstractDifferentiation package extensions.

Would a PR in this direction be welcome?

CC @devmotion, @ChrisRackauckas, @Vaibhavdixit02

Support mutating functions

It was brought up on Slack that we may need to think about supporting a mutating API where buffers are stored in a more systematic abstract way. I am opening this issue to track the problem and discuss solutions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.