Comments (4)
Just to be clear, let's address your cholfact
example explicitly:
there are certainly operations (such as the Cholesky factorisation) for which symbolic expressions might be a bit unweildy (see here).
I would never argue that cholfact
/∇cholfact
would be composed of other primitives; cholfact
is a primitive itself. By declaring it as such in DiffRules (and pointing to the DiffLinearAlgebra kernel as the actual implementation), downstream tools could then support cholfact
as a primitive automatically, without requiring any alteration in the downstream tools' code.
In order to make such a declaration in DiffRules, however, we still need to add API support for linear algebraic primitives (as I mentioned earlier).
from diffrules.jl.
Great, it looks like we're on pretty much the same page then.
Possibly our only difference of opinion is that I now can't see why you would ever want to have a library of "eager" kernels, as opposed to one that provides the code you need to automatically compile your own in a downstream package. I can't think of a situation in which exposing objects that contain
- function name
- argument types
- argument names
- code to implement the reverse pass for each argument
isn't strictly better than providing the first three things + the corresponding implemented method. A downstream package can clearly reconstruct the method given the code (it doesn't really matter how long the code for any particular sensitivity is), and as you pointed out it may be possible to perform optimisations given a symbolic representation of the sensitivity that you can't when you only have access to a method (i.e. it might be useful to perform a CSE optimisation when the sensitivities w.r.t. multiple arguments are required - if you compile a custom sensitivity on the fly using symbolic representations of the sensitivity w.r.t each argument, then you can do such optimisations).
What are your thoughts on this? I may be missing something obvious.
(On a related note, it might be an idea to replace the things in the last two bullet points with a function which accepts the argument names the downstream package wants to use, and returns code using those argument names)
from diffrules.jl.
A downstream package can clearly reconstruct the method given the code (it doesn't really matter how long the code for any particular sensitivity is), and as you pointed out it may be possible to perform optimisations given a symbolic representation of the sensitivity that you can't when you only have access to a method
We're in full agreement here.
it might be useful to perform a CSE optimisation when the sensitivities w.r.t. multiple arguments are required
Definitely. Actually, this made me think of a nice API change for helping with manually-optimized "multiple sensitivities" cases (e.g. where CSE etc. can't/doesn't suffice). Currently, DiffRules requires that the rule author provide a standalone derivative expression for each argument. Instead, we could require that rule authors mark differentiated variables explicitly, for example:
@define_diffrule M.f(wrt(x), wrt(y)) = expr_for_dfdx_and_dfdy($x, $y)
@define_diffrule M.f(wrt(x), y) = expr_for_dfdx($x, $y)
@define_diffrule M.f(x, wrt(y)) = expr_for_dfdy($x, $y)
Possibly our only difference of opinion is that I now can't see why you would ever want to have a library of "eager" kernels
Well, it's up to the rule author to decide the level of granularity of the function calls present in the derivative expression. On one extreme end of the spectrum, the rule author can inline as much as possible (i.e. compose the derivative expression using only Core.Builtin
s), while on the other end, the derivative expression can just contain a single call to an eager kernel. I imagine most cases naturally fall somewhere in between, and IMO, the line should just be drawn on a case-by-case basis by the rule author. There are advantages to both sides of the spectrum; as you mentioned, aggressive inlining provides downstream compilation tools with more raw material, but eager kernels can take advantage of multiple dispatch/method overloading earlier in the computation, and are (obviously) necessary for calling non-Julia code.
(On a related note, it might be an idea to replace the things in the last two bullet points with a function which accepts the argument names the downstream package wants to use, and returns code using those argument names)
Yup, that's the way DiffRules currently works (you can interpolate Expr
s as well).
from diffrules.jl.
I agree with all of the above.
I like your wrt
proposal a lot - it solves the problem in a much cleaner way than we're currently allowing (although not properly exploiting) in Nabla / DiffLinearAlgebra.
Running with this, a reverse-mode rule could be something like:
@define_reverse z::Tz z̄::Tz̄ M.f(wrt(x::Tx), y::Ty) = expr_dOdx($z, $z̄, $x, $y)
@define_reverse z::Tz z̄::Tz̄ M.f(wrt(x::Tx), y::Ty) = expr_dOdx!($x̄, $z, $z̄, $x, $y) x̄::Tx̄
where I've just given the macro a different name and added a couple of extra terms at the front end to pass in the , and the second rule is for in-place updates for if x
already has a value. Similarly a forward-rule could be something like:
@define_forward ẋ::Tẋ ẏ::Tẏ M.f(x::Tx, y::Ty) = expr_dfdI($x, $y, $ẋ, $ẏ)
Does this sound reasonable?
The above doesn't directly address more complicated method definitions (e.g. involving diagonal dispatch), but I can't see any reason in principle that it couldn't be extended to handle that kind of thing. Also, I'm not sure about the ordering of the arguments for the in-place @define_reverse
is optimal.
from diffrules.jl.
Related Issues (20)
- rule for tanh has catastrophic cancellation for |x| > 20 HOT 2
- Stable documentat is not latest
- clarify @define_diffrule namespaces HOT 4
- Scalar DiffRules which use the output HOT 1
- Package compatibility caps
- no type annotations HOT 1
- Cannot backprop `x^a` when `x` is negative HOT 2
- diffrule for fma HOT 1
- Lack rule for Base.log with 2 arguments which is overloaded
- Release a major? HOT 5
- Lack rule for `ifelse` HOT 5
- TagBot trigger issue HOT 32
- rule for GSL functions HOT 1
- Rule for `SpecialFunctions.beta_inc` HOT 1
- Version info is confusing HOT 2
- closed under packages? HOT 1
- Derivatives of max and min HOT 2
- missing `logabsgamma` rules HOT 7
- Type instability in ldexp with Float32 arguments HOT 1
- Problems with the `DEFINED_DIFFRULES` implementation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diffrules.jl.