Comments (21)
In which cases is it not possible to achieve this behavior by using a closure?
from forwarddiff.jl.
I am using it inside an optimization problem (through MathProgBase), The
user passes a function which is used to set up the constraint. I use
ForwardDiff.jl to automatically get the Jacobian and Hessian of the
constraints where jacobian and hessian depends on other variables. I cannot
use closure because construct the Jacobian and Hessian function outside the
optimization.
I could construct fad_jacobian
and fad_hessian inside
by creating a
closure inside eval_jac_g
, but this would kill performances.
Giuseppe Ragusa
On Wed, Aug 5, 2015 at 4:05 PM, Miles Lubin [email protected]
wrote:
In which cases is it not possible to achieve this behavior by using a
closure?—
Reply to this email directly or view it on GitHub
#32 (comment)
.
from forwarddiff.jl.
I might be misunderstanding, but does it help that the new API allows one to easily take derivatives/gradients/etc. at the provided point without having to build a closure first?
For example, using the old API, you have to construct ∇f
before you can evaluate ∇f(x)
:
julia> using ForwardDiff
julia> f(x) = sin(x[1]) + cos(x[2])^2 + tan(x[3])^3
f (generic function with 1 method)
julia> g = forwarddiff_gradient(f, Float64)
gradf (generic function with 2 methods)
julia> g([1.0,2.0,3.0])
3-element Array{Float64,1}:
0.540302
0.756802
0.0621972
Using the new API, you can simply perform the evaluation ∇f(x)
without actually constructing ∇f
:
julia> using ForwardDiff
julia> f(x) = sin(x[1]) + cos(x[2])^2 + tan(x[3])^3
f (generic function with 1 method)
julia> gradient(f, [1.0, 2.0, 3.0])
3-element Array{Float64,1}:
0.540302
0.756802
0.0621972
Similar methods exist for Jacobians/Hessians/etc. The tentative documentation for the new API can be found in the README of #27's branch.
from forwarddiff.jl.
The new API is great. But it is still not up to the task for the sort of problems I am dealing with. To be more concrete, I tried to make a simple example where passing down argument is necessary.
I have a function, f(θ)
in R^k -> R^(n x m). This function is used to define another function h(u)
in R^(n x m) -> R^m
function h(u)
p = u[1:n]
θ = u[n+1:end]
vec(mean(p.*f(θ),1))
end
The Jacobian with respect to u
is given by [f(θ)'/n \partial h(u)/\partial θ].
With the new API I can use
jacobian(h, [ones(n); 0])
which gives
3x7 Array{Float64,2}:
-0.000217836 0.0556389 0.0816585 0.000563994 0.112706 0.0099686 -0.332284
-0.00168171 -0.329863 -0.0724125 -0.0232947 -0.0784092 0.030865 -1.35021
-0.000336177 -0.0760003 -0.0558016 0.00272413 0.0931781 -0.056837 -0.796085
where the 3x6 block is
f([.0])'/6
3x6 Array{Float64,2}:
-0.000217836 0.0556389 0.0816585 0.000563994 0.112706 0.0099686
-0.00168171 -0.329863 -0.0724125 -0.0232947 -0.0784092 0.030865
-0.000336177 -0.0760003 -0.0558016 0.00272413 0.0931781 -0.056837
In this case, jacobian
is already calculating 18 more derivatives that it needs to.
The things get worse since I am also interested to the hessian of the following function
function v(u)
p = u[1:n]
θ = u[n+1:n+1]
λ = u[n+2:end]
sum(p.*f(θ)*λ)
end
This is sparse as the second derivtives wrt to p
are all 0. I am also not interested in the hessian with respect to λ
.
If I do
hessian(v, [ones(n); 0., ones(3)])
10x10 Array{Float64,2}:
0.0 0.0 0.0 0.0 0.0 0.0 -9.61983 -0.00130702 -0.0100903 -0.00201706
0.0 0.0 0.0 0.0 0.0 0.0 -2.88912 0.333833 -1.97918 -0.456002
0.0 0.0 0.0 0.0 0.0 0.0 -0.097451 0.489951 -0.434475 -0.33481
0.0 0.0 0.0 0.0 0.0 0.0 0.131498 0.00338397 -0.139768 0.0163448
0.0 0.0 0.0 0.0 0.0 0.0 -2.10706 0.676236 -0.470455 0.559068
0.0 0.0 0.0 0.0 0.0 0.0 -0.289503 0.0598116 0.18519 -0.341022
-9.61983 -2.88912 -0.097451 0.131498 -2.10706 -0.289503 0.0 -1.9937 -8.10126 -4.77651
-0.00130702 0.333833 0.489951 0.00338397 0.676236 0.0598116 -1.9937 0.0 0.0 0.0
-0.0100903 -1.97918 -0.434475 -0.139768 -0.470455 0.18519 -8.10126 0.0 0.0 0.0
-0.00201706 -0.456002 -0.33481 0.0163448 0.559068 -0.341022 -4.77651 0.0 0.0 0.0
Since in my use case n
is very large (in the thousands) and θ
is very small (in the tens) having the possibility to pass down additional argument would make this computation feasible.
Both h
and v
are used inside a loop --- and each iteration of the loop gives the value of p
and λ
. I could do it using closure. But the closure would be created at each iteration. Isn't this too costly?
from forwarddiff.jl.
But the closure would be created at each iteration. Isn't this too costly?
Not necessarily, and why not create the closure once instead of at each iteration?
from forwarddiff.jl.
Because, for instance, λ
is a lagrange multiplier. I am using this type of function into the MathProgBase interface. If I create the closure 'outside' the interface I have to make the λ
a global variable. That is, I have scoping issues.
Probably I my resolve the issue by looking at advanced closures alternatives #102.
from forwarddiff.jl.
Do you have any benchmarks showing the performance difference here?
from forwarddiff.jl.
Not off-hand --- I changed completely the interface to avoid globals --- but I remember getting a ~5x speedup plus a reduction in memory allocation using my hacked ForwardDiff API (that allows extra argument).
from forwarddiff.jl.
Were you using globals or closures?
from forwarddiff.jl.
Globals. I thought that the closure performance hit is (or used to be?) large.
from forwarddiff.jl.
Not nearly as bad as using globals. If you can put together a realistic but small benchmark, I think that would help sort out the issues here.
from forwarddiff.jl.
I just pushed a branch that extends the API to allow for targeting specific arguments. It was quite a fun little problem to try to solve generically. Scroll down to the bottom of the README for the new stuff.
I didn't want to add this to #27 yet, because I haven't developed good tests for it. I believe that the strategy I'm using can be implemented such that there is no (or very little) loss of performance due to the generalization. It's very possible that what I just pushed will be that fast, but I need to do to some more rigorous performance comparisons before I make any claims.
My guess is that the main performance hit for the wrt-feature
branch will come from conversions between the evaluated target argument and the non-partial arguments (with no extra conversions being done in the 1-arg case, so no performance hit there). For similar reasons, I also imagine that rigorous testing of the wrt-feature
branch will reveal that we need more comprehensive conversion code, but we can tackle that once we get there.
Finally, wrt-feature
stuff is backwards compatible with #27, so it can be merged after #27 and not break the existing API.
from forwarddiff.jl.
@jrevels this looks outstanding. I will try it soon. I will contribute with extensive performance testing.
from forwarddiff.jl.
@gragusa Awesome, thanks! Also, remember - since I don't have automated tests for it yet, there are probably bugs that may or may not actually throw errors. Silent bugs could corrupt your results without any warning, so be wary of results you get from that branch for the time being.
from forwarddiff.jl.
I can help with testing. Do you have anything in mind? Adjusting the old
ones to the new API extended to the new features?
On Fri 7 Aug 2015 at 14:47 Jarrett Revels [email protected] wrote:
@gragusa https://github.com/gragusa Awesome, thanks! Also, remember -
since I don't have automated tests for it yet, there are probably bugs that
may or may not actually throw errors. Silent bugs could corrupt your
results without any warning, so be wary of results you get from that branch
for the time being.—
Reply to this email directly or view it on GitHub
#32 (comment)
.
from forwarddiff.jl.
Pretty much. Like I mentioned, wrt-feature
's API should be backwards-compatible with #27's API, so all the tests from #27 should remain in place.
The things that immediately come to mind that need to be tested for wrt-feature
:
-
- Check that using
wrt{1}
,wrt{2}
...wrt{n}
consistently delivers correct results across a variety ofn
-ary functions. A while ago, @mlubin floated the idea of making a fuzzer to generate test expressions from lists of allowable univariate and bivariate functions; I think that would be a necessity in this case. I implemented a really naive one here, but removed in a subsequent commit because it needed better logic to avoid building functions that would inherently throwDomainError
s.
- Check that using
-
- Pass in arguments of heterogeneous type to the functions from 1), checking that results are still consistently correct. Additionally, compare performance with the same functions when the arguments are homogeneously typed. Finally, compare performance with the same functions, but change the internal methods to replace all arguments with the relevant
ForwardDiffNum
type (real-valued for all nonwrt
args). For example,hessian(wrt{2}, f, x, y, z)
would be redefined to evaluate something likef(HessianNum(x, zero_partials...), HessianNum(y, one_partials...), HessianNum(z, zero_partials...))
(its current behavior is to evaluatef(x, HessianNum(y, one_partials...), z)
). That's basically psuedo-code - I'll probably need to implement better constructors/convert methods to actually get this done.
- Pass in arguments of heterogeneous type to the functions from 1), checking that results are still consistently correct. Additionally, compare performance with the same functions when the arguments are homogeneously typed. Finally, compare performance with the same functions, but change the internal methods to replace all arguments with the relevant
-
- Check univariate performance (e.g. no
wrt{i}
) against #27's univariate performance.
- Check univariate performance (e.g. no
TBH, properly building the fuzzer mentioned in 1) would be a task that might even be worthy of its own package. 3) is something that could be done basically immediately, however, and I'd love to see the results.
from forwarddiff.jl.
I made a package for the fuzzer here. It may take a while before I can start work on it, but at least now there's a centralized place to make contributions to.
from forwarddiff.jl.
@jrevels I was starting on helping with test and performing with benchmark of the wrt-feature branch, but the branch does not merge nicely anymore --- you have done way to much work in the last week. I looked at the code and it seems that your changes are still compatible with your work on the wrt feature. Are they?
from forwarddiff.jl.
This feature is still possible given the new underlying structure, but a lot has changed so it wouldn't just be a straight port (the strategy would essentially be the same, though). The wrt-feature
branch remains as a proof-of-concept, but the feature would have to be re-implemented overtop of the current master to actually work.
Currently, fleshing out our benchmarks and resolving #37 are higher priority than this, since those may change how we implement the wrt
feature.
from forwarddiff.jl.
Now that #37 is resolved, I've revisited this idea, and I think that telling users to make closures is the way to go here. It's a much simpler solution than trying to incorporate this feature into ForwardDiff's API. Feel free to continue to discuss this, but for now, I'm going to close the issue.
To give an explicit example, here's a potential way in which a user should accomplish this:
# I have some function f(x, y, z)
# I want to take the gradient of f with respect to y
const my_cache = ForwardDiffCache()
g_y(x, y, z) = ForwardDiff.gradient(n -> f(x, n, z), y, cache=my_cache)
Obviously, there may be more performant ways of doing the above if you're looping over different y
s while your x
s or z
s are invariant (e.g. construct the closure outside the loop).
If there are still performance concerns with anonymous closures, then this issue could be reopened is depending on what we see from benchmarks/profiling examples. In general, however, I feel like effort to fix closure concerns should be focused on improving Base rather than hacking around the problem in ForwardDiff.
from forwarddiff.jl.
I am re-opening this issue, with a justification and explanation found at the bottom of #77.
from forwarddiff.jl.
Related Issues (20)
- Cancellation with sparse arrays HOT 5
- Implement hessian! for scalar x
- Implement gammalogccdf for ForwardDiff HOT 1
- `ForwardDiff.jacobian` throws error for `fft` HOT 1
- Correctly forming nested dual numbers. HOT 8
- Derivative of a function of derivatives HOT 7
- Symbolics.jl compatibility HOT 1
- Support derivative(f, ::Complex)
- `ForwardDiff` fails to compute correct derivative HOT 3
- Incorrect Hessian by `exp` function HOT 1
- Method ambiguities reported by Aqua HOT 3
- Document internals? HOT 1
- Bug (NaNs) when differentiating eigenvectors of Symmetric matrices
- Error requiring `Symbolics` from `Optimization` HOT 1
- promote_rule ambiguity with AbstractIrrational and ForwardDiff.Dual HOT 2
- Allocation tests broken since Julia 1.9
- LoadError: ArgumentError: Package AdaptStaticArraysCoreExt does not have Adapt in its dependencies: HOT 2
- Working with anonymous functions HOT 2
- DiffResults objects are not re-aliased properly HOT 2
- `gradient!` allocates for matrices but not for vectors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from forwarddiff.jl.