Giter Club home page Giter Club logo

Comments (21)

mlubin avatar mlubin commented on May 23, 2024

In which cases is it not possible to achieve this behavior by using a closure?

from forwarddiff.jl.

gragusa avatar gragusa commented on May 23, 2024

I am using it inside an optimization problem (through MathProgBase), The
user passes a function which is used to set up the constraint. I use
ForwardDiff.jl to automatically get the Jacobian and Hessian of the
constraints where jacobian and hessian depends on other variables. I cannot
use closure because construct the Jacobian and Hessian function outside the
optimization.

I could construct fad_jacobian and fad_hessian inside by creating a
closure inside eval_jac_g, but this would kill performances.

Giuseppe Ragusa

On Wed, Aug 5, 2015 at 4:05 PM, Miles Lubin [email protected]
wrote:

In which cases is it not possible to achieve this behavior by using a
closure?


Reply to this email directly or view it on GitHub
#32 (comment)
.

from forwarddiff.jl.

jrevels avatar jrevels commented on May 23, 2024

I might be misunderstanding, but does it help that the new API allows one to easily take derivatives/gradients/etc. at the provided point without having to build a closure first?

For example, using the old API, you have to construct ∇f before you can evaluate ∇f(x):

julia> using ForwardDiff

julia> f(x) = sin(x[1]) + cos(x[2])^2 + tan(x[3])^3
f (generic function with 1 method)

julia> g = forwarddiff_gradient(f, Float64)
gradf (generic function with 2 methods)

julia> g([1.0,2.0,3.0])
3-element Array{Float64,1}:
 0.540302
 0.756802
 0.0621972

Using the new API, you can simply perform the evaluation ∇f(x) without actually constructing ∇f:

julia> using ForwardDiff

julia> f(x) = sin(x[1]) + cos(x[2])^2 + tan(x[3])^3
f (generic function with 1 method)

julia> gradient(f, [1.0, 2.0, 3.0])
3-element Array{Float64,1}:
 0.540302
 0.756802
 0.0621972

Similar methods exist for Jacobians/Hessians/etc. The tentative documentation for the new API can be found in the README of #27's branch.

from forwarddiff.jl.

gragusa avatar gragusa commented on May 23, 2024

The new API is great. But it is still not up to the task for the sort of problems I am dealing with. To be more concrete, I tried to make a simple example where passing down argument is necessary.

I have a function, f(θ) in R^k -> R^(n x m). This function is used to define another function h(u) in R^(n x m) -> R^m

function h(u)
    p = u[1:n]
    θ = u[n+1:end]
    vec(mean(p.*f(θ),1))
end

The Jacobian with respect to u is given by [f(θ)'/n \partial h(u)/\partial θ].

With the new API I can use

jacobian(h, [ones(n); 0])

which gives

3x7 Array{Float64,2}:
 -0.000217836   0.0556389   0.0816585   0.000563994   0.112706    0.0099686  -0.332284
 -0.00168171   -0.329863   -0.0724125  -0.0232947    -0.0784092   0.030865   -1.35021
 -0.000336177  -0.0760003  -0.0558016   0.00272413    0.0931781  -0.056837   -0.796085

where the 3x6 block is

f([.0])'/6
3x6 Array{Float64,2}:
 -0.000217836   0.0556389   0.0816585   0.000563994   0.112706    0.0099686
 -0.00168171   -0.329863   -0.0724125  -0.0232947    -0.0784092   0.030865
 -0.000336177  -0.0760003  -0.0558016   0.00272413    0.0931781  -0.056837

In this case, jacobian is already calculating 18 more derivatives that it needs to.

The things get worse since I am also interested to the hessian of the following function

function v(u)
    p = u[1:n]
    θ = u[n+1:n+1]
    λ = u[n+2:end]
    sum(p.*f(θ)*λ)
end

This is sparse as the second derivtives wrt to p are all 0. I am also not interested in the hessian with respect to λ.

If I do

hessian(v, [ones(n); 0., ones(3)])
10x10 Array{Float64,2}:
  0.0          0.0        0.0        0.0          0.0        0.0        -9.61983   -0.00130702  -0.0100903  -0.00201706
  0.0          0.0        0.0        0.0          0.0        0.0        -2.88912    0.333833    -1.97918    -0.456002
  0.0          0.0        0.0        0.0          0.0        0.0        -0.097451   0.489951    -0.434475   -0.33481
  0.0          0.0        0.0        0.0          0.0        0.0         0.131498   0.00338397  -0.139768    0.0163448
  0.0          0.0        0.0        0.0          0.0        0.0        -2.10706    0.676236    -0.470455    0.559068
  0.0          0.0        0.0        0.0          0.0        0.0        -0.289503   0.0598116    0.18519    -0.341022
 -9.61983     -2.88912   -0.097451   0.131498    -2.10706   -0.289503    0.0       -1.9937      -8.10126    -4.77651
 -0.00130702   0.333833   0.489951   0.00338397   0.676236   0.0598116  -1.9937     0.0          0.0         0.0
 -0.0100903   -1.97918   -0.434475  -0.139768    -0.470455   0.18519    -8.10126    0.0          0.0         0.0
 -0.00201706  -0.456002  -0.33481    0.0163448    0.559068  -0.341022   -4.77651    0.0          0.0         0.0

Since in my use case n is very large (in the thousands) and θ is very small (in the tens) having the possibility to pass down additional argument would make this computation feasible.

Both h and v are used inside a loop --- and each iteration of the loop gives the value of p and λ. I could do it using closure. But the closure would be created at each iteration. Isn't this too costly?

from forwarddiff.jl.

mlubin avatar mlubin commented on May 23, 2024

But the closure would be created at each iteration. Isn't this too costly?

Not necessarily, and why not create the closure once instead of at each iteration?

from forwarddiff.jl.

gragusa avatar gragusa commented on May 23, 2024

Because, for instance, λ is a lagrange multiplier. I am using this type of function into the MathProgBase interface. If I create the closure 'outside' the interface I have to make the λ a global variable. That is, I have scoping issues.

Probably I my resolve the issue by looking at advanced closures alternatives #102.

from forwarddiff.jl.

mlubin avatar mlubin commented on May 23, 2024

Do you have any benchmarks showing the performance difference here?

from forwarddiff.jl.

gragusa avatar gragusa commented on May 23, 2024

Not off-hand --- I changed completely the interface to avoid globals --- but I remember getting a ~5x speedup plus a reduction in memory allocation using my hacked ForwardDiff API (that allows extra argument).

from forwarddiff.jl.

mlubin avatar mlubin commented on May 23, 2024

Were you using globals or closures?

from forwarddiff.jl.

gragusa avatar gragusa commented on May 23, 2024

Globals. I thought that the closure performance hit is (or used to be?) large.

from forwarddiff.jl.

mlubin avatar mlubin commented on May 23, 2024

Not nearly as bad as using globals. If you can put together a realistic but small benchmark, I think that would help sort out the issues here.

from forwarddiff.jl.

jrevels avatar jrevels commented on May 23, 2024

I just pushed a branch that extends the API to allow for targeting specific arguments. It was quite a fun little problem to try to solve generically. Scroll down to the bottom of the README for the new stuff.

I didn't want to add this to #27 yet, because I haven't developed good tests for it. I believe that the strategy I'm using can be implemented such that there is no (or very little) loss of performance due to the generalization. It's very possible that what I just pushed will be that fast, but I need to do to some more rigorous performance comparisons before I make any claims.

My guess is that the main performance hit for the wrt-feature branch will come from conversions between the evaluated target argument and the non-partial arguments (with no extra conversions being done in the 1-arg case, so no performance hit there). For similar reasons, I also imagine that rigorous testing of the wrt-feature branch will reveal that we need more comprehensive conversion code, but we can tackle that once we get there.

Finally, wrt-feature stuff is backwards compatible with #27, so it can be merged after #27 and not break the existing API.

from forwarddiff.jl.

gragusa avatar gragusa commented on May 23, 2024

@jrevels this looks outstanding. I will try it soon. I will contribute with extensive performance testing.

from forwarddiff.jl.

jrevels avatar jrevels commented on May 23, 2024

@gragusa Awesome, thanks! Also, remember - since I don't have automated tests for it yet, there are probably bugs that may or may not actually throw errors. Silent bugs could corrupt your results without any warning, so be wary of results you get from that branch for the time being.

from forwarddiff.jl.

gragusa avatar gragusa commented on May 23, 2024

I can help with testing. Do you have anything in mind? Adjusting the old
ones to the new API extended to the new features?
On Fri 7 Aug 2015 at 14:47 Jarrett Revels [email protected] wrote:

@gragusa https://github.com/gragusa Awesome, thanks! Also, remember -
since I don't have automated tests for it yet, there are probably bugs that
may or may not actually throw errors. Silent bugs could corrupt your
results without any warning, so be wary of results you get from that branch
for the time being.


Reply to this email directly or view it on GitHub
#32 (comment)
.

from forwarddiff.jl.

jrevels avatar jrevels commented on May 23, 2024

Pretty much. Like I mentioned, wrt-feature's API should be backwards-compatible with #27's API, so all the tests from #27 should remain in place.

The things that immediately come to mind that need to be tested for wrt-feature:

    1. Check that using wrt{1}, wrt{2}...wrt{n} consistently delivers correct results across a variety of n-ary functions. A while ago, @mlubin floated the idea of making a fuzzer to generate test expressions from lists of allowable univariate and bivariate functions; I think that would be a necessity in this case. I implemented a really naive one here, but removed in a subsequent commit because it needed better logic to avoid building functions that would inherently throw DomainErrors.
    1. Pass in arguments of heterogeneous type to the functions from 1), checking that results are still consistently correct. Additionally, compare performance with the same functions when the arguments are homogeneously typed. Finally, compare performance with the same functions, but change the internal methods to replace all arguments with the relevant ForwardDiffNum type (real-valued for all non wrt args). For example, hessian(wrt{2}, f, x, y, z) would be redefined to evaluate something like f(HessianNum(x, zero_partials...), HessianNum(y, one_partials...), HessianNum(z, zero_partials...)) (its current behavior is to evaluate f(x, HessianNum(y, one_partials...), z)). That's basically psuedo-code - I'll probably need to implement better constructors/convert methods to actually get this done.
    1. Check univariate performance (e.g. no wrt{i}) against #27's univariate performance.

TBH, properly building the fuzzer mentioned in 1) would be a task that might even be worthy of its own package. 3) is something that could be done basically immediately, however, and I'd love to see the results.

from forwarddiff.jl.

jrevels avatar jrevels commented on May 23, 2024

I made a package for the fuzzer here. It may take a while before I can start work on it, but at least now there's a centralized place to make contributions to.

from forwarddiff.jl.

gragusa avatar gragusa commented on May 23, 2024

@jrevels I was starting on helping with test and performing with benchmark of the wrt-feature branch, but the branch does not merge nicely anymore --- you have done way to much work in the last week. I looked at the code and it seems that your changes are still compatible with your work on the wrt feature. Are they?

from forwarddiff.jl.

jrevels avatar jrevels commented on May 23, 2024

This feature is still possible given the new underlying structure, but a lot has changed so it wouldn't just be a straight port (the strategy would essentially be the same, though). The wrt-feature branch remains as a proof-of-concept, but the feature would have to be re-implemented overtop of the current master to actually work.

Currently, fleshing out our benchmarks and resolving #37 are higher priority than this, since those may change how we implement the wrt feature.

from forwarddiff.jl.

jrevels avatar jrevels commented on May 23, 2024

Now that #37 is resolved, I've revisited this idea, and I think that telling users to make closures is the way to go here. It's a much simpler solution than trying to incorporate this feature into ForwardDiff's API. Feel free to continue to discuss this, but for now, I'm going to close the issue.

To give an explicit example, here's a potential way in which a user should accomplish this:

# I have some function f(x, y, z)
# I want to take the gradient of f with respect to y

const my_cache = ForwardDiffCache()

g_y(x, y, z) = ForwardDiff.gradient(n -> f(x, n, z), y, cache=my_cache)

Obviously, there may be more performant ways of doing the above if you're looping over different ys while your xs or zs are invariant (e.g. construct the closure outside the loop).

If there are still performance concerns with anonymous closures, then this issue could be reopened is depending on what we see from benchmarks/profiling examples. In general, however, I feel like effort to fix closure concerns should be focused on improving Base rather than hacking around the problem in ForwardDiff.

from forwarddiff.jl.

papamarkou avatar papamarkou commented on May 23, 2024

I am re-opening this issue, with a justification and explanation found at the bottom of #77.

from forwarddiff.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.