Differentiation of functions of the form `f!(x, y)` where `x` is the input and `y` is the output about forwarddiff.jl HOT 10 CLOSED

dbeach24 commented on May 16, 2024

Differentiation of functions of the form `f!(x, y)` where `x` is the input and `y` is the output

from forwarddiff.jl.

Comments (10)

jrevels commented on May 16, 2024

If memory serves, a pervious version of ForwardDiff supported functions that looked like this:

Unless I slipped up somewhere, ForwardDiff v0.1.0 should strictly support all the features of the previous version, plus a few more. There's even a deprecation wrapper that allows all of the old methods to still be used.

How were you doing this with the old version?

Is there any way take the Jacobian of a vector-valued function written using the result placement style?

FAD techniques mainly rely on sneakily passing overloaded types into your function, so if you use a naively-typed of Vector for xout (e.g. Vector{Float64}), InexactErrors are going to get thrown when your functions tries to store instances of ForwardDiff's types in xout.

So, you can accomplish this by passing in an xout that is typed appropriately:

using ForwardDiff

function sphere2cart!(xin::Vector, xout::Vector)
    rho, theta, phi = xin
    rho_sin_phi = rho * sin(phi)
    xout[1] = rho_sin_phi * cos(theta)
    xout[2] = rho_sin_phi * sin(theta)
    xout[3] = rho * cos(phi)
    return xout # added this line so that it's truly a Vector --> Vector function
end

# The output vector needs to be able to store ForwardDiff's special
# overloaded number type
const my_xout = Vector{ForwardDiff.GradNumTup{3,Float64}}(3)

sphere2cart(xin::Vector) = sphere2cart!(xin, my_xout)

j = ForwardDiff.jacobian(sphere2cart)

This is definitely a hack, though, as it relies on knowledge of ForwardDiff's implementation that I would never expect end-users to be aware of.

More importantly, I benchmarked my above "hack" against the non-mutating method (the sphere2cart you defined above) and I didn't see a significant runtime performance increase, only a slight drop in memory usage from skipping the allocation of a result vector. In more complex functions (esp. ones with higher input dimensions), I would guess that any benefit from the slight memory decrease will get dwarfed by the general memory usage of differentiating the function.

from forwarddiff.jl.

jrevels commented on May 16, 2024

More importantly, I benchmarked my above "hack" against the non-mutating method (the sphere2cart you defined above) and I didn't see a significant runtime performance increase, only a slight drop in memory usage from skipping the allocation of a result vector.

For clarity, I was comparing taking the Jacobian of each, not the performance of normal execution of each (obviously the mutating method will be more performant if we're just talking about the functions themselves).

from forwarddiff.jl.

jrevels commented on May 16, 2024

If memory serves, a pervious version of ForwardDiff supported functions that looked like this:

Unless I slipped up somewhere, ForwardDiff v0.1.0 should strictly support all the features of the previous version, plus a few more. There's even a deprecation wrapper that allows all of the old methods to still be used.

How were you doing this with the old version?

Well damn, you're right - you can find a reference to this feature here.

from forwarddiff.jl.

jrevels commented on May 16, 2024

Adding back in support for this wouldn't be very hard, I think; here's the code in the old version that implements this.

@mlubin Do you have any thoughts on whether we should work to support this in the new version? I could change the current jacobian method to perform the "hack" I worked out above:

function jacobian{A}(f, ::Type{A}=Void;
                     mutates::Bool=false,
                     chunk_size::Int=default_chunk_size,
                     cache::ForwardDiffCache=ForwardDiffCache(),
                     output_length::Int=0)
    if output_length > 0 #if output_length > 0, assume f is of the form f!(y, x) where y is the output
        newf(x) = f(get_output!(cache, out, Val{chunk_size}, eltype(x)), x)
    else
        newf = f
    end

    if mutates
        function j!(output::Matrix, x::Vector)
            return ForwardDiff.jacobian!(output, newf, x, A;
                                         chunk_size=chunk_size,
                                         cache=cache)
        end
        return j!
    else
        function j(x::Vector)
            return ForwardDiff.jacobian(newf, x, A;
                                        chunk_size=chunk_size,
                                        cache=cache)
        end
        return j
    end
end

Then all that would need to be implemented is get_output! for the cache, which should be really easy to write. I'm just hesitant to add it as a feature.

from forwarddiff.jl.

dbeach24 commented on May 16, 2024

Thanks for the very thorough follow-up! I'm glad that I'm not misremembering that an early version that supported mutator functions. (And thanks for the better issue name, too!)

Yes, my goal is to write a single version of the function which is both fast (i.e. uses result placement for evaluation of the function itself), and which also is supported by ForwardDiff. As a bonus, it would be nice if the function definition was generic and did not rely on any specific types defined in ForwardDiff. (I had believed that using the unspecialized Vector type was sufficient for this.

The problem with definitions like this is the use of global values for the result:

# The output vector needs to be able to store ForwardDiff's special
# overloaded number type
const my_xout = Vector{ForwardDiff.GradNumTup{3,Float64}}(3)

sphere2cart(xin::Vector) = sphere2cart!(xin, my_xout)

j = ForwardDiff.jacobian(sphere2cart)

If the caller fails to immediately copy the result, new evaluations will overwrite the value of earlier ones. Moreover, if & when julia gains multi-threading support, global result storage would violate thread-safety.

For clarity, I was comparing taking the Jacobian of each, not the performance of normal execution of each (obviously the mutating method will be more performant if we're just talking about the functions themselves).

It appears that you're making the argument that the complexity of computing the jacobian via dual numbers significantly outweighs the overhead of dynamic allocation, such that returning a newly allocated vector is not a concern. I have no doubt that this could be true for high-dimensional vector-valued functions. However, my interests are in being able to simultaneously compute the value and jacobian of a small dynamic state function where N=3..6. At these sizes, is it still the case that dynamic allocation costs are dwarfed by the computational complexity of the jacobian, itself? You might be right, I'm just not sure.

Sorry if the tone sounds at all argumentative here... Thanks again for the help!

from forwarddiff.jl.

jrevels commented on May 16, 2024

Yes, my goal is to write a single version of the function which is both fast (i.e. uses result placement for evaluation of the function itself), and which also is supported by ForwardDiff.

A reasonable goal, I think!

As a bonus, it would be nice if the function definition was generic and did not rely on any specific types defined in ForwardDiff. (I had believed that using the unspecialized Vector type was sufficient for this.

It is, generally, and the type annotations in the example you gave are totally fine (though if you had typed them too specifically, e.g. Vector{Float64}, it would haven't worked - probably should note that in the documentation). Mutation of an input vector within the function is also totally allowed; the only tricky thing here was that a vector that ForwardDiff didn't "know" to overload was getting mutated.

If the caller fails to immediately copy the result, new evaluations will overwrite the value of earlier ones.

The usefulness of having a mutating function in the first place, though, is that you can reuse the same output vector for multiple evaluations. Being able to overwrite values once you're done with the result, instead of having to allocate new space for them, is precisely the reason why you'd use a mutating function instead of a non-mutating one.

Moreover, if & when julia gains multi-threading support, global result storage would violate thread-safety.

ForwardDiff.jl isn't yet designed with thread-safety in mind - there are a host of other issues that would have to be resolved in Base before we would even know just where to begin. ForwardDiff's internal caching layer is also not "global" state (though the user can optionally pass in their own cache, so they could make it so). Restrictions could be added in the future to make it thread-safe.

However, my interests are in being able to simultaneously compute the value and jacobian of a small dynamic state function where N=3..6. At these sizes, is it still the case that dynamic allocation costs are dwarfed by the computational complexity of the jacobian, itself?

I suspect that the lower the input dimension, the more the answer to that question will rely on the specific function being evaluated. Anyway, you should check out using AllResults if you want to grab the value and the Jacobian simultaneously.

Sorry if the tone sounds at all argumentative here...

It doesn't - on the contrary, I'm really glad you brought this up! I tried not to lose any features from previous versions during the v0.1.0 refactoring, but it appears this particular feature slipped through the cracks.

I believe I can re-implement this functionality pretty easily in the current version of ForwardDiff (my comment above basically does it, I just need to add some support in the caching layer), so I'm going to submit a PR soon for it. Stay tuned.

from forwarddiff.jl.

dbeach24 commented on May 16, 2024

Thanks. Regarding this remark:

The usefulness of having a mutating function in the first place, though, is that you can reuse the same output vector for multiple evaluations. Being able to overwrite values once you're done with the result, instead of having to allocate new space for them, is precisely the reason why you'd use a mutating function instead of a non-mutating one.

Yes, I agree. With a mutator/placement approach the caller gets total control regarding if & when former results are overwritten with new data. With a global return vector, this is not the case.

Anyway, you should check out using AllResults if you want to grab the value and the Jacobian simultaneously.

Yeah, I saw this in the docs and it looks like just the thing!

I believe I can re-implement this functionality pretty easily in the current version of ForwardDiff (my comment above basically does it, I just need to add some support in the caching layer), so I'm going to submit a PR soon for it. Stay tuned.

I am eager to try whatever solution you come up with. Thanks again.

from forwarddiff.jl.

dbeach24 commented on May 16, 2024

Oh wow -- Just realized who I've been chatting with. It was great having beers with you at JuliaCon. Hope to see you again next year!

from forwarddiff.jl.

mlubin commented on May 16, 2024

This is an important feature to support, didn't realize it disappeared.

from forwarddiff.jl.

jrevels commented on May 16, 2024

@dbeach24 I thought it was you, but I wasn't absolutely sure - it's obvious now that you have a profile pic! Ditto on the greatness of JuliaCon beers.

from forwarddiff.jl.

Differentiation of functions of the form `f!(x, y)` where `x` is the input and `y` is the output about forwarddiff.jl HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent