Giter Club home page Giter Club logo

Comments (7)

aplavin avatar aplavin commented on June 12, 2024

Bump...

from statsbase.jl.

aplavin avatar aplavin commented on June 12, 2024

Gentle bump...
I understand that developer time is always scarce, so: would a PR fixing this be accepted?

from statsbase.jl.

devmotion avatar devmotion commented on June 12, 2024

I wasn't involved in the previous discussions in #671 and #814, but my impression was that the errors are intentional and the previous behaviour (no errors) was considered a bug. I think I tend to agree with this view since weights are supposed to sum to 1 (or at least it should be able to normalize them to such weights), which is not possible if a weight is NaN or Inf.

from statsbase.jl.

aplavin avatar aplavin commented on June 12, 2024

The whole point of NaN is that it propagates, and plain aggregations play nicely with that – mean of an array with NaN is NaN. This is natural and convenient when doing many aggregations, with a few of them having NaNs: they just appear in the correct locations in the results, instead of requiring try/catch or manual isnan checks everywhere.
It's surprising and inconvenient that as soon as one needs weighted aggregations, suddenly there's no way around – manual trycatch/isnan checks are required.

from statsbase.jl.

devmotion avatar devmotion commented on June 12, 2024

I don't see a clear inconsistency, there's no restriction on the values and Inf/NaN propagate in the same way for weighted operations:

julia> using StatsBase

julia> mean([1,2,3,Inf])
Inf

julia> mean([1,2,3,Inf], weights([1,1,1,1]))
Inf

julia> mean([1,2,3,NaN])
NaN

julia> mean([1,2,3,NaN], weights([1,1,1,1]))
NaN

The restriction only applies to the weights themselves because they have to sum up to 1.

from statsbase.jl.

aplavin avatar aplavin commented on June 12, 2024

The restriction only applies to the weights themselves because they have to sum up to 1.

And if they don't because there's a NaN – then the result is... Not a Number :)
Exactly the same as with mean([1, NaN]) that already returns NaN because there's no actual number that represents the result. I don't really see any difference in these situations, and the same motivation works in both cases.

Consider two scenarios:

Have an array of arrays, and want to compute mean of each inner array:

mean.(A)

and NaN are correctly propagated.

Have an array of arrays of values, and array of arrays of weights, and want to compute weighted means?

mean.(A, weight.(W))

doesn't work as soon as there are any NaNs inside W, and requires manual handling.

from statsbase.jl.

aplavin avatar aplavin commented on June 12, 2024

For those stumbling across the same issue: AbstractWeights and many functions that take them actually support NaNs. Defining another type MyWeights <: AbstractWeights without NaN checks in the constructor will work just fine!

Or, even easier – just overload the Weights type constructor by putting this code somewhere:

using StatsBase
@generated Weights{S,T,V}(values, sum) where {S<:Real, T<:Real, V<:AbstractVector{T}} = Expr(:new, Weights{S,T,V}, :values, :sum)

It doesn't change any existing behavior, just adds support for non-finite values in weight vectors.
Then:

julia> mean([1,2], weights([1,NaN]))
NaN

from statsbase.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.