Comments (7)
Bump...
from statsbase.jl.
Gentle bump...
I understand that developer time is always scarce, so: would a PR fixing this be accepted?
from statsbase.jl.
I wasn't involved in the previous discussions in #671 and #814, but my impression was that the errors are intentional and the previous behaviour (no errors) was considered a bug. I think I tend to agree with this view since weights are supposed to sum to 1 (or at least it should be able to normalize them to such weights), which is not possible if a weight is NaN
or Inf
.
from statsbase.jl.
The whole point of NaN
is that it propagates, and plain aggregations play nicely with that – mean
of an array with NaN
is NaN
. This is natural and convenient when doing many aggregations, with a few of them having NaNs: they just appear in the correct locations in the results, instead of requiring try/catch
or manual isnan
checks everywhere.
It's surprising and inconvenient that as soon as one needs weighted aggregations, suddenly there's no way around – manual trycatch/isnan checks are required.
from statsbase.jl.
I don't see a clear inconsistency, there's no restriction on the values and Inf
/NaN
propagate in the same way for weighted operations:
julia> using StatsBase
julia> mean([1,2,3,Inf])
Inf
julia> mean([1,2,3,Inf], weights([1,1,1,1]))
Inf
julia> mean([1,2,3,NaN])
NaN
julia> mean([1,2,3,NaN], weights([1,1,1,1]))
NaN
The restriction only applies to the weights themselves because they have to sum up to 1.
from statsbase.jl.
The restriction only applies to the weights themselves because they have to sum up to 1.
And if they don't because there's a NaN
– then the result is... Not a Number :)
Exactly the same as with mean([1, NaN])
that already returns NaN
because there's no actual number that represents the result. I don't really see any difference in these situations, and the same motivation works in both cases.
Consider two scenarios:
Have an array of arrays, and want to compute mean
of each inner array:
mean.(A)
and NaN
are correctly propagated.
Have an array of arrays of values, and array of arrays of weights, and want to compute weighted mean
s?
mean.(A, weight.(W))
doesn't work as soon as there are any NaN
s inside W
, and requires manual handling.
from statsbase.jl.
For those stumbling across the same issue: AbstractWeights
and many functions that take them actually support NaNs. Defining another type MyWeights <: AbstractWeights
without NaN checks in the constructor will work just fine!
Or, even easier – just overload the Weights
type constructor by putting this code somewhere:
using StatsBase
@generated Weights{S,T,V}(values, sum) where {S<:Real, T<:Real, V<:AbstractVector{T}} = Expr(:new, Weights{S,T,V}, :values, :sum)
It doesn't change any existing behavior, just adds support for non-finite values in weight vectors.
Then:
julia> mean([1,2], weights([1,NaN]))
NaN
from statsbase.jl.
Related Issues (20)
- `wsample(w)` behaves not as the documentation describes it
- Move `entropy` to StatsAPI.jl and import it from there?
- Cronbach's alpha
- std,var return NaN when there is only one element in array. HOT 1
- Adopting Transducers.jl as a Dependency HOT 2
- Differences between zscore(X) and standardize(ZScoreTransform, X)
- bin and reduce
- additional higher-level `rle` api?
- zscore with missing values HOT 5
- Cumulant function is not numerically stable HOT 4
- StatsBase.mad computes the wrong median absolute deviation HOT 1
- Inconsistencies between `mean` and `mean!` when using 0-Arrays.
- Duplicate function and help entry
- counts should accept AbstractUnitRange as levels argument
- Method ambiguities reported by Aqua HOT 1
- [feature request] allow `transform` to avoid Z-score transforming when sigma=0 HOT 1
- Improve performance of describe
- `alias_sample!` can be faster
- Return type of entropy changes with the base HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from statsbase.jl.