Comments (9)
I think this was solved with PR #58, right?
from survey.jl.
Yes, some CategoricalArray support has been added for SimpleRandomSample
and StratifiedSample
, which is achieving slightly faster groupby
times. Still need to do thorough testing and benchmarking to show that as stratification levels increase, setting the strata vector as a CategoricalArray
results in better performance than as a StringX
type
from survey.jl.
I think it would be great to create multiple dispatch from CategoricalArray
enhacements added inside if-else
ladders of svymean and svytotal.]
So instead of doing elseif isa(x, Symbol) && isa(design.data[!, x], CategoricalArray)
inside svymean(x::Symbol, design::StratifiedSample)
, we can have those conditions as multiple dispatch and separate function for better readability
from survey.jl.
Further, need to quantify and benchmark the improvements from grouping by over CategoricalArrays instead of Strings (which would be naive default for a categorical variable).
from survey.jl.
svymean
and svytotal
give a different output for CategoricalArray
input than for Symbol
input:
julia> apisrs = load_data("apisrs");
julia> srs = SimpleRandomSample(apisrs; weights = :pw);
julia> srs.data.stype = categorical(srs.data.stype);
julia> svymean(:enroll, srs)
1×2 DataFrame
Row │ mean sem
│ Float64 Float64
─────┼──────────────────
1 │ 584.61 27.3684
julia> svymean(:stype, srs)
3×5 DataFrame
Row │ stype counts proportion var se
│ Cat… Int64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────────
1 │ E 142 0.71 0.00100126 0.0316428
2 │ H 25 0.125 0.000531876 0.0230624
3 │ M 33 0.165 0.000669982 0.025884
Also, the standard error for the CategoricalArray
method doesn't exactly match R:
> library(survey)
> data(api)
> srs <- svydesign(id = ~1, weights = ~pw, data = apistrat)
> svymean(~stype, srs)
mean SE
stypeE 0.71376 0.0291
stypeH 0.12189 0.0177
stypeM 0.16435 0.0229
from survey.jl.
okay ill have a look
from survey.jl.
bump
from survey.jl.
> srs <- svydesign(id = ~1, data = apistrat, fpc = ~fpc)
> svymean(~stype, srs)
mean SE
stypeE 0.832972 0.0194
stypeH 0.071126 0.0109
stypeM 0.095902 0.0144
> srs <- svydesign(id = ~1, weights = ~pw, data = apistrat, fpc = ~fpc)
> svymean(~stype, srs)
mean SE
stypeE 0.71376 0.0285
stypeH 0.12189 0.0173
stypeM 0.16435 0.0224
These two are also different. From our Julia result and each other. They are also related to #93. It seems like R doesn't derive weights from fpc
.
from survey.jl.
closing as codebase has changed quite a lot
from survey.jl.
Related Issues (20)
- Allow `fpc` argument in the constructor and implement finite population correction HOT 2
- `bydomain` is hardcoded for `bootstrap` variance HOT 5
- Drop fpc after first stage?
- Jackknife estimates and variance not matching for dclus2 HOT 12
- apiclus2 weights HOT 2
- Thoroughly think about type stability and improved abstractions HOT 1
- Revamped Contributor guidelines
- v0.2 release HOT 1
- `jackknife_variance` hardcoded for binary `func` HOT 3
- Update man page for replicate weights.
- Update README.
- Add feature to compute variance when original weights are not given.
- Export distribution families from GLM package HOT 1
- Density Estimation
- Equation for bootweights HOT 1
- Add links to the data dictionaries HOT 1
- Factor analysis
- Problems with deriving one variable from the other HOT 2
- When passing both weights and popsize
- Test missing for ratio
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from survey.jl.