Comments (5)
I am making a bare minimum multistage survey design.
from survey.jl.
Im not sure the approximations for mean and variance would work out this way. You may be mathematically correct but it should be checked beforehand. Current attempt at SurveyDesign tries to infer whether given inputs/data is SRS or Stratified.
Will look more into this
from survey.jl.
"In the single-stage approximation the PSUs are treated as strata and the second-stage sampling units are treated as PSUs."
Rereading through Chap3 of Lumley
from survey.jl.
I am attempting to make a structure for cluster sampling, which accommodates multiple stages. My work in progress is in #134
from survey.jl.
@ayushpatnaikgit @iuliadmtru
I have been going through R svydesign.default
function for the below 2-stage cluster sample command:
dclus2<-svydesign(id=~dnum+snum, fpc=~fpc1+fpc2, data=apiclus2)
I did trace(svydesign,browser)
in R, and went through each step.
We could take some cues from R for the generalised survey design.
Observations
- They have a lot of checks for
NULL
andtypeof
argument. There is afailsafe
function that has logic if an argument is not given. In Julia this could/should be incorporated as multiple dispatches to increase legibility and code quality. - Multiple clusters/strata are stored together in string concatenated form separated by '.'. Eg in above command the two cluster columns are stored in the design object as
fpc1
, andfpc1+'.'+fpc2
! strata
are created and filled even if nostrata
provided. See appendix below.- There is an
allprobs
matrix andprobs
vector.allprobs
has the probabilities for each stage, whileprobs
seems to be the net sampling probability (which is the product of each column inallprobs
).rval$prob <- apply(probs, 1, prod); rval$allprob <- probs
weights
are not stored in the design object, onlyprobs
. If they ever needweights
they just doas.matrix(1/probs)
.- There is a neat function called
as.fpc
which calculates and returns thepopsize
andsampsize
for the design, given all arguments.fpc <- as.fpc(fpc, strata, ids, pps = pps)
fpc correction
R logic when no probs
or weights
given and popsize
could not be inferred. Related to #110 and #93 . So it is not that different from what is currently implemented for SRS
and Stratified
?
if (is.null(probs) && is.null(weights)) {
if (is.null(fpc$popsize)) {
if (missing(probs) && missing(weights))
warning("No weights or probabilities supplied, assuming equal probability")
probs <- rep(1, nrow(ids))
}
else {
probs <- 1/weights(fpc, final = FALSE)
}
}
Appendix
Observe how they store strata (even when not stratified). dclus2[["strata"]][["V2"]]
is just a vector of ones
> dclus2[["strata"]][["V2"]]
[1] 1.15 1.63 1.83 1.83 1.83 1.117 1.132 1.132 1.132 1.152 1.152
[12] 1.152 1.173 1.173 1.173 1.173 1.176 1.198 1.198 1.198 1.198 1.200
[23] 1.200 1.200 1.200 1.200 1.228 1.228 1.264 1.295 1.295 1.295 1.295
[34] 1.295 1.302 1.302 1.302 1.302 1.403 1.403 1.403 1.403 1.403 1.452
[45] 1.452 1.452 1.452 1.456 1.480 1.480 1.480 1.480 1.480 1.523 1.523
[56] 1.534 1.534 1.534 1.534 1.534 1.549 1.549 1.549 1.549 1.549 1.552
[67] 1.552 1.570 1.570 1.570 1.570 1.570 1.574 1.575 1.575 1.575 1.575
[78] 1.575 1.596 1.596 1.596 1.596 1.596 1.620 1.620 1.620 1.620 1.620
[89] 1.638 1.638 1.638 1.638 1.638 1.639 1.639 1.639 1.639 1.639 1.674
[100] 1.674 1.679 1.679 1.679 1.679 1.687 1.687 1.687 1.701 1.701 1.711
[111] 1.711 1.719 1.731 1.731 1.731 1.731 1.731 1.742 1.768 1.768 1.781
[122] 1.781 1.781 1.781 1.781 1.795
40 Levels: 1.117 1.132 1.15 1.152 1.173 1.176 1.198 1.200 ... 1.83
from survey.jl.
Related Issues (20)
- Allow `fpc` argument in the constructor and implement finite population correction HOT 2
- `bydomain` is hardcoded for `bootstrap` variance HOT 5
- Drop fpc after first stage?
- Jackknife estimates and variance not matching for dclus2 HOT 12
- apiclus2 weights HOT 2
- Thoroughly think about type stability and improved abstractions HOT 1
- Revamped Contributor guidelines
- v0.2 release HOT 1
- `jackknife_variance` hardcoded for binary `func` HOT 3
- Update man page for replicate weights.
- Update README.
- Add feature to compute variance when original weights are not given.
- Export distribution families from GLM package HOT 1
- Density Estimation
- Equation for bootweights HOT 1
- Add links to the data dictionaries HOT 1
- Factor analysis
- Problems with deriving one variable from the other HOT 2
- When passing both weights and popsize
- Test missing for ratio
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from survey.jl.