maximerischard / deconvolutiontests.jl Goto Github PK
View Code? Open in Web Editor NEWTest for equality of two distributions given observations with known measurement errors.
License: Other
Test for equality of two distributions given observations with known measurement errors.
License: Other
Let's look at a situation where we have a standard test (in this case k.s. test) to compare our deconvolution decon.test()
to it. This simulation is built to give assurance that in a situation where we know what to do, the deconvolution test does as good (if not better) than the standard test.
Assuming homoskedastic errors, the KS test is valid, the distributions of Y are the same under both groups. For the DGM
Y_i1~N(mu1, sigma1) + N(0, sigma^2)
Y_i2~N(mu2, sigma2) + N(0, sigma^2)
ks({Y_i1}, {Y_i2})
should have a rejection of 5% when mu1=mu2 and sigma1=sigma2.decon.test({Y_i1}, {Y_i2})
should also have this.But as we move into the alternative, i.e. mu1<mu2, keeping sigma1=sigma2, both tests will begin to reject H0 more often, eventually leading to 100% power when the separation between mu1, mu2 is large enough (relative to sigma1=sigma2 AND sigma).
It would be good to see a power profile for increasing separation for varying levels of noise sigma1=sigma2 as well as error noise sigma.
This can easily get out of hand, even running MLEs (like in my simple example) takes ~20 minutes to get smooth power profiles (2000 replicates). So this may have to wait until Issue 8 is resolved.
Another addition to the "what's the harm?" Section. One is to run Testing on noisy data, the other is this.
Vinay and Paul Green seem convinced that one can simply deconvolve the data and conduct tests on the resulting distributions (ignoring the resampling of MEs). This is wrong because we won't have the correct Null distribution to test against. But what's the harm?
This needs to be investigated to see what the harm is. We can run a simulation where we simply deconvolve (using a few options even) and KS test directly. We can push this to extremes where the resulting statistics won't have KS distributions -- I can't think directly of situations when this is the case.
The package provides a common framework for hypothesis tests in julia. It would be neat if our test was compatible with it.
When the errors are homoscedastic, the KS test is valid. Under that circumstance, how does the deconvolution+bootstrap+KS test compare to the traditional KS test? Is there a loss of power?
It would be straightforward to perform some simulations of this, but perhaps more interesting to dig into theory a little bit and try to understand the difference.
We need to think about what simulations to perform.
Some parameters include:
What theoretical questions do we wish to ask and (hopefully) answer? What questions will people ask?
Some I can think of:
Luis, do you already have a notebook/TeX typing up our notes? Can you add it to the repository?
The first step of our algorithm is the deconvolution of all the data (obtaining an estimate of F_0). My intuition was that the uncertainty in this deconvolution is not crucial. But is this correct? Can we make this intuition more precise?
It would be possible to incorporate the uncertainty in the deconvolution by first bootstrapping the original data (nonparametric bootstrap). What do we gain by doing so? Is it more valid? more robust? more powerful?
Are there other ways to handle this uncertainty?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.