stephens999 / ashr Goto Github PK

View Code? Open in Web Editor NEW

79.0 79.0 35.0 1.88 MB

An R package for adaptive shrinkage

License: GNU General Public License v3.0

R 94.98% C++ 5.02%

ashr's People

Contributors

Stargazers

Watchers

ashr's Issues

VB initialization

the VB (non-default) initialization appears not to work - throws an error when I try to use pi.init.
OK when I use pi.init=NULL (which sets pi.init to be all 1s) but not clear why other values don't work.

posterior median

would be nice to implement something to compute posterior medians

CRAN to-do items

Create tag for v2.0.5 on cran branch.
Merge cran branch into master.

nonzeromode in general branch

need to fix this option in the general branch before merging into master

Should we export `autoselect.mixsd`?

Currently for mr-ash via varbvs I have to use ashr:::autoselect.mixsd (three :) because the function autoselect.mixsd is not exported. It is not a big deal to me but if there is no obvious harm to ashr perhaps we should export it to avoid the ugly ::: syntax in other applications.

call returned incorrect

if you call ash.workhorse via ash (as usual) then the call returned is the call to ash.workhorse not to ash.

poisson likelihood with different priors

@mengyin I got errors with these two cases:

Case 1:
set.seed(100)
l = rexp(1)
x = rpois(100,lambd=l)
ashr::ash(rep(0,100),1,lik=ashr::lik_pois(x), mode=0, prior = "+uniform")

Case 2:
set.seed(100)
l = rexp(1)
x = rpois(100,lambd=l)
ashr::ash(rep(0,100),1,lik=ashr::lik_pois(x), mode= "estimate", prior = "halfuniform")

Both returned the error message: "Error in match.arg(prior) : 'arg' should be one of “nullbiased”, “uniform”, “unit”". And when prior = "uniform", the code runs okay with no errors. Thanks so much!

diagnostic plot

would be good to produce a diagnostic plot.
For each element betahat_j compute where it falls (ie its quantile) in its predictive distribution
p( betahat_j | ghat, s_j)

If the model fits well this should be uniform, so do a qq-plot against uniform quantiles.

This may help diagnose dependence among tests

speed up the mode estimation procedure?

The mode estimation procedure might be too slow especially when we need search the optimal mode in a big range. E.g. in Joyce's Poisson count (singlecell data) application, the counts range is (0,900) so it takes pretty long time to estimate the mode.

We guess one possible way to fix the issue is to start search with a better starting point? However, current ashn() uses stats::optimize function, which doesn't allow user to provide starting point...

ash fails when optmethod==mixIP but REBayes not installed

this was deliberate, but is causign some unit tests to fail.
Possibly better to issue a warning in this case and default to EM?

rename likelihood functions

we should rename all the likelihoods lik_xxx instead of xxx_lik. This would allow a user to easily see all the likelihoods available by typing lik_ and tab to complete.

document new output options in general branch

also the format is rather unfriendly right now - need to work on this.
For example:
ash(rnorm(100),1,outputlevel=list(resfns = c(ashr:::output_pm)))$res

maybe we need an exported function that creates the output resfns?

ash does not return the right set of returns?

Dear Dr. Stephens,

I tried with great interest the ashr package. I did not install REBayes package, yet only got a warning.

But when I follow the example of the ash function, I only got the following. The values of interest were not returned. Is there anything that I missed here? Is REBayes a requirement for the correct returns? Thank you very much!

 beta = c(rep(0,100),rnorm(100))
 sebetahat = abs(rnorm(200,0,1))
 betahat = rnorm(200,beta,sebetahat)
 beta.ash = ash(betahat, sebetahat)
Due to absence of package REBayes, switching to EM algorithm
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘REBayes’

$pi
[1] 5.120168e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[6] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[11] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.627738e-07
[16] 3.354781e-02 4.983565e-06 1.229349e-12 1.423635e-16 0.000000e+00
[21] 1.336484e-12 1.235669e-09 3.348131e-01 1.196170e-01 3.706764e-15
[26] 3.399726e-17 0.000000e+00 0.000000e+00 3.554808e-08

$a
[1] 0.000000000 -0.001030045 -0.001456703 -0.002060090 -0.002913407
[6] -0.004120179 -0.005826813 -0.008240358 -0.011653626 -0.016480717
[11] -0.023307253 -0.032961433 -0.046614506 -0.065922866 -0.093229012
[16] -0.131845733 -0.186458023 -0.263691465 -0.372916046 -0.527382930
[21] -0.745832093 -1.054765861 -1.491664186 -2.109531722 -2.983328371
[26] -4.219063444 -5.966656743 -8.438126887 -11.933313485

$b
[1] 0.000000000 0.001030045 0.001456703 0.002060090 0.002913407
[6] 0.004120179 0.005826813 0.008240358 0.011653626 0.016480717
[11] 0.023307253 0.032961433 0.046614506 0.065922866 0.093229012
[16] 0.131845733 0.186458023 0.263691465 0.372916046 0.527382930
[21] 0.745832093 1.054765861 1.491664186 2.109531722 2.983328371
[26] 4.219063444 5.966656743 8.438126887 11.933313485

attr(,"row.names")
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29
attr(,"class")
[1] "unimix"

Combine help for ash and ash.workhorse

check that not prior <1

prior values <1 can give infinite penalty, so should be avoided. we should probably enforce this for the user.

Version issue with ashr_2.1-25

Hi there,

I was looking to use vashr, but am running into a version problem with ashr and I think it was when comppostprob was changed to comp_postprob, which take different parameters. I tried to trouble-shoot a couple of things in vash.R but could you let me know for:

 postpi.se[completeobs,] = t(comppostprob(mix.fit$g,rep(numeric(0),n),sehat[completeobs],df))

What would be the equivalent call to comp_postprob? Further details are provided in the original issue:

mengyin/vashr#2

Sorry for the cross-post - it looks like this repo is maintained frequently.

Update document for MOSEK v8

Thanks to @LSun for pointing out this update.

ashci should be stand-alone rather than based on ash object

currently pass in ash object, but really would be neater to make this entirely standalone so that can compute for any b,s given prior g

+uniform option: posteriormean issue

betahat = rnorm(100,0,0.5)
sebeta = 0.6
test.ash = ash(betahat = betahat,sebetahat = sebeta, mixcompdist = "+uniform")
test.ash$fitted.g
plot(test.ash$PosteriorMean)

fitted.g provide right result we want, but the posteriormean has tiny non zero and negative values.

PosteriorMean can be infinity?

When I tried ash with "df=..." option on this example (data here: https://www.dropbox.com/s/d2n9lxzxtir95nh/eg.Rdata?dl=0), the "PosteriorMean" in ash object included a few infinities. The corresponding betahat's were indeed quite large (compared to the other betahat's, yet the absolute values were no more than 2), but I guess it's still weird that their posterior means are infinities.

load("eg.Rdata")
test=ash(betahat,sebetahat,df=df)
sum(test$PosteriorMean==Inf)
[1] 3

If we don't use the "df=..." option, then the posterior means will be fine. I guess there might be some numerical issues when computing the mean of truncated t distribution?

bug with modify.sd0

@MuzheZeng try this
set.seed(100)
z = rnorm(1000,0,2)
ashr::ash(z1,1,"uniform",gridmult=1.1)

I get

Error in temp[sdzero] = ifelse(A[sdzero] <= M[sdzero] & B[sdzero] >= M[sdzero], :
NAs are not allowed in subscripted assignments

installing package ashr

Hello,
I tried installing the package as shown in the ReadMe
library(devtools)
install_github("stephens999/ashr")
But it fails to install it.
can you please help me?
Thank you
Ronen

optmethod returned incorrectly if switch is done due to negative values

very occasionally mixIP returns negative estimates for pi.
Then we switch to optimization by EM.
But in that case the returned ashobject $optmethod is still mixIP.

I've made it so in future the fit object will also return optmethod, and that will be correct.
Do we also need optmethod in the ash object? (the fit object is only returned if outputlevel =4,
so not by default)

Skipifnotinstalled

Should use this functionality in tests instead of checking with require

warnings when compiling

MixSquarem.cpp:55:12: warning: unused variable 'kr' [-Wunused-variable]
double kr=control["kr"];
^
MixSquarem.cpp:66:20: warning: unused variable 'leval' [-Wunused-variable]
int iter,feval,leval;
^
MixSquarem.cpp:212:9: warning: unused variable 'n' [-Wunused-variable]
int n=matrix_lik.nrow(), k=matrix_lik.ncol(),niter;

general ash

@mengyin can you take a look at the code I've got in my general branch?
I think it incorporates much, but not yet all, of what you had in your branch, but without all the repetition.
It seems to run OK.
There are still some things to do, but thought it would be useful for you to look and
see if, for example, the results for logF errors look similar to yours.
(note that for now I think df1 and df2 have to be the same for all observations...; changing this
should not be too hard, but not done yet)

To see how to run it, see my test in tests/testthat/test_logF.R
set.seed(10)
e = rnorm(100)+log(rf(100,df1=10,df2=10))
e.ash = ash(e,1,lik=logF_lik(df1=10,df2=10))
s = sd(log(rf(10000,10,10))) # get approximate standard deviation by simulation
e.ash.norm = ash(e,s)
expect_equal(e.ash.norm$PosteriorMean,e.ash$PosteriorMean,tol=0.05)

and also the examples under ?ash.workhorse

speeding output computation for big problems

the current computation of output is nice and modular, but also much slower
than it needs to be. Possible speeds ups would involve rearranging so that

i) we only compute the posterior component probabilities W once (an n by K matrix)
ii) then we compute the posterior quantities we want P (eg lfsr etc) for each component [each an n by K matrix]
iii) then we compile them by rowsums(W*P)

Also, in the case of excluded observations, it would be best to exclude all of them upfront,
so
-first remove the data that are to be excluded (if any)

after iii) add back in the values for excluded observations

Warnings when using ash(..., df=..., ...)?

There are such warnings when using the "df=..." option of function ash().

test=ash(betahat=rnorm(100,0,1),sebetahat=rep(1,100),df=50)
Warning messages:
1: In if (a == b) { :
the condition has length > 1 and only the first element will be used
2: In if (a == b) { :
the condition has length > 1 and only the first element will be used
3: In if (a == b) { :
the condition has length > 1 and only the first element will be used

Add vignette on computation with general likelihoods

ashgen.Rmd.gz

Thanks to Mengyin.

when g is specified, probably want to default to uniform prior, not null-biased!

generally, the code here is rather messy: need to think a little about all the combinations of specifications that might want to be used, and what sensible behaviour is

problems with poisson likelihood and mode=0

@mengyin i get an error with this:
set.seed(100)
l = rexp(1)
x = rpois(100,lambd=l)
ashr::ash(rep(0,100),1,lik=ashr::lik_pois(x))
ashr::ash(rep(0,100),1,lik=ashr::lik_pois(x),mode=0)

Problems with small sebetahat

Description

I seem to be getting new errors when the standard deviations are small. However, they aren't really all that small.

Example Data and Error

library(ashr)
betahat   <- c(-2.2, -1.2, 1.3, -1.8, 1.5, 0.9, 1.6, -2, 0.6, 1.1)
sebetahat <- c(0.65, 0.95, 0.32, 0.41, 0.1, 0.17, 0.59, 0.49, 0.51, 0.47)
aout      <- ashr::ash.workhorse(betahat = betahat, sebetahat = sebetahat)

## Warning in min(f): no non-missing arguments to min; returning Inf

## Error in t(A) %*% (f * d): non-conformable arguments

Error goes away for much larger sd

aout <- ashr::ash.workhorse(betahat = betahat, sebetahat = sebetahat + 1)

Generally, problem seems to be small SD

set.seed(4235)
aout <- ashr::ash.workhorse(betahat = rnorm(5), sebetahat = rep(0.5, 5))

## Warning in min(f): no non-missing arguments to min; returning Inf

## Error in t(A) %*% (f * d): non-conformable arguments

sessionInfo()

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ashr_2.0.4
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8       knitr_1.15        magrittr_1.5     
##  [4] REBayes_0.73      MASS_7.3-45       doParallel_1.0.10
##  [7] pscl_1.4.9        SQUAREM_2016.8-2  lattice_0.20-34  
## [10] foreach_1.4.3     stringr_1.1.0     tools_3.3.2      
## [13] parallel_3.3.2    grid_3.3.2        htmltools_0.3.5  
## [16] iterators_1.0.8   yaml_2.1.14       assertthat_0.1   
## [19] digest_0.6.10     tibble_1.2        Matrix_1.2-7.1   
## [22] codetools_0.2-15  evaluate_0.10     rmarkdown_1.1    
## [25] stringi_1.1.2     Rmosek_7.1.3      truncnorm_1.0-7

CIs in general branch

this option is broken by the changes in the general branch. Again, needs fixing before merging into master. need to add some basic tests for this option too.

Mosek documentation updates

Fix links to the Mac-specific and Linux-specific instructions for installing mosek and rmosek.
Post warning about expiry of mosek license (it is only valid for 1 year).

Interior point method sometimes fails when given small variances and uniform mixcompdist.

Example dataset:

betahat <- c(0.7775, 0.6091, 0.6880, 0.6897, 0.6565, 0.7505,
             0.7125, 0.7201, 0.7498, 0.7553)
sebetahat <- rep(0.01, length = length(betahat))
ip_out <- ashr::ash(betahat = betahat, sebetahat = sebetahat,
                    mixcompdist = "uniform", optmethod = "mixIP")
ip_out$PosteriorMean
min(ip_out$fitted.g$pi)

EM works fine:

em_out <- ashr::ash(betahat = betahat, sebetahat = sebetahat,
                    mixcompdist = "uniform", optmethod = "mixEM")
em_out$PosteriorMean
summary(em_out$fitted.g$pi)

KWDual outputs a negative mixture weight

Here is the example that produces a negative mixture weight with optmethod = mixIP only:

library(ashr)
betahat <- c(-5.8336751,1.0585307,0.8701328,0.9644079,-1.0796877,-7.7252394,-2.0433250,42.5216280,3.4243654)
sebeta  <- c(0.5003549,0.4223690,0.3462631,0.3316413,0.4076427,0.5276066,0.4302293,0.2051576,0.4270015)
out1    <- ash(betahat,sebeta,method = "fdr",mixcompdist = "normal",optmethod = "mixEM")
out2    <- ash(betahat,sebeta,method = "fdr",mixcompdist = "normal",optmethod = "mixIP")

The ad hoc solution right now is to set all negative mixture weights to 0 (or a small value near 0), and re-normalize the mixture weights. It would be good to come up with a better solution at some point.

Thanks to Wei for identifying this issue.

default prior for general likelihoods

@mengyin for general likelihoods does it make sense to have the "null-biased" prior as
default? Might be better to make it uniform? What is your experience?

Dealing with infinity values in standard errors

seems like in certain (maybe very rare) applications some SEs are inifnite, which would result in a breakdown of the squarem algorithm. Simple solution is to just ignore these cases in ash, as with NA values.

numerical instability in computation of comppostprob for very outlying observations

g = unimix(c(1,0),c(0,0),c(1,2))
comppostprob(g,-10,0.5,NULL)
[,1]
[1,] 1
[2,] 0
comppostprob(g,-20,0.5,NULL)
[,1]
[1,] NaN
[2,] NaN

dens() function gives NaNs when unimix object includes pointmass

For example:

g=unimix(c(0.5,0.5),c(0,0),c(0,1))
dens(g, 0.5)

will output NaN.

I think the reason is that in function comp_dens.unimix(), dunif(d, m$a, m$b, log) produces NaNs when m$a==m$b.

ashCI doesn't work when df varies across observations

this might be useful to have...

Request to Export Functions

I use the following functions in my vicar package:

calc_lfdr, calc_lfsr, calc_np, calc_pm, calc_pp, calc_psd, calc_qvalue, calc_svalue, and estimate_mixprop.

Since these functions aren't exported in ashr (the general branch), my CRAN checks throw me a note. Could you export these functions? It's not a big deal if you prefer not to, though, as I can just copy and paste the functions for my package.

Give warnings for "list" formatted input data

document new output options in general branch

also the format is rather unfriendly right now - need to work on this.
For example:
ash(rnorm(100),1,outputlevel=list(resfns = c(ashr:::output_pm)))$res

maybe we need an exported function that creates the output resfns?

Lots of warnings from `my_e2trunct`.

I am getting a lot of warnings when I use the t-likelihood in ash with small df. I think the warnings come from my_e2trunct. It's already noted that it is unstable for df of 1 or 2, but in the dataset below it seems to be unstable for df of 6.

Example dataset:

options(warn = 2)
betahat <- c(-0.695, -0.497, -0.176, 0.115, -3.757, -0.752, 0.163, -0.308, 0.361)
sebetahat <- c(0.186, 0.221, 0.233, 0.436, 0.548, 0.307, 0.532, 0.997, 0.526)
ashr::ash.workhorse(betahat = betahat, sebetahat = sebetahat, df = 6) ## warnings
ashr::ash.workhorse(betahat = betahat, sebetahat = sebetahat, df = 7) ## works fine

issue with new wrapper ashn

in ashn, why is the default
alpha = seq(from = -abs(mean(betahat)), to = 2 * abs(mean(betahat))
seems like maybe should be
alpha = seq(from = -2*abs(mean(betahat)), to = 2 * abs(mean(betahat))

also in help
alpha and then consider the model betahat_j ~ g()
should be betahat_j-mu \sim g() ??

and finally would be more readable if we replace alpha with mu in help and function

IP optimization error from ashr "general" branch

Without the REBayes package, ash runs successfully (see ash/analysis/adaptive_shrinkage.Rmd):

R> beta.spiky.ash = ash(sim.spiky$input$betahat,s)                             
Due to absence of package REBayes, switching to EM algorithm

However, when I install the REBayes package, I get the following error:

R> install.packages("REBayes")
R> beta.spiky.ash = ash(sim.spiky$input$betahat,s)
Error in t(A) %*% (f * d) : non-conformable arguments

The error appears to be cropping up in a call to the KWDual function in the REBayes package. Has anyone gotten this error before? Any ideas why this may be occurring?

Note I get the same error in the "master" branch version. Also note that this works fine:
beta.spiky.ash = ash(sim.spiky$input$betahat,s,optmethod = "cxxMixSquarem")

numerical issue in comp_cdf_post.unimix in general branch

eg try:

cdf_post(m=unimix(1,-100,100),0,data=set_data(-2000,1,normal_lik(),alpha=1))
[1] 0.5
cdf_post(m=unimix(1,-100,100),0,data=set_data(-20,1,normal_lik(),alpha=1))
[1] 1

they should both be approximately 1.

The problem is exponentiating too early...

Negative PositiveProb Values Returned

Data

pgc_scz.sumstat.gz

Code

library(ashr)
df <- read.table("pgc_scz.sumstat.gz", header=TRUE, stringsAsFactors=FALSE)
ash_res <- ash(df$BETA, df$SE)

# negative value returned
min(ash_res$PositiveProb)

Reduce output from tests

The raw log is >40,000 lines long!