pitakakariki / simr Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 19.0 1.43 MB

Power Analysis of Generalised Linear Mixed Models by Simulation

R 100.00%

simr's People

Contributors

Stargazers

Watchers

Forkers

palday jie-nissel liznecka nehelgeson acad2 runehaubo mustafaascha linguistsherry bwiernik osorensen manueldelpero thecodemasterk bromsk jlpeng75 hemken pablobernabeu alexblake1290 jj-zhang-ds

simr's Issues

autoextend

Alternative to extend, but with "experimental" features.

e.g.

some sort of regression to handle year and yr <- year-2000 type cases
simulate new values instead of repeating them
extend log and level variables simultaneously

This will need messages so that users know what guesses autoextend has made.

bug in default xname when formula is 1 + x + ...

library(nlme)
VC <- matrix(c(2.327^2, -2.328*0.2264*0.61, -2.328*0.2264*0.61,  0.2264^2), nrow=2)
simm <- makeLmer(y ~ 1+age+(age|Subject), fixef = c(16.7611, 0.6602), VarCorr=VC, sigma=1.31, data = Orthodont)
powerSim(simm, nsim=10)

Code is in defaults.R, needs to be better documented.

powerSim with sim != fit

Add unit tests for e.g. binomial sim with poisson fit.

Sanity checks for modify

There aren't enough guardrails in modify.R. e.g.

fm <- lmer(y ~ x + (1|g), data=simdata)
fixef(fm) <- 3
fixef(fm)

gives

Error in structure(object@beta, names = dimnames(X)[[2]]) : 
  'names' attribute [2] must be the same length as the vector [1]

use.u and ranef<-

Need to add a ranef<- function to directly specify random effects, but this is pointless unless the user has a use.u option.

The user might want to keep the random effects estimated from the pilot data, but generate new values for extended group levels (see add=TRUE).

Error messages in makeMer

makeMer needs its own error messages instead of relying on the messages from the lme4 functions it wraps.

For example, if beta is the wrong length, the user should be told that the argument fixef (which they supplied) is the wrong length.

Extra care needed with the simulate and lmer/glmer calls, since any problems are suppressed. If simulate produces lots of NAs, we get cryptic messages about an "Invalid grouping factor specification".

Confounded variables (was: For the models with two random effects)

Hi,
This is the great package to estimate the power and sample size for the language studies that analyze the data in use of the mixed-effect model. I am analyzing the power based on the statistical values of the paper. There are only the means and standard deviations of the two critical conditions, and the comparisons between two mixed-effect models.

My full script is in this public gist. My idea begins at the creation of the simulated data in terms of the condition means and standard deviation. The mixed-effect models are extended along with the number of items and then the number of subjects. Although I got the reasonable estimations of statistical powers according to the description in the original paper, the powers are not going to change after I increase the number of subjects. Is there something wrong I set up the models?

By the way, the powerCurve could crash when the 'nsim' was set up at 1,000. Is there the limitation when we run the power analysis on the models with two random effects?

thanks!

User supplied functions.

These need some work.

unit tests for user supplied sim and test arguments
what to do in powerCurve if fit has a data or ... argument instead of subset
maybe don't run all nsim simulations if the functions don't work?

Remove workaround 408

This one is closed now: lme4/lme4#408

Need to remove the workaround from simr, update required lme4 version.

powerCurve with too many breaks

fm <- lmer(y ~ x + (1|g), data=simdata)
pctest <- powerCurve(fm, along='g', breaks=1:5, nsim=25)

Power for predictor 'x', (95% confidence interval),
by number of levels in g:
      1:  0.00% ( 0.00, 13.72) - 10 rows
      2: 88.00% (68.78, 97.45) - 20 rows
      3: 96.00% (79.65, 99.90) - 30 rows
     NA: 96.00% (79.65, 99.90) - 30 rows
     NA: 96.00% (79.65, 99.90) - 30 rows

Time elapsed: 0 h 0 m 16 s

doSim not using data frame set by getData in lm case

fm1 <- lm(y ~ x, data=simdata)
fm2 <- extend(fm1, along='g', n=10)
fm3 <- fm1
getData(fm3) <- getData(fm2)

length(doSim(fm2))
[1] 100
length(doSim(fm3))
[1] 30

Clarify test examples vignette

Make it clear that after the test specification is working, you can replace doTest with powerSim to do the power analysis.

Add support for constrasts

Currently extend drops contrasts, which tends to cause errors because the factors have different names.

glmmTMB support?

Tricky bits probably:

simulate with modified parameter values
use.u in simulate

getData with no data argument

getData fails when there is no data argument. This flows on to powerSim etc.

parallel support

There's a bunch of things that will have to be sorted out to get polished parallel support.

cross-platform. That means snow-type parallel need to be implemented (i.e. we have to export stuff to the workers)
- but maybe a fork option for people on Unix? Fork will be better for user-defined stuff.
automagic. Users should be able to do parallel=TRUE and it should Just Work.
parallel=TRUE will have to choose a default number of cores somehow.
let user specify cores with parallel=n?
- everything else will still have to be automagic.
arbitrary backend with parallel=cl?
- we should still export simr in this case
random number seeds.
- seeds should work with the parallel option
- ideally results will be the same for the same seed, but we might have to accept e.g. same seed + same # of cores
maybe worry that exporting loading simr in each worker will use the installed version, not the version we're working on (will this mess with testthat too?)
lme4 versus other packages and user defined functions.
- leave it up to users to set up a cluster with the right things exported
- export everything - if the job's big enough to make parallel overhead won't matter so much

Tests for individual factor levels

Currently these are a bit flaky - they only work because the summary functions put their names in the fixed effects block.

Need to write something like drop1 that changes the contrasts to drop individual factor levels.

Alternative: let users deal with contrasts, but provide a contrastTest option?

Add a complicated example

Either in Test Examples or in its own vignette?

Add a ranef<- feature

You can currently sim from a glm with specified fixed effects, which has the same effect. Nicer this way though.

extend factor versus character

Should probably treat character variables as factors.

e.g. "a" "b" "c" should extend to "a" "b" "c" "d" "e" "f" not "1" "2" "3" "4" "5" "6".

extend generates tibble warning

Warning message:
Setting row names on a tibble is deprecated.

Make sure subset works

This is going to break everything:

lm(y ~ x, data=X, subset=???)

Make sure awkward formulas don't break things

e.g.
y ~ x1*x2 + x1*x3
X$y + X$x1 + X$x2

I think this will work in lme4, but some of the statistical testing code might get confused.

Rerun a power analysis.

Change nsim, add more simulations.
Change test, use stored output to redo the analysis faster.

From scratch vignette edits

V1 belongs with model1, and V2 belongs with model2. At the moment that's not clear during the introduction.

Do we need both extend.lm and getData<-.lm?

Looks like they do exactly the same thing.

The trouble with tibbles

See e.g. https://stat.ethz.ch/pipermail/r-package-devel/2017q3/001896.html

Pass ... down from doTest

e.g. doTest(fm, rcompare(~(1|g), "pb", pbnsim=10)).

fixed, categorical variables, and interactions

fm1 <- lmer(angle ~ recipe * temp + (1|recipe:replicate), data=cake, REML=FALSE)

At the moment fixed("recipe")(fm1) will work (with a warning) because it sees that recipe is categorical and chooses a likelihood ratio test. Need to add some more rules to break down an interaction term so that the following works:

fixed("recipe:temp")(fm1)

Also: fixed("temp")(fm1) doesn't work because if confuses the default KRmodcomp. Not a big deal because we shouldn't be testing main effects when there's an interaction anyway, but could be handled better.

getData() fails when using subsets in formula

Awesome package, thanks!

This is not important, but when specifying the data via 'subset()' in the model, 'extend()' does not work, because 'getData()' function can't get the data.

If fixing it is not a priority, an error message can help spotting the issue, as the actual message is not informative.

Nacho

What to do with verbose output?

Capture message output?
Can I capture cat and print etc too?
Strip any verbose argument in doFit?
Give powerSim a verbose argument to let that stuff through?

Helpful error messages

Every error message a user gets should be helpful. For example, no nonsense about closures because a function was supplied when a merMod was expected.

This means more pre-emptive checks in the top-level functions.

e.g.:

"subscript out of bounds" for fixed effects that don't exist
proper message when fcompare/rcompare used on a model without random effects.

make sure people know about `getData<-`

New vignette?

Mention it in the new extend vignette.

"See also" in extend help.

Better documentation in ?getData.

"add" option for extend

if add is TRUE, then add extra levels rather than replacing them.

Extend checklist

"where" option for extend

Extend only a subset of e.g. transect.

i.e. replace

X <- expand.grid(year=1:2, transect=1:4)
subset(extend(X, along='year', values=1:4), year <= 2 | transect %in% 1:2)

with

extend(X, along='year', values=1:4, where=transect %in% 1:2)

New factor levels.

Check what happens if extend or getData<- mucks up the factor levels. Random and fixed effects.

powerSim for a non existing fixed effect

Nice package, thanks!

I've just noticed this

library("simr")
#> Loading required package: lme4
#> Loading required package: Matrix
#> 
#> Attaching package: 'lme4'
#> The following object is masked from 'package:stats':
#> 
#>     sigma
#> Warning: replacing previous import 'lme4::sigma' by 'stats::sigma' when
#> loading 'pbkrtest'
#> Warning: replacing previous import 'lme4::sigma' by 'stats::sigma' when
#> loading 'simr'
fm1 <- lmer(y ~ x + (1|g), data=simdata)
powerSim(fm1, nsim=10, test = fixed("notavariable"))
#> Simulating: |                                                             |���������������������������������������������������������������������������Simulating: |======                                                       |���������������������������������������������������������������������������Simulating: |============                                                 |���������������������������������������������������������������������������Simulating: |==================                                           |���������������������������������������������������������������������������Simulating: |========================                                     |���������������������������������������������������������������������������Simulating: |==============================                               |���������������������������������������������������������������������������Simulating: |====================================                         |���������������������������������������������������������������������������Simulating: |==========================================                   |���������������������������������������������������������������������������Simulating: |================================================             |���������������������������������������������������������������������������Simulating: |======================================================       |���������������������������������������������������������������������������Simulating: |=============================================================|���������������������������������������������������������������������������
#> Warning in observedPowerWarning(sim): This appears to be an "observed
#> power" calculation
#> Power for predictor 'notavariable', (95% confidence interval):
#>        0.00% ( 0.00, 30.85)
#> 
#> Test: Kenward Roger (package pbkrtest)
#> 
#> Based on 10 simulations, (10 warnings, 10 errors)
#> alpha = 0.05, nrow = 30
#> 
#> Time elapsed: 0 h 0 m 0 s
#> 
#> nb: result might be an observed power calculation
lastResult()$pval
#>  [1] NA NA NA NA NA NA NA NA NA NA

Session info

devtools::session_info()
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.3.1 (2016-06-21)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  Spanish_Spain.1252          
#>  tz       Europe/Paris                
#>  date     2017-04-13
#> Packages ------------------------------------------------------------------
#>  package      * version date       source                            
#>  backports      1.0.5   2017-01-18 CRAN (R 3.3.2)                    
#>  binom          1.1-1   2014-01-02 CRAN (R 3.3.3)                    
#>  car            2.1-4   2016-12-02 CRAN (R 3.3.2)                    
#>  devtools       1.12.0  2016-06-24 CRAN (R 3.3.2)                    
#>  digest         0.6.12  2017-01-27 CRAN (R 3.3.2)                    
#>  evaluate       0.10    2016-10-11 CRAN (R 3.3.1)                    
#>  htmltools      0.3.5   2016-03-21 CRAN (R 3.2.4)                    
#>  iterators      1.0.8   2015-10-13 CRAN (R 3.2.3)                    
#>  knitr          1.15.1  2016-11-22 CRAN (R 3.3.2)                    
#>  lattice        0.20-34 2016-09-06 CRAN (R 3.3.2)                    
#>  lme4         * 1.1-12  2016-04-16 CRAN (R 3.2.5)                    
#>  magrittr       1.5     2014-11-22 CRAN (R 3.2.2)                    
#>  MASS           7.3-45  2015-11-10 CRAN (R 3.2.2)                    
#>  Matrix       * 1.2-7   2016-08-28 CRAN (R 3.3.1)                    
#>  MatrixModels   0.4-1   2015-08-22 CRAN (R 3.2.5)                    
#>  memoise        1.0.0   2016-01-29 CRAN (R 3.2.3)                    
#>  mgcv           1.8-17  2017-02-08 CRAN (R 3.3.2)                    
#>  minqa          1.2.4   2014-10-09 CRAN (R 3.2.2)                    
#>  nlme           3.1-131 2017-02-06 CRAN (R 3.3.2)                    
#>  nloptr         1.0.4   2014-08-04 CRAN (R 3.2.2)                    
#>  nnet           7.3-12  2016-02-02 CRAN (R 3.3.1)                    
#>  pbkrtest       0.4-7   2017-03-15 CRAN (R 3.3.3)                    
#>  plotrix        3.6-4   2016-12-30 CRAN (R 3.3.2)                    
#>  plyr           1.8.4   2016-06-08 CRAN (R 3.2.5)                    
#>  quantreg       5.29    2016-09-04 CRAN (R 3.3.2)                    
#>  Rcpp           0.12.10 2017-03-19 CRAN (R 3.3.3)                    
#>  RLRsim         3.1-3   2016-11-04 CRAN (R 3.3.3)                    
#>  rmarkdown      1.4     2017-03-24 CRAN (R 3.3.3)                    
#>  rprojroot      1.2     2017-01-16 CRAN (R 3.3.2)                    
#>  simr         * 1.0.2-1 2017-04-13 Github (pitakakariki/simr@a58d9b8)
#>  SparseM        1.76    2017-03-09 CRAN (R 3.3.3)                    
#>  stringi        1.1.3   2017-03-21 CRAN (R 3.3.3)                    
#>  stringr        1.2.0   2017-02-18 CRAN (R 3.3.3)                    
#>  withr          1.0.2   2016-06-20 CRAN (R 3.2.5)                    
#>  yaml           2.1.14  2016-11-12 CRAN (R 3.3.2)

Should the function output a warning in that case?

Add glmer.nb theta to modify functions

This is theta the negative binomial scale parameter, not theta the random effect variances parameter!

Is sigma.lm still necessary?

Might be unnecessary due to sigma.default in stats.

Return multiple p-values from tests.

Save time by running multiple tests per iteration, instead of redoing all that simulating and refitting for each test.

Could be:

multiple tests
test multiple variables

This will help with precision analysis down the line.

RLRsim and glmer not compatible

Change the test for random() applied to a glmerMod, or at least give a helpful error message.

Missing Data

Either

Handle missing data properly.
Fail nicely when there is missing data.

"basis" option for extend

e.g. variable transects per year, extend based on just the final year rather than cycling.

Make this the default?

Return effect size measures for observed power calculations

I might be nice to give effect sizes as Cohen's d and Pearson's r when running powerSim, since these have a intuitive interpretation. One can use the following paper as a source of how to calculate effect sizes for mixed models:

Nakagawa S, Cuthill IC. 2009. Effect size, confidence interval and statistical significance: A practical guide for biologists. Biol Rev 84: 515.

For example, below are R functions for the calculation of Cohen's d for a linear mixed model with a categorical (2 levels) effect of interest (following the paper above). I have R functions for glmms and continuous variables as well, and could possibly spare some time to help out with implementation.

d.mix <- function(t.val, ni, no, no1, no2, k, R){
  d <- (t.val * (1 + (ni / no) * R) * sqrt(1 - R) * (no1 + no2)) /
    (sqrt(no1 * no2) * sqrt(no - k))
  names(d) <- "effect size d"
  return(d)
}

with:

R <- function(s2b, s2e) { #Equation 23
  R <- s2b / (s2b + s2e)
  names(R) <- "Repeatablity (ICC)"
  return(R)
}

From scratch functions for lm and glm

Write makeLm and makeGlm to go along with the current merMod functions.

Balance before using extend within

X <- expand.grid(gender=c("M", "F"), obs=1:5, subject=1:10)
X <- subset(X, gender=="M" | subject>=6)
X <- subset(X, gender=="F" | subject<=5)

n <- c(2,4,3,2,4,3,2,4,3,3)
X <- subset(X, obs <= n[subject])

What do we expect after:

extend(X, within="subject", n=4)