Giter Club home page Giter Club logo

simr's People

Contributors

pablobernabeu avatar palday avatar petergreensimrtest avatar pitakakariki avatar tobiasrebholz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simr's Issues

autoextend

Alternative to extend, but with "experimental" features.

e.g.

  • some sort of regression to handle year and yr <- year-2000 type cases
  • simulate new values instead of repeating them
  • extend log and level variables simultaneously

This will need messages so that users know what guesses autoextend has made.

bug in default xname when formula is 1 + x + ...

library(nlme)
VC <- matrix(c(2.327^2, -2.328*0.2264*0.61, -2.328*0.2264*0.61,  0.2264^2), nrow=2)
simm <- makeLmer(y ~ 1+age+(age|Subject), fixef = c(16.7611, 0.6602), VarCorr=VC, sigma=1.31, data = Orthodont)
powerSim(simm, nsim=10)

Code is in defaults.R, needs to be better documented.

Sanity checks for modify

There aren't enough guardrails in modify.R. e.g.

fm <- lmer(y ~ x + (1|g), data=simdata)
fixef(fm) <- 3
fixef(fm)

gives

Error in structure(object@beta, names = dimnames(X)[[2]]) : 
  'names' attribute [2] must be the same length as the vector [1]

use.u and ranef<-

Need to add a ranef<- function to directly specify random effects, but this is pointless unless the user has a use.u option.

The user might want to keep the random effects estimated from the pilot data, but generate new values for extended group levels (see add=TRUE).

Error messages in makeMer

makeMer needs its own error messages instead of relying on the messages from the lme4 functions it wraps.

For example, if beta is the wrong length, the user should be told that the argument fixef (which they supplied) is the wrong length.

Extra care needed with the simulate and lmer/glmer calls, since any problems are suppressed. If simulate produces lots of NAs, we get cryptic messages about an "Invalid grouping factor specification".

Confounded variables (was: For the models with two random effects)

Hi,
This is the great package to estimate the power and sample size for the language studies that analyze the data in use of the mixed-effect model. I am analyzing the power based on the statistical values of the paper. There are only the means and standard deviations of the two critical conditions, and the comparisons between two mixed-effect models.

My full script is in this public gist. My idea begins at the creation of the simulated data in terms of the condition means and standard deviation. The mixed-effect models are extended along with the number of items and then the number of subjects. Although I got the reasonable estimations of statistical powers according to the description in the original paper, the powers are not going to change after I increase the number of subjects. Is there something wrong I set up the models?

By the way, the powerCurve could crash when the 'nsim' was set up at 1,000. Is there the limitation when we run the power analysis on the models with two random effects?

thanks!

User supplied functions.

These need some work.

  • unit tests for user supplied sim and test arguments
  • what to do in powerCurve if fit has a data or ... argument instead of subset
  • maybe don't run all nsim simulations if the functions don't work?

powerCurve with too many breaks

fm <- lmer(y ~ x + (1|g), data=simdata)
pctest <- powerCurve(fm, along='g', breaks=1:5, nsim=25)
Power for predictor 'x', (95% confidence interval),
by number of levels in g:
      1:  0.00% ( 0.00, 13.72) - 10 rows
      2: 88.00% (68.78, 97.45) - 20 rows
      3: 96.00% (79.65, 99.90) - 30 rows
     NA: 96.00% (79.65, 99.90) - 30 rows
     NA: 96.00% (79.65, 99.90) - 30 rows

Time elapsed: 0 h 0 m 16 s

Clarify test examples vignette

Make it clear that after the test specification is working, you can replace doTest with powerSim to do the power analysis.

Add support for constrasts

Currently extend drops contrasts, which tends to cause errors because the factors have different names.

glmmTMB support?

Tricky bits probably:

  • simulate with modified parameter values
  • use.u in simulate

parallel support

There's a bunch of things that will have to be sorted out to get polished parallel support.

  • cross-platform. That means snow-type parallel need to be implemented (i.e. we have to export stuff to the workers)
    • but maybe a fork option for people on Unix? Fork will be better for user-defined stuff.
  • automagic. Users should be able to do parallel=TRUE and it should Just Work.
  • parallel=TRUE will have to choose a default number of cores somehow.
  • let user specify cores with parallel=n?
    • everything else will still have to be automagic.
  • arbitrary backend with parallel=cl?
    • we should still export simr in this case
  • random number seeds.
    • seeds should work with the parallel option
    • ideally results will be the same for the same seed, but we might have to accept e.g. same seed + same # of cores
  • maybe worry that exporting loading simr in each worker will use the installed version, not the version we're working on (will this mess with testthat too?)
  • lme4 versus other packages and user defined functions.
    • leave it up to users to set up a cluster with the right things exported
    • export everything - if the job's big enough to make parallel overhead won't matter so much

Tests for individual factor levels

Currently these are a bit flaky - they only work because the summary functions put their names in the fixed effects block.

Need to write something like drop1 that changes the contrasts to drop individual factor levels.

Alternative: let users deal with contrasts, but provide a contrastTest option?

Add a ranef<- feature

You can currently sim from a glm with specified fixed effects, which has the same effect. Nicer this way though.

extend factor versus character

Should probably treat character variables as factors.

e.g. "a" "b" "c" should extend to "a" "b" "c" "d" "e" "f" not "1" "2" "3" "4" "5" "6".

Rerun a power analysis.

  • Change nsim, add more simulations.
  • Change test, use stored output to redo the analysis faster.

From scratch vignette edits

V1 belongs with model1, and V2 belongs with model2. At the moment that's not clear during the introduction.

fixed, categorical variables, and interactions

fm1 <- lmer(angle ~ recipe * temp + (1|recipe:replicate), data=cake, REML=FALSE)

At the moment fixed("recipe")(fm1) will work (with a warning) because it sees that recipe is categorical and chooses a likelihood ratio test. Need to add some more rules to break down an interaction term so that the following works:

fixed("recipe:temp")(fm1)

Also: fixed("temp")(fm1) doesn't work because if confuses the default KRmodcomp. Not a big deal because we shouldn't be testing main effects when there's an interaction anyway, but could be handled better.

getData() fails when using subsets in formula

Awesome package, thanks!

This is not important, but when specifying the data via 'subset()' in the model, 'extend()' does not work, because 'getData()' function can't get the data.

If fixing it is not a priority, an error message can help spotting the issue, as the actual message is not informative.

Nacho

What to do with verbose output?

Capture message output?
Can I capture cat and print etc too?
Strip any verbose argument in doFit?
Give powerSim a verbose argument to let that stuff through?

Helpful error messages

Every error message a user gets should be helpful. For example, no nonsense about closures because a function was supplied when a merMod was expected.

This means more pre-emptive checks in the top-level functions.

e.g.:

  • "subscript out of bounds" for fixed effects that don't exist
  • proper message when fcompare/rcompare used on a model without random effects.

Extend checklist

  • Update ?extend docs, e.g. mention data frames, rewrite details.
  • Write an "add" option #43
  • Add the "where" option #44
  • Make sure factors are good #45
  • Sensible and/or flexible recycling (i.e. the using argument) #46
  • Get extend magic working #47
  • New vignette with transect and treatment examples
  • Option to fix current values, and only simulate for the extended portion.
  • Make sure not-quite data frames work too #65
  • Integrate "freeze" and use.u #54
  • Extend weights #109
  • Make along compulsory for powerCurve #121
  • Extend and enforce balance #89
  • Extend along "obs" #125
  • Extend unbalanced data #70
  • Sensible names/values for extended variables #134
  • Sensible behaviour with character #187
  • Check that powerCurve and extend use the same logic e.g. #263
  • newdata argument #284

"where" option for extend

Extend only a subset of e.g. transect.

i.e. replace

X <- expand.grid(year=1:2, transect=1:4)
subset(extend(X, along='year', values=1:4), year <= 2 | transect %in% 1:2)

with

extend(X, along='year', values=1:4, where=transect %in% 1:2)

New factor levels.

Check what happens if extend or getData<- mucks up the factor levels. Random and fixed effects.

powerSim for a non existing fixed effect

Nice package, thanks!

I've just noticed this

library("simr")
#> Loading required package: lme4
#> Loading required package: Matrix
#> 
#> Attaching package: 'lme4'
#> The following object is masked from 'package:stats':
#> 
#>     sigma
#> Warning: replacing previous import 'lme4::sigma' by 'stats::sigma' when
#> loading 'pbkrtest'
#> Warning: replacing previous import 'lme4::sigma' by 'stats::sigma' when
#> loading 'simr'
fm1 <- lmer(y ~ x + (1|g), data=simdata)
powerSim(fm1, nsim=10, test = fixed("notavariable"))
#> Simulating: |                                                             |���������������������������������������������������������������������������Simulating: |======                                                       |���������������������������������������������������������������������������Simulating: |============                                                 |���������������������������������������������������������������������������Simulating: |==================                                           |���������������������������������������������������������������������������Simulating: |========================                                     |���������������������������������������������������������������������������Simulating: |==============================                               |���������������������������������������������������������������������������Simulating: |====================================                         |���������������������������������������������������������������������������Simulating: |==========================================                   |���������������������������������������������������������������������������Simulating: |================================================             |���������������������������������������������������������������������������Simulating: |======================================================       |���������������������������������������������������������������������������Simulating: |=============================================================|���������������������������������������������������������������������������
#> Warning in observedPowerWarning(sim): This appears to be an "observed
#> power" calculation
#> Power for predictor 'notavariable', (95% confidence interval):
#>        0.00% ( 0.00, 30.85)
#> 
#> Test: Kenward Roger (package pbkrtest)
#> 
#> Based on 10 simulations, (10 warnings, 10 errors)
#> alpha = 0.05, nrow = 30
#> 
#> Time elapsed: 0 h 0 m 0 s
#> 
#> nb: result might be an observed power calculation
lastResult()$pval
#>  [1] NA NA NA NA NA NA NA NA NA NA
Session info
devtools::session_info()
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.3.1 (2016-06-21)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  Spanish_Spain.1252          
#>  tz       Europe/Paris                
#>  date     2017-04-13
#> Packages ------------------------------------------------------------------
#>  package      * version date       source                            
#>  backports      1.0.5   2017-01-18 CRAN (R 3.3.2)                    
#>  binom          1.1-1   2014-01-02 CRAN (R 3.3.3)                    
#>  car            2.1-4   2016-12-02 CRAN (R 3.3.2)                    
#>  devtools       1.12.0  2016-06-24 CRAN (R 3.3.2)                    
#>  digest         0.6.12  2017-01-27 CRAN (R 3.3.2)                    
#>  evaluate       0.10    2016-10-11 CRAN (R 3.3.1)                    
#>  htmltools      0.3.5   2016-03-21 CRAN (R 3.2.4)                    
#>  iterators      1.0.8   2015-10-13 CRAN (R 3.2.3)                    
#>  knitr          1.15.1  2016-11-22 CRAN (R 3.3.2)                    
#>  lattice        0.20-34 2016-09-06 CRAN (R 3.3.2)                    
#>  lme4         * 1.1-12  2016-04-16 CRAN (R 3.2.5)                    
#>  magrittr       1.5     2014-11-22 CRAN (R 3.2.2)                    
#>  MASS           7.3-45  2015-11-10 CRAN (R 3.2.2)                    
#>  Matrix       * 1.2-7   2016-08-28 CRAN (R 3.3.1)                    
#>  MatrixModels   0.4-1   2015-08-22 CRAN (R 3.2.5)                    
#>  memoise        1.0.0   2016-01-29 CRAN (R 3.2.3)                    
#>  mgcv           1.8-17  2017-02-08 CRAN (R 3.3.2)                    
#>  minqa          1.2.4   2014-10-09 CRAN (R 3.2.2)                    
#>  nlme           3.1-131 2017-02-06 CRAN (R 3.3.2)                    
#>  nloptr         1.0.4   2014-08-04 CRAN (R 3.2.2)                    
#>  nnet           7.3-12  2016-02-02 CRAN (R 3.3.1)                    
#>  pbkrtest       0.4-7   2017-03-15 CRAN (R 3.3.3)                    
#>  plotrix        3.6-4   2016-12-30 CRAN (R 3.3.2)                    
#>  plyr           1.8.4   2016-06-08 CRAN (R 3.2.5)                    
#>  quantreg       5.29    2016-09-04 CRAN (R 3.3.2)                    
#>  Rcpp           0.12.10 2017-03-19 CRAN (R 3.3.3)                    
#>  RLRsim         3.1-3   2016-11-04 CRAN (R 3.3.3)                    
#>  rmarkdown      1.4     2017-03-24 CRAN (R 3.3.3)                    
#>  rprojroot      1.2     2017-01-16 CRAN (R 3.3.2)                    
#>  simr         * 1.0.2-1 2017-04-13 Github (pitakakariki/simr@a58d9b8)
#>  SparseM        1.76    2017-03-09 CRAN (R 3.3.3)                    
#>  stringi        1.1.3   2017-03-21 CRAN (R 3.3.3)                    
#>  stringr        1.2.0   2017-02-18 CRAN (R 3.3.3)                    
#>  withr          1.0.2   2016-06-20 CRAN (R 3.2.5)                    
#>  yaml           2.1.14  2016-11-12 CRAN (R 3.3.2)

Should the function output a warning in that case?

Return multiple p-values from tests.

Save time by running multiple tests per iteration, instead of redoing all that simulating and refitting for each test.

Could be:

  • multiple tests
  • test multiple variables

This will help with precision analysis down the line.

Missing Data

Either

  1. Handle missing data properly.
  2. Fail nicely when there is missing data.

"basis" option for extend

e.g. variable transects per year, extend based on just the final year rather than cycling.

Make this the default?

Return effect size measures for observed power calculations

I might be nice to give effect sizes as Cohen's d and Pearson's r when running powerSim, since these have a intuitive interpretation. One can use the following paper as a source of how to calculate effect sizes for mixed models:

Nakagawa S, Cuthill IC. 2009. Effect size, confidence interval and statistical significance: A practical guide for biologists. Biol Rev 84: 515.

For example, below are R functions for the calculation of Cohen's d for a linear mixed model with a categorical (2 levels) effect of interest (following the paper above). I have R functions for glmms and continuous variables as well, and could possibly spare some time to help out with implementation.

d.mix <- function(t.val, ni, no, no1, no2, k, R){
  d <- (t.val * (1 + (ni / no) * R) * sqrt(1 - R) * (no1 + no2)) /
    (sqrt(no1 * no2) * sqrt(no - k))
  names(d) <- "effect size d"
  return(d)
}

with:

R <- function(s2b, s2e) { #Equation 23
  R <- s2b / (s2b + s2e)
  names(R) <- "Repeatablity (ICC)"
  return(R)
}

Balance before using extend within

X <- expand.grid(gender=c("M", "F"), obs=1:5, subject=1:10)
X <- subset(X, gender=="M" | subject>=6)
X <- subset(X, gender=="F" | subject<=5)

n <- c(2,4,3,2,4,3,2,4,3,3)
X <- subset(X, obs <= n[subject])

What do we expect after:

extend(X, within="subject", n=4)

Modify all

Fixef, VarCorr, and sigma in one go. Use this for copying all of a models parameters to another model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.