pitakakariki / simr Goto Github PK
View Code? Open in Web Editor NEWPower Analysis of Generalised Linear Mixed Models by Simulation
Power Analysis of Generalised Linear Mixed Models by Simulation
Alternative to extend
, but with "experimental" features.
e.g.
year
and yr <- year-2000
type casesThis will need messages so that users know what guesses autoextend
has made.
library(nlme)
VC <- matrix(c(2.327^2, -2.328*0.2264*0.61, -2.328*0.2264*0.61, 0.2264^2), nrow=2)
simm <- makeLmer(y ~ 1+age+(age|Subject), fixef = c(16.7611, 0.6602), VarCorr=VC, sigma=1.31, data = Orthodont)
powerSim(simm, nsim=10)
Code is in defaults.R
, needs to be better documented.
Add unit tests for e.g. binomial sim with poisson fit.
There aren't enough guardrails in modify.R
. e.g.
fm <- lmer(y ~ x + (1|g), data=simdata)
fixef(fm) <- 3
fixef(fm)
gives
Error in structure(object@beta, names = dimnames(X)[[2]]) :
'names' attribute [2] must be the same length as the vector [1]
Need to add a ranef<-
function to directly specify random effects, but this is pointless unless the user has a use.u
option.
The user might want to keep the random effects estimated from the pilot data, but generate new values for extended group levels (see add=TRUE
).
makeMer
needs its own error messages instead of relying on the messages from the lme4
functions it wraps.
For example, if beta
is the wrong length, the user should be told that the argument fixef
(which they supplied) is the wrong length.
Extra care needed with the simulate
and lmer
/glmer
calls, since any problems are suppressed. If simulate
produces lots of NA
s, we get cryptic messages about an "Invalid grouping factor specification".
Hi,
This is the great package to estimate the power and sample size for the language studies that analyze the data in use of the mixed-effect model. I am analyzing the power based on the statistical values of the paper. There are only the means and standard deviations of the two critical conditions, and the comparisons between two mixed-effect models.
My full script is in this public gist. My idea begins at the creation of the simulated data in terms of the condition means and standard deviation. The mixed-effect models are extended along with the number of items and then the number of subjects. Although I got the reasonable estimations of statistical powers according to the description in the original paper, the powers are not going to change after I increase the number of subjects. Is there something wrong I set up the models?
By the way, the powerCurve could crash when the 'nsim' was set up at 1,000. Is there the limitation when we run the power analysis on the models with two random effects?
thanks!
These need some work.
sim
and test
argumentspowerCurve
if fit
has a data
or ...
argument instead of subset
nsim
simulations if the functions don't work?This one is closed now: lme4/lme4#408
Need to remove the workaround from simr
, update required lme4
version.
fm <- lmer(y ~ x + (1|g), data=simdata)
pctest <- powerCurve(fm, along='g', breaks=1:5, nsim=25)
Power for predictor 'x', (95% confidence interval),
by number of levels in g:
1: 0.00% ( 0.00, 13.72) - 10 rows
2: 88.00% (68.78, 97.45) - 20 rows
3: 96.00% (79.65, 99.90) - 30 rows
NA: 96.00% (79.65, 99.90) - 30 rows
NA: 96.00% (79.65, 99.90) - 30 rows
Time elapsed: 0 h 0 m 16 s
fm1 <- lm(y ~ x, data=simdata)
fm2 <- extend(fm1, along='g', n=10)
fm3 <- fm1
getData(fm3) <- getData(fm2)length(doSim(fm2))
[1] 100
length(doSim(fm3))
[1] 30
Make it clear that after the test specification is working, you can replace doTest
with powerSim
to do the power analysis.
Currently extend
drops contrasts, which tends to cause errors because the factors have different names.
Tricky bits probably:
simulate
with modified parameter valuesuse.u
in simulate
getData
fails when there is no data argument. This flows on to powerSim
etc.
There's a bunch of things that will have to be sorted out to get polished parallel support.
parallel=TRUE
and it should Just Work.parallel=TRUE
will have to choose a default number of cores somehow.parallel=n
?
parallel=cl
?
simr
in this casesimr
in each worker will use the installed version, not the version we're working on (will this mess with testthat
too?)lme4
versus other packages and user defined functions.
Currently these are a bit flaky - they only work because the summary
functions put their names in the fixed effects block.
Need to write something like drop1
that changes the contrasts to drop individual factor levels.
Alternative: let users deal with contrasts, but provide a contrastTest
option?
Either in Test Examples or in its own vignette?
You can currently sim from a glm with specified fixed effects, which has the same effect. Nicer this way though.
Should probably treat character variables as factors.
e.g. "a" "b" "c" should extend to "a" "b" "c" "d" "e" "f" not "1" "2" "3" "4" "5" "6".
Warning message:
Setting row names on a tibble is deprecated.
This is going to break everything:
lm(y ~ x, data=X, subset=???)
e.g.
y ~ x1*x2 + x1*x3
X$y + X$x1 + X$x2
I think this will work in lme4
, but some of the statistical testing code might get confused.
nsim
, add more simulations.test
, use stored output to redo the analysis faster.V1
belongs with model1
, and V2
belongs with model2
. At the moment that's not clear during the introduction.
Looks like they do exactly the same thing.
e.g. doTest(fm, rcompare(~(1|g), "pb", pbnsim=10))
.
fm1 <- lmer(angle ~ recipe * temp + (1|recipe:replicate), data=cake, REML=FALSE)
At the moment fixed("recipe")(fm1)
will work (with a warning) because it sees that recipe
is categorical and chooses a likelihood ratio test. Need to add some more rules to break down an interaction term so that the following works:
fixed("recipe:temp")(fm1)
Also: fixed("temp")(fm1)
doesn't work because if confuses the default KRmodcomp
. Not a big deal because we shouldn't be testing main effects when there's an interaction anyway, but could be handled better.
Awesome package, thanks!
This is not important, but when specifying the data via 'subset()' in the model, 'extend()' does not work, because 'getData()' function can't get the data.
If fixing it is not a priority, an error message can help spotting the issue, as the actual message is not informative.
Nacho
Capture message
output?
Can I capture cat
and print
etc too?
Strip any verbose
argument in doFit
?
Give powerSim
a verbose argument to let that stuff through?
Every error message a user gets should be helpful. For example, no nonsense about closures because a function was supplied when a merMod
was expected.
This means more pre-emptive checks in the top-level functions.
e.g.:
fcompare
/rcompare
used on a model without random effects.New vignette?
Mention it in the new extend
vignette.
"See also" in extend
help.
Better documentation in ?getData
.
if add
is TRUE
, then add extra levels rather than replacing them.
?extend
docs, e.g. mention data frames, rewrite details.using
argument) #46use.u
#54along
compulsory for powerCurve
#121powerCurve
and extend
use the same logic e.g. #263Extend only a subset of e.g. transect.
i.e. replace
X <- expand.grid(year=1:2, transect=1:4)
subset(extend(X, along='year', values=1:4), year <= 2 | transect %in% 1:2)
with
extend(X, along='year', values=1:4, where=transect %in% 1:2)
Check what happens if extend
or getData<-
mucks up the factor levels. Random and fixed effects.
Nice package, thanks!
I've just noticed this
library("simr")
#> Loading required package: lme4
#> Loading required package: Matrix
#>
#> Attaching package: 'lme4'
#> The following object is masked from 'package:stats':
#>
#> sigma
#> Warning: replacing previous import 'lme4::sigma' by 'stats::sigma' when
#> loading 'pbkrtest'
#> Warning: replacing previous import 'lme4::sigma' by 'stats::sigma' when
#> loading 'simr'
fm1 <- lmer(y ~ x + (1|g), data=simdata)
powerSim(fm1, nsim=10, test = fixed("notavariable"))
#> Simulating: | |���������������������������������������������������������������������������Simulating: |====== |���������������������������������������������������������������������������Simulating: |============ |���������������������������������������������������������������������������Simulating: |================== |���������������������������������������������������������������������������Simulating: |======================== |���������������������������������������������������������������������������Simulating: |============================== |���������������������������������������������������������������������������Simulating: |==================================== |���������������������������������������������������������������������������Simulating: |========================================== |���������������������������������������������������������������������������Simulating: |================================================ |���������������������������������������������������������������������������Simulating: |====================================================== |���������������������������������������������������������������������������Simulating: |=============================================================|���������������������������������������������������������������������������
#> Warning in observedPowerWarning(sim): This appears to be an "observed
#> power" calculation
#> Power for predictor 'notavariable', (95% confidence interval):
#> 0.00% ( 0.00, 30.85)
#>
#> Test: Kenward Roger (package pbkrtest)
#>
#> Based on 10 simulations, (10 warnings, 10 errors)
#> alpha = 0.05, nrow = 30
#>
#> Time elapsed: 0 h 0 m 0 s
#>
#> nb: result might be an observed power calculation
lastResult()$pval
#> [1] NA NA NA NA NA NA NA NA NA NA
devtools::session_info()
#> Session info --------------------------------------------------------------
#> setting value
#> version R version 3.3.1 (2016-06-21)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Spanish_Spain.1252
#> tz Europe/Paris
#> date 2017-04-13
#> Packages ------------------------------------------------------------------
#> package * version date source
#> backports 1.0.5 2017-01-18 CRAN (R 3.3.2)
#> binom 1.1-1 2014-01-02 CRAN (R 3.3.3)
#> car 2.1-4 2016-12-02 CRAN (R 3.3.2)
#> devtools 1.12.0 2016-06-24 CRAN (R 3.3.2)
#> digest 0.6.12 2017-01-27 CRAN (R 3.3.2)
#> evaluate 0.10 2016-10-11 CRAN (R 3.3.1)
#> htmltools 0.3.5 2016-03-21 CRAN (R 3.2.4)
#> iterators 1.0.8 2015-10-13 CRAN (R 3.2.3)
#> knitr 1.15.1 2016-11-22 CRAN (R 3.3.2)
#> lattice 0.20-34 2016-09-06 CRAN (R 3.3.2)
#> lme4 * 1.1-12 2016-04-16 CRAN (R 3.2.5)
#> magrittr 1.5 2014-11-22 CRAN (R 3.2.2)
#> MASS 7.3-45 2015-11-10 CRAN (R 3.2.2)
#> Matrix * 1.2-7 2016-08-28 CRAN (R 3.3.1)
#> MatrixModels 0.4-1 2015-08-22 CRAN (R 3.2.5)
#> memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
#> mgcv 1.8-17 2017-02-08 CRAN (R 3.3.2)
#> minqa 1.2.4 2014-10-09 CRAN (R 3.2.2)
#> nlme 3.1-131 2017-02-06 CRAN (R 3.3.2)
#> nloptr 1.0.4 2014-08-04 CRAN (R 3.2.2)
#> nnet 7.3-12 2016-02-02 CRAN (R 3.3.1)
#> pbkrtest 0.4-7 2017-03-15 CRAN (R 3.3.3)
#> plotrix 3.6-4 2016-12-30 CRAN (R 3.3.2)
#> plyr 1.8.4 2016-06-08 CRAN (R 3.2.5)
#> quantreg 5.29 2016-09-04 CRAN (R 3.3.2)
#> Rcpp 0.12.10 2017-03-19 CRAN (R 3.3.3)
#> RLRsim 3.1-3 2016-11-04 CRAN (R 3.3.3)
#> rmarkdown 1.4 2017-03-24 CRAN (R 3.3.3)
#> rprojroot 1.2 2017-01-16 CRAN (R 3.3.2)
#> simr * 1.0.2-1 2017-04-13 Github (pitakakariki/simr@a58d9b8)
#> SparseM 1.76 2017-03-09 CRAN (R 3.3.3)
#> stringi 1.1.3 2017-03-21 CRAN (R 3.3.3)
#> stringr 1.2.0 2017-02-18 CRAN (R 3.3.3)
#> withr 1.0.2 2016-06-20 CRAN (R 3.2.5)
#> yaml 2.1.14 2016-11-12 CRAN (R 3.3.2)
Should the function output a warning in that case?
This is theta the negative binomial scale parameter, not theta the random effect variances parameter!
Might be unnecessary due to sigma.default
in stats
.
Save time by running multiple tests per iteration, instead of redoing all that simulating and refitting for each test.
Could be:
This will help with precision analysis down the line.
Change the test for random()
applied to a glmerMod
, or at least give a helpful error message.
Either
e.g. variable transects per year, extend based on just the final year rather than cycling.
Make this the default?
I might be nice to give effect sizes as Cohen's d and Pearson's r when running powerSim
, since these have a intuitive interpretation. One can use the following paper as a source of how to calculate effect sizes for mixed models:
Nakagawa S, Cuthill IC. 2009. Effect size, confidence interval and statistical significance: A practical guide for biologists. Biol Rev 84: 515.
For example, below are R functions for the calculation of Cohen's d for a linear mixed model with a categorical (2 levels) effect of interest (following the paper above). I have R functions for glmms and continuous variables as well, and could possibly spare some time to help out with implementation.
d.mix <- function(t.val, ni, no, no1, no2, k, R){
d <- (t.val * (1 + (ni / no) * R) * sqrt(1 - R) * (no1 + no2)) /
(sqrt(no1 * no2) * sqrt(no - k))
names(d) <- "effect size d"
return(d)
}
with:
R <- function(s2b, s2e) { #Equation 23
R <- s2b / (s2b + s2e)
names(R) <- "Repeatablity (ICC)"
return(R)
}
Write makeLm
and makeGlm
to go along with the current merMod
functions.
X <- expand.grid(gender=c("M", "F"), obs=1:5, subject=1:10)
X <- subset(X, gender=="M" | subject>=6)
X <- subset(X, gender=="F" | subject<=5)
n <- c(2,4,3,2,4,3,2,4,3,3)
X <- subset(X, obs <= n[subject])
What do we expect after:
extend(X, within="subject", n=4)
Sorry...I did a terrible mistake in specifying the arguments... no problem there, just a dumb user!
:-)
At least pass ...
all the way through.
Fixef, VarCorr, and sigma in one go. Use this for copying all of a models parameters to another model.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.