Giter Club home page Giter Club logo

clubsandwich's People

Contributors

jepusto avatar jzangela avatar katrinleinweber avatar marianolake avatar tappek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

clubsandwich's Issues

vcovCR() on a plm object doesn't work within a function

Calling vcovCR() on a plm object does not work properly when it's called within a function.

Here's a reproducible example:

example.of.problem <- function(){
  require(plm)
  require(clubSandwich)
  our.data <- iris
  model.output <- plm(formula = "Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width", data = our.data, model = "within", index = "Species")
  vcovCR(model.output, cluster = our.data$Species, type="CR1S")
}
example.of.problem()

This results in an error message, whereas running the code that's inside the function works just fine if it's not inside a function.

Incomplete vignette?

Seems like the vignette "Robust-Variance-Estimation-with-clubSandwich" is incomplete and should be either revised or deleted? It ends mid-sentence--as does the source RMD file.

"CR1S" Formula

According to the Stata documentation (Stata User's Guide, Release 13, section 20.21.2, 'Correlated errors: Cluster–robust standard errors'), the equation for "CR1S" should be (m (N-1)) / [(m - 1)(N - p)] (instead of (m N) / [(m - 1)(N - p)]).

Revise documentation with mathjax

Added mathjaxr to package dependencies, so documentation can now include mathjax equations. Revise docs to make the equations pretty.

tidy tidy

When I use the tidy = TRUE argument with conf_int() it adds the name of the variable (like genderFemale) is the rowname. Could you make it so that the variables are a separate column in the data - so it's easier to filter without having to pull rownames and add it to the data?

coxph

Would it be possible to extend clubSandwich to handle coxph objects as well? Currently (v0.3.2), this gives an error due to lacking nobs function. Adding such a function does not help, as the below toy example illustrates:

set.seed(123)
library(clubSandwich)

# Avoids errors from clubSandwich, but doesn't solve the problem.
nobs.coxph <- function(f) f$n

# Generate some data.
n <- 200
dd <- data.frame(x1=rnorm(n), x2=rnorm(n))
dd$lp <- dd$x1 + dd$x2 # Linear predictor.
dd$time <- rexp(n=n, rate=exp(dd$lp)) # Survival time.
dd$event <- T # No censoring.
dd$id <- 1:n # No clustering.

# Classical robust standard errors (works).
fit1 <- coxph(Surv(time, event) ~ x1 + x2 + cluster(id), data=dd)
vcov1 <- vcov(fit1)

# Robust standard errors from clubSandwich (doesn't work).
fit2 <- coxph(Surv(time, event) ~ x1 + x2, data=dd)
vcov2 <- vcovCR(fit2, cluster=dd$id, type="CR0")

# In theory, vcov2 should equal vcov2. However, it is much larger:
print(vcov2 / vcov1)

nobs.plm can be removed as there is now plm:::nobs.panelmodel on CRAN

Referring to (the beginning of) #13:
you can remove the nobs.plm method from clubSandwich for your next CRAN release (if you wish). There is now plm:::nobs.panelmodel in the latest CRAN release of plm (version 1.6-4) and you can rely on this if you want to (nobs.panelmodel also captures plm objects).

BTW: there is also support for clubSandwich's vcovs for plm::pwaldtest and hence one can pass a clubSandwich vcov in summary.plm and get identical results for the F statistic and partial wald tests on coefficients:

library(plm)
library(clubSandwich)
data(MortalityRates)

# partly from clubSandwich's vignette
# subset for deaths in motor vehicle accidents, 1970-1983
MV_deaths <- subset(MortalityRates, cause=="Motor Vehicle" & 
                      year <= 1983 & !is.na(beertaxa))

plm_unweighted <- plm(mrate ~ legal + beertaxa, data = MV_deaths, 
                      effect = "individual", index = c("state","year"))

vcov_club_id   <- vcovCR(plm_unweighted, type = "CR0", cluster = "individual")
vcov_plm_id    <- vcovHC(plm_unweighted, method="arellano", type="HC0", cluster = "group")

summary(plm_unweighted, vcov = vcov_plm_id) 
summary(plm_unweighted, vcov = vcov_club_id)
pwaldtest(plm_unweighted, test = "F", vcov = vcov_plm_id)
pwaldtest(plm_unweighted, test = "F", vcov = vcov_club_id)

Bug in findcluster.rma.mv()

clustering variable is not detected models that include both inner | outer and ~ 1 | id random effects structures.

add tests for rma.mv() with random slopes

Development vresion of metafor now allows for random slopes in rma.mv(). Add tests for whether current clubSandwich methods are adequate for such models (and if not, fix!).

A more informative error message for missing data?

Hi!

Noticed this thing, which caused a bit of trouble. Could we maybe have a more informative message saying that you can't have missings, or maybe build in a way to deal with them?

This gives "Error in t(x) %*% w : non-conformable arguments":

data <- data.frame(value = rnorm(500, mean = 100, sd = 10),
cluster = sample(c(letters[1:10], NA), 500, replace = TRUE))

lm_object <- lm(value ~ 1, data = data)
clubSandwich::vcovCR(lm_object, cluster = data$cluster, type = "CR1S")

Removing NA helps:

data <- data.frame(value = rnorm(500, mean = 100, sd = 10),
cluster = sample(c(letters[1:10], NA), 500, replace = TRUE)) %>%
na.omit()

lm_object <- lm(value ~ 1, data = data)

clubSandwich::vcovCR(lm_object, cluster = data$cluster, type = "CR1S")

superfluous dot in printed output: "Std..Error"

Just a minor cosmetic "Std..Error" in printed output of coef_test, see:

library(clubSandwich)
data(MortalityRates)

MV_deaths <- subset(MortalityRates, cause=="Motor Vehicle" & 
                      year <= 1983 & !is.na(beertaxa))
plm_unweighted <- plm(mrate ~ legal + beertaxa, data = MV_deaths, 
                      effect = "individual", index = c("state","year"))
coef_test(plm_unweighted, vcov = "CR1", cluster = "individual", test = "naive-t")

# Coef Estimate Std..Error p-val (naive-t) Sig.
#1    legal     4.42       2.13           0.043    *
#2 beertaxa    30.25       5.99          <0.001  ***

support for MASS::polr

I have build an ordered logistic regression model with MASS::polr and would like to get clustered standard errors for the marginal predictions. However this throws the following error, suggesting that MASS::polr is currently not supported:
#> Error in split.default(resid, cluster): first argument must be a vector

methods for glmmTMB objects?

Possible to add methods for models fitted with glmmTMB? The package supports linear and generalized linear mixed models where random effects are gaussian and on the scale of the linear predictor.

Possible bug in multi-level multi-variate model with lme()

library(RCurl)
library(nlme)
library(clubSandwich)

### get data
tmp <- getURL("https://raw.githubusercontent.com/wviechtb/multivariate_multilevel_models/master/data.dat")
dat <- read.table(text=tmp, header=TRUE, sep="\t")

### calculate PA and NA
dat$pa <- rowMeans(dat[, grepl("pa", names(dat))])
dat$na <- rowMeans(dat[, grepl("na", names(dat))])

### keep only variables that are needed
dat <- dat[, c("id", "sex", "beep", "pa", "na")]

### change into very long format
dat <- reshape(dat, direction="long", varying=c("pa","na"), v.names="y", idvar="obs", timevar="outcome")
dat$obs <- NULL
dat <- dat[order(dat$id, dat$beep, dat$outcome),]
rownames(dat) <- 1:nrow(dat)
dat$outnum <- dat$outcome
dat$outcome <- factor(dat$outcome)

### fit multivariate multilevel model
res1 <- lme(y ~ outcome - 1,
            random = ~ outcome - 1 | id,
            weights = varIdent(form = ~ 1 | outcome),
            correlation = corSymm(form = ~ outnum | id/beep),
            data = dat, na.action = na.omit)
summary(res1)

# Error in tcrossprod(s) * R : non-conformable arrays
coef_test(res1, vcov = "CR2")
coef_test(res1, vcov = "CR2", cluster = dat$id)

# the error stems from vcovCR()
vcovCR(res1, type = "CR2", cluster = dat$id)
vcovCR(res1, type = "CR0", cluster = dat$id)

The error goes away when I remove the correlation structure on the errors:

res2 <- lme(y ~ outcome - 1,
            random = ~ outcome - 1 | id,
            weights = varIdent(form = ~ 1 | outcome),
            data = dat, na.action = na.omit)
summary(res2)
coef_test(res2, vcov = "CR2")

But gls() with such a correlation structure works:

res3 <- gls(y ~ outcome - 1,
           weights = varIdent(form = ~ 1 | outcome),
           correlation = corSymm(form = ~ outnum | id/beep),
           data = dat, na.action = na.omit)
summary(res3)
coef_test(res3, vcov = "CR2")
coef_test(res3, vcov = "CR2", cluster = dat$id)

Maybe it's because 'res1' uses 'id' as the grouping variable for 'random' while the correlation structure is defined with respect to 'id/beep'?

error handling in Wald_test().

Wald_test() should return an informative error message if test does not match any of the available options.

Also, Wald_test() currently fails (in an uninformative way) for an intercept-only model with constrain_zero(1). Should return tests equivalent to the coef_test() result.

multiple correction

Can you adjust the p-value for multiple correction? That way you can have the symbols for the standard alpha values without having to mess with correcting alpha. Like what p.adjust() does.

Example:

if (mc_method == "Bonferroni"){

res$p_val <- res$p_val * q # q being number of contrasts
res$sig <- symnum(x$p_val, corr = FALSE, na = FALSE,
cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
symbols = c("", "", "", ".", " "))

}

Possibly incorrect finite sample adjustment for CR1S w/ multivariate model

Hi!

Was benchmarking an estimator I'm working on and noticed that I believe your finite sample adjustment of the "stata" flavor may be wrong when working with multiple outcomes. Here's a MWE:

lmo <- lm(cbind(mpg, hp) ~ wt, data = mtcars)
clubSandwich::vcovCR(lmo, cluster = mtcars$cyl, type = "CR0")

# Manually build correction
J <- length(unique(mtcars$cyl))
n <- nrow(mtcars)
r <- 2

# Not equivalent
clubSandwich::vcovCR(lmo, cluster = mtcars$cyl, type = "CR1S")
clubSandwich::vcovCR(lmo, cluster = mtcars$cyl, type = "CR0") * ((J * (n - 1)) / ((J - 1) * (n - r)))

# can see here that your N is actually n * ny, where ny is the number of models
# and r is actually r*ny, where r is the rank of X
debugonce(clubSandwich:::CR1S)
clubSandwich::vcovCR(lmo, cluster = mtcars$cyl, type = "CR1S")

Now, I'm not actually sure that the correction you are using is wrong, but it does mean that the block diagonals of your vcov matrix are not the same as you would get from running the regression separately for each outcome, which I believe it should be.

Specifying cluster variable sometimes fail

Hi all!
I'm trying to specify a cluster variable after plm for my simulated data (which I use for power simulation), but I get the following error message:
"Error in [.data.frame(eval(mf$data, envir), , index_names) : undefined columns selected"

I'm not sure if this is specific to vcovCR or something general about R, but could anyone tell me what's wrong with my code?
My example (data file is in the attached zip file):
library(clubSandwich)
library(plm)
load("Data.RData")
fe.fit <- plm(formula = Y ~ T, data = Data, model = "within", index = "id",effect = "individual", singular.ok = FALSE)
vcovCR(fe.fit,cluster=Data$id,type = "CR2") # doesn't work, but I can run this by not specifying cluster as in the next line
vcovCR(fe.fit,type = "CR2")
vcovCR(fe.fit,cluster=Data$gid,type = "CR2") # I ultimately want to run this

For some other datasets, for example Produc in plm package, this works:
library(clubSandwich)
library(plm)
data("Produc", package = "plm")
plm_FE <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"),
effect = "individual", model = "within")
vcovCR(plm_FE, type="CR2",cluster = Produc$state)

Thanks a lot!
Data.zip

nobs.plm

I noticed there is a nobs.plm function in this package. This makes sense because the current CRAN release of plm does not have this method. Since some time, it is implementend in the development version of plm (https://r-forge.r-project.org/projects/plm/).

Just a tiny note about clubSandwich's implementation:

 nobs.plm <- function(object, ...) {
   NROW(object$model)
 }

This should work for FE and RE models. For FD models, the original/undifferenced data is kept in object$model, thus this will likely yield a somewhat wrong number. However, I don't know if FD models are supported by clubSandwich. I recommend to use length(object$residuals) which will give the number of (differenced) observations used in the (FD) model estimation and does the trick for FE and RE models as well. This is likely the number we want here?

coef_test with fixed corelation matrices

Hi @jepusto

In ecology and evolution, meta-analysis often has correlation matrices for a random factor. But I tried coef_test with such a model. Then I get this:

> coef_test(mod4, vcov="CR2", cluster = dat$paper)
Error in vcovCR.rma.mv(obj, type = vcov, ...) : 
  vcovCR.rma.mv() does not work with fixed correlation matrices in the R argument.

Do you think you could make this work with such a model?

Thanks!

Control parallel processing

Great package. But I was struggling to find a way, without hacking all of the code, to control how many cores are utilized in simulations. Could you add that, please?

"CR1" Formula

Shouldn't the adjustment factor for "CR1" be m / (m - p), where m is the number of clusters and p the rank of the model matrix (instead of m / (m - 1))? This would be analogous to 'HC1'. See, for example, MacKinnon and White (1985).

break up bread and meat?

Hi @jepusto,

@josherrickson and I were wondering, does clubSandwich have a route to/user-facing mechanism for calculating meat only, ie assuming the user will construct his or her own bread matrix or get it from elsewhere? For us something like that would be pretty nifty.

Cool project! --ben

Consider using sandwich::bread.mlm()

On suggestion from Dr. Zeileis, consider using the following:

bread.mlm <- function(x, ...)
{
   if(!is.null(x$na.action)) class(x$na.action) <- "omit"
   cf <- coef(x)
   rval <- summary.lm(x)
   rval <- kronecker(
     structure(diag(ncol(cf)), .Dimnames = rep.int(list(colnames(cf)), 2L)),
     structure(rval$cov.unscaled,  .Dimnames = rep.int(list(rownames(cf)), 2L)) * as.vector(sum(rval$df[1:2])),
     make.dimnames = TRUE
   )
   return(rval)
}

Two-way clustering for plm object?

Hi! I'm trying to do two-way clustering with vcovCR() command after running fixed effects model with plm() by vcovCR(plm_FE,type="CR2",cluster=c("data$id","data$year")), but it returns the same covariance matrix estimate as one-way clustering. Is this covariance matrix estimate correct, or something is wrong with the syntax?

Sample code (adapted from the vcovCR.plm() help menu code):
library(plm)
data("Produc", package = "plm")
plm_FE <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"),
effect = "individual", model = "within")
vcovCR(plm_FE, type="CR2") % one-way clustering
vcovCR(plm_FE, type="CR2",cluster = c(Produc$state)) % same as above - one-way clustering
vcovCR(plm_FE, type="CR2",cluster = c(Produc$state,Produc$year)) % supposed to be two-way clustering, but returns the same covariance matrix as one-way clustering

model_matrix.plm for RE models

Could you check please: in model_matrix.plm for RE models, the model matrix is extracted by:
model.matrix(Formula::as.Formula(formula(obj)), model.frame(obj))

This will give the untransformed data (as it is in the model frame) - intended? What is the reason behind the special treatment of RE models here?

Let impute_covariance_matrix() take a patterned correlation matrix as input.

Modify impute_covariance_matrix() to take a vector describing categories of effect sizes and a matrix describing the correlations between effect sizes of each pair of categories (and within category). Return a variance-covariance matrix (or list of vcov matrices) with the implied correlations.

Here's an example dataset and pattern matrix:

set.seed(20200910)
cluster <- factor(rep(1:3, each = 4))
cati <- sample(letters[1:4], 12, replace = TRUE)
vi <- 1 /rchisq(12, df = 6)
dat <- data.frame(cluster, cati, vi)

r_pattern <- matrix(c(0.9, 0.8, 0.7, 0.6, 
                      0.8, 0.85, 0.7, 0.5,
                      0.7, 0.7, 0.88, 0.5,
                      0.6, 0.5, 0.5, 0.92), 4, 4,
                    dimnames = list(letters[1:4],letters[1:4]))

Here's how the syntax might look:

impute_covariance_matrix(vi, cluster, cati, pattern)

error with conf_int() when clustering on a single study

Hi James,

Many thanks for creating this fantastic package!
We aim at applying RVE to summarize R-squared values that we calculated based on data from several large-scale assessment studies. The sample size at the study level is quite small, ranging between 1 study and 6 studies, depending on the covariate set under investigation. To account both for within- and between-study variation and to model the dependencies within studies, we thought about fitting CHE models (as outlined in Pustejovsky & Tipton, 2020 [https://osf.io/preprints/metaarxiv/vyfcj/]) using metafor and clubSandwich.
As described above, there are situations where we would like to summarize across only a single study. In these cases, we still would like to account for the fact that the effect sizes stem from the same study. So we tried to fit the following model (using sample data here):

# install.packages("metafor")
# remotes::install_github("jepusto/clubSandwich")

library(metafor) # version 2.4-0
library(clubSandwich) # version 0.5.3

sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)

# sample data with 1 study and 5 effect sizes
set.seed(1234)
dat <- data.frame(
  study = "study1", # study number
  est = runif(5, 0.1, 0.6), # R-squared values
  se = runif(5, 0.005, 0.025), # standard errors of R-squared values
  es_id = 1:5 # effect size ID
)

print(dat)
##    study       est          se es_id
## 1 study1 0.1568517 0.017806212     1
## 2 study1 0.4111497 0.005189915     2
## 3 study1 0.4046374 0.009651010     3
## 4 study1 0.4116897 0.018321675     4
## 5 study1 0.5304577 0.015285023     5

# sampling variance-covariance matrix
v_mat <- impute_covariance_matrix(dat$se^2, cluster = dat$study, r = 0.8)

# working model in metafor
res <- rma.mv(yi = est, V = v_mat, random = ~ 1 | study/es_id, data = dat)

metafor throws out the following warning:

In rma.mv(yi = est, V = v_mat, random = ~1 | study/es_id, data = dat) :
  Single-level factor(s) found in 'random' argument. Corresponding 'sigma2' value(s) fixed to 0.

Since fixing the between-cluster heterogeneity to zero is exactly what we intended to do here, we ignore this warning. However, when calling clubSandwich::conf_int(), we get the following error message:

# clustered SEs and CIs
ci <- conf_int(res, "CR2")

Error in `diag<-`(`*tmp*`, value = diag(P_array[, , i]) + P_diag[i, ]) : 
  only matrix diagonals can be replaced

Regardless of the fact that RVE with Satterthwaite correction might produce imprecise estimates with less than 5 studies anyway (as mentioned in Tipton, 2015), I assume that it simply won't work to cluster on only a single study, right? (Fixing the between-cluster variance with 2 or more studies by means of sigma2 = c(0, NA) in metafor::rma.mv() does not produce this error message.) If this is the case, is it possible to adjust the error message accordingly? Or are we overlooking something?

Many thanks in advance for your effort.
All the best,
Sophie

method for lfe::felm() objects

Really like the look of this package. Thanks for putting it together.

Would it be possible to add a method for lfe::felm()? For many of us, this is the go-to approach for estimating high-dimensional FE models in R. The function already includes native cluster-robust SE estimation, but I don't think it implements a small cluster correction of the type presented here.

One potential complication is that the fixed effects are "projected out" by the underlying algorithm... which is why felm() is able to able to estimate HDFE models so quickly in the first place. But this may limit how well it plays with clubSandwich.

PS - A link to the lfe package.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.