bsaul / inferference Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 2.0 1.23 MB

An R package for causal inference with interference ('inferference')

R 100.00%

inferference's People

Contributors

Stargazers

Watchers

Forkers

barkleybg guhjy

inferference's Issues

Incorrect formula

Propensity formula on p.3 of reference manual - in the interference section under propensity_integrand - is incorrect. This is a typo.

logit_integrand cannot handle singleton groups

When a group has only one individual, the logit_integrand() function provided in this package cannot provide variance estimates.

MWE:

library(inferference)
vaccinesim_sub <- vaccinesim[1:300,] ##speedup
vaccinesim_sub$group[1] <- -3 ##Creating a singleton group

example1 <- interference(
    formula = Y | A | B ~ X1 + X2 + (1|group) | group, 
    allocations = c(.3, .45,  .6), 
    data = vaccinesim_sub, 
    randomization = 2/3,
    method = 'simple')

throws

Error in apply(hh, 2, function(x) exp(sum(log(x)))) : 
  dim(X) must have a positive length
Error in apply(hh, 2, function(x) exp(sum(log(x)))) : 
  dim(X) must have a positive length
Error in apply(hh, 2, function(x) exp(sum(log(x)))) : 
  dim(X) must have a positive length
Error in apply(hh, 2, function(x) exp(sum(log(x)))) : 
  dim(X) must have a positive length
Error in apply(hh, 2, function(x) exp(sum(log(x)))) : 
  dim(X) must have a positive length

and print(example1) returns

Direct Effects
 alpha1 trt1 alpha2 trt2 estimate std.error conf.low conf.high
   0.30    0   0.30    1  0.09501        NA       NA        NA
   0.60    0   0.60    1  0.12915        NA       NA        NA
   0.45    0   0.45    1  0.14543        NA       NA        NA

I think I've fixed it though and will put in a PR shortly.

Use of parallel computing

Much of the internal computations are embarrassingly parallel. How can the internal functions make use of multiple cores when they are available?

Use of data.frame internally

Per bribark:

Formula::model.part(formula, data=my_data, lhs=num, drop=TRUE) does not actually drop a one-column local data_frame() to a vector. That is, if someone passes in a dplyr/tibble/tbl_df'd data_frame(), then I don't know if the model.part() will work as intended.

Break between v0.4.62 and v0.5.1?

With inferference_0.5.1,

full_fit <-  try(inferference::interference(
    formula = my_formula,
    data = my_dataset,
    allocations = c(.3, .6),
    randomization = 2/3,
    ),
    runSilent = FALSE
))
[1] "Calculating matrix of IP weights..."
[1] "Calculating array of IP weight derivatives..."
[1] "Calculating matrix of scores..."
Error in grad.default(X = structure(list(`(Intercept)` = c(1, 1, 1, 1,  : 
  function returns NA at 7.75248344440477e-059.44252957302433e-063.03965082818232e-060.0001242369543700136.31106479280438e-050.00012825501761554 distance from x.

However, for inferference_0.4.62,

full_fit <-  try(inferference::interference(
    formula = my_formula,
    data = my_dataset,
    allocations = c(.3, .6),
    randomization = 2/3,
    ),
    runSilent = FALSE
))
[1] "Calculating matrix of IP weights..."
[1] "Calculating array of IP weight derivatives..."
[1] "Calculating matrix of scores..."
[1] "Computing effect estimates..."
[1] "Interference complete"

full_fit
 --------------------------------------------------------------------------
                               Model Summary                    
 --------------------------------------------------------------------------

I can provide you with the dataset.

Estimated Random Effect Variance of 0

When the GLMER model estimated variance of the random effect is 0, calculation of robust variance can be impossible due to non-invertible V matrix. Package should be able to caution user that the data does not allow for this option, instead of quitting and returning an error. Perhaps use a try() function and return description of the issue.

I may have described two separate issues with the code; I hope not.

bugfix: warn/stop when allocation exceeds the randomization rule

User can currently specify an allocation greater than the randomization rule.

This should probably return a warning (or maybe even an error).

I suggest to have a default to stop and return an error, but the user can specify an argument for the program to run and complete without issues if they so choose.

Allow users to fit spaMM::HLfit() propensity model

spaMM::HLfit() is an alternative to lme4::glmer().

tag versions/releases

It would be easier to keep up with CRAN by adding version tags or release notes and changelogs to github.

Use of sandwich and sandwichShop package for estimating equations

Using sandwich and sandwichShop packages could greatly simplify the internals of this package by keeping the user from needing to specify the loglihood_integrand and possibly the propensity_integrand functions. These functions interpreted based on the modeling procedure for the group-level propensity (e.g. glmer, glm, lm, geeglm, etc).

Remove stale develop branch?

In the interest of starting new development on this package, is it time to kill the stale develop branch with my terrible commits on it?

I don't think anyone is downstream from it, so can we kill it quietly? Or, keep those commits in a new branch, and start the Develop branch fresh from Master.

Relevant information that I'm choosing to ignore right now: how to git flowchart

Run interference() silently

Is there an option to run inferference::interference() without generating the following output?
[1] "Calculating matrix of IP weights..."
[1] "Calculating array of IP weight derivatives..."
[1] "Calculating matrix of scores..."
[1] "Computing effect estimates..."
[1] "Interference complete"

If not, can you add in option to pass in a logical runSilently=T into the "..." in interference()?
Thanks!
BGB

User-provided propensity scores?

A user may want to provide propensity scores (estimated by a different method, say). Two options I've thought of for this:

model_method='user' and user specifies weights as an additional argument
- In this case, the user can be asked to specify a placeholder in the formula, like outcome | exposure ~ anything | group. The propensity_formula=exposure~anything will never be called.
- The weights argument would get passed into the appropriate functions and naive variance estimation would take place.
- A downside would be that the weights may need to be in correct form. Of course, a helper function could easily be written from wght_matrix()
model_method='user' and user includes the weights as a term in the data argument, and then specifies this with the formula method.
- This may be a little cleaner, because the user would need only to provide formula= outcome | exposure ~ prop_score | group where prop_score is the column name in data that contains the relevant PS or IPW information.
- A downside to this approach is the cluster propensity scores are invariant for all individuals in the group, so this is a slight copying of information
- An upside of this approach is it may be easier to implement with Hajek-style IPW (future work), as you could specify exposure ~ group_prop_score + individual_conditional_prob

I guess I'm leaning towards approach 2 here. Does that sound good to you?

Estimated Group-level Propensity Scores

Users would like to access the estimated propensity scores in addition to the estimated weights.