zeileis / ivreg Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 6.0 21.41 MB

2SLS Regression with Diagnostics in R

Home Page: https://zeileis.github.io/ivreg/

R 87.52% CSS 3.82% TeX 8.65%

instrumental-variables regression-diagnostics two-stage-least-squares-regression

ivreg's People

Contributors

Stargazers

Watchers

Forkers

jianf7 nk027 tappek maelastruc vincentarelbundock raffaem

ivreg's Issues

bug in hatvalues(model, type="maximum")

It seems to me that this line:

ivreg/R/ivregDiagnostics.R

Line 291 in 234b00e

hats[, "stage_1"]

should actually be:

hats[, "stage_1"] / mean1

Currently because this is not divided, the stage 2 hatvalues is always the maximum and hatvalues(model, type="maximum") essentially always returns the stage 2 hatvalues.

Possible extra #' in helpfile

Thanks for the great package.

On line 91 of the ivreg.R file here I think you have an extra #' at the beginning of the line.

vcovHC.ivreg non-exported?

Seems like vcovHC.ivreg is not registered/exported in ivreg_0.6-1:

> ivreg::vcovHC(m)
Error: 'vcovHC' is not an exported object from 'namespace:ivreg'
> ivreg:::vcovHC.ivreg() # does exist

Bibliographic references

The vignette briefly describe the 3 tests provided by the summary.

Can we put bibliographic references for such tests? Where they are explained in more mathematical details? Or where have you taken the implementation from?

Include stage-1 errors (or whole model)

Currently ivreg() retains some features from the stage-1 regression. Notably, the "residuals1", "qr1", "rank1", and "coefficients1" components of the return object. It would be great if we could retain the associated stage-1 standard errors too, or perhaps even the whole model object.

Motivation

It's very common to see coefficient estimates from the first stage of an IV presented in regression tables. Currently, this is not possible to automate and requires quite a bit tinkering or re-running the first-stage manually. However, since we already have the stage-1 coefficients, all we would need is the associated standard errors to automate the table construction.

Alternatively, one could include the whole stage-1 model object, which would allow for even more flexibility. This is the approach that lfe::felm adopts for its IV method. (Example below.) You could even go a step further and return it as a "coeftest" object to reduce size.

library(ivreg)
library(lfe)
#> Loading required package: Matrix

data("CigaretteDemand", package = "ivreg")

## ivreg 
m1 = ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome),
                  data = CigaretteDemand)
## "Stage-1" elements of the return object (missing coefficient standard errors)
grep('1', attr(m1, 'names'), value = TRUE)
#> [1] "residuals1"    "qr1"           "rank1"         "coefficients1"

## lfe::felm equivalent
m2 = felm(log(packs) ~ log(rincome)  | 0 | (log(rprice) ~ salestax), 
                 data = CigaretteDemand)
## Retains whole stage-1 model object
summary(m2$stage1)
#> 
#> Call:
#>    NULL 
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -0.163799 -0.033049  0.001907  0.049322  0.185542 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  3.590811   0.225558  15.920  < 2e-16 ***
#> log(rincome) 0.389283   0.085104   4.574 3.74e-05 ***
#> salestax     0.027395   0.004077   6.720 2.65e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.07848 on 45 degrees of freedom
#> Multiple R-squared(full model): 0.6389   Adjusted R-squared: 0.6228 
#> Multiple R-squared(proj model): 0.6389   Adjusted R-squared: 0.6228 
#> F-statistic(full model):39.81 on 2 and 45 DF, p-value: 1.114e-10 
#> F-statistic(proj model): 39.81 on 2 and 45 DF, p-value: 1.114e-10 
#> F-statistic(excl instr.):45.16 on 1 and 45 DF, p-value: 2.655e-08

^{Created on 2020-09-04 by the reprex package (v0.3.0)}

Related discussions:

P.S. Thanks so much for the work on this package. Having a self-contained R library for IV makes a ton of sense to me. The new features and pkgdown site are excellent too.

No `instruments` when omitting intercepts

This may be revealing my fundamental misunderstanding of IVs, but if I were to specify a 2SLS using ivreg without intercepts in either equation, ivreg seems to believe there are no instruments when computing diagnostics (and therefore fails). Is this the expected behavior? If so, can someone explain to me why? If not, can I help fix it?

Thank you!

Here's a reproducible example using the built in mtcars dataset.

library(ivreg)

ivo <- ivreg(mpg ~ 0 + hp | 0 + cyl, data = mtcars)
ivo$instruments
#> named integer(0)
summary(ivo)
#> Error in ivdiag(object, vcov. = vcov.): no endogenous/instrument variables
sessionInfo()
#> R version 4.1.1 Patched (2021-09-01 r80849)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ivreg_0.6-0
#> 
#> loaded via a namespace (and not attached):
#>  [1] zip_2.2.0         Rcpp_1.0.7        cellranger_1.1.0  compiler_4.1.1   
#>  [5] pillar_1.6.2      highr_0.9         forcats_0.5.1     tools_4.1.1      
#>  [9] digest_0.6.27     lattice_0.20-44   evaluate_0.14     lifecycle_1.0.0  
#> [13] tibble_3.1.4      pkgconfig_2.0.3   rlang_0.4.11      openxlsx_4.2.4   
#> [17] reprex_2.0.1      cli_3.0.1         rstudioapi_0.13   curl_4.3.2       
#> [21] yaml_2.2.1        haven_2.4.3       xfun_0.25         rio_0.5.27       
#> [25] fastmap_1.1.0     withr_2.4.2       stringr_1.4.0     knitr_1.33       
#> [29] fs_1.5.0          vctrs_0.3.8       hms_1.1.0         grid_4.1.1       
#> [33] lmtest_0.9-38     glue_1.4.2        data.table_1.14.0 fansi_0.5.0      
#> [37] readxl_1.3.1      foreign_0.8-81    rmarkdown_2.10    carData_3.0-4    
#> [41] Formula_1.2-4     car_3.0-11        magrittr_2.0.1    htmltools_0.5.2  
#> [45] ellipsis_0.3.2    MASS_7.3-54       abind_1.4-5       utf8_1.2.2       
#> [49] stringi_1.7.4     crayon_1.4.1      zoo_1.8-9

^{Created on 2021-09-04 by the reprex package (v2.0.1)}

Website in README

I think we should put the website in the README

https://zeileis.github.io/ivreg/

Confidence interval and prediction intervals

Doesn't ivreg implement confidence and prediction intervals?

The following called on an ivreg object return just the prediction, not the interval around it:

predict(mod2, newdata=newdf, interval="confidence", level=0.99)
predict(mod2, newdata=newdf, interval="prediction", level=0.99)

Complete vcov for aliased models

As pointed out in the discussion of #1 the vcov() method for ivreg objects always drops the rows/columns pertaining to aliased coeffcients while the lm method retains them by default. In contrast, the coef() method always retains the aliased coefficients. It would be good to be more consistent with lm here. A simple non-sensical example is provided below.

With lm:

library("ivreg")
data("cars", package = "datasets")
cars$const <- 1
m <- lm(dist ~ speed + const, data = cars)
coef(m)
## (Intercept)       speed       const 
##  -17.579095    3.932409          NA 
 coef(m, complete = FALSE)
## (Intercept)       speed 
##  -17.579095    3.932409 
vcov(m)
##             (Intercept)      speed const
## (Intercept)   45.676514 -2.6588234    NA
## speed         -2.658823  0.1726509    NA
## const                NA         NA    NA
vcov(m, complete = FALSE)
##             (Intercept)      speed
## (Intercept)   45.676514 -2.6588234
## speed         -2.658823  0.1726509

Versus ivreg:

iv <- ivreg(dist ~ speed + const | log(speed) + const, data = cars)
coef(iv)
## (Intercept)       speed       const 
##   -15.36537     3.78866          NA 
vcov(iv)
##             (Intercept)      speed
## (Intercept)   48.985379 -2.8729188
## speed         -2.872919  0.1865532

Partial argument matches

Having the options for partial argument matching activated, I receive a couple of these warnings in ivreg_0.6-1, see below example from ?ivreg::ivreg. These are the options:

$warnPartialMatchArgs
[1] TRUE
$warnPartialMatchAttr
[1] TRUE
$warnPartialMatchDollar
[1] TRUE

Example:

library(ivreg)
data("CigaretteDemand", package = "ivreg")

## model 
m <- ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome),
           data = CigaretteDemand)
summary(m)
#> 
#> Call:
#> ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | salestax + 
#>     log(rincome), data = CigaretteDemand)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -0.611000 -0.086072  0.009423  0.106912  0.393159 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)    9.4307     1.3584   6.943 1.24e-08 ***
#> log(rprice)   -1.1434     0.3595  -3.181  0.00266 ** 
#> log(rincome)   0.2145     0.2686   0.799  0.42867    
#> 
#> Diagnostic tests:
#>                  df1 df2 statistic  p-value    
#> Weak instruments   1  45    45.158 2.65e-08 ***
#> Wu-Hausman         1  44     1.102      0.3    
#> Sargan             0  NA        NA       NA    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.1896 on 45 degrees of freedom
#> Multiple R-Squared: 0.4189,  Adjusted R-squared: 0.3931 
#> Wald test: 6.534 on 2 and 45 DF,  p-value: 0.003227

Warning messages:
1: In model.matrix.default(object$terms$regressors, object$model, contrasts = object$contrasts$regressors) :
  partial argument match of 'contrasts' to 'contrasts.arg'
Calls: summary ... ivdiag -> model.matrix -> model.matrix.ivreg -> model.matrix
2: In model.matrix.default(object$terms$instruments, object$model,  :
  partial argument match of 'contrasts' to 'contrasts.arg'
Calls: summary ... ivdiag -> model.matrix -> model.matrix.ivreg -> model.matrix
3: In model.matrix.default(object$terms$regressors, object$model, contrasts = object$contrasts$regressors) :
  partial argument match of 'contrasts' to 'contrasts.arg'
Calls: summary ... ivdiag -> model.matrix -> model.matrix.ivreg -> model.matrix
4: In model.matrix.default(object$terms$instruments, object$model,  :
  partial argument match of 'contrasts' to 'contrasts.arg'
Calls: summary ... ivdiag -> model.matrix -> model.matrix.ivreg -> model.matrix

pkgdown github action produces error messages

I added a pkgdown github action and it produces a couple of error messages -- see the Actions tab

Weak instruments

Hi,

I have been using the ivreg function and the diagnostics. But I have found a few different sources to what the Weak instrument diagnostic actually does.

Therefore, I want ask what the weak instrument actually tests.

Error in `summary` when no endogenous regressors

Hi --- thanks for the package, this is very useful! I have a quick question about some functionality when there are no endogenous regressors included in the model. In particular, in this case (at least for me), the summary function errors. Here is an example:

library(ivreg)

data("SchoolingReturns", package="ivreg")

iv_res <- ivreg(wage ~ education + experience, 
                ~education + experience, 
                data=SchoolingReturns)
coef(iv_res)
#> (Intercept)   education  experience 
#>  -375.58805    55.05451    25.14156
summary(iv_res)
#> Error in ivdiag(object, vcov. = vcov.): no endogenous/instrument variables

I know this is not a main use case; for me, I am using the ivreg function internally in another package, and I'd like to be able to cover the case with and without endogenous regressors in the same code.

For what it's worth, the AER package does work in this case:

data("SchoolingReturns", package="ivreg")

iv_res <- AER::ivreg(wage ~ education + experience, 
                ~education + experience, 
                data=SchoolingReturns)
summary(iv_res)
#> 
#> Call:
#> AER::ivreg(formula = wage ~ education + experience | education + 
#>     experience, data = SchoolingReturns)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -647.16 -161.20  -29.57  124.79 1612.62 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -375.588     37.802  -9.936   <2e-16 ***
#> education     55.055      2.140  25.722   <2e-16 ***
#> experience    25.142      1.383  18.174   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 238 on 3007 degrees of freedom
#> Multiple R-Squared: 0.1811,  Adjusted R-squared: 0.1805 
#> Wald test: 332.5 on 2 and 3007 DF,  p-value: < 2.2e-16

three-part right side

https://cran.r-project.org/web/packages/ivreg/vignettes/ivreg.html says As listing exogenous variables in both parts on the right-hand side of the formula may become tedious if there are many of them, an additional convenience option is to use a three-part right side like y ~ x1 | x2 | z1 + z2, listing the exogenous, endogenous, and instrumental variables (for the endogenous variables only), respectively.

This does not work for me.

# data
data("SchoolingReturns")

# ivreg basic
library(ivreg)

ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome), data = CigaretteDemand)
ivreg(log(packs) ~ log(rprice) | log(rincome) | salestax, data = CigaretteDemand)

Three-part right-hand side

Currently, ivreg() requires a two-part right-hand side in the formula corresponding to the X and Z matrices in 2SLS, respectively:

y ~ ex + en | ex + in

where ex corresponds to all exogenous and en to all endogenous and in to all instrument variables.

However, this notation can be a bit cumbersome if we have many exogenous variables and just one endogenous variable and one corresponding instrument. This is why the manual page points out that you can do:

y ~ ex + en | . - en + in

This is a bit technical, though, and in a discussion with @grantmcdermott we thought that it might help to support a three-part right-had side:

y ~ ex | en | in

It would be really hard to fully support that in the sense that terms etc. would also need to change. But supporting it in the top-level ivreg() function just by converting it to the two-part case requires just the following additional code:

if(length(formula)[2] == 3L) formula <- as.Formula(
  formula(formula, rhs = c(1, 2), collapse = TRUE),
  formula(formula, lhs = 0, rhs = c(1, 3), collapse = TRUE)
)

This would have to be inserted between line 141 and 142 of ivreg.R. For example, it would turn

formula <- Formula(y ~ x1 | x2 + x3 | z1 + z2)

into

y ~ x1 + (x2 + x3) | x1 + (z1 + z2)

My feeling is that the three-part right-hand side in the intput would be easier for some users. I'm just not sure whether it would create more confusion for some users as well.

Thoughts?

adding Cragg-Donald (1993) weak instrument test

Hi everyone,

First of all thank you for creating this great package, which I have used heavily for a while now.

I thought it would be a useful addition to the package to provide a function that calculates the Cragg-Donald (1993) test statistic. This would be more informative than the usual first stage F-tests when the model has several endogenous variables. I have already coded my own version of the CD statistic for specific applications and could either write a method or a function (potentially to be incorporated into ivdiag()) that calculates the test statistic.

Following the open source netiquette, I wanted to first ask whether such a contribution would be welcomed and if yes, whether you have any suggestions about how best to add this functionality (e.g. separate method/function etc.).