zeileis / ivreg Goto Github PK
View Code? Open in Web Editor NEW2SLS Regression with Diagnostics in R
Home Page: https://zeileis.github.io/ivreg/
2SLS Regression with Diagnostics in R
Home Page: https://zeileis.github.io/ivreg/
It seems to me that this line:
Line 291 in 234b00e
should actually be:
hats[, "stage_1"] / mean1
Currently because this is not divided, the stage 2 hatvalues is always the maximum and hatvalues(model, type="maximum")
essentially always returns the stage 2 hatvalues.
Thanks for the great package.
On line 91 of the ivreg.R file here I think you have an extra #'
at the beginning of the line.
Seems like vcovHC.ivreg is not registered/exported in ivreg_0.6-1:
> ivreg::vcovHC(m)
Error: 'vcovHC' is not an exported object from 'namespace:ivreg'
> ivreg:::vcovHC.ivreg() # does exist
The vignette briefly describe the 3 tests provided by the summary.
Can we put bibliographic references for such tests? Where they are explained in more mathematical details? Or where have you taken the implementation from?
Currently ivreg()
retains some features from the stage-1 regression. Notably, the "residuals1", "qr1", "rank1", and "coefficients1" components of the return object. It would be great if we could retain the associated stage-1 standard errors too, or perhaps even the whole model object.
It's very common to see coefficient estimates from the first stage of an IV presented in regression tables. Currently, this is not possible to automate and requires quite a bit tinkering or re-running the first-stage manually. However, since we already have the stage-1 coefficients, all we would need is the associated standard errors to automate the table construction.
Alternatively, one could include the whole stage-1 model object, which would allow for even more flexibility. This is the approach that lfe::felm
adopts for its IV method. (Example below.) You could even go a step further and return it as a "coeftest" object to reduce size.
library(ivreg)
library(lfe)
#> Loading required package: Matrix
data("CigaretteDemand", package = "ivreg")
## ivreg
m1 = ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome),
data = CigaretteDemand)
## "Stage-1" elements of the return object (missing coefficient standard errors)
grep('1', attr(m1, 'names'), value = TRUE)
#> [1] "residuals1" "qr1" "rank1" "coefficients1"
## lfe::felm equivalent
m2 = felm(log(packs) ~ log(rincome) | 0 | (log(rprice) ~ salestax),
data = CigaretteDemand)
## Retains whole stage-1 model object
summary(m2$stage1)
#>
#> Call:
#> NULL
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.163799 -0.033049 0.001907 0.049322 0.185542
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.590811 0.225558 15.920 < 2e-16 ***
#> log(rincome) 0.389283 0.085104 4.574 3.74e-05 ***
#> salestax 0.027395 0.004077 6.720 2.65e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.07848 on 45 degrees of freedom
#> Multiple R-squared(full model): 0.6389 Adjusted R-squared: 0.6228
#> Multiple R-squared(proj model): 0.6389 Adjusted R-squared: 0.6228
#> F-statistic(full model):39.81 on 2 and 45 DF, p-value: 1.114e-10
#> F-statistic(proj model): 39.81 on 2 and 45 DF, p-value: 1.114e-10
#> F-statistic(excl instr.):45.16 on 1 and 45 DF, p-value: 2.655e-08
Created on 2020-09-04 by the reprex package (v0.3.0)
Related discussions:
P.S. Thanks so much for the work on this package. Having a self-contained R library for IV makes a ton of sense to me. The new features and pkgdown site are excellent too.
This may be revealing my fundamental misunderstanding of IVs, but if I were to specify a 2SLS using ivreg
without intercepts in either equation, ivreg
seems to believe there are no instruments when computing diagnostics (and therefore fails). Is this the expected behavior? If so, can someone explain to me why? If not, can I help fix it?
Thank you!
Here's a reproducible example using the built in mtcars
dataset.
library(ivreg)
ivo <- ivreg(mpg ~ 0 + hp | 0 + cyl, data = mtcars)
ivo$instruments
#> named integer(0)
summary(ivo)
#> Error in ivdiag(object, vcov. = vcov.): no endogenous/instrument variables
sessionInfo()
#> R version 4.1.1 Patched (2021-09-01 r80849)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ivreg_0.6-0
#>
#> loaded via a namespace (and not attached):
#> [1] zip_2.2.0 Rcpp_1.0.7 cellranger_1.1.0 compiler_4.1.1
#> [5] pillar_1.6.2 highr_0.9 forcats_0.5.1 tools_4.1.1
#> [9] digest_0.6.27 lattice_0.20-44 evaluate_0.14 lifecycle_1.0.0
#> [13] tibble_3.1.4 pkgconfig_2.0.3 rlang_0.4.11 openxlsx_4.2.4
#> [17] reprex_2.0.1 cli_3.0.1 rstudioapi_0.13 curl_4.3.2
#> [21] yaml_2.2.1 haven_2.4.3 xfun_0.25 rio_0.5.27
#> [25] fastmap_1.1.0 withr_2.4.2 stringr_1.4.0 knitr_1.33
#> [29] fs_1.5.0 vctrs_0.3.8 hms_1.1.0 grid_4.1.1
#> [33] lmtest_0.9-38 glue_1.4.2 data.table_1.14.0 fansi_0.5.0
#> [37] readxl_1.3.1 foreign_0.8-81 rmarkdown_2.10 carData_3.0-4
#> [41] Formula_1.2-4 car_3.0-11 magrittr_2.0.1 htmltools_0.5.2
#> [45] ellipsis_0.3.2 MASS_7.3-54 abind_1.4-5 utf8_1.2.2
#> [49] stringi_1.7.4 crayon_1.4.1 zoo_1.8-9
Created on 2021-09-04 by the reprex package (v2.0.1)
I think we should put the website in the README
Doesn't ivreg
implement confidence and prediction intervals?
The following called on an ivreg object return just the prediction, not the interval around it:
predict(mod2, newdata=newdf, interval="confidence", level=0.99)
predict(mod2, newdata=newdf, interval="prediction", level=0.99)
As pointed out in the discussion of #1 the vcov()
method for ivreg
objects always drops the rows/columns pertaining to aliased coeffcients while the lm
method retains them by default. In contrast, the coef()
method always retains the aliased coefficients. It would be good to be more consistent with lm
here. A simple non-sensical example is provided below.
With lm
:
library("ivreg")
data("cars", package = "datasets")
cars$const <- 1
m <- lm(dist ~ speed + const, data = cars)
coef(m)
## (Intercept) speed const
## -17.579095 3.932409 NA
coef(m, complete = FALSE)
## (Intercept) speed
## -17.579095 3.932409
vcov(m)
## (Intercept) speed const
## (Intercept) 45.676514 -2.6588234 NA
## speed -2.658823 0.1726509 NA
## const NA NA NA
vcov(m, complete = FALSE)
## (Intercept) speed
## (Intercept) 45.676514 -2.6588234
## speed -2.658823 0.1726509
Versus ivreg
:
iv <- ivreg(dist ~ speed + const | log(speed) + const, data = cars)
coef(iv)
## (Intercept) speed const
## -15.36537 3.78866 NA
vcov(iv)
## (Intercept) speed
## (Intercept) 48.985379 -2.8729188
## speed -2.872919 0.1865532
Having the options for partial argument matching activated, I receive a couple of these warnings in ivreg_0.6-1, see below example from ?ivreg::ivreg
. These are the options:
$warnPartialMatchArgs
[1] TRUE
$warnPartialMatchAttr
[1] TRUE
$warnPartialMatchDollar
[1] TRUE
Example:
library(ivreg)
data("CigaretteDemand", package = "ivreg")
## model
m <- ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome),
data = CigaretteDemand)
summary(m)
#>
#> Call:
#> ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | salestax +
#> log(rincome), data = CigaretteDemand)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.611000 -0.086072 0.009423 0.106912 0.393159
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 9.4307 1.3584 6.943 1.24e-08 ***
#> log(rprice) -1.1434 0.3595 -3.181 0.00266 **
#> log(rincome) 0.2145 0.2686 0.799 0.42867
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments 1 45 45.158 2.65e-08 ***
#> Wu-Hausman 1 44 1.102 0.3
#> Sargan 0 NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.1896 on 45 degrees of freedom
#> Multiple R-Squared: 0.4189, Adjusted R-squared: 0.3931
#> Wald test: 6.534 on 2 and 45 DF, p-value: 0.003227
Warning messages:
1: In model.matrix.default(object$terms$regressors, object$model, contrasts = object$contrasts$regressors) :
partial argument match of 'contrasts' to 'contrasts.arg'
Calls: summary ... ivdiag -> model.matrix -> model.matrix.ivreg -> model.matrix
2: In model.matrix.default(object$terms$instruments, object$model, :
partial argument match of 'contrasts' to 'contrasts.arg'
Calls: summary ... ivdiag -> model.matrix -> model.matrix.ivreg -> model.matrix
3: In model.matrix.default(object$terms$regressors, object$model, contrasts = object$contrasts$regressors) :
partial argument match of 'contrasts' to 'contrasts.arg'
Calls: summary ... ivdiag -> model.matrix -> model.matrix.ivreg -> model.matrix
4: In model.matrix.default(object$terms$instruments, object$model, :
partial argument match of 'contrasts' to 'contrasts.arg'
Calls: summary ... ivdiag -> model.matrix -> model.matrix.ivreg -> model.matrix
I added a pkgdown github action and it produces a couple of error messages -- see the Actions tab
Hi,
I have been using the ivreg function and the diagnostics. But I have found a few different sources to what the Weak instrument diagnostic actually does.
Therefore, I want ask what the weak instrument actually tests.
Hi --- thanks for the package, this is very useful! I have a quick question about some functionality when there are no endogenous regressors included in the model. In particular, in this case (at least for me), the summary
function errors. Here is an example:
library(ivreg)
data("SchoolingReturns", package="ivreg")
iv_res <- ivreg(wage ~ education + experience,
~education + experience,
data=SchoolingReturns)
coef(iv_res)
#> (Intercept) education experience
#> -375.58805 55.05451 25.14156
summary(iv_res)
#> Error in ivdiag(object, vcov. = vcov.): no endogenous/instrument variables
I know this is not a main use case; for me, I am using the ivreg
function internally in another package, and I'd like to be able to cover the case with and without endogenous regressors in the same code.
For what it's worth, the AER
package does work in this case:
data("SchoolingReturns", package="ivreg")
iv_res <- AER::ivreg(wage ~ education + experience,
~education + experience,
data=SchoolingReturns)
summary(iv_res)
#>
#> Call:
#> AER::ivreg(formula = wage ~ education + experience | education +
#> experience, data = SchoolingReturns)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -647.16 -161.20 -29.57 124.79 1612.62
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -375.588 37.802 -9.936 <2e-16 ***
#> education 55.055 2.140 25.722 <2e-16 ***
#> experience 25.142 1.383 18.174 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 238 on 3007 degrees of freedom
#> Multiple R-Squared: 0.1811, Adjusted R-squared: 0.1805
#> Wald test: 332.5 on 2 and 3007 DF, p-value: < 2.2e-16
https://cran.r-project.org/web/packages/ivreg/vignettes/ivreg.html says As listing exogenous variables in both parts on the right-hand side of the formula may become tedious if there are many of them, an additional convenience option is to use a three-part right side like y ~ x1 | x2 | z1 + z2, listing the exogenous, endogenous, and instrumental variables (for the endogenous variables only), respectively.
This does not work for me.
# data
data("SchoolingReturns")
# ivreg basic
library(ivreg)
ivreg(log(packs) ~ log(rprice) + log(rincome) | salestax + log(rincome), data = CigaretteDemand)
ivreg(log(packs) ~ log(rprice) | log(rincome) | salestax, data = CigaretteDemand)
Currently, ivreg()
requires a two-part right-hand side in the formula corresponding to the X and Z matrices in 2SLS, respectively:
y ~ ex + en | ex + in
where ex
corresponds to all exogenous and en
to all endogenous and in
to all instrument variables.
However, this notation can be a bit cumbersome if we have many exogenous variables and just one endogenous variable and one corresponding instrument. This is why the manual page points out that you can do:
y ~ ex + en | . - en + in
This is a bit technical, though, and in a discussion with @grantmcdermott we thought that it might help to support a three-part right-had side:
y ~ ex | en | in
It would be really hard to fully support that in the sense that terms etc. would also need to change. But supporting it in the top-level ivreg()
function just by converting it to the two-part case requires just the following additional code:
if(length(formula)[2] == 3L) formula <- as.Formula(
formula(formula, rhs = c(1, 2), collapse = TRUE),
formula(formula, lhs = 0, rhs = c(1, 3), collapse = TRUE)
)
This would have to be inserted between line 141 and 142 of ivreg.R
. For example, it would turn
formula <- Formula(y ~ x1 | x2 + x3 | z1 + z2)
into
y ~ x1 + (x2 + x3) | x1 + (z1 + z2)
My feeling is that the three-part right-hand side in the intput would be easier for some users. I'm just not sure whether it would create more confusion for some users as well.
Thoughts?
Hi everyone,
First of all thank you for creating this great package, which I have used heavily for a while now.
I thought it would be a useful addition to the package to provide a function that calculates the Cragg-Donald (1993) test statistic. This would be more informative than the usual first stage F-tests when the model has several endogenous variables. I have already coded my own version of the CD statistic for specific applications and could either write a method or a function (potentially to be incorporated into ivdiag()
) that calculates the test statistic.
Following the open source netiquette, I wanted to first ask whether such a contribution would be welcomed and if yes, whether you have any suggestions about how best to add this functionality (e.g. separate method/function etc.).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.