evanjflack / bacondecomp Goto Github PK
View Code? Open in Web Editor NEWBacon-Goodman decomposition for differences-in-differences with variation in treatment timing.
License: Other
Bacon-Goodman decomposition for differences-in-differences with variation in treatment timing.
License: Other
First, thank you for the great work with the package and adopting feols. Really useful!
I am trying to use the bacon function but I keep getting the error that the dependent variable is a constant, when it is not:
bacon(n_tot ~ tt, data = panel[panel$random==1,],
+ id_var = "id_grid",
+ time_var = "period",
+ quietly = TRUE)
Error in fixest::feols(outcome ~ treated | time + id, data = data1) :
The dependent variable is a constant. The estimation with fixed-effects cannot be done.
I assume it has to do with the fact that I have lots of zeros, but around 5% are non-zeros:
quantile(panel[panel$random==1,]$n_tot,seq(0.95,1,0.01))
95% 96% 97% 98% 99% 100%
0.0000000 0.9677419 0.9677419 1.0000000 1.0714286 8.7096774
It seems to be a bacon function error since it runs smoothly when I use feols directly:
feols(as.formula("n_tot ~ tt | id_grid + period"), data=panel[panel$random==1,])
OLS estimation, Dep. Var.: n_tot
Observations: 457,125
Fixed-effects: id_grid: 6,625, period: 69
Standard-errors: Clustered (id_grid)
Estimate Std. Error t value Pr(>|t|)
tt -0.003755 0.002513 -1.49411 0.13519
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.250655 Adj. R2: 0.20454
Within R2: 1.141e-5
Thank you!
Sigma = share attributed to within variation.
Need to write tests.
Pretty sure we don't want this tracked on git but untracking stuff always gives me a headache...
Would you be willing to accept a pull request that replaces lm()
with the faster fixest::feols()
. With many units, it will be orders of magnitudes faster as factor() is very memory-intensive and slow.
Lines 96 to 97 in 403cbaf
I just downloaded the package from GitHub today and I got the "unbalanced panel" error. I am using panel data that involves bilateral trade values of multiple countries. As such I have duplicate "country-year" combinations because I observe a country's trade with all its partners in a given year over several year. Could that explain the error? Do you have a suggestion on how I should proceed?
Here is the code I run:
dots_bacon <- bacon(export_fob ~ policy_switch | interaction(exporter, importer) + interaction(exporter, year) +
interaction(importer, year),
data = my_df, id_var = "exporter", time_var = "year")
Just for context, my variable of interest (policy_switch) is a dummy identifying subsequent to a policy change. The rest are pairs of fixed effects.
Hello,
Do you know if it is possible to use a weighted regression with the Bacon function in R?
Thank you,
Alba
I have a branch that replicates table 1 column 1 of divorce coefs in this paper.
The only problem is I'm struggling to replicate the average effect (-9.7%) which we want to use as this is mentioned at the top of page 21 in the DD paper. The DD paper then replicates this in levels, not logs and finally does the decomp and adds controls so figuring out this estimate is p important.
The do files run lincom (2*_Ichyrspos_3+2*_Ichyrspos_4+2*_Ichyrspos_5+2*_Ichyrspos_6+2*_Ichyrspos_7+2*_Ichyrspos_8+2*_Ichyrspos_9+2*_Ichyrspos_10+2*_Ichyrspos_11+2*_Ichyrspos_12)/20
which I think is meant to generate the average effect but I don't see any chyrpos_3
in the divorce data - this year isn't present it seems?
Maybe I'm missing something.
I've attached the .do file since I suck at stata.
I don't know how other packages do this (maybe terms(formula)
) but we need to be more careful than we currently are.
Right now we can't deal with:
y ~ treatment + . - siteid
y ~ treatment + x:z
I'll look into what others do.
@evanjflack
It'd be nice if we could rewrite weight calculation code so that we can directly verify against the results on page 10 of the paper.
Hi, I faced an error message when using the bacon function ("Error in bacon(...: Unbalanced Panel". Researching this, I found this thread #71 and thought the issue was resolved. Is there anything I've missed?
At some point we need to figure out how to include time varying covariates, which the 2019 version of the paper does.
First of all, thank you for this package!
I was running this on my own data after TWFE with an attempt to see the breakdown of the treatment effects in different comparison groups but received this error:
Error: in fixest::feols(outcome ~ treated | time + id, data...:
The only variable 'treated' is collinear with the fixed effects. In such
circumstances, the estimation is void.
Traceback:
1. bacon(formula = as.formula(bacon.fml), data = bacon.dat, id_var = "thread_id",
. time_var = time.index)
2. fixest::feols(outcome ~ treated | time + id, data = data1)
3. stop_up(msg, up = fromGLM)
4. stop("in ", my_call, ":\n ", fit_screen(message), call. = FALSE)
I searched in the package fixest
but do not see such an issue. I also computed the correlations among the variables as well as the fixed covariates but did not find strong correlations:
(The value is -0.26 for the correlation between treated and the time fixed effect and 0.052 for that between treated and the subject fixed effect). May I ask for your help in understanding this better? Thank you.
We should include the same replication of Stevenson and Wolfers (2006).
Add github action for https://github.com/r-lib/covr
Hi
Thanks for the package!! I notice that if the year variable is a date, and I have a never treated group, this will return an error? It seems to be due to this line:
data[is.na(data$treat_time), "treat_time"] <- 99999
As R will complain about a non-valid date?! Maybe could return a 9999-like date is year is date, or at least warn about using dates?
Thanks!
library(bacondecomp)
dat <- bacondecomp::castle[, c("l_homicide", "state", "year", "post")]
## works
df_bacon <- bacon(l_homicide ~ post,
data = dat,
id_var = "state",
time_var = "year")
#> type weight avg_est
#> 1 Earlier vs Later Treated 0.05976 -0.00554
#> 2 Later vs Earlier Treated 0.03190 0.07032
#> 3 Treated vs Untreated 0.90834 0.08796
## convert to date: does not work
dat2 <- dat
dat2$year <- as.Date(paste0("2000-01-", dat$year-1999))
unique(dat2$year)
#> [1] "2000-01-01" "2000-01-02" "2000-01-03" "2000-01-04" "2000-01-05"
#> [6] "2000-01-06" "2000-01-07" "2000-01-08" "2000-01-09" "2000-01-10"
#> [11] "2000-01-11"
bacon(l_homicide ~ post,
data = dat2,
id_var = "state",
time_var = "year")
#> Error in as.Date.numeric(value): 'origin' must be supplied
Created on 2020-06-11 by the reprex package (v0.3.0)
Right now the formula can handle pretty much anything, but we need to let it take ., as in y ~ treated + .
Have balanced panel as variable dat.
df_bacon <- bacon(outcome ~ order,
data = dat,
id_var = "state_abbrev",
time_var = "date_weekly")
Get:
Error in value[[jvseq[[jjj]]]] : subscript out of bounds
Hi @evanjflack @EdJeeOnGitHub -- how does your testing workflow work?
My understanding was that .travis.yml
would automatically run your testthat tests upon pushes to master.
But this understanding seems incomplete. The former file seems to quote this script, and other files seem to perform other checks. Could you help my understand what's going on here?
"But I think the pro method would be to have it return an s3 object" - @EdJeeOnGitHub
Also write tests.
Need to allow for units to already be treated before the panel starts.
Hi, great package! I'm just trying to learn Goodman-Bacon decomposition, so sorry if this is not a real "issue". Is it possible to do the decomposition with an interactive FE (like state by year, i.e a model where we have municipality FE + state-by-year FE)? I guess my question both relates to the package but also, would this be possible theoretically? (control groups have the additional requirement of coming from inside the same state, I don't know if this would affect how the weights are calculated)
(I've looked just a tiny bit into the functions and it does not seem to be the case, but maybe there's a valid reason for it so wanted to check. Otherwise I'd be interested to work on it)
Hi Guys,
First - thanks a bunch for translating this package to R, I really appreciate it. I just wanted to flag a small issue I've found when using the bacon() function.
It seems that bacon() does not currently allow our groups to be different sizes. I've appended the code to generate a minimal example. In the dataset I create, we have 3 groups (id == 1, 2, 3), where id == 1 | 3 contain one individual, and id == 2 contains two individuals (ind_id is the individual id).
If I run bacon(id_var == "group_id", ...) the function will throw an error for an "Unbalanced Panel", because group 2 has twice as many time periods within it as group 1 (because there are two individuals in group 2).
But, I don't think you want to call that an error; otherwise, you cannot demonstrate 2x2 weighting heterogeneity arising from the size of the groups. And, from what I understand, this is one of the key takeaways of the Bacon decomposition: the larger groups retain higher weights in the 2x2.
Alternatively, if you do want to call that an unbalanced panel, I don't think you need the code calculating "n_k, n_u, n_ku", because n_k = n_u by definition and n_ku = 0.5.
Thanks again,
Alec
library(dplyr)
df <-
expand.grid(
group_id = c(1, 2, 3), # Group ID (treatment level ID)
t = c(0, 1, 2) # Time
) %>%
mutate(
# Treatment status
a = case_when(
group_id == 2 & t > 0 ~ 1, # 1 time period untreated 2 periods treated
group_id == 3 & t > 1 ~ 1, # 2 untreated 1 treated
T ~ 0 # id == 1 never treated
)
)
# Expand dataset with "individual" level observations
df <- df %>% left_join(
expand.grid(
group_id = c(1, 2, 3),
ind_id = seq(1, 2)
) %>%
filter(group_id == 2 | ind_id < 2) ## Leave only group id == 2 with two individuals
) %>%
select(group_id, ind_id, everything()) %>%
arrange(group_id, ind_id, t)
Add tests for summary since we got that wrong originally... oops
Lots of things up top need to be put into separate functions.
When I change another dataset (fake_data), and replace the variables and run, always noted that "Error: 'fake_data' is not an exported object from 'namespace:bacondecomp'", I don't know how to deal with it.
Hello,
Thanks for writing such a great package. I am using this package to do some diagnostic analysis. However, I find that when the panel data is unbalanced, the weighted bacon 2*2 estimation is not numerically equal to the TWFE estimation. For example, I use
df1 <- bacondecomp::math_reform
sample.use <- sample(c(1:dim(df1)[1]),500,replace = F)
df2 <- df1[sample.use,]
df2_bacon <- bacon(incearn_ln ~ reform_math,
data = df2,
id_var = "state",
time_var = "class")
sum(df2_bacon$estimate*df2_bacon$weight)
feols(incearn_ln ~ reform_math|state+class,data=df2)
The results given by bacon and twfe are different.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.