Giter Club home page Giter Club logo

bacondecomp's People

Contributors

edjeeongithub avatar evanjflack avatar kylebutts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

bacondecomp's Issues

Dependent variable is a constant error, when it is not.

First, thank you for the great work with the package and adopting feols. Really useful!

I am trying to use the bacon function but I keep getting the error that the dependent variable is a constant, when it is not:

bacon(n_tot ~ tt,  data = panel[panel$random==1,],
+             id_var = "id_grid", 
+             time_var = "period",
+             quietly = TRUE)

Error in fixest::feols(outcome ~ treated | time + id, data = data1) : 
The dependent variable is a constant. The estimation with fixed-effects cannot be done.

I assume it has to do with the fact that I have lots of zeros, but around 5% are non-zeros:

quantile(panel[panel$random==1,]$n_tot,seq(0.95,1,0.01))
      95%       96%       97%       98%       99%      100% 
0.0000000 0.9677419 0.9677419 1.0000000 1.0714286 8.7096774 

It seems to be a bacon function error since it runs smoothly when I use feols directly:

feols(as.formula("n_tot ~ tt | id_grid + period"), data=panel[panel$random==1,])
OLS estimation, Dep. Var.: n_tot
Observations: 457,125 
Fixed-effects: id_grid: 6,625,  period: 69
Standard-errors: Clustered (id_grid) 
    Estimate Std. Error  t value Pr(>|t|) 
tt -0.003755   0.002513 -1.49411  0.13519 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.250655     Adj. R2: 0.20454 
                 Within R2: 1.141e-5

  1. Is there a way to feed the feols result directly into the bacon function?
  2. Why does that error keep coming up? How to handle it?

Thank you!

Calculate Sigma

Sigma = share attributed to within variation.

Need to write tests.

Remove .Rproj

Pretty sure we don't want this tracked on git but untracking stuff always gives me a headache...

Speeding up linear regression

Would you be willing to accept a pull request that replaces lm() with the faster fixest::feols(). With many units, it will be orders of magnitudes faster as factor() is very memory-intensive and slow.

bacondecomp/R/bacon.R

Lines 96 to 97 in 403cbaf

estimate <- lm(outcome ~ treated + factor(time) + factor(id),
data = data1)$coefficients[2]

"Unbalanced panel" in panel data

I just downloaded the package from GitHub today and I got the "unbalanced panel" error. I am using panel data that involves bilateral trade values of multiple countries. As such I have duplicate "country-year" combinations because I observe a country's trade with all its partners in a given year over several year. Could that explain the error? Do you have a suggestion on how I should proceed?

Here is the code I run:

         dots_bacon <- bacon(export_fob ~ policy_switch | interaction(exporter, importer) + interaction(exporter, year) + 
                                    interaction(importer, year),
                                   data = my_df, id_var = "exporter",  time_var = "year")

Just for context, my variable of interest (policy_switch) is a dummy identifying subsequent to a policy change. The rest are pairs of fixed effects.

Weighted regression

Hello,

Do you know if it is possible to use a weighted regression with the Bacon function in R?

Thank you,

Alba

Divorce Data Replication

I have a branch that replicates table 1 column 1 of divorce coefs in this paper.

The only problem is I'm struggling to replicate the average effect (-9.7%) which we want to use as this is mentioned at the top of page 21 in the DD paper. The DD paper then replicates this in levels, not logs and finally does the decomp and adds controls so figuring out this estimate is p important.

The do files run lincom (2*_Ichyrspos_3+2*_Ichyrspos_4+2*_Ichyrspos_5+2*_Ichyrspos_6+2*_Ichyrspos_7+2*_Ichyrspos_8+2*_Ichyrspos_9+2*_Ichyrspos_10+2*_Ichyrspos_11+2*_Ichyrspos_12)/20 which I think is meant to generate the average effect but I don't see any chyrpos_3 in the divorce data - this year isn't present it seems?

Maybe I'm missing something.

I've attached the .do file since I suck at stata.

Unpack formula terms properly

I don't know how other packages do this (maybe terms(formula)) but we need to be more careful than we currently are.

Right now we can't deal with:

  • y ~ treatment + . - siteid
  • y ~ treatment + x:z

I'll look into what others do.
@evanjflack

bacondecomp / unbalanced panel

Hi, I faced an error message when using the bacon function ("Error in bacon(...: Unbalanced Panel". Researching this, I found this thread #71 and thought the issue was resolved. Is there anything I've missed?

Time-varying controls

At some point we need to figure out how to include time varying covariates, which the 2019 version of the paper does.

The only variable 'treated' is collinear with the fixed effects.

First of all, thank you for this package!

I was running this on my own data after TWFE with an attempt to see the breakdown of the treatment effects in different comparison groups but received this error:

Error: in fixest::feols(outcome ~ treated | time + id, data...:
 The only variable 'treated' is collinear with the fixed effects. In such
circumstances, the estimation is void.
Traceback:

1. bacon(formula = as.formula(bacon.fml), data = bacon.dat, id_var = "thread_id", 
 .     time_var = time.index)
2. fixest::feols(outcome ~ treated | time + id, data = data1)
3. stop_up(msg, up = fromGLM)
4. stop("in ", my_call, ":\n ", fit_screen(message), call. = FALSE)

I searched in the package fixest but do not see such an issue. I also computed the correlations among the variables as well as the fixed covariates but did not find strong correlations:
image

(The value is -0.26 for the correlation between treated and the time fixed effect and 0.052 for that between treated and the subject fixed effect). May I ask for your help in understanding this better? Thank you.

Divorce data

We should include the same replication of Stevenson and Wolfers (2006).

Does not work when time variable is date? Error in as.Date.numeric(value): 'origin' must be supplied

Hi

Thanks for the package!! I notice that if the year variable is a date, and I have a never treated group, this will return an error? It seems to be due to this line:

data[is.na(data$treat_time), "treat_time"] <- 99999

As R will complain about a non-valid date?! Maybe could return a 9999-like date is year is date, or at least warn about using dates?

Thanks!

library(bacondecomp)
dat <- bacondecomp::castle[, c("l_homicide", "state", "year", "post")]

## works
df_bacon <- bacon(l_homicide ~ post,
                  data = dat,
                  id_var = "state",
                  time_var = "year")
#>                       type  weight  avg_est
#> 1 Earlier vs Later Treated 0.05976 -0.00554
#> 2 Later vs Earlier Treated 0.03190  0.07032
#> 3     Treated vs Untreated 0.90834  0.08796

## convert to date: does not work
dat2 <- dat
dat2$year <- as.Date(paste0("2000-01-", dat$year-1999))
unique(dat2$year)
#>  [1] "2000-01-01" "2000-01-02" "2000-01-03" "2000-01-04" "2000-01-05"
#>  [6] "2000-01-06" "2000-01-07" "2000-01-08" "2000-01-09" "2000-01-10"
#> [11] "2000-01-11"
bacon(l_homicide ~ post,
      data = dat2,
      id_var = "state",
      time_var = "year")
#> Error in as.Date.numeric(value): 'origin' must be supplied

Created on 2020-06-11 by the reprex package (v0.3.0)

Flexibility in formula

Right now the formula can handle pretty much anything, but we need to let it take ., as in y ~ treated + .

Is it possible to use interacted FE?

Hi, great package! I'm just trying to learn Goodman-Bacon decomposition, so sorry if this is not a real "issue". Is it possible to do the decomposition with an interactive FE (like state by year, i.e a model where we have municipality FE + state-by-year FE)? I guess my question both relates to the package but also, would this be possible theoretically? (control groups have the additional requirement of coming from inside the same state, I don't know if this would affect how the weights are calculated)

(I've looked just a tiny bit into the functions and it does not seem to be the case, but maybe there's a valid reason for it so wanted to check. Otherwise I'd be interested to work on it)

"Unbalanced Panel" when groups are different sizes

Hi Guys,

First - thanks a bunch for translating this package to R, I really appreciate it. I just wanted to flag a small issue I've found when using the bacon() function.

It seems that bacon() does not currently allow our groups to be different sizes. I've appended the code to generate a minimal example. In the dataset I create, we have 3 groups (id == 1, 2, 3), where id == 1 | 3 contain one individual, and id == 2 contains two individuals (ind_id is the individual id).

If I run bacon(id_var == "group_id", ...) the function will throw an error for an "Unbalanced Panel", because group 2 has twice as many time periods within it as group 1 (because there are two individuals in group 2).

But, I don't think you want to call that an error; otherwise, you cannot demonstrate 2x2 weighting heterogeneity arising from the size of the groups. And, from what I understand, this is one of the key takeaways of the Bacon decomposition: the larger groups retain higher weights in the 2x2.

Alternatively, if you do want to call that an unbalanced panel, I don't think you need the code calculating "n_k, n_u, n_ku", because n_k = n_u by definition and n_ku = 0.5.

Thanks again,
Alec

library(dplyr)

df <- 
  expand.grid(
    group_id = c(1, 2, 3), # Group ID (treatment level ID)
    t  = c(0, 1, 2)  # Time
  ) %>%
  mutate(
    # Treatment status
    a = case_when(
      group_id == 2 & t > 0 ~ 1, # 1 time period untreated 2 periods treated
      group_id == 3 & t > 1 ~ 1, # 2 untreated 1 treated
      T ~ 0 # id == 1 never treated
    )
  )

# Expand dataset with "individual" level observations 
df <- df %>% left_join(
  expand.grid(
    group_id = c(1, 2, 3), 
    ind_id = seq(1, 2)
    ) %>%
    filter(group_id == 2 | ind_id < 2) ## Leave only group id == 2 with two individuals
  ) %>%
  select(group_id, ind_id, everything()) %>%
  arrange(group_id, ind_id, t)

Change data doesn't work

When I change another dataset (fake_data), and replace the variables and run, always noted that "Error: 'fake_data' is not an exported object from 'namespace:bacondecomp'", I don't know how to deal with it.

The weighted bacon estimation is not equal to the TWFE when the panel data is unbalanced.

Hello,

Thanks for writing such a great package. I am using this package to do some diagnostic analysis. However, I find that when the panel data is unbalanced, the weighted bacon 2*2 estimation is not numerically equal to the TWFE estimation. For example, I use

df1 <- bacondecomp::math_reform
sample.use <- sample(c(1:dim(df1)[1]),500,replace = F)
df2 <- df1[sample.use,]
df2_bacon <- bacon(incearn_ln ~ reform_math,
data = df2,
id_var = "state",
time_var = "class")
sum(df2_bacon$estimate*df2_bacon$weight)
feols(incearn_ln ~ reform_math|state+class,data=df2)

The results given by bacon and twfe are different.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.