I am a bit confused, what is the difference between Causal Analysis of the dataframe (

Difference between analysis and estimation about causality HOT 5 CLOSED

akelleh commented on August 29, 2024

Difference between analysis and estimation

from causality.

Comments (5)

akelleh commented on August 29, 2024

Really good question! I'm not sure that I should maintain the difference, but maybe you have some opinions and we can sort that out.

Originally, the estimation module was meant as a place for different causal effect estimators to live. The analysis module would be tools that implement the causal effect estimators in a way that works with a typical data science workflow (e.g. manipulations of pandas dataframes).

In practice, I ended up implementing the Robins G-Formula estimators in the analysis package, but that was really a weekend worth of tech debt. I think to regain the original spirit of the division, I should remove it from the analysis module, put it back in the estimation module, and add interfaces to the other estimators through the analysis module.

One alternative might be to have more "data science" methods like g-formula estimation with machine learning estimators in the analysis package, and more "research" methods like PSM, Weighted OLS, etc. in the estimation package.

I think I like the former a little better for a couple of reasons. First, the software abstraction is much nicer. Second, it doesn't artificially draw a line between approaches to causal effect estimation.

Any thoughts?

from causality.

fedorzh commented on August 29, 2024

Thanks for answering. I am only a beginner in the field, and starting to learn about various approaches to causality. Thus my opinion might not be grounded on experience.

But since you ask, I think the first approach makes much more sense: you have a set of core models/methods and a set of interfaces (through functions, or though CausalDataFrame).
If you want distinction of the methods, I'd rather distinguish them by 1) purpose (e.g. treatment effect estimation, DAG determination, etc.), 2) possibly separate "research" (or "experimental") ones in a separate folder (subpackage) but not separate interfaces, if you want to have a set of "reliable ones" and those which are "for advanced users, and might not work in many cases".

from causality.

akelleh commented on August 29, 2024

Thanks for the feedback! I'll pay the tech debt and implement the first approach next time I get a little time to work on the package. And don't worry about being newer to causal inference -- that's exactly the audience the package is for!

For "reliable" vs. "advanced", i figured I could implement the reliable methods as defaults, and advanced as optional through the same methods, but with extra args... the user-defined models in the causaldataframe.zplot method is a good example of the approach I'm proposing. Any opinion there?

from causality.

fedorzh commented on August 29, 2024

"I could implement the reliable methods as defaults, and advanced as optional through the same methods, but with extra arg"
This is exactly what I do in my packages, however, this only allows for one "reliable" - the default one. Not sure if that's what you want or not.

from causality.

akelleh commented on August 29, 2024

Good point!

I'm not sure I like the idea of using lots of methods, since that can be kindof daunting to the user. I like what pandas.DataFrame.plot does with the different plot types. There's a ton of flexibility by changing a large number of kwargs.

I think there's a good compromise. For example, the zplot method three levels of difficulty:
(1) It has a default of using a random forest regression.
(2) with a single string kwarg (model_type='kernel') you can switch to a (slower, but often better) kernel density regression (without needing to know what that means). I'd regard this as "alternative reliable defaults".
(3) Then, if you use the more advanced kwargs (model=<trained model object with a predict method>), advanced users can drop in a trained model for the maximum flexibility.

We could do something similar where we switch between different effect inference methods.

from causality.

Difference between analysis and estimation about causality HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent