Giter Club home page Giter Club logo

Comments (5)

akelleh avatar akelleh commented on August 29, 2024

Really good question! I'm not sure that I should maintain the difference, but maybe you have some opinions and we can sort that out.

Originally, the estimation module was meant as a place for different causal effect estimators to live. The analysis module would be tools that implement the causal effect estimators in a way that works with a typical data science workflow (e.g. manipulations of pandas dataframes).

In practice, I ended up implementing the Robins G-Formula estimators in the analysis package, but that was really a weekend worth of tech debt. I think to regain the original spirit of the division, I should remove it from the analysis module, put it back in the estimation module, and add interfaces to the other estimators through the analysis module.

One alternative might be to have more "data science" methods like g-formula estimation with machine learning estimators in the analysis package, and more "research" methods like PSM, Weighted OLS, etc. in the estimation package.

I think I like the former a little better for a couple of reasons. First, the software abstraction is much nicer. Second, it doesn't artificially draw a line between approaches to causal effect estimation.

Any thoughts?

from causality.

fedorzh avatar fedorzh commented on August 29, 2024

Thanks for answering. I am only a beginner in the field, and starting to learn about various approaches to causality. Thus my opinion might not be grounded on experience.

But since you ask, I think the first approach makes much more sense: you have a set of core models/methods and a set of interfaces (through functions, or though CausalDataFrame).
If you want distinction of the methods, I'd rather distinguish them by 1) purpose (e.g. treatment effect estimation, DAG determination, etc.), 2) possibly separate "research" (or "experimental") ones in a separate folder (subpackage) but not separate interfaces, if you want to have a set of "reliable ones" and those which are "for advanced users, and might not work in many cases".

from causality.

akelleh avatar akelleh commented on August 29, 2024

Thanks for the feedback! I'll pay the tech debt and implement the first approach next time I get a little time to work on the package. And don't worry about being newer to causal inference -- that's exactly the audience the package is for!

For "reliable" vs. "advanced", i figured I could implement the reliable methods as defaults, and advanced as optional through the same methods, but with extra args... the user-defined models in the causaldataframe.zplot method is a good example of the approach I'm proposing. Any opinion there?

from causality.

fedorzh avatar fedorzh commented on August 29, 2024

"I could implement the reliable methods as defaults, and advanced as optional through the same methods, but with extra arg"
This is exactly what I do in my packages, however, this only allows for one "reliable" - the default one. Not sure if that's what you want or not.

from causality.

akelleh avatar akelleh commented on August 29, 2024

Good point!

I'm not sure I like the idea of using lots of methods, since that can be kindof daunting to the user. I like what pandas.DataFrame.plot does with the different plot types. There's a ton of flexibility by changing a large number of kwargs.

I think there's a good compromise. For example, the zplot method three levels of difficulty:
(1) It has a default of using a random forest regression.
(2) with a single string kwarg (model_type='kernel') you can switch to a (slower, but often better) kernel density regression (without needing to know what that means). I'd regard this as "alternative reliable defaults".
(3) Then, if you use the more advanced kwargs (model=<trained model object with a predict method>), advanced users can drop in a trained model for the maximum flexibility.

We could do something similar where we switch between different effect inference methods.

from causality.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.