uber / causalml Goto Github PK

View Code? Open in Web Editor NEW

4.8K 82.0 745.0 96.03 MB

Uplift modeling and causal inference with machine learning algorithms

License: Other

Python 66.13% Cython 33.82% Makefile 0.05%

incubation machine-learning causal-inference uplift-modeling

causalml's Introduction

Disclaimer

This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to change.

Causal ML: A Python Package for Uplift Modeling and Causal Inference with ML

Causal ML is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research [1]. It provides a standard interface that allows user to estimate the Conditional Average Treatment Effect (CATE) or Individual Treatment Effect (ITE) from experimental or observational data. Essentially, it estimates the causal impact of intervention T on outcome Y for users with observed features X, without strong assumptions on the model form. Typical use cases include

Campaign targeting optimization: An important lever to increase ROI in an advertising campaign is to target the ad to the set of customers who will have a favorable response in a given KPI such as engagement or sales. CATE identifies these customers by estimating the effect of the KPI from ad exposure at the individual level from A/B experiment or historical observational data.
Personalized engagement: A company has multiple options to interact with its customers such as different product choices in up-sell or messaging channels for communications. One can use CATE to estimate the heterogeneous treatment effect for each customer and treatment option combination for an optimal personalized recommendation system.

Documentation

Documentation is available at:

https://causalml.readthedocs.io/en/latest/about.html

Installation

Installation instructions are available at:

https://causalml.readthedocs.io/en/latest/installation.html

Quickstart

Quickstarts with code-snippets are available at:

https://causalml.readthedocs.io/en/latest/quickstart.html

Example Notebooks

Example notebooks are available at:

https://causalml.readthedocs.io/en/latest/examples.html

Contributing

We welcome community contributors to the project. Before you start, please read our code of conduct and check out contributing guidelines first.

Versioning

We document versions and changes in our changelog.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

References

Documentation

Causal ML API documentation

Conference Talks and Publications by CausalML Team

(Talk) Introduction to CausalML at Causal Data Science Meeting 2021
(Talk) Introduction to CausalML at 2021 Conference on Digital Experimentation @ MIT (CODE@MIT)
(Talk) Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber at KDD 2021 Tutorials (website and slide links)
(Publication) CausalML White Paper Causalml: Python package for causal machine learning
(Publication) Uplift Modeling for Multiple Treatments with Cost Optimization at 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
(Publication) Feature Selection Methods for Uplift Modeling

Citation

To cite CausalML in publications, you can refer to the following sources:

Whitepaper: CausalML: Python Package for Causal Machine Learning

Bibtex:

@misc{chen2020causalml, title={CausalML: Python Package for Causal Machine Learning}, author={Huigang Chen and Totte Harinen and Jeong-Yoon Lee and Mike Yung and Zhenyu Zhao}, year={2020}, eprint={2002.11631}, archivePrefix={arXiv}, primaryClass={cs.CY} }

Literature

Chen, Huigang, Totte Harinen, Jeong-Yoon Lee, Mike Yung, and Zhenyu Zhao. "Causalml: Python package for causal machine learning." arXiv preprint arXiv:2002.11631 (2020).
Radcliffe, Nicholas J., and Patrick D. Surry. "Real-world uplift modelling with significance-based uplift trees." White Paper TR-2011-1, Stochastic Solutions (2011): 1-33.
Zhao, Yan, Xiao Fang, and David Simchi-Levi. "Uplift modeling with multiple treatments and general response types." Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2017.
Hansotia, Behram, and Brad Rukstales. "Incremental value modeling." Journal of Interactive Marketing 16.3 (2002): 35-46.
Jannik Rößler, Richard Guse, and Detlef Schoder. "The Best of Two Worlds: Using Recent Advances from Uplift Modeling and Heterogeneous Treatment Effects to Optimize Targeting Policies". International Conference on Information Systems (2022)
Su, Xiaogang, et al. "Subgroup analysis via recursive partitioning." Journal of Machine Learning Research 10.2 (2009).
Su, Xiaogang, et al. "Facilitating score and causal inference trees for large observational studies." Journal of Machine Learning Research 13 (2012): 2955.
Athey, Susan, and Guido Imbens. "Recursive partitioning for heterogeneous causal effects." Proceedings of the National Academy of Sciences 113.27 (2016): 7353-7360.
Künzel, Sören R., et al. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116.10 (2019): 4156-4165.
Nie, Xinkun, and Stefan Wager. "Quasi-oracle estimation of heterogeneous treatment effects." arXiv preprint arXiv:1712.04912 (2017).
Bang, Heejung, and James M. Robins. "Doubly robust estimation in missing data and causal inference models." Biometrics 61.4 (2005): 962-973.
Van Der Laan, Mark J., and Daniel Rubin. "Targeted maximum likelihood learning." The international journal of biostatistics 2.1 (2006).
Kennedy, Edward H. "Optimal doubly robust estimation of heterogeneous causal effects." arXiv preprint arXiv:2004.14497 (2020).
Louizos, Christos, et al. "Causal effect inference with deep latent-variable models." arXiv preprint arXiv:1705.08821 (2017).
Shi, Claudia, David M. Blei, and Victor Veitch. "Adapting neural networks for the estimation of treatment effects." 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 2019.
Zhao, Zhenyu, Yumin Zhang, Totte Harinen, and Mike Yung. "Feature Selection Methods for Uplift Modeling." arXiv preprint arXiv:2005.03447 (2020).
Zhao, Zhenyu, and Totte Harinen. "Uplift modeling for multiple treatments with cost optimization." In 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 422-431. IEEE, 2019.

Related projects

uplift: uplift models in R
grf: generalized random forests that include heterogeneous treatment effect estimation in R
rlearner: A R package that implements R-Learner
DoWhy: Causal inference in Python based on Judea Pearl's do-calculus
EconML: A Python package that implements heterogeneous treatment effect estimators from econometrics and machine learning methods

causalml's People

Contributors

Stargazers

Watchers

Forkers

minyus jabogithub daehwanahn kjm0623v codeaudit foeinlove sprinterzzj awesome-archive pvaish1 yuqisun19 mvisionai booblu yluogit stjordanis mohi7solanki forkdump moomoofarm1 paullo0106 roman-baldaev hmisra luongdolong lawinslow jones-nicholas-s causal-inference-zerotoall rmeyers osamahali mithunkumarsr amirstudy achchg deisler134 dandan0503 anmarchese noise-trader yuhuaxiong mbenoit29 michaelgallacher aaryanmontana huangzhongyu maximilianfranz ppstacy abezzam10 ergonyc zangruirui quanshuang gingerlolipop radium05 peterfoley fritzo arisfernando zin-gduf michael-mdt franckgesbert tomaszzamacinski noyousjtu leihuaye jbdatascience czzlegend 4iji rohancalum jyzhang122 anu-bioinfo ssxjss harrisonzhu508 tuannguyen27 79212 rsyi liuliuball45 scoutfranciscano steveyang90 databill86 joulroad lewisbakkero yaxche-io ahoyosid yushoujin ghostintheshellarise yuta0013 joshpinshyamala sharatsc smashpumpkin demonbibi jaeyk hometh vvkishere oncukayalar hdocmsu ml-lab bigrlab acmilannesta jaykimbravekjh sts-sadr srhruc918 wangsd01 hpetander hongtengxu yiwen-jiao prashant0598 zhpengg khof312 lakersli

causalml's Issues

Any plan for ATT and ATC estimation?

Nice package. I am just wondering if there is plan to add support for ATT and ATC estimation?

Fix the TLearner linear regression issue

Any plan for model explanation functionality?

Tried this package, had to say it is marvelous! I will not hesitate to use it in my real-life work, but wonder if there is plan to add support for model explanation tool such as feature importance plot, SHAP plot or tree visualizer. It would be crucial to present the model explanation to business stakeholders.
Thanks for open sourcing this package!

Bug in BaseRClassifier with multiple treatment

BaseRClassifier's fit() raises an error with incorrect filtering with multiple treatments.

Add an assert to check that the control group exists

The S-Learner will run even if the default control group name ('0') is not in the treatment array, leading to seemingly paradoxical results in which eg units in the control group are predicted to have a negative treatment effect under the control condition. The other learners may have this same problem too.

To prevent the above behaviour, we should add an assert to catch situations in which the default or user-specified control group name is not in the treatment array.

Add CEVAE synthetic data generation process

As a start of the migration/add a wrapper of CEVAE under CausalML, we can add the synthetic data generation process in CEVAE paper. (Causal Effect Inference with Deep Latent-Variable Models, http://papers.nips.cc/paper/7223-causal-effect-inference-with-deep-latent-variable-models.pdf)

I noticed that we already have the implementation code under pyro repo, maybe we can migrate it here to CausalML under dataset

Fix CausalTreeRegressor

CausalTreeRegressor's ATE/ITE estimates are off with synthetic data 2, 3, and 4.

Add xgboost with pairwise ranking objective as a derived class of base rlearner regressor

Add check to ensure that propensity scores are not ones and zeros only

If you pass a propensity score that contains only ones and zeros, the X-learner fails with the following ambiguous error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-186-7587b62cdf36> in <module>
     56     ps = ps_model.predict(X[test])
     57 
---> 58     cate_pred = uplift_model.predict(X[test], ps)
     59 
     60     # Get recommended treatment group and actual value diff

~/phoenix-worker/environments/python3/lib/python3.6/site-packages/causalml/inference/meta/xlearner.py in predict(self, X, p, treatment, y, return_components, verbose)
    594             dhat_ts[group] = model_tau_t.predict(X)
    595 
--> 596             _te = (p[group] * dhat_cs[group] + (1 - p[group]) * dhat_ts[group]).reshape(-1, 1)
    597             te[:, i] = np.ravel(_te)
    598 

TypeError: can't multiply sequence by non-int of type 'float'

It may be useful to add a check that the propensity score contains more than two unique values. In particular, this prevents the user from mistakenly passing binary labels as the propensity score.

Remove the need for big_number in causaltree.pyx

Creating an issue to organize discussion that started in #105 (@jeongyoonlee, @TomaszZamacinski).

Summary of the issue is that sklearn's DecisionTreeRegressor.fit() raises ValueError if min_impurity_split<0 and the TreeBuilder's stop if impurity<min_impurity_split. Causal tree impurities can be negative, so sklearn's restriction doesn't help and the current workaround adds a large constant big_number to try to keep impurity>0.

big_number has to be large enough to ensure impurity>0 in a large number of cases, but large values of big_number reduce numeric precision. It would be preferable to dodge sklearn's checks so that we can set min_impurity_split to negative infinity and ensure it has no impact on the tree construction.

While the min_impurity_split argument to DecisionTreeRegressor is deprecated and will be removed, I'm not sure whether it will be removed from the TreeBuilder classes, or whether it will just be fixed at 0 by DecisionTreeRegressor and not accessible to users.

Increasing the test coverage to 70%

Per ML WG's request, we need to increase the test coverage to 70% before we release Causal ML as OSS.

Use n_jobs = -1 for cross_val_predict() function for R Learner

Allow estimate_ate() for user prediction input

It's really helpful that we have the function estimate_ate() to return the mean and confidence interval (LB, UB) of the ATE estimate. But currently it will generate the predictions under the hood with fit_prediction() function. It might be confusing if someone didn't look at the code base and use it for a validation data set . Could we consider support for user prediction input as an optional input? So we could use this function for more purposes.

Propensity score requirement in X-learner and R-learner

First, thanks for open-sourcing this package. I've learned a lot!

I'm wondering if there's a particular reason why the user must pass self generated propensity scores to be used in the X-learner and R-learner? While it most likely forces the user to understand how well calibrated the scores are, I would think there's validity in estimating them 'under the hood' using the learner or another supplied model parameterization (similar to how BaseRLearner.outcome_learner can be optionally specified).

In terms of the X-learner, while Kunzel et al. state they have good experiences specifying g(x) as the propensity scores, its worth noting g(x) is simply a weighting function that is chosen to minimize the variance of the CATEs. Stating this in the documentation / naming conventions might be helpful also?

Thanks

add development instructions to build cython before testing

The Contributing instructions don't mention that cython needs to be run in order to run the tests.

It worked (for me) to run python setup.py build_ext --inplace before the tests.

Propensity Score Estimation with Classifiers

Hey there!

I was wondering about the best way to do propensity score estimation in the models that require it. I saw you are using an ElasticNetCV with clipping as a default. What were the considerations behind that decision? From what I know, using regressors for estimating class probabilities is tricky, unless it's a LogisticRegression (which seems to be part of the ElasticNet, right?).

When comparing the ElasticNetPropensityModel with LogisticRegression and calibrated RandomForestClassifier, I observed that the predictions are not as meaningful as they could/should be, because the probabilities are skewed. The plot shows how much the probability estimate of the method concurs with our intuition of the likelihood of that event occuring. Or as the folks at sklearn put it:

For instance, a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approximately 80% actually belong to the positive class.

I believe this is also what the propensity score should reflect. Thus, using their recommended ways for calibration seems reasonable.

For our own framework for model comparison, we're aiming to provide default propensity estimation as well, thus I looking forward to hear your thoughts on this!

Best,
Max

add proportion of sample in addition to total sample size

In tree visualization, I found it helpful to add proportion of sample in addition to total sample size. Because sample size varies from train to validation, while proportion may be similar

Treatment effect bound calculation doesn't match the provided reference

For example, the t-learner calculates the bounds for the estimated treatment effect as follows:

self.c_var = (y[~is_treatment] - self.model_mu_c.predict(X[~is_treatment])).var()
self.t_var = (y[is_treatment] - self.model_mu_t.predict(X[is_treatment])).var()

se = np.sqrt((
             self.t_var/prob_treatment + self.c_var/(1-prob_treatment) +
             (p * dhat_c + (1-p) * dhat_t).var()
       ) / X.shape[0])

te_lb = te - se * norm.ppf(1 - self.ate_alpha / 2)
te_ub = te + se * norm.ppf(1 - self.ate_alpha / 2)

However, this doesn't seem to match the referenced equation by Woolridge and Imbens (2009), which is:

The fix may be as simple as changing the reference.

Uplift Curves when random is negative with normalization

When generating the uplift curve plots (gain, lift and gini) if normalized by the random which is negative, the shape of other plots will be flipped.

For example here with normalization:

without normalization:

Updating the disclaimer to reflect the incubation stage.

Per the request from the OSS Committee, we need to update the disclaimer to reflect the stage of the project, which is incubation.

Host API docs

Need to host API docs somewhere.

readthedocs.org
uber.github.io
a custom website

Pseudo-residual defined wrong in X-Learner documentation

In methodology.rst we have:

Impute the user level treatment effects, D^1_i and D^0_j for user i in the treatment group based on \mu_0(x), and user j in the control groups based on \mu_1(x):

D^1_i = Y^1_i - \hat\mu_0(X^1_i), and
D^0_i = Y^0_i - \hat\mu_1(X^0_i)

Should read:

D^1_i = Y^1_i - \hat\mu_0(X^1_i), and
D^0_i = \hat\mu_1(X^0_i) - Y^0_i

the uplift random forest is based on which paper?

Write an arXiv paper about the package for citation

Add integration tests

We need integration tests for key functions:

Question: Propensity Score?

Great library, just starting to take a look but I didnt immediately see documentation to answer the question I had in regards to if in the available methods, there is any distinction between observational or experimental data and specifically if propensity scores (propensity to be treated) were leveraged for observational data - perhaps as weights or through matching or as a covariate?

PIP installation error on LINUX

pip install causalml fails because requirements.txt is not in MANIFEST.in, thus included in the package.

Installation fails with pip and from source in Python 3.7

Hey there,

I was trying to add causalml as a dependency to our project. When trying pip install causalml this error occurred:

Collecting causalml
  Using cached https://files.pythonhosted.org/packages/4e/92/fb9af85303fc6b54bf824c36572c30d9a503e9a70a043d1f135f9c03c1fc/causalml-0.4.0.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/54/4kkg30g93bn1nqsz25r0qkfw0000gq/T/pip-install-hjz2ribq/causalml/setup.py", line 3, in <module>
        from Cython.Build import cythonize
    ModuleNotFoundError: No module named 'Cython'
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/54/4kkg30g93bn1nqsz25r0qkfw0000gq/T/pip-install-hjz2ribq/causalml/

After install Cython and running pip install causalml this occurred:

 Error compiling Cython file:
------------------------------------------------------------
...
            if sample_weight != NULL:
                # the weights of 1 and 1 + eps are used for control and treatment respectively
                is_treated = (sample_weight[i] - 1.0) * one_over_eps

            # assume that there is only one output (k = 0)
            y_ik = y[i * self.y_stride]
                            ^
------------------------------------------------------------

causalml/inference/tree/causaltree.pyx:163:29: Accessing Python attribute not allowed without gil
Traceback (most recent call last):
  File "setup.py", line 43, in <module>
    ext_modules=cythonize(extensions),
  File "/Users/MaximilianFranz/anaconda3/envs/justcause/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1096, in cythonize
    cythonize_one(*args)
  File "/Users/MaximilianFranz/anaconda3/envs/justcause/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1219, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: causalml/inference/tree/causaltree.pyx

I'm not familiar with the intricacies of Cython and compiling .pyx files, thus I can't point at any potential underlying issue. Maybe it's just me?

Any ideas or tips how to solve this?

Add docs from new_project_templates

As part of Step 6, docs from new_project_templates need to be added.

Drop Python 2 support

This task is to drop Python 2 support and allow users to use latest versions of dependencies such as scipy, scikit-learn, matplotlib, pandas, etc. which only support Python 3 in their latest releases.

It requires updating setup.py and check compatibilities between causalml and other latest libraries. Currently latest scikit-learn is not compatible with causalml because of its change in Cython code for tree algorithms.

PIP installation error on Windows

Below is the pip install error:

C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.22.27905\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -IC:\ProgramData\Anaconda3\lib\site-packages\numpy\core\include -IC:\ProgramData\Anaconda3\lib\site-packages\numpy\core\include -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.22.27905\include" /Tccausalml/inference/tree/causaltree.c /Fobuild\temp.win-amd64-3.7\Release\causalml/inference/tree/causaltree.obj -O3
cl : Command line warning D9002 : ignoring unknown option '-O3'
causaltree.c
C:\ProgramData\Anaconda3\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.22.27905\bin\HostX86\x64\cl.exe' failed with exit status 2

Add a link to docs in README

I guess it would be better to have a link to documents in the README page.

Is this the official documents?

ImportError for check_control_in_treatment

After #97 was merged to master, tests are failing with ImportError for check_control_in_treatment due to merge conflict as follows:

ImportError while loading conftest '/Users/jeong/development/causalml/tests/conftest.py'.
tests/conftest.py:4: in <module>
    from causalml.dataset import synthetic_data
causalml/dataset/__init__.py:8: in <module>
    from .synthetic import get_synthetic_preds, get_synthetic_preds_holdout
causalml/dataset/synthetic.py:15: in <module>
    from causalml.inference.meta import BaseXRegressor, BaseRRegressor, BaseSRegressor, BaseTRegressor
causalml/inference/meta/__init__.py:1: in <module>
    from .slearner import LRSRegressor, BaseSLearner, BaseSRegressor, BaseSClassifier
causalml/inference/meta/slearner.py:16: in <module>
    from causalml.inference.meta.utils import check_control_in_treatment, convert_pd_to_np
E   ImportError: cannot import name 'check_control_in_treatment'

Enhance tree plot with node colors indicating positive and negative uplifts

minor enhancement on tree/plot.py so it's intuitive to tell who we should target on

Speed up UpliftRandomForestClassifier?

I am trying to fit this with only 100 trees on a dataset of roughly 200,000 records and 80 features. It takes hours to fit (and the results so far has been poor - overfit it appears, I am tuning some regularization as we speak). I am wondering if there this speed is normal and if there are any plans to improve (I did not see any option to grow the trees in parallel)? Also - on a real world data set (say from a marketing response problem) has this implementation proved to predict well?

Make meta-learners compatible with pandas.DataFrame inputs

Currently meta-learners functions expect numpy.matrix as an input. e.g. inference.meta.LRSLearner.estimate_ate() fails when the input is pandas.DataFrame instead of numpy.matrix.

It'd be better to make these compatible with pandas.DataFrame inputs.

Usage of setup_requires to define cython dependency

Right now cython and numpy is needed to run setup.py which leads to a hen and egg problem. If I want to install CausalML in a fresh virtual environment using a requirements.txt file, which includes cython and numpy, the installation will fail because both aren't installed before the setup.py file of CausalML is read.

The is problem is also subject of following StackOverflow issue. The actual solution is described here for your case.

AttributeError: 'int' object has no attribute 'clear'

hi,
when i run example,i get the follow error

please give advice or comments，thank you~

xgboost's `reg:linear` has been deprecated in favor of `reg:squarederror`

xgboost 0.90 deprecated reg:linear in favor of reg:squarederror. When you run the package tests with current xgboost, it dumps a huge number of deprecation warnings.

It looks like reg:linear is only mentioned by the XGBRRegressor constructor, but it's a part of the existing public interface ("Effect learner objective has to be rank:pairwise or reg:linear"), and reg:linear is required by versions before 0.90.

I'd suggest checking the xgboost version and replacing the objective with reg:squarederror if the version is 0.90 or greater, and tweaking the public interface to allow effect_learner_objective='reg:squarederror' itself and apply the same logic to pass the appropriate objective needed by the particular xgboost version.

Add DGP with imbalanced propensity distribution

This issue is to add synthetic data generation processes for the data with imbalanced propensity distribution. It will be useful to validate treatment effect estimations in such case.

Add bootstrap confidence intervals for ATE

It seems we are returning the percentile results instead of the CI.
https://github.com/uber/causalml/blob/master/causalml/inference/meta/rlearner.py#L174-L175

ValueError running example notebook

Thanks for the package, great contribution!

I'm trying to run the notebook here: https://github.com/uber/causalml/blob/master/examples/meta_learners_with_synthetic_data.ipynb

Getting the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-3e036ea08429> in <module>
     13 from causalml.match import NearestNeighborMatch, MatchOptimizer, create_table_one
     14 from causalml.propensity import ElasticNetPropensityModel
---> 15 from causalml.dataset import *
     16 from causalml.metrics import *
     17 

/opt/conda/lib/python3.6/site-packages/causalml/dataset/__init__.py in <module>
      6 from .classification import make_uplift_classification
      7 
----> 8 from .synthetic import get_synthetic_preds, get_synthetic_preds_holdout
      9 from .synthetic import get_synthetic_summary, get_synthetic_summary_holdout
     10 from .synthetic import scatter_plot_summary, scatter_plot_summary_holdout

/opt/conda/lib/python3.6/site-packages/causalml/dataset/synthetic.py in <module>
     14 
     15 from causalml.inference.meta import BaseXRegressor, BaseRRegressor, BaseSRegressor, BaseTRegressor
---> 16 from causalml.inference.tree import CausalTreeRegressor
     17 from causalml.propensity import ElasticNetPropensityModel
     18 from causalml.metrics import plot_gain, get_cumgain

/opt/conda/lib/python3.6/site-packages/causalml/inference/tree/__init__.py in <module>
      1 from .models import UpliftTreeClassifier, DecisionTree
      2 from .models import UpliftRandomForestClassifier
----> 3 from  .causaltree import CausalMSE, CausalTreeRegressor
      4 from .plot import uplift_tree_string, uplift_tree_plot
      5 from .utils import cat_group, cat_transform, cv_fold_index, cat_continuous, kpi_transform

_splitter.pxd in init causalml.inference.tree.causaltree()

ValueError: sklearn.tree._splitter.Splitter size changed, may indicate binary incompatibility. Expected 368 from C header, got 360 from PyObject

Installed the package on 4.12.2019 using pip, running causalml version 0.5.0.

The problem seems to come from scikit.-learn version 0.22.0 which was released on 3.12.2019. If I downgrade to scikit-learn 0.21.0, everything works fine.

Tree-based algorithms supports continuous target?

WARNING: C:/Jenkins/workspace/xgboost-win64_release_0.90/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

There are several warnings in the example of "meta_learners_with_synthetic_data_multiple_treatment".
The warnings occur when I run the code to "learner_s.estimate_ate(X=X, treatment=treatment, y=y, return_ci=True, bootstrap_ci=True, n_bootstraps=100, bootstrap_size=5000)"

The warning is

WARNING: C:/Jenkins/workspace/xgboost-win64_release_0.90/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

Fix uplift curves

The y-axis of current uplift curves (metrics.plot_gain()) is not correct. It is supposed to be the cumulative lift times the population for the cumulative quantiles, but currently it is the cumulative lift times 1 ~ 100.

Also it shows incorrect uplift curves for the imbalance data set because it sorts the data set separately for the treatment and control groups.

Add Qini curves and coefficients to metrics

Add Qini curves and coefficients to metrics as defined in Radcliffe 2007, "Using Control Groups to Target on Predicted LIft: Building and Assessing Uplift Models".

Causalml fails to install xgboost even with xgboost present in my conda env

Installing causalml through pip fails in my environment with the following error:

Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/4f/ck1srdbj21s8j7nbkbdn6zdm0000gn/T/pip-install-n76pirmd/xgboost/setup.py", line 42, in <module>
        LIB_PATH = libpath['find_lib_path']()
      File "/private/var/folders/4f/ck1srdbj21s8j7nbkbdn6zdm0000gn/T/pip-install-n76pirmd/xgboost/xgboost/libpath.py", line 48, in find_lib_path
        'List of candidates:\n' + ('\n'.join(dll_path)))
    XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?

I thought I'd try to install xgboost myself, but this did not solve the problem. Here is my system spec:

Darwin PCNAME 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

And here is the conda environment inside which I am trying to install causalml:

# platform: osx-64
_py-xgboost-mutex=2.0=cpu_0
absl-py=0.8.1=pypi_0
appnope=0.1.0=py37_0
astor=0.8.0=pypi_0
attrs=19.3.0=py_0
backcall=0.1.0=py37_0
blas=1.0=mkl
bleach=3.1.0=py37_0
ca-certificates=2019.11.27=0
certifi=2019.11.28=py37_0
cycler=0.10.0=py37_0
cython=0.29.14=py37h0a44026_0
dbus=1.13.12=h90a0687_0
decorator=4.4.1=py_0
defusedxml=0.6.0=py_0
econml=0.6=pypi_0
entrypoints=0.3=py37_0
expat=2.2.6=h0a44026_0
freetype=2.9.1=hb4e5f40_0
gast=0.2.2=pypi_0
gettext=0.19.8.1=h15daf44_3
glib=2.63.1=hd977a24_0
google-pasta=0.1.8=pypi_0
grpcio=1.25.0=pypi_0
h5py=2.10.0=pypi_0
icu=58.2=h4b95b61_1
importlib_metadata=1.1.0=py37_0
intel-openmp=2019.4=233
ipykernel=5.1.3=py37h39e3cac_0
ipython=7.10.1=py37h39e3cac_0
ipython_genutils=0.2.0=py37_0
ipywidgets=7.5.1=py_0
jedi=0.15.1=py37_0
jinja2=2.10.3=py_0
joblib=0.14.0=py_0
jpeg=9b=he5867d9_2
jsonschema=3.2.0=py37_0
jupyter=1.0.0=py37_7
jupyter_client=5.3.4=py37_0
jupyter_console=5.2.0=py37_1
jupyter_core=4.6.1=py37_0
keras=2.3.1=pypi_0
keras-applications=1.0.8=pypi_0
keras-preprocessing=1.1.0=pypi_0
kiwisolver=1.1.0=py37h0a44026_0
libcxx=9.0.0=h89e68fa_1
libedit=3.1.20181209=hb402a30_0
libffi=3.2.1=h475c297_4
libgfortran=3.0.1=h93005f0_2
libiconv=1.15=hdd342a3_7
libpng=1.6.37=ha441bb4_0
libsodium=1.0.16=h3efe00b_0
libxgboost=0.90=h4a8c4bd_4
llvm-openmp=9.0.0=h40edb58_0
llvmlite=0.30.0=pypi_0
markdown=3.1.1=pypi_0
markupsafe=1.1.1=py37h1de35cc_0
matplotlib=3.0.3=pypi_0
mistune=0.8.4=py37h1de35cc_0
mkl=2019.4=233
mkl-service=2.3.0=py37hfbe908c_0
mkl_fft=1.0.15=py37h5e564d8_0
mkl_random=1.1.0=py37ha771720_0
more-itertools=7.2.0=py37_0
nbconvert=5.6.1=py37_0
nbformat=4.4.0=py37_0
ncurses=6.1=h0a44026_1
notebook=6.0.2=py37_0
numba=0.46.0=pypi_0
numpy=1.17.4=py37h890c691_0
numpy-base=1.17.4=py37h6575580_0
openssl=1.1.1d=h1de35cc_3
opt-einsum=3.1.0=pypi_0
pandas=0.25.3=py37h0a44026_0
pandoc=2.2.3.2=0
pandocfilters=1.4.2=py37_1
parso=0.5.1=py_0
patsy=0.5.1=py37_0
pcre=8.43=h0a44026_0
pexpect=4.7.0=py37_0
pickleshare=0.7.5=py37_0
pip=19.3.1=py37_0
prometheus_client=0.7.1=py_0
prompt_toolkit=3.0.2=py_0
protobuf=3.11.1=pypi_0
ptyprocess=0.6.0=py37_0
py-xgboost=0.90=py37_4
pygments=2.5.2=py_0
pyparsing=2.4.5=py_0
pyqt=5.9.2=py37h655552a_2
pyrsistent=0.15.6=py37h1de35cc_0
python=3.7.5=h359304d_0
python-dateutil=2.8.1=py_0
python-graphviz=0.13.2=pypi_0
pytz=2019.3=py_0
pyyaml=5.2=pypi_0
pyzmq=18.1.0=py37h0a44026_0
qt=5.9.7=h468cd18_1
qtconsole=4.6.0=py_0
readline=7.0=h1de35cc_5
scikit-learn=0.21.3=py37h27c97d8_0
scipy=1.3.1=py37h1410ff5_0
send2trash=1.5.0=py37_0
setuptools=42.0.2=py37_0
sip=4.19.8=py37h0a44026_0
six=1.13.0=py37_0
sparse=0.8.0=pypi_0
sqlite=3.30.1=ha441bb4_0
statsmodels=0.10.1=py37h1d22016_0
tensorboard=1.15.0=pypi_0
tensorflow=1.15.0=pypi_0
tensorflow-estimator=1.15.1=pypi_0
termcolor=1.1.0=pypi_0
terminado=0.8.3=py37_0
testpath=0.4.4=py_0
tk=8.6.8=ha441bb4_0
tornado=6.0.3=py37h1de35cc_0
traitlets=4.3.3=py37_0
wcwidth=0.1.7=py37_0
webencodings=0.5.1=py37_1
werkzeug=0.16.0=pypi_0
wheel=0.33.6=py37_0
widgetsnbextension=3.5.1=py37_0
wrapt=1.11.2=pypi_0
xgboost=0.90=py37h4a8c4bd_4
xlrd=1.2.0=py37_0
xz=5.2.4=h1de35cc_4
zeromq=4.3.1=h0a44026_3
zipp=0.6.0=py_0
zlib=1.2.11=h1de35cc_3

Add net value optimization for classifier meta-learners

As the result of PR #39, we lost the net value optimisation approach previously implemented for R- and X-learners. The reference for the approach is: https://arxiv.org/abs/1908.05372

It should also be very straightforward to extend the approach to S- and T-learners.

Move meta-learner classifiers from tree to meta

Moving the meta-learner classifiers from tree to meta seems to be as easy as calling predict_proba() rather than predict() for some of the base-learners.

There seem to be two easy ways to incorporate the change:

Let the user specify the nature of the task as an argument
Check the type of the outcome variable and/or whether the passed learner has the predict_proba() attribute

Verify model performance based on benchmark data

As discussed on issue #70, it would be valuable to replicate the benchmark tests from the foundational research papers. For the meta-learners, we have the papers by Nie and Wager (2019) and Kunzel et al (2019) that contain such benchmark studies.

The simulation studies are:

•	Simulation setups A, B, C and D (Nie and Wager, p. 16)
•	Simulations 1—6 (Kunzel et al, p. 15–19)

Additionally, both of the papers have experiments with real data:

•	Voting study 1 (Nie and Wager, p. 7)
•	Voting study 2 (Kunzel et al, p. 9)
•	Canvassing study (Kunzel et al, p. 10)

It seems to me like a good starting point would be to conduct the experiments with synthetic data, but I’m also interested in hearing what others think.

As @swager mentioned, the replication data for Nie and Wager (2019) are available here.

uber / causalml Goto Github PK

causalml's Introduction

Disclaimer

Causal ML: A Python Package for Uplift Modeling and Causal Inference with ML

Documentation

Installation

Quickstart

Example Notebooks

Contributing

Versioning

License

References

Documentation

Conference Talks and Publications by CausalML Team

Citation

Literature

Related projects

causalml's People

Contributors

Stargazers

Watchers

Forkers

causalml's Issues

Recommend Projects

Recommend Topics

Recommend Org