minyus / causallift Goto Github PK

CausalLift: Python package for causality-based Uplift Modeling in real-world business

Home Page: https://causallift.readthedocs.io/

License: Other

Python 54.15% Jupyter Notebook 45.85%

causal-impact causal-inference causality counterfactual econometrics propensity-score propensity-scores uplift uplift-modeling

causallift's Introduction

Yusuke Minami

Interests

Data Science
Machine Learning
Artificial Intelligence
Computer Vision
Causal Inference

Tech stack

IDE
- VS Code
versioning
- Git
shell
- Bash
- Xonsh (Python-powered and pip-installable shell with suggestions imported from bash-completion)
terminal
- Wezterm (terminal with quick copying feature)
Editor
- Vim
containerization
- Docker
- Kubernetes
CI/CD
- GitLab CI/CD
Python formatting
- Black (Python auto-formatting for early issue detection)
- Ruff (Fast Python linter which can replace Flake8, its plugins, isort, etc. for early issue detection)
Python packages
- PyTorch
  - MMPreTrain (formerly MMClassification and MMSelfSup)
- TensorFlow
- Scikit-learn
- Pandas
- OpenCV
- Apache Airflow
- MLflow (ML Experiment tracking)
- Kedro
ML model serving
- ONNX
- Triton
SQL
- Spark
- Trino (formerly PrestoSQL)
storage access
- rclone (Universal CLI to access remote storages such as S3, HDFS, etc.)
API framework
- OpenAPI Specification

Natural languages

English
Mandarin Chinese
Japanese
Korean

Contacts

causallift's People

Contributors

Stargazers

Watchers

causallift's Issues

How do we can give separtae scale_pos_weight for two separated models?

In the causalfit, at the beginning of the code, we can set scal_pos_weight but not for each model separately. Since I have high imbalance data and each model for each treatment I could have different scale_pos_weight, I must set it for each model. At the beginning of the code, we split data to train_df and test_df but all data regardless of treatment. This is done only for our later simulation. But where two models gets separated? in this case I can model (XGBoost) for each treatment and I can specify separate scale_pos_weight.

I am asking this because when I use two separated models for two treatment by myself and I specify a separated scale_pos_weight, then results with causallift is different.

Thanks in adavnce

Error regarding to base_score

Thanks for your very nice written code. I tried to use your code using my dataset. In the step:

train_df, test_df = cl.estimate_cate_by_2_models()
print('\n[Show CATE for train dataset]')
train_df.to_csv('CATE_for_Train.csv')
test_df.to_csv('CATE_for_Test.csv')

print('\n[Estimate the effect of recommendation based on the uplift model]')
estimated_effect_df = cl.estimate_recommendation_impact()
#df.to_csv('df.csv')
print('\n[Show the estimated effect of recommendation based on the uplift model]')
display(estimated_effect_df)

I have following error:

Which is related to base score. But I have already defined base score in Uplift_model_param.
May I ask you what could be the problem?

Thanks in advance

Some clarification about the code

Hi,

I would like to ask you some questions about your codes (Causallift).

My first question is that: what is the advantages of using Causallift? I mean If we have a binary treatment then I use a XGBoost estimator for each treatment and then separately model them (untreated and treated models), then I calculate the propabilities of converted for each individuals for each model. Now If new student comes in I put its features to untreated model and gets probability of converted and put it to treated model and gets its converted probability.
Then If the difference between probabilites of two models is positive, then I recommend this person to be treated and If the differences is negative then I don't recommend it to be treated.

Now in Causallift, we separate treated and untreated to two separate models. Our goal is to calculate the uplift ---> P(buy|treated) - P(buy|untreated).....But how we calculate uplift for one person that can not be treated and untreated in the same time?
And why we say Causal to this model , please explain the difference with first approach.

Thanks in advance.

Using Causallift for later predition

I would like to ask you a question related to using Causallift for later prediction.
I am using A/B test data for training the code. To do that foe example my data is "df". Then I split them to train and test set and I get uplift scores for each individual.
Suppose that I need to use this code for prediction using new data and calculate uplift scores for new data using the trained model.
In this case, all the parameters of XGBoost that I used for training should be the same when I use new data?
Second is that when I use new data, then new data will be test set and train would be data used for training the code?
Is it work like this? If we use like this then new data will see exactly the trained code as before right?
Thanks in advance,

Question Code & Result

Hi I have a question regarding to the Code of the Package.
I have tried to implement the Hillstrom (RCT) data set in CausalLift, but somehow the results look a bit weird. I splitted the dataset into a test and trainingsset and applied it as it is in the notebook example, but with
cl = CausalLift(df_train,df_test, verbose=3,enable_ipw=False)
After that I went to step 2:
Somehow I got these results

What could be the problem here ? Why is the accuracy at 1 here, what could be the problem?
And the pred/obs CVR in the end (result of step 3) is also at 1. Since I also compared the notebook example with my dataset I can't see the mistake. The dataset is balanced, so that about 50% got treated and 50% not.

Thx in advance!

The effect of Stratify = ['Treatment']

I have a question regarding to Stratify = ['Treatment'].
What is it for? Is it necessary to add it to the line of code?

Thanks in advance,

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Hi, thank you for the great package, it is very great and useful. I'm having some troubles implementing the code, can you please have a look.
In my original data I did not have any NAs or missing values, but when run the code I get the following error.

cl = CausalLift(X0, X11, enable_ipw=True, verbose=3)


[[2021-01-14 15:07:57,554|causallift.nodes.estimate_propensity|INFO] 
### Confusion Matrix for Test:
[2021-01-14 15:07:57,559|kedro.pipeline.node|ERROR] Node `estimate_propensity([args,df_00,propensity_model]) -> [df_01]` failed with error: 
Input contains NaN, infinity or a value too large for dtype('float64').
[2021-01-14 15:07:57,564|kedro.runner.sequential_runner|WARNING] There are 1 nodes that have not run.
You can resume the pipeline run by adding the following argument to your previous command:]

.......

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

estimate_cate_by_2_models crashes when col_propensity is provided in the data and enable_ipw=False

I am using the example provided in https://colab.research.google.com/github/Minyus/causallift/blob/master/examples/CausalLift_example.ipynb. However I am trying to test a scenario where the propensity is computed beforehand. In my CasualLift I initialize the object using

model = CausalLift(train_df, test_df,
                   enable_ipw=False,
                   random_state=0,
                   verbose=3,
                   col_treatment='treated',
                   col_propensity='likelihood',
                   col_outcome='outcome')

train_df, test_df = model.estimate_cate_by_2_models()

This code crashed with the following error message

Traceback (most recent call last):
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 567, in __call__
    return self.func(*args, **kwargs)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 528, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/sklearn.py", line 703, in fit
    missing=self.missing, nthread=self.n_jobs)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 427, in __init__
    self.set_weight(weight)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 698, in set_weight
    self.set_float_info('weight', weight)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 592, in set_float_info
    c_data = c_array(ctypes.c_float, data)
  File "/home/code/projects/uplift/env/lib/python3.6/site-packages/xgboost/core.py", line 219, in c_array
    return (ctype * len(values))(*values)
TypeError: object of type 'float' has no len()
"""

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~/projects/uplift/model.py in <module>
     31                    col_propensity='propensity',
     32                    col_outcome='Outcome')
---> 33 train_df, test_df = model.estimate_cate_by_2_models()
     34 # estimated_effect_df = model.estimate_recommendation_impact()

~/projects/uplift/env/lib/python3.6/site-packages/causallift/causal_lift.py in estimate_cate_by_2_models(self, verbose)
    194                                             enable_ipw=self.enable_ipw,
    195                                             uplift_model_params=self.uplift_model_params,
--> 196                                             cv=self.cv)
    197         model_for_untreated = ModelForUntreated(train_df_, test_df_,
    198                                                 random_state=self.random_state,

~/projects/uplift/env/lib/python3.6/site-packages/causallift/model_for_each.py in __init__(self, *args, **kwargs)
    224     def __init__(self, *args, **kwargs):
    225         kwargs.update(treatment_val=1.0)
--> 226         super().__init__(*args, **kwargs)
    227 
    228 

~/projects/uplift/env/lib/python3.6/site-packages/causallift/model_for_each.py in __init__(self, train_df_, test_df_, treatment_val, random_state, verbose, cols_features, col_treatment, col_outcome, col_propensity, col_recommendation, min_propensity, max_propensity, enable_ipw, uplift_model_params, cv)
    101                              params, cv=cv, return_train_score=False, n_jobs=-1)
    102 
--> 103         model.fit(X_train, y_train, sample_weight=sample_weight)
    104         if verbose >= 3:
    105             print('### Best parameters of the model trained using samples with observational Treatment: {} \n {}'.

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    720                 return results_container[0]
    721 
--> 722             self._run_search(evaluate_candidates)
    723 
    724         results = results_container[0]

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
   1189     def _run_search(self, evaluate_candidates):
   1190         """Search all candidates in param_grid"""
-> 1191         evaluate_candidates(ParameterGrid(self.param_grid))
   1192 
   1193 

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
    709                                for parameters, (train, test)
    710                                in product(candidate_params,
--> 711                                           cv.split(X, y, groups)))
    712 
    713                 all_candidate_params.extend(candidate_params)

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    928 
    929             with self._backend.retrieval_context():
--> 930                 self.retrieve()
    931             # Make sure that we get a last message telling us we are done
    932             elapsed_time = time.time() - self._start_time

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in retrieve(self)
    831             try:
    832                 if getattr(self._backend, 'supports_timeout', False):
--> 833                     self._output.extend(job.get(timeout=self.timeout))
    834                 else:
    835                     self._output.extend(job.get())

~/projects/uplift/env/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    519         AsyncResults.get from multiprocessing."""
    520         try:
--> 521             return future.result(timeout=timeout)
    522         except LokyTimeoutError:
    523             raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

TypeError: object of type 'float' has no len()

If I understand the document correctly, when the propensity is calculated beforehand, the flag enable_ipw is set to False. Assuming my object is initialized with the right parameters, I suspect that in line 76 in model_for_each module the sample_weight is defined as a float (1.0) while it should be a numpy array.

cl.estimate_cate_by_2_models() does not work with XGBoost version 1.0.2

I installed the newest release of causallift and xgboost (version 1.0.2) and the function estimate_cate_by_2_models() gives a RunTimeError now:

RuntimeError: Cannot clone object XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=None, colsample_bytree=1, gamma=0, gpu_id=None, importance_type='gain', interaction_constraints=None, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, monotone_constraints=None, n_estimators=100, n_jobs=-1, nthread=None, num_parallel_tree=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method=None, validate_parameters=False, verbose=0, verbosity=None), as the constructor either does not set or modifies parameter missing

When I use an older version of xgboost (version 0.90 like in the example notebook), it works again. An API change in version 1.0.0 of xgboost maybe?

Examples with Observational data

Hello Yusuke, first of all, I wanted to say that you have done a great job with the package. There are two points that stand out to me and I see that frequently as well:

Observational data as opposed to randomized data.
Trying to explain concepts in business terms
Do you have a sample example that shows CausalLift with observational data? Would love to see an example.
Thank you again,
--Muneer

Is there a causal modeling for multiple treatments?

I am asking this issue here, since Causallift handle binary treatment. Do you know a python code that compare multiple treatments and give the uplift for each individual to be able to compare treatments?
Do you know such a code or do you plan to develop a code for this case?

Thnaks

TypeError: run() missing 1 required positional argument: 'hook_manager'

Hi Team,

I am getting a TypeError: run() missing 1 required positional argument: 'hook_manager' while executing CausalLift(train_df, test_df, enable_ipw=True, verbose=3). For your reference, please find attached the snippet of dataset.

Generated data are random data?

I have a question about the generated data in one part of your code. I would like to know are they random data? I want to use random data to apply to the code and see how results of CATE varies compared to my data.

Thanks in advance

A clear explanation

I would like to ask you to please explain in details what Causallift does.
I think it first model treated and non treated samples separately. In this case for each model we have conversion rates for each individual. Then after I do not know what happens and why we do simulation and how? how exactly uplifts are calculated?
I want to clearly understand it since I choosed Causalift for my project.

Thanks in advance

Order of features in train and new data

I have a question. Is it necessary to have the same order for features in train and new data set?

pipeline issue (Kedro?)

Running:

print('\n[Estimate the effect of recommendation based on the uplift model]')
estimated_effect_df = cl.estimate_recommendation_impact()

gives below error. ran the notebook example from the github project.

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /tmp/ipykernel_121/2062088210.py:2 in │
│ │
│ [Errno 2] No such file or directory: '/tmp/ipykernel_121/2062088210.py' │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/causal_lift.py:721 in │
│ estimate_recommendation_impact │
│ │
│ 718 │ │ │
│ 719 │ │ if self.runner: │
│ 720 │ │ │ # self.kedro_context.catalog.save('args', self.args) │
│ ❱ 721 │ │ │ self.kedro_context.run(tags=["511_recommend_by_cate"]) │
│ 722 │ │ │ self.df = self.kedro_context.catalog.load("df_03") │
│ 723 │ │ │ │
│ 724 │ │ │ self.kedro_context.run(tags=["521_simulate_recommendation"]) │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/context/flexible_context.py:178 in run │
│ │
│ 175 │ │ │ + "only_missing: {}".format(only_missing) │
│ 176 │ │ │ + ")" │
│ 177 │ │ ) │
│ ❱ 178 │ │ return super().run( │
│ 179 │ │ │ tags=tags, runner=runner, node_names=node_names, only_missing=only_missing │
│ 180 │ │ ) │
│ 181 │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/context/flexible_context.py:141 in run │
│ │
│ 138 │ │ self, **kwargs # type: Any │
│ 139 │ ): │
│ 140 │ │ # type: (...) -> Dict[str, Any] │
│ ❱ 141 │ │ d = super().run(**kwargs) │
│ 142 │ │ self.catalog.add_feed_dict(d, replace=True) │
│ 143 │ │ return d │
│ 144 │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/context/flexible_context.py:131 in run │
│ │
│ 128 │ │ │ runner = ( │
│ 129 │ │ │ │ ParallelRunner() if runner == "ParallelRunner" else SequentialRunner() │
│ 130 │ │ │ ) │
│ ❱ 131 │ │ return super().run(runner=runner, **kwargs) │
│ 132 │
│ 133 │
│ 134 class ProjectContext2(ProjectContext1): │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/context/flexible_context.py:106 in run │
│ │
│ 103 │ │ runner = runner or SequentialRunner() │
│ 104 │ │ if only_missing and _skippable(self.catalog): │
│ 105 │ │ │ return runner.run_only_missing(pipeline, self.catalog) │
│ ❱ 106 │ │ return runner.run(pipeline, self.catalog) │
│ 107 │
│ 108 │
│ 109 def _skippable( │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/runner/runner.py:75 in run │
│ │
│ 72 │ │ │
│ 73 │ │ unsatisfied = pipeline.inputs() - set(catalog.list()) │
│ 74 │ │ if unsatisfied: │
│ ❱ 75 │ │ │ raise ValueError( │
│ 76 │ │ │ │ f"Pipeline input(s) {unsatisfied} not found in the DataCatalog" │
│ 77 │ │ │ ) │
│ 78 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Pipeline input(s) {'df_02'} not found in the DataCatalog

Getting Json Formatter error

Using Google Colab to run basic Uplift modeling.

try:
import causallift
except:
""" Install CausalLift """
!pip3 install causallift

'1.0.6'

from causallift import CausalLift

print('\n[Estimate propensity scores for Inverse Probability Weighting.]')
cl = CausalLift(train_df, test_df, enable_ipw=True, verbose=3)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ :4 in │
│ │
│ /usr/local/lib/python3.8/dist-packages/causallift/causal_lift.py:510 in init │
│ │
│ 507 │ │ │
│ 508 │ │ # Instance attributes were defined above. │
│ 509 │ │ if logging_config: │
│ ❱ 510 │ │ │ logging.config.dictConfig(logging_config) │
│ 511 │ │ │
│ 512 │ │ args_raw = dict( │
│ 513 │ │ │ cols_features=cols_features, │
│ │
│ /usr/lib/python3.8/logging/config.py:808 in dictConfig │
│ │
│ 805 │
│ 806 def dictConfig(config): │
│ 807 │ """Configure logging using a dictionary.""" │
│ ❱ 808 │ dictConfigClass(config).configure() │
│ 809 │
│ 810 │
│ 811 def listen(port=DEFAULT_LOGGING_CONFIG_PORT, verify=None): │
│ │
│ /usr/lib/python3.8/logging/config.py:545 in configure │
│ │
│ 542 │ │ │ │ │ │ formatters[name] = self.configure_formatter( │
│ 543 │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ formatters[name]) │
│ 544 │ │ │ │ │ except Exception as e: │
│ ❱ 545 │ │ │ │ │ │ raise ValueError('Unable to configure ' │
│ 546 │ │ │ │ │ │ │ │ │ │ 'formatter %r' % name) from e │
│ 547 │ │ │ │ # Next, do filters - they don't refer to anything else, either │
│ 548 │ │ │ │ filters = config.get('filters', EMPTY_DICT) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Unable to configure formatter 'json_formatter'

Handle imbalance

Hi,

is it possible to consider instance weights during training to handle imbalances?

XGBClassifier has a sample_weight parameter during fitting the model. The parameter accepts a weight for each instance. How can I trigger it? I tried to include the parameter in the uplift_model_param dictionary when creating a CausaLift instance, but it did not work.

Thanks in advance

A question about train.df and test.df

Hello.

I am using your code for my data set. But I have very urgent question, which I am confused now.

As I understood, you mentioned that you train two separated models one for untreated people and one for treated people. After you mentioned that to calculate CATE, we combine all samples including treated and untreated people and then set treatment feature to 0 for all sample and give all sample to untreated model and obtain predicted probabilities, then another time we set for all sample treatment feature to 1 and give all sample to treated model and at the end we calculate the CATE subtracting predicted probabilities.
But in this case it means there are some cases which are seen for the treated model or untreated model. This is biased no?

I see that at the beginning of the code you split data in train and test set by 20%. But I don't understand where train and test set in two separated models come from. Is it related to train and test set we split at the beginning of the code?

In any case, I am confused that If we train a model for untreated sample and train a model for treated sample, then we set treatment for all samples including treated and untreated to 0 and another time to 1 and give them two models separately, then some data are seen for one of the model. I am not sure my interpretation of your code is correct but If yes then we will have bias for calculating uplift.

I will be grateful If you reply to my question. Since I am currently using your code.

Many thanks

XGBooster Invalid missing value: null

Running:

print('\n[Create 2 models for treatment and untreatment and estimate CATE (Conditional Average Treatment Effects)]')
train_df, test_df = cl.estimate_cate_by_2_models()

gives below error. Ran the example notebook from the github project

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /tmp/ipykernel_121/3555275851.py:5 in │
│ │
│ [Errno 2] No such file or directory: '/tmp/ipykernel_121/3555275851.py' │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/causal_lift.py:654 in │
│ estimate_cate_by_2_models │
│ │
│ 651 │ │ │ ) │
│ 652 │ │ │
│ 653 │ │ if self.runner: │
│ ❱ 654 │ │ │ self.kedro_context.run(tags=["311_fit", "312_bundle_2_models"]) │
│ 655 │ │ │ self.uplift_models_dict = self.kedro_context.catalog.load( │
│ 656 │ │ │ │ "uplift_models_dict" │
│ 657 │ │ │ ) │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/context/flexible_context.py:178 in run │
│ │
│ 175 │ │ │ + "only_missing: {}".format(only_missing) │
│ 176 │ │ │ + ")" │
│ 177 │ │ ) │
│ ❱ 178 │ │ return super().run( │
│ 179 │ │ │ tags=tags, runner=runner, node_names=node_names, only_missing=only_missing │
│ 180 │ │ ) │
│ 181 │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/context/flexible_context.py:141 in run │
│ │
│ 138 │ │ self, **kwargs # type: Any │
│ 139 │ ): │
│ 140 │ │ # type: (...) -> Dict[str, Any] │
│ ❱ 141 │ │ d = super().run(**kwargs) │
│ 142 │ │ self.catalog.add_feed_dict(d, replace=True) │
│ 143 │ │ return d │
│ 144 │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/context/flexible_context.py:131 in run │
│ │
│ 128 │ │ │ runner = ( │
│ 129 │ │ │ │ ParallelRunner() if runner == "ParallelRunner" else SequentialRunner() │
│ 130 │ │ │ ) │
│ ❱ 131 │ │ return super().run(runner=runner, **kwargs) │
│ 132 │
│ 133 │
│ 134 class ProjectContext2(ProjectContext1): │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/context/flexible_context.py:106 in run │
│ │
│ 103 │ │ runner = runner or SequentialRunner() │
│ 104 │ │ if only_missing and _skippable(self.catalog): │
│ 105 │ │ │ return runner.run_only_missing(pipeline, self.catalog) │
│ ❱ 106 │ │ return runner.run(pipeline, self.catalog) │
│ 107 │
│ 108 │
│ 109 def _skippable( │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/runner/runner.py:88 in run │
│ │
│ 85 │ │ │ self._logger.info( │
│ 86 │ │ │ │ "Asynchronous mode is enabled for loading and saving data" │
│ 87 │ │ │ ) │
│ ❱ 88 │ │ self._run(pipeline, catalog, hook_manager, session_id) │
│ 89 │ │ │
│ 90 │ │ self._logger.info("Pipeline execution completed successfully.") │
│ 91 │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/runner/sequential_runner.py:70 in _run │
│ │
│ 67 │ │ │
│ 68 │ │ for exec_index, node in enumerate(nodes): │
│ 69 │ │ │ try: │
│ ❱ 70 │ │ │ │ run_node(node, catalog, hook_manager, self._is_async, session_id) │
│ 71 │ │ │ │ done_nodes.add(node) │
│ 72 │ │ │ except Exception: │
│ 73 │ │ │ │ self._suggest_resume_scenario(pipeline, done_nodes, catalog) │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/runner/runner.py:304 in run_node │
│ │
│ 301 │ if is_async: │
│ 302 │ │ node = _run_node_async(node, catalog, hook_manager, session_id) │
│ 303 │ else: │
│ ❱ 304 │ │ node = _run_node_sequential(node, catalog, hook_manager, session_id) │
│ 305 │ │
│ 306 │ for name in node.confirms: │
│ 307 │ │ catalog.confirm(name) │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/runner/runner.py:398 in _run_node_sequential │
│ │
│ 395 │ ) │
│ 396 │ inputs.update(additional_inputs) │
│ 397 │ │
│ ❱ 398 │ outputs = _call_node_run( │
│ 399 │ │ node, catalog, inputs, is_async, hook_manager, session_id=session_id │
│ 400 │ ) │
│ 401 │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/runner/runner.py:366 in _call_node_run │
│ │
│ 363 │ │ │ is_async=is_async, │
│ 364 │ │ │ session_id=session_id, │
│ 365 │ │ ) │
│ ❱ 366 │ │ raise exc │
│ 367 │ hook_manager.hook.after_node_run( │
│ 368 │ │ node=node, │
│ 369 │ │ catalog=catalog, │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/runner/runner.py:356 in _call_node_run │
│ │
│ 353 ) -> Dict[str, Any]: │
│ 354 │ # pylint: disable=too-many-arguments │
│ 355 │ try: │
│ ❱ 356 │ │ outputs = node.run(inputs) │
│ 357 │ except Exception as exc: │
│ 358 │ │ hook_manager.hook.on_node_error( │
│ 359 │ │ │ error=exc, │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/pipeline/node.py:353 in run │
│ │
│ 350 │ │ # purposely catch all exceptions │
│ 351 │ │ except Exception as exc: │
│ 352 │ │ │ self._logger.error("Node '%s' failed with error: \n%s", str(self), str(exc)) │
│ ❱ 353 │ │ │ raise exc │
│ 354 │ │
│ 355 │ def _run_with_no_inputs(self, inputs: Dict[str, Any]): │
│ 356 │ │ if inputs: │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/pipeline/node.py:344 in run │
│ │
│ 341 │ │ │ elif isinstance(self._inputs, str): │
│ 342 │ │ │ │ outputs = self._run_with_one_input(inputs, self._inputs) │
│ 343 │ │ │ elif isinstance(self._inputs, list): │
│ ❱ 344 │ │ │ │ outputs = self._run_with_list(inputs, self._inputs) │
│ 345 │ │ │ elif isinstance(self._inputs, dict): │
│ 346 │ │ │ │ outputs = self._run_with_dict(inputs, self._inputs) │
│ 347 │
│ │
│ /root/venv/lib/python3.9/site-packages/kedro/pipeline/node.py:384 in _run_with_list │
│ │
│ 381 │ │ │ │ f"{sorted(inputs.keys())}." │
│ 382 │ │ │ ) │
│ 383 │ │ # Ensure the function gets the inputs in the correct order │
│ ❱ 384 │ │ return self._func(*(inputs[item] for item in node_inputs)) │
│ 385 │ │
│ 386 │ def _run_with_dict(self, inputs: Dict[str, Any], node_inputs: Dict[str, str]): │
│ 387 │ │ # Node inputs and provided run inputs should completely overlap │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/nodes/model_for_each.py:234 in │
│ model_for_treated_fit │
│ │
│ 231 │
│ 232 │
│ 233 def model_for_treated_fit(*posargs, **kwargs): │
│ ❱ 234 │ return ModelForTreated().fit(*posargs, **kwargs) │
│ 235 │
│ 236 │
│ 237 def model_for_treated_predict_proba(*posargs, **kwargs): │
│ │
│ /root/venv/lib/python3.9/site-packages/causallift/nodes/model_for_each.py:94 in fit │
│ │
│ 91 │ │ │ else: │
│ 92 │ │ │ │ log.info("## Feature importances not available.") │
│ 93 │ │ │
│ ❱ 94 │ │ y_pred_train = model.predict(X_train) │
│ 95 │ │ │
│ 96 │ │ y_test = None │
│ 97 │ │ y_pred_test = None │
│ │
│ /shared-libs/python3.9/py/lib/python3.9/site-packages/sklearn/model_selection/search.py:500 in │
│ predict │
│ │
│ 497 │ │ │ the best found parameters. │
│ 498 │ │ """ │
│ 499 │ │ check_is_fitted(self) │
│ ❱ 500 │ │ return self.best_estimator.predict(X) │
│ 501 │ │
│ 502 │ @available_if(_estimator_has("predict_proba")) │
│ 503 │ def predict_proba(self, X): │
│ │
│ /root/venv/lib/python3.9/site-packages/xgboost/sklearn.py:1434 in predict │
│ │
│ 1431 │ │ base_margin: Optional[ArrayLike] = None, │
│ 1432 │ │ iteration_range: Optional[Tuple[int, int]] = None, │
│ 1433 │ ) -> np.ndarray: │
│ ❱ 1434 │ │ class_probs = super().predict( │
│ 1435 │ │ │ X=X, │
│ 1436 │ │ │ output_margin=output_margin, │
│ 1437 │ │ │ ntree_limit=ntree_limit, │
│ │
│ /root/venv/lib/python3.9/site-packages/xgboost/sklearn.py:1049 in predict │
│ │
│ 1046 │ │ iteration_range = self._get_iteration_range(iteration_range) │
│ 1047 │ │ if self._can_use_inplace_predict(): │
│ 1048 │ │ │ try: │
│ ❱ 1049 │ │ │ │ predts = self.get_booster().inplace_predict( │
│ 1050 │ │ │ │ │ data=X, │
│ 1051 │ │ │ │ │ iteration_range=iteration_range, │
│ 1052 │ │ │ │ │ predict_type="margin" if output_margin else "value", │
│ │
│ /root/venv/lib/python3.9/site-packages/xgboost/core.py:2147 in inplace_predict │
│ │
│ 2144 │ │ if isinstance(data, np.ndarray): │
│ 2145 │ │ │ from .data import _ensure_np_dtype │
│ 2146 │ │ │ data, _ = _ensure_np_dtype(data, data.dtype) │
│ ❱ 2147 │ │ │ _check_call( │
│ 2148 │ │ │ │ _LIB.XGBoosterPredictFromDense( │
│ 2149 │ │ │ │ │ self.handle, │
│ 2150 │ │ │ │ │ _array_interface(data), │
│ │
│ /root/venv/lib/python3.9/site-packages/xgboost/core.py:246 in _check_call │
│ │
│ 243 │ │ return value from API calls │
│ 244 │ """ │
│ 245 │ if ret != 0: │
│ ❱ 246 │ │ raise XGBoostError(py_str(_LIB.XGBGetLastError())) │
│ 247 │
│ 248 │
│ 249 def _has_categorical(booster: "Booster", data: DataType) -> bool: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
XGBoostError: [12:04:08] ../src/c_api/c_api_utils.h:159: Invalid missing value: null
Stack trace:
[bt] (0) /root/venv/lib/python3.9/site-packages/xgboost/lib/libxgboost.so(+0xbbec9) [0x7f5d31953ec9]
[bt] (1) /root/venv/lib/python3.9/site-packages/xgboost/lib/libxgboost.so(+0xdeb90) [0x7f5d31976b90]
[bt] (2) /root/venv/lib/python3.9/site-packages/xgboost/lib/libxgboost.so(+0xe45d8) [0x7f5d3197c5d8]
[bt] (3) /root/venv/lib/python3.9/site-packages/xgboost/lib/libxgboost.so(XGBoosterPredictFromDense+0x330)
[0x7f5d3195c4d0]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f5dccad38ee]
[bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7f5dccad32bf]
[bt] (6) /usr/local/lib/python3.9/lib-dynload/_ctypes.cpython-39-x86_64-linux-gnu.so(+0x13111) [0x7f5dccaf1111]
[bt] (7) /usr/local/lib/python3.9/lib-dynload/_ctypes.cpython-39-x86_64-linux-gnu.so(+0x81ed) [0x7f5dccae61ed]
[bt] (8) /usr/local/lib/libpython3.9.so.1.0(_PyObject_MakeTpCall+0x79) [0x7f5dcdd1ced9]

CATE vs Propensity

I am just following the tutorial, and I have generated two columns: CATE and Propensity. The tutorial recommends selecting users with a high uplift score which is CATE.

Is the Propensity column any use to us at all? Or can I just disregard it? The propensity may be a positive number, and CATE could be negative some times. I'm not sure how to interpret the scores when this happens.

The robustness of uplift

Hi
I think this package is really useful in the campaign evaluation.

But I'm curious how could we assess the robustness of expected uplift?
I tried to review the references but didn't find the related discussion.

Thanks

minyus / causallift Goto Github PK

causallift's Introduction

Yusuke Minami

Interests

Popular Contents

Tech stack

Natural languages

Contacts

causallift's People

Contributors

Stargazers

Watchers

Forkers

causallift's Issues

Recommend Projects

Recommend Topics

Recommend Org