Giter Club home page Giter Club logo

pylift's People

Contributors

minyus avatar rsyi avatar shaddyab avatar wtfrost avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pylift's Issues

Typo in the docs [quick fix ๐Ÿ˜„]

In pylift/docs/eda.rst there is a typo in the LaTeX. It should swap \elem for \in to get the symbol โˆˆ in the following line:

\text{WOE}_{ij} = \log \frac{P(X_j \elem B_i | Y = 1)}{P(X_j \elem B_i | Y = 0)}

Links in the docs

In the docs online, on readthedocs, I find that a few links are incorrectly rendered.

On Usage, the links for introduction and quickstart miss the .html.

Belongs to mathematical symbol in EDA docs

There's an error in eda.rst in Weight of Evidence part, mathematical symbol for "belongs to" is not rendered due to incorrect command \elem. The correct command is \in.

Convert scoring methods to eval metrics

Is there an easy way to convert the sklearn make_scorer metrics (e.g., _aqini_score, _qini_score, etc.) to eval functions that can be used with the 'eval_metric' argument in xgboost?

Which models is used? XGBoost? how to change its parameters?

Hi @rsyi

First of all, thanks for very good package.
I want to use your code for my data.
I have few questions. First is that, If I want to use XGBoost, where can I change the parameters? I am asking because my data is highly imbalanced. Therefore, I need to change the value of scale_pos_weight of XGBoost to balance my data.
The second question is that I don't understand the difference between your plots of uplift gain and lift!
Can you please very briefly explain here?

Thanks in advance

Future issues and PR

Should all future issues and PR posted to this fork instead of wayfair/pylift?

NIV Empty

Hello,

I've setup my model via TransformedOutcome, then wanted to check all the NIV features. However, when I use NIV (dict, or plot) all the included features are empty. The rest of the steps work and I get plots further down the line, but the NIV being empty leads me to believe they are incorrect.

Setting up the model
up = TransformedOutcome(df, col_treatment='Response', col_outcome='TotalRevenueFoodItems',random_state=4, stratify=y)

Call NIV
up.NIV_dict

Output of NIV: see screenshot
Screen Shot 2019-10-28 at 3 06 23 PM

Thank you.

Possible issue with _get_counts and _get_tc_counts

Given 100K (N=1e5) samples with the following distribution:
Treatment = 98% (W=1)
Control = 2% (W=0)
Hence p = 0.98

The samples were balanced for response such that, for response (Y=1), the samples are split 1% (W=0, Y=1) Control vs 49% (W=1, Y=1) Treatment. Similarly, for no response (Y=0), the samples are split 1% (W=0, Y=0) Control vs 49% (W=1, Y=0) Treatment.

Based on this I would expect the two functions _get_counts and _get_tc_counts in pylift.eval to return the following values.
Nt1o1 = 49K, Nt0o1 = 1K , Nt1o0 = 49K, Nt0o0 = 1K
Nt1 = 98K, Nt0 = 2K , N = 1e5

However, the functions are returning the following values instead

Nt1o1 = 25K, Nt0o1 = 25K, Nt1o0 = 25K, Nt0o0 = 25K
Nt1 = 50K, Nt0 = 50K , N = 1e5

Could it be that that the implemented logic which is based on summing 1/p and 1/(1-p) values need to be modify?

Uplift and Cross Validation

Should stratified cross-validation based on the Treatment vs Outcome 2x2 matrix split be used when performing a grid search to ensure that each fold follows the same distribution used in the overall data? If this is not the case, and cross validation is used for hyperparameter search, should we expect that the scores used for evaluating each fold, using qini coefficient as an example, indeed represent the qini coefficient for the overall training dataset? Putting it differently, an increase in number of folds may affect the stability of the uplift score in each fold.

Another closely related question, given the following two cross validation outputs, which one should we prefer? Higher mean score across folds, or uniform score across folds?

Number of cross folds: 4
Split_1_Score | split_2_Score | Split_3_Score | Split_4_Score | Mean_Score
0.4 0.9 -0.2 -0.3 | 0.2

Number of cross folds: 2
Split_1_Score | split_2_Score | Mean_Score
0.12 0.16 | 0.14

Scoring and evaluation for continuous outcome

Q1)
Given the fact that for continuous outcome the theoretical max (i.e., q1_) and practical max(i.e., q2_) curves are not well defined and will not be correct, then only the following six metrics can be used to evaluate the model. Is this correct?

  1. Q_cgains
  2. Q_aqini
  3. Q_qini
  4. max_ cgains
  5. max_aqini
  6. max_qini

Q2)
Based on lines 205
score_name = 'q1_'+method
And the _score function in base.py

    def _score(self, y_true, y_pred, method, plot_type, score_name):
        """ scoring function to be passed to make_scorer.
        """
        treatment_true, outcome_true, p = self.untransform(y_true)
        scores = get_scores(treatment_true, outcome_true, y_pred, p, scoring_range=(0,self.scoring_cutoff[method]), plot_type=plot_type)
        return scores[score_name]

three of the scoring methods which can be used for grid search: 'q1_qini', 'q1_cgains', 'q1_aqini' should not be used with continuous variables. If this is indeed the case, then I would suggest that this issue be fixed using the continuous_outcome argument already available and maybe be replaced with โ€˜Q_โ€™ scores for continuous variables.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.