Giter Club home page Giter Club logo

Comments (11)

detrin avatar detrin commented on June 10, 2024 2

@ReinierKoops Meanwhile I decided to persuade career opportunity in SImilarweb instead of Home Credit, so there is no need for this particular PR at my work. I might still use it outside the work and besides that it will be good practice. I will try to allocate some time over the next two weeks for this PR.

from probatus.

detrin avatar detrin commented on June 10, 2024 1

@ReinierKoops I didn't know there is subclass for that, I will try it

from probatus.

ReinierKoops avatar ReinierKoops commented on June 10, 2024 1

I think what you are proposing is an improvement for sure. However this would be a breaking change so I’d have to discuss this. Thank you for showing much interest :)

from probatus.

ReinierKoops avatar ReinierKoops commented on June 10, 2024

Hi @detrin I believe that is what the subclass: EarlyStoppingShapRFECV tries to do. Does that solve your question?

from probatus.

detrin avatar detrin commented on June 10, 2024

Maybe a cleaner design would be to have one class ShapRFECV that would not depend on the type of clf. Or then have children class for LGBMClassifier. Having ShapRFECV for especially early stopping seems to me a bit like overkill.

To your suggestion, yes it is working! So, I guess this issue could be closed, however it doesn't work I would expect it to work. See here is my example

import lightgbm
from sklearn.model_selection import RandomizedSearchCV
from probatus.feature_elimination import ShapRFECV, EarlyStoppingShapRFECV

params = {
   # ...
    "n_estimators": 1000,
    "seed": 1234,
}

param_grid = {
    "num_leaves": [25, 50, 100, 150, 200],
}

clf = lightgbm.LGBMClassifier(**params, class_weight="balanced")
search = RandomizedSearchCV(clf, param_grid)
shap_elimination = EarlyStoppingShapRFECV(
    clf=clf, step=0.2, cv=4, scoring="roc_auc", early_stopping_rounds=10, n_jobs=6
)
report = shap_elimination.fit_compute(
    data[train_mask][cols],
    data[train_mask][col_target],
)

So, when using clf as LGBMClassifier it works

shap_elimination = EarlyStoppingShapRFECV(clf=clf, step=0.2, cv=4, scoring='roc_auc', early_stopping_rounds=10, n_jobs=6)

however clf as RandomizedSearchCV doesn't work

shap_elimination = EarlyStoppingShapRFECV(clf=search, step=0.2, cv=4, scoring='roc_auc', early_stopping_rounds=10, n_jobs=6)

and I think there is no reason for it not to be possible. You may say that such workflow is overkill, but I would say not really when you use a lot of features to begin with. Hyperoptimization is then needed so that the params are not bending the results of shapley values.

I am thinking what would be the cleanest solution would be and I have the following
See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html
The RandomizedSearchCV.fit() has parameter fit_params and I think this could be the way how ShapRFECV.fit() would work as well.

When having fit_params in ShapRFECV.fit(), we can just pass fit_params to RandomizedSearchCV that would pass fit_params to LGBMClassifier!

So something like following

clf = lightgbm.LGBMClassifier(**params, class_weight="balanced")
search = RandomizedSearchCV(clf, param_grid)
shap_elimination = ShapRFECV(
    clf=clf, step=0.2, cv=4, scoring="roc_auc", early_stopping_rounds=10, n_jobs=6
)

fit_params = {
    "fit_params": { 
        "fit_params": {
            "eval_set": [(data[valid_mask][cols], data[valid_mask][col_target])],
        } # for LGBMClassifier
    } # for RandomizedSearchCV
}
report = shap_elimination.fit(
    data[train_mask][cols],
    data[train_mask][col_target],
    fit_params=fit_params
)

I see something similar done in


So what is the reason to have special class EarlyStoppingShapRFECV and what do you think of this proposal?

from probatus.

detrin avatar detrin commented on June 10, 2024

@ReinierKoops Could you have a look at this please?

from probatus.

ReinierKoops avatar ReinierKoops commented on June 10, 2024

I’ll get back to you on it on monday

from probatus.

detrin avatar detrin commented on June 10, 2024

Thanks. Regarding breaking change, there could be an optional argument in ShapRFECV that would switch between new and old behaviour. Alternatively, there could be a different part or different name for importing ShapRFECV. The rest would have a deprecation flag and it would be up to you and your team when it would be deprecated. Big repositories make those breaking changes in major releases such as pandas==2.0.0 and lightgbm==4.0.0.

from probatus.

ReinierKoops avatar ReinierKoops commented on June 10, 2024

If you want to take a stab at it, I’ll review it and merge it when done (if made backwards compatible/deprecation warning)

from probatus.

detrin avatar detrin commented on June 10, 2024

If you want to take a stab at it, I’ll review it and merge it when done (if made backwards compatible/deprecation warning)

Sure, this could be a fun, plus it will have real world use :)

from probatus.

ReinierKoops avatar ReinierKoops commented on June 10, 2024

@detrin is this still something you'd consider picking up?

from probatus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.