Comments (11)
@ReinierKoops Meanwhile I decided to persuade career opportunity in SImilarweb instead of Home Credit, so there is no need for this particular PR at my work. I might still use it outside the work and besides that it will be good practice. I will try to allocate some time over the next two weeks for this PR.
from probatus.
@ReinierKoops I didn't know there is subclass for that, I will try it
from probatus.
I think what you are proposing is an improvement for sure. However this would be a breaking change so I’d have to discuss this. Thank you for showing much interest :)
from probatus.
Hi @detrin I believe that is what the subclass: EarlyStoppingShapRFECV tries to do. Does that solve your question?
from probatus.
Maybe a cleaner design would be to have one class ShapRFECV
that would not depend on the type of clf
. Or then have children class for LGBMClassifier
. Having ShapRFECV
for especially early stopping seems to me a bit like overkill.
To your suggestion, yes it is working! So, I guess this issue could be closed, however it doesn't work I would expect it to work. See here is my example
import lightgbm
from sklearn.model_selection import RandomizedSearchCV
from probatus.feature_elimination import ShapRFECV, EarlyStoppingShapRFECV
params = {
# ...
"n_estimators": 1000,
"seed": 1234,
}
param_grid = {
"num_leaves": [25, 50, 100, 150, 200],
}
clf = lightgbm.LGBMClassifier(**params, class_weight="balanced")
search = RandomizedSearchCV(clf, param_grid)
shap_elimination = EarlyStoppingShapRFECV(
clf=clf, step=0.2, cv=4, scoring="roc_auc", early_stopping_rounds=10, n_jobs=6
)
report = shap_elimination.fit_compute(
data[train_mask][cols],
data[train_mask][col_target],
)
So, when using clf
as LGBMClassifier
it works
shap_elimination = EarlyStoppingShapRFECV(clf=clf, step=0.2, cv=4, scoring='roc_auc', early_stopping_rounds=10, n_jobs=6)
however clf
as RandomizedSearchCV
doesn't work
shap_elimination = EarlyStoppingShapRFECV(clf=search, step=0.2, cv=4, scoring='roc_auc', early_stopping_rounds=10, n_jobs=6)
and I think there is no reason for it not to be possible. You may say that such workflow is overkill, but I would say not really when you use a lot of features to begin with. Hyperoptimization is then needed so that the params are not bending the results of shapley values.
I am thinking what would be the cleanest solution would be and I have the following
See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html
The RandomizedSearchCV.fit()
has parameter fit_params
and I think this could be the way how ShapRFECV.fit()
would work as well.
When having fit_params
in ShapRFECV.fit()
, we can just pass fit_params
to RandomizedSearchCV
that would pass fit_params
to LGBMClassifier
!
So something like following
clf = lightgbm.LGBMClassifier(**params, class_weight="balanced")
search = RandomizedSearchCV(clf, param_grid)
shap_elimination = ShapRFECV(
clf=clf, step=0.2, cv=4, scoring="roc_auc", early_stopping_rounds=10, n_jobs=6
)
fit_params = {
"fit_params": {
"fit_params": {
"eval_set": [(data[valid_mask][cols], data[valid_mask][col_target])],
} # for LGBMClassifier
} # for RandomizedSearchCV
}
report = shap_elimination.fit(
data[train_mask][cols],
data[train_mask][col_target],
fit_params=fit_params
)
I see something similar done in
So what is the reason to have special class
EarlyStoppingShapRFECV
and what do you think of this proposal?from probatus.
@ReinierKoops Could you have a look at this please?
from probatus.
I’ll get back to you on it on monday
from probatus.
Thanks. Regarding breaking change, there could be an optional argument in ShapRFECV
that would switch between new and old behaviour. Alternatively, there could be a different part or different name for importing ShapRFECV
. The rest would have a deprecation flag and it would be up to you and your team when it would be deprecated. Big repositories make those breaking changes in major releases such as pandas==2.0.0
and lightgbm==4.0.0
.
from probatus.
If you want to take a stab at it, I’ll review it and merge it when done (if made backwards compatible/deprecation warning)
from probatus.
If you want to take a stab at it, I’ll review it and merge it when done (if made backwards compatible/deprecation warning)
Sure, this could be a fun, plus it will have real world use :)
from probatus.
@detrin is this still something you'd consider picking up?
from probatus.
Related Issues (20)
- Unit tests should only contain assertions that make sense in the context of the functionality. HOT 2
- Mkdocs fails HOT 2
- Update Probatus to use the latest version of SHAP HOT 23
- Antivirus blacklisted and blocked Probatus website HOT 7
- Patch release v2.1.1 HOT 2
- Spark Support of ShapRFECV HOT 3
- python3.12 support HOT 2
- Support for shap==0.43.0 HOT 6
- AttributeError: module 'numpy' has no attribute 'bool'. HOT 2
- Random state not set consistently. HOT 1
- Add explicit support for regressors next to classifiers HOT 1
- Introduce dependabot for help with dependency updates
- Investigate if parts of the codebase can leverage other libraries code HOT 2
- Update all notebooks according to latest code. HOT 1
- Probatus v3.0.0+ missing features & issues.
- Add a notebook which shows the use of Probatus with pySpark
- Add seed to explainer + remove np.random.state() HOT 1
- Create a new tag HOT 3
- eval_metric in EarlyStoppingShapRFECV not used for LGBMClassifier
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from probatus.