Comments (13)
In my case, I had to manually create Explanation object with correct shape :
rfc = RandomForestClassifier()
rfc.fit(train_x, train_y)
# test_x is a np.ndarray with (2936911, 41) shape
df_tmp = pd.DataFrame(test_x, columns=feature_names)
masker = shap.maskers.Independent(df_tmp, max_samples=1000)
explainer = shap.TreeExplainer(rfc, masker)
# shap_values.values.shape == (10000, 41, 2)
# shap_values[1].values.shape == (41, 2)
shap_values = explainer(df_tmp.sample(10000, random_state=69))
# tmp.values.shape == tmp.data.shape == (10000, 41)
# I took all observations for all features just for positive prediction result
tmp = shap.Explanation(shap_values[:, :, 1], data=df_tmp, feature_names=feature_names)
# now both plots work
shap.plots.beeswarm(tmp)
#shap.summary_plot(tmp)
Plot without feature names (NDA and similar stuff):
Solutions above did not work:
shap.plots.beeswarm(shap_values)
>>> ValueError: "The beeswarm plot does not support plotting explanations with instances that have more than one dimension!"
shap.plots.beeswarm(shap_values[1])
>>>IndexError: "tuple index out of range"
But the strange thing, that some time ago shap.summary_plot(shap_values[1], tmp)
worked for other RandomForest model and different data.
PS: shap-0.39.0, numpy-'1.20.3'
from shap.
Hi @slundberg,
Thank you for the thread!
I was reading about plotting the shap.summary_plot(shap_values, X) for random forest and XGB binary classifiers, where shap_values = shap.TreeExplainer(clf).shap_values(X).
The interesting thing is that for the XGB classifier, shap_values in the summary plot is just as is in the calculation, whereas for the random forest, the shap_values needs to be shap_values[1], basically only the array for the positive label. I am interested in knowing why there is a discrepancy. Thank you so much!
Below, I included the example implementations for the random forest and XGB classifiers.
RF: https://medium.com/python-in-plain-english/random-forest-classifier-and-shap-how-to-understand-your-customers-and-interpret-a-black-box-model-6166d86820d9
XGB: https://github.com/slundberg/shap/blob/master/notebooks/tree_explainer/Census%20income%20classification%20with%20XGBoost.ipynb
from shap.
When I use shap.TreeExplainer
to explain RandomForestRegressor
, it is very slow, but it is very fast when using shap.TreeExplainer
to explain XGBRegressor
. Does anyone have the same issue or know the reasons? Thx!
from shap.
XGBoost does do bagging, and has parameters that can make it very similar to a random forest (using the DART parameters). So there is no reason TreeSHAP couldn't apply to any tree model, however it does scale quadratically with tree depth, so it would run a bit slower with random forest models since they tend to be really deep.
Making a general purpose tree library in Python or R is challenging since it is important to use a high performance language such as C++ to get good results since the algorithm involves loops and recursion, which are slow in a typical interpreted language.
Hope that helps!
from shap.
Very cool, I'd love to see TreeSHAP in action on a random forest. And I'm sure you're right that it would be super slow to do general purpose tree building in Python or R directly. I was thinking more about using those languages as user-friendly frontends to fast lower level implementations, as they do with gbm, ranger, etc. I suspect the algorithm will eventually be incorporated into other packages for tree-based modeling the way it was with XGBoost. Looking forward to that!
from shap.
Hi,
Nice work! Is it possible to provide an example on how to use the SHAP package with a random forest model? That would be much appreciated!
thanks
Pieter
from shap.
Hello everybody,
I got an issue with random forest regessor here. The same code which is working for a xgb model, brings the following error (see below). Can anybody explain why this isn't working with sklearn randomforestregressor?
Error: 'i' format requires -2147483648 <= number <= 2147483647
although I'm not using 'i' anywhere in the code
thanks and br
christoph
from shap.
from shap.
Dear Scott,
thanks for your fast answer.
Please see below 2 code snippets: The first shows the xgb case, which is working fine. The second shows the RandomForestRegressor(sklearn) case, which gives the error above.
XGB:
import shap
explainer = shap.TreeExplainer(model)
shap.initjs()
shap_values = explainer.shap_values(PredData, approximate=True)
model: <xgboost.core.Booster at 0x7f641316cdd8>
RF:
import shap
explainer = shap.TreeExplainer(modelrf)
shap.initjs()
shap_values = explainer.shap_values(PredData, approximate=True)
modelrf: RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=35,
max_features=20, max_leaf_nodes=None, min_impurity_decrease=0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=5, min_weight_fraction_leaf=0.0,
n_estimators=750, n_jobs=4, oob_score=False,
random_state=201810, verbose=1, warm_start=False)
In case you need more information; just tell me.
thanks and br
christoph
from shap.
@christophgm this runs fine when I use another dataset, can you provide a full example?
from shap.
Hi
when I apply shap to random forest regressor, I got error like
'RandomForestRegressor' object has no attribute 'estimators_'
can anybody help me on this one?
from shap.
Hi
when I apply shap to random forest regressor, I got error like
'RandomForestRegressor' object has no attribute 'estimators_'can anybody help me on this one?
I think this means that you need to fit your RandomForestRegressor
. This is a common error if you create your model and run it through a grid search or a cross-validation, since the model is passed as an argument and thus fitted "internally" in those functions.
from shap.
Hi, would like to know. In Randomforest if a variable/feature is out of the historic range, how will the SHAP value for the feature will be, eg will it have 0 impact on the force plot or a higher + value (if it exceed the upper limit of the historic data). SOS..!!
Note: as per my output the SHAP value is +0.25 close to zero, howevre i was expecting the SHAP value to be on the higer side, am i missing somthing ??
from shap.
Related Issues (20)
- BUG: 0.45.0 update breaks pytorch example on docs HOT 1
- x
- BUG: Error using Falcon for text-generation HOT 4
- BUG: Error when using DeepExplainer on LSTM Model HOT 1
- ENH: Partition Explainer for Video Models
- Does Feature/Column Order of dataset matter while calculating SHAP values? HOT 3
- When will the paddlepaddle framework be supported HOT 1
- BUG: LookupError: gradient registry has no entry for: shap_TensorListStack HOT 1
- BUG: shap summary plot for 3 group classification HOT 1
- ENH: Include directionality of feature association in beeswarm plot
- ENH: Support SeLU and activation function in Pytorch Deep Explainer
- BUG: tensorflow DeepExplainer SHAP explanations do not sum up to the model's output HOT 1
- Question: Using SHAP with GPT-4 via API HOT 1
- BUG: Warning: unrecognized nn.Module: Chomp1d HOT 2
- CI broken: mistralai Mistral-7B-v0.1 Tokenizer no longer accessible
- BUG: summary_plot ignores plot_type for TreeExplainer
- BUG: TypeError: waterfall() got an unexpected keyword argument 'features'
- BUG: Unexpected Interaction Plot Instead of Summary Plot in Multiclass SHAP Summary with XGBoost HOT 1
- BUG: Workflow failure on macOS when building 'lightgbm'
- ENH: expose raw feature categories in shap.plots.bar HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from shap.