Hi, Tree SHAP seems to work great on boosted tree models like XGBoos

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

When I use shap.TreeExplainer to explain <code class=

Tree SHAP for random forests? about shap HOT 13 OPEN

slundberg commented on May 5, 2024

Tree SHAP for random forests?

from shap.

Comments (13)

banderlog commented on May 5, 2024 4

In my case, I had to manually create Explanation object with correct shape :

rfc = RandomForestClassifier()
rfc.fit(train_x, train_y)

# test_x is a np.ndarray with (2936911, 41) shape
df_tmp = pd.DataFrame(test_x, columns=feature_names)

masker = shap.maskers.Independent(df_tmp, max_samples=1000)
explainer = shap.TreeExplainer(rfc, masker)

# shap_values.values.shape ==  (10000, 41, 2)
# shap_values[1].values.shape == (41, 2)
shap_values = explainer(df_tmp.sample(10000, random_state=69))

# tmp.values.shape == tmp.data.shape == (10000, 41)
# I took all observations for all features just for positive prediction result
tmp = shap.Explanation(shap_values[:, :, 1], data=df_tmp, feature_names=feature_names)

# now both plots work
shap.plots.beeswarm(tmp)
#shap.summary_plot(tmp)

Plot without feature names (NDA and similar stuff):

Solutions above did not work:

shap.plots.beeswarm(shap_values)
>>> ValueError: "The beeswarm plot does not support plotting explanations with instances that have more than one dimension!"

shap.plots.beeswarm(shap_values[1])
>>>IndexError: "tuple index out of range"

But the strange thing, that some time ago shap.summary_plot(shap_values[1], tmp) worked for other RandomForest model and different data.

PS: shap-0.39.0, numpy-'1.20.3'

from shap.

LiWangSH commented on May 5, 2024 1

Hi @slundberg,

Thank you for the thread!

I was reading about plotting the shap.summary_plot(shap_values, X) for random forest and XGB binary classifiers, where shap_values = shap.TreeExplainer(clf).shap_values(X).

The interesting thing is that for the XGB classifier, shap_values in the summary plot is just as is in the calculation, whereas for the random forest, the shap_values needs to be shap_values[1], basically only the array for the positive label. I am interested in knowing why there is a discrepancy. Thank you so much!

Below, I included the example implementations for the random forest and XGB classifiers.
RF: https://medium.com/python-in-plain-english/random-forest-classifier-and-shap-how-to-understand-your-customers-and-interpret-a-black-box-model-6166d86820d9

XGB: https://github.com/slundberg/shap/blob/master/notebooks/tree_explainer/Census%20income%20classification%20with%20XGBoost.ipynb

from shap.

mxshen commented on May 5, 2024 1

When I use shap.TreeExplainer to explain RandomForestRegressor, it is very slow, but it is very fast when using shap.TreeExplainer to explain XGBRegressor. Does anyone have the same issue or know the reasons? Thx!

from shap.

slundberg commented on May 5, 2024

XGBoost does do bagging, and has parameters that can make it very similar to a random forest (using the DART parameters). So there is no reason TreeSHAP couldn't apply to any tree model, however it does scale quadratically with tree depth, so it would run a bit slower with random forest models since they tend to be really deep.

Making a general purpose tree library in Python or R is challenging since it is important to use a high performance language such as C++ to get good results since the algorithm involves loops and recursion, which are slow in a typical interpreted language.

Hope that helps!

from shap.

dswatson commented on May 5, 2024

Very cool, I'd love to see TreeSHAP in action on a random forest. And I'm sure you're right that it would be super slow to do general purpose tree building in Python or R directly. I was thinking more about using those languages as user-friendly frontends to fast lower level implementations, as they do with gbm, ranger, etc. I suspect the algorithm will eventually be incorporated into other packages for tree-based modeling the way it was with XGBoost. Looking forward to that!

from shap.

pietervosnl commented on May 5, 2024

Hi,

Nice work! Is it possible to provide an example on how to use the SHAP package with a random forest model? That would be much appreciated!

thanks

Pieter

from shap.

PurenBITeam commented on May 5, 2024

Hello everybody,

I got an issue with random forest regessor here. The same code which is working for a xgb model, brings the following error (see below). Can anybody explain why this isn't working with sklearn randomforestregressor?

Error: 'i' format requires -2147483648 <= number <= 2147483647

although I'm not using 'i' anywhere in the code

thanks and br

christoph

from shap.

slundberg commented on May 5, 2024

Looks like something is out of the range of an int. Could you share a minimal example that reproduces the problem?

…

On Mon, Dec 3, 2018 at 7:20 AM christophgm ***@***.***> wrote: Hello everybody, I got an issue with random forest regessor here. The same code which is working for a xgb model, brings the following error (see below). Can anybody explain why this isn't working with sklearn randomforestregressor? Error: 'i' format requires -2147483648 <= number <= 2147483647 although I'm not using 'i' anywhere in the code thanks and br christoph — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADkTxXyFMF4tR13QKescJLiGKlwXQBBmks5u1UE-gaJpZM4RdrMR> .

from shap.

PurenBITeam commented on May 5, 2024

Dear Scott,
thanks for your fast answer.

Please see below 2 code snippets: The first shows the xgb case, which is working fine. The second shows the RandomForestRegressor(sklearn) case, which gives the error above.

XGB:
import shap
explainer = shap.TreeExplainer(model)
shap.initjs()
shap_values = explainer.shap_values(PredData, approximate=True)

model: <xgboost.core.Booster at 0x7f641316cdd8>

RF:
import shap
explainer = shap.TreeExplainer(modelrf)
shap.initjs()
shap_values = explainer.shap_values(PredData, approximate=True)

modelrf: RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=35,
max_features=20, max_leaf_nodes=None, min_impurity_decrease=0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=5, min_weight_fraction_leaf=0.0,
n_estimators=750, n_jobs=4, oob_score=False,
random_state=201810, verbose=1, warm_start=False)

PredData:

In case you need more information; just tell me.

thanks and br

christoph

from shap.

slundberg commented on May 5, 2024

@christophgm this runs fine when I use another dataset, can you provide a full example?

from shap.

zhihaoyan commented on May 5, 2024

Hi
when I apply shap to random forest regressor, I got error like
'RandomForestRegressor' object has no attribute 'estimators_'

can anybody help me on this one?

from shap.

arturomoncadatorres commented on May 5, 2024

Hi
when I apply shap to random forest regressor, I got error like
'RandomForestRegressor' object has no attribute 'estimators_'

can anybody help me on this one?

I think this means that you need to fit your RandomForestRegressor. This is a common error if you create your model and run it through a grid search or a cross-validation, since the model is passed as an argument and thus fitted "internally" in those functions.

from shap.

condran999 commented on May 5, 2024

Hi, would like to know. In Randomforest if a variable/feature is out of the historic range, how will the SHAP value for the feature will be, eg will it have 0 impact on the force plot or a higher + value (if it exceed the upper limit of the historic data). SOS..!!

Note: as per my output the SHAP value is +0.25 close to zero, howevre i was expecting the SHAP value to be on the higer side, am i missing somthing ??

from shap.

Tree SHAP for random forests? about shap HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent