Comments (24)
Let me clarify a few aspects.
(the articles uses N
instead of n
)
The articles stand that when we iterate down the tree we can use depth instead of N
(number of features). It can be proven by induction. If depth == N
it's trivial.
If depth = N+1
two groups of sets are contributing to phi
for every unique set {S}
:
- which is size
|S|
and contributes withw
equals to|S|!(N+1 - |S| -1)!/(N+1)!
if we denote$
as the originalw
coefficient(|S|!(N-|S|-1)!/N!
) it becomes$ * (N-|S|)/(N+1)
- A new set of size
|S+1|
, a new feature get appended to set{S}
. It has identicalv(S U {i}) -v(S)
second term as our tree with depthN
put it into the same leaf nodes(although it has an additional feature).And the difference equals the difference for the previous point.w
for such sets is|S+1|!(N-|S|-1)/(N + 1)!
and we have$*(|S|+1)/(N+1)
- Summation gives
$ * (N-|S|)/(N+1) + $*(|S| + 1)/(N+1) = $
which is identical toN
features case.
from shap.
Didn't get to this before heading out...but I'll reply when I am back next week :)
from shap.
hey, sorry for bugging you on this, but I tried running the algorithm on the simplest case I can think of, and couldn't get the contributions sum to equal the prediction :(
can you please take a quick look?
https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator.ipynb
(Predict and CalculateContributionsApprox returns 1.0, while CalculateContributions returns -3.0...)
thanks again! Ido
from shap.
from shap.
I should also point out that https://github.com/dmlc/xgboost/blob/master/include/xgboost/tree_model.h has a C++ version of the algorithm in the TreeSHAP
method that would be a useful reference.
from shap.
I know, the python simulator I wrote in https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator.ipynb is just an exact copy of that c++ version... :)
from shap.
Just wanted to say sorry for the delay, I'll still get back to this when I can, but some graduation-important tasks have prevented me from taking the time it will take to sit down and re-work through the algorithm to check this. When I do I hope to release a python version as well.
from shap.
waiting anxiously! thanks for not forgetting!
from shap.
I finally wrote up a python version of Tree SHAP that is now in the notebooks directory :) ...I need to do more careful testing, but it seems to producing the correct values.
Could you look over/play with that code and let me know what questions you still have?
from shap.
from shap.
hey Scott, I'm sorry to say that even your simplified python version eludes me :(
Can you please clarify what do the "condition" and "condition_feature" parameters mean?
from shap.
Also, why does compute_expectations recompute the expectations to make sure they follow the SHAP logic
and does not also recompute node_sample_weight and makes sure node_sample_weight(parent) = node_sample_weight(left_child) + node_sample_weight(right_child) ?
from shap.
I now use your exact code to explain the toy example I gave above: https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator2.ipynb . I still don't understand the significance of the values I'm getting. I'm trying to explain the single instance [0, 111, 222, 333, 444]
for the tree I gave above
# 0
# / \
# 1 2
# / \
# 3 4
# / \
# 5 6
where the hot path is always left for this instance. I'm getting the shap values [-2.08333333, -0.58333333, -0.08333333, 0. , 0. , 3.75 ]
, where 3.75 is indeed the correct bias term, but how should I interpret the other values???
if I calculate 3.75 - 2.083*0 - 0.5833 * 111 - 0.0833 * 222
I'm getting -79.4889
, which is clearly wrong, as the prediction is of course 1...
can you please make it extra simpler for me? Thanks, Ido
from shap.
You can ignore the condition
and condition feature
parameters for now and assume they are always off. They are used for computing SHAP interaction values (which is not exposed yet in the python code).
I recompute the expectations because I didn't want to assume the way they were computed. And if we use non-sklearn models they might not be precomputed. I assumed the weights would always be correct.
For your example perhaps it would be better to start with an even simpler scenario. For example:
from shap.
hey, I know this is old and I should probably let go if I haven't got it by now, but any chance you can try and explain how do EXTEND and UNWIND do what they do?
from shap.
I have been thinking about working on a sparse version, and some other tweaks. If I start that I'll also try and write up a longer explanation of EXTEND and UNWIND. I can't promise when, but it's on my list.
from shap.
@slundberg Hi Scott!
I'm also trying to grasp ideas behind EXTEND
and UNWIND
. Could you explain it in more details. Or may be with some visualization.
Thanks
from shap.
did anyone get any luck on understanding them?
from shap.
Another 2 cents about pweights
coefficients
https://github.com/slundberg/shap/blob/master/shap/explainers/pytree.py#L255-L262
if unique_depth == 0:
pweights[unique_depth] = 1.
else:
pweights[unique_depth] = 0.
for i in range(unique_depth - 1, -1, -1):
pweights[i + 1] += one_fraction * pweights[i] * (i + 1.) / (unique_depth + 1.)
pweights[i] = zero_fraction * pweights[i] * (unique_depth - i) / (unique_depth + 1.)
Correct me if I'm wrong @slundberg pweights
contain multiplication of w
from the previous comment and probability.
Let's start with simple example
|
/ \
like a first split. S
in this case(before the split) equals to empty set(tree of depth 1, so we can only consider sets smaller that that). w
becomes 0!(1-0-1)!/1! = 1
At the split point we separate all possible sets into two groups.
- Which expands(by splitting feature) for 0 to 1
- Which stays the same size 0
It doesn't matter how much features in setS
(see my previous comment) matters where we are in a tree's path(how many features are utilized). For simple single split
pweights[1] = one_fraction * w[1] = one_fraction * 1!*0!/2! <- only one_fraction can expand to |S|=1
pweights[0] = zero_fraction * w[0] = zero_fraction * 0!*0!/2!
In more complected case when |S|
expands to |S+1|
it's crucial to notice that not only "bottom" splits, but sets of all sizes.
1 ------------- 1
\_________ 2
2 ------------ 2
\_________ 3
...
S-1 ----------- S-1
\_________ S
S ------------- S
\_________ S+1
To correctly compute a new state(|S+1|
) from the current state(|S|
) we need to move "buttom->top".
pweights_S+1 = w(S+1, N+1) * one_fraction * fraction_for_S =
|S+1|! (N-|S|-1)!/(N+1)! * one_fraction * fraction_for_S =
one_fraction * |S|! (N-|S|-1)!/(N)! * fraction_for_S * |S+1|/(N+1) =
one_fraction * pweights_S * |S+1|/(N+1)
S
level is composed as "zero_fraction"s and expansion from level S-1
First term
pweights_S = w(S, N+1) * zero_fraction * fraction_for_S =
|S|! (N-|S|-1)!/(N+1)! * zero_fraction * fraction_for_S =
zero_fraction * |S|! (N-|S|)!/(N)! * fraction_for_S /(N+1) =
zero_fraction * pweights_S * (N - |S|) /(N+1)
second term
pweights_S += w(S, N+1) * one_fraction * fraction_for_S-1 =
|S|! (N-|S|)!/(N+1)! * one_fraction * fraction_for_S-1 =
one_fraction * |S-1|! (N-|S-1|-1)!/(N)! * fraction_for_S-1 * |S|/(N+1) =
one_fraction * pweights_S-1 * |S|/(N+1)
and so on...
More comments are coming.
from shap.
@sh1ng - struggling to read your enlightening comments.
for the first comment:
- where did you see in the paper https://www.nature.com/articles/s42256-019-0138-9.epdf?shared_access_token=RCYPTVkiECUmc0CccSMgXtRgN0jAjWel9jnR3ZoTv0O81kV8DqPb2VXSseRmof0Pl8YSOZy4FHz5vMc3xsxcX6uT10EzEoWo7B-nZQAHJJvBYhQJTT1LnJmpsa48nlgUWrMkThFrEIvZstjQ7Xdc5g%3D%3D that you can use depth instead of N?
- why do you start the induction from depth=N? what about depths 1,2...N-1?
- what are the 2 groups of sets you're describing? the first group are sets with the same size as S - is this the group of sets created when the splitting feature at the node at level N was a feature already encountered somewhere in the path? and the second group is the sets that were created when it's a new feature that wasn't encountered in the path?
- for the second group, why do you say it has identical
v(S U {i}) -v(S)
? it's split on a new feature so what do you mean by that? S would have landed in both sides of the split as it doesn't contain this new feature - what's the meaning of showing that the sum of the contribution of these 2 groups looks like the contribution of the original set?
from shap.
- https://www.nature.com/articles/s41551-018-0304-0.epdf?author_access_token=vSPt7ryUfdSCv4qcyeEuCdRgN0jAjWel9jnR3ZoTv0PdqacSN9qNY_fC0jWkIQUd0L2zaj3bbIQEdrTqCczGWv2brU5rTJPxyss1N4yTIHpnSv5_nBVJoUbvejyvvjrGTb2odwWKT2Bfvl0ExQKhZw%3D%3D or https://arxiv.org/pdf/1802.03888.pdf or https://arxiv.org/pdf/1706.06060.pdf
- It's opposite. I'm starting with number of features
N
equals to depth and than show that forN+1
features result is the same. Looks like there's one edge case whenN
<depth
. I'll work on it. - Not sure that I got your question. Let's say we have a path with features
[1, 2, 3]
and adding a new feature4
. We need to computew * p
for sets of size 4([1, 2, 3, 4]) and updatew * p
for size 3 ([1, 2, 3, 4 is missing], [1 is missing, 2, 3, 4], [1, 2 is missing, 3, 4], [1, 2, 3 is missing, 4]). It's done with dynamic programming.
from shap.
@sh1ng - thanks for your patience.
can you please help me first understand the very basics?
please take a look at this notebook
I'm trying to understand first how can we do the polynomial calculation if we ignore the part where w depends on the size of the set |S|
. In the example tree from my notebook - we start with only the empty set {}
at the root. when we move to the right "cold" child, we're supposed to add {2}
as well, agree? what will cause us to do so?
from shap.
@sh1ng - please review https://cs.stackexchange.com/questions/133247/how-does-the-shap-algorithm-work-in-polynomial-time
and https://colab.research.google.com/drive/1itp2Tql38DhuNwirmz1LgGQZFX7bzmdG#scrollTo=f-Vl9TOfRDbg
from shap.
I'm trying to understand first how can we do the polynomial calculation if we ignore the part where w depends on the size of the set
|S|
. In the example tree from my notebook - we start with only the empty set{}
at the root. when we move to the right "cold" child, we're supposed to add{2}
as well, agree? what will cause us to do so?
It doesn't work like that. We propagate through a tree all subsets(more precisely all subsets matching current feature path) as a whole. Not considering them separably. Let's say we have a degenerate case of a tree with a single leaf node. It doesn't matter how many features we have all subsets will end up in the leaf. Now let's consider a tree with a single split node and 2 leafs. It's easy to see that all subsets split into 2 groups(hot/cold).
from shap.
Related Issues (20)
- BUG: Pytorch LSTM Not recognizednn.Model:LSTM HOT 1
- BUG: compute interaction values for catboost HOT 2
- ENH: Explaining squared loss in Regression Tree HOT 1
- MAINT: implement "no-response" bot to close issues where the author hasn't responded to a request for more information HOT 1
- BUG: Error while installing shap into pypy's docker image HOT 1
- ENH: Run Notebooks in CI pipeline HOT 2
- BUG: Force plot JS Not Initialized HOT 4
- BUG: Inconsistencies between different tree-based ensembles with the TreeExplainer HOT 1
- BUG: Categorical features in HistGradientBoostingRegressor
- Add GPU Support for explainability. HOT 2
- Add Support for More nn.Modules Layers. HOT 1
- BUG: shap does not work with Embeddings and tf.version >= 2.4
- BUG: Corresponding output_ids are not removed with duplicate masks
- SHAP Explainer throws an error while using transformers model for abstractive summarization HOT 11
- BUG: I cannot make sense of this simple example HOT 2
- KernelExplainer shap_values giving different ndims InvalidArgumentError
- BUG: SHAP force plots HTML is broken (regression) HOT 3
- BUG: grad can be implicitly created only for scalar outputs HOT 1
- BUG: posx and posy should be finite values HOT 1
- BUG: Shap_values have negative and possitive values; Nevertheless, all graphs have blue dots HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from shap.