I'm very interested in your Shapely method for consistent feature attribution for tree

Let me clarify a few aspects. <a target="_blank" rel="noopener noref

I should also point out that <a href="https://github.com/dmlc/xgboost/blob/master/incl

I know, the python simulator I wrote in <a href="https://github.com/ihadanny/shap_simu

how does the algorithm consider the number of subsets behind each path?,about slundberg/shap

Comments (24)

sh1ng commented on May 5, 2024 1

Let me clarify a few aspects.

(the articles uses N instead of n)

The articles stand that when we iterate down the tree we can use depth instead of N(number of features). It can be proven by induction. If depth == N it's trivial.
If depth = N+1 two groups of sets are contributing to phi for every unique set {S}:

which is size |S| and contributes with w equals to |S|!(N+1 - |S| -1)!/(N+1)! if we denote $ as the original w coefficient(|S|!(N-|S|-1)!/N!) it becomes $ * (N-|S|)/(N+1)
A new set of size |S+1|, a new feature get appended to set {S}. It has identical v(S U {i}) -v(S) second term as our tree with depth N put it into the same leaf nodes(although it has an additional feature).And the difference equals the difference for the previous point. w for such sets is |S+1|!(N-|S|-1)/(N + 1)! and we have $*(|S|+1)/(N+1)
Summation gives $ * (N-|S|)/(N+1) + $*(|S| + 1)/(N+1) = $ which is identical to N features case.

from shap.

slundberg commented on May 5, 2024

Didn't get to this before heading out...but I'll reply when I am back next week :)

from shap.

ihadanny commented on May 5, 2024

hey, sorry for bugging you on this, but I tried running the algorithm on the simplest case I can think of, and couldn't get the contributions sum to equal the prediction :(
can you please take a quick look?
https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator.ipynb

(Predict and CalculateContributionsApprox returns 1.0, while CalculateContributions returns -3.0...)
thanks again! Ido

from shap.

slundberg commented on May 5, 2024

Sorry for being slow! I have a deadline in just over a week that has been slowing everything down. I'll take a look as soon as I can though (hopefully before then).

…

On Fri, Feb 2, 2018 at 1:04 PM Ido Hadanny ***@***.***> wrote: hey, sorry for bugging you on this, but I tried running the algorithm on the simplest case I can think of, and couldn't get the contributions sum to equal the prediction :( can you take a quick look? https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator.ipynb — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#24 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADkTxYkQaZKTB7gQ6vsU39DHNt4pvtOTks5tQ3hwgaJpZM4RtcFB> .

from shap.

slundberg commented on May 5, 2024

I should also point out that https://github.com/dmlc/xgboost/blob/master/include/xgboost/tree_model.h has a C++ version of the algorithm in the TreeSHAP method that would be a useful reference.

from shap.

ihadanny commented on May 5, 2024

I know, the python simulator I wrote in https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator.ipynb is just an exact copy of that c++ version... :)

from shap.

slundberg commented on May 5, 2024

Just wanted to say sorry for the delay, I'll still get back to this when I can, but some graduation-important tasks have prevented me from taking the time it will take to sit down and re-work through the algorithm to check this. When I do I hope to release a python version as well.

from shap.

ihadanny commented on May 5, 2024

waiting anxiously! thanks for not forgetting!

from shap.

slundberg commented on May 5, 2024

I finally wrote up a python version of Tree SHAP that is now in the notebooks directory :) ...I need to do more careful testing, but it seems to producing the correct values.

Could you look over/play with that code and let me know what questions you still have?

from shap.

ihadanny commented on May 5, 2024

wow! what a great follow-up! promise to play with it this week and give feedback.

…

On 7 April 2018 at 03:10, Scott Lundberg ***@***.***> wrote: I finally wrote up a python version of Tree SHAP that is now in the notebooks directory :) ...I need to do more careful testing, but it seems to producing the correct values. Could you look over/play with that code and let me know what questions you still have? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#24 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA6MDf8tIpo9kFhuXAgBl79kDHl7OFP1ks5tmAQNgaJpZM4RtcFB> .

-- Sent from my androido

from shap.

ihadanny commented on May 5, 2024

hey Scott, I'm sorry to say that even your simplified python version eludes me :(
Can you please clarify what do the "condition" and "condition_feature" parameters mean?

from shap.

ihadanny commented on May 5, 2024

Also, why does compute_expectations recompute the expectations to make sure they follow the SHAP logic and does not also recompute node_sample_weight and makes sure node_sample_weight(parent) = node_sample_weight(left_child) + node_sample_weight(right_child) ?

from shap.

ihadanny commented on May 5, 2024

I now use your exact code to explain the toy example I gave above: https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator2.ipynb . I still don't understand the significance of the values I'm getting. I'm trying to explain the single instance [0, 111, 222, 333, 444] for the tree I gave above

#     0
#   /    \
#  1       2
#        /   \
#      3      4
#           /  \
#          5    6

where the hot path is always left for this instance. I'm getting the shap values [-2.08333333, -0.58333333, -0.08333333, 0. , 0. , 3.75 ], where 3.75 is indeed the correct bias term, but how should I interpret the other values???

if I calculate 3.75 - 2.083*0 - 0.5833 * 111 - 0.0833 * 222 I'm getting -79.4889, which is clearly wrong, as the prediction is of course 1...

can you please make it extra simpler for me? Thanks, Ido

from shap.

slundberg commented on May 5, 2024

You can ignore the condition and condition feature parameters for now and assume they are always off. They are used for computing SHAP interaction values (which is not exposed yet in the python code).

I recompute the expectations because I didn't want to assume the way they were computed. And if we use non-sklearn models they might not be precomputed. I assumed the weights would always be correct.

For your example perhaps it would be better to start with an even simpler scenario. For example:

https://github.com/slundberg/shap/blob/master/notebooks/Understanding%20Tree%20SHAP%20for%20Simple%20Models.ipynb

from shap.

ihadanny commented on May 5, 2024

hey, I know this is old and I should probably let go if I haven't got it by now, but any chance you can try and explain how do EXTEND and UNWIND do what they do?

from shap.

slundberg commented on May 5, 2024

I have been thinking about working on a sparse version, and some other tweaks. If I start that I'll also try and write up a longer explanation of EXTEND and UNWIND. I can't promise when, but it's on my list.

from shap.

sh1ng commented on May 5, 2024

@slundberg Hi Scott!

I'm also trying to grasp ideas behind EXTEND and UNWIND. Could you explain it in more details. Or may be with some visualization.

Thanks

from shap.

yupbank commented on May 5, 2024

did anyone get any luck on understanding them?

from shap.

sh1ng commented on May 5, 2024

Another 2 cents about pweights coefficients

https://github.com/slundberg/shap/blob/master/shap/explainers/pytree.py#L255-L262

    if unique_depth == 0:
        pweights[unique_depth] = 1.
    else:
        pweights[unique_depth] = 0.

    for i in range(unique_depth - 1, -1, -1):
        pweights[i + 1] += one_fraction * pweights[i] * (i + 1.) / (unique_depth + 1.)
        pweights[i] = zero_fraction * pweights[i] * (unique_depth - i) / (unique_depth + 1.)

Correct me if I'm wrong @slundberg pweights contain multiplication of w from the previous comment and probability.

Let's start with simple example

  |
 / \

like a first split. S in this case(before the split) equals to empty set(tree of depth 1, so we can only consider sets smaller that that). w becomes 0!(1-0-1)!/1! = 1

At the split point we separate all possible sets into two groups.

Which expands(by splitting feature) for 0 to 1
Which stays the same size 0
It doesn't matter how much features in set S(see my previous comment) matters where we are in a tree's path(how many features are utilized). For simple single split

pweights[1] = one_fraction * w[1] = one_fraction * 1!*0!/2! <- only one_fraction can expand to |S|=1
pweights[0] = zero_fraction * w[0] = zero_fraction * 0!*0!/2!

In more complected case when |S| expands to |S+1| it's crucial to notice that not only "bottom" splits, but sets of all sizes.

1 ------------- 1
     \_________ 2
2  ------------ 2
     \_________ 3

...
S-1 ----------- S-1
     \_________ S
S  ------------- S
     \_________ S+1

To correctly compute a new state(|S+1|) from the current state(|S|) we need to move "buttom->top".

pweights_S+1 = w(S+1, N+1) * one_fraction * fraction_for_S  = 
|S+1|! (N-|S|-1)!/(N+1)! *  one_fraction * fraction_for_S =
one_fraction * |S|! (N-|S|-1)!/(N)!  * fraction_for_S * |S+1|/(N+1) = 
one_fraction * pweights_S * |S+1|/(N+1)

S level is composed as "zero_fraction"s and expansion from level S-1

First term

pweights_S = w(S, N+1) * zero_fraction * fraction_for_S  = 
|S|! (N-|S|-1)!/(N+1)! *  zero_fraction * fraction_for_S =
zero_fraction * |S|! (N-|S|)!/(N)!  * fraction_for_S /(N+1) = 
zero_fraction * pweights_S * (N - |S|) /(N+1)

second term

pweights_S += w(S, N+1) * one_fraction * fraction_for_S-1  = 
|S|! (N-|S|)!/(N+1)! *  one_fraction * fraction_for_S-1 =
one_fraction * |S-1|! (N-|S-1|-1)!/(N)!  * fraction_for_S-1 * |S|/(N+1) = 
one_fraction * pweights_S-1 * |S|/(N+1)

and so on...

More comments are coming.

from shap.

ihadanny commented on May 5, 2024

@sh1ng - struggling to read your enlightening comments.
for the first comment:

where did you see in the paper https://www.nature.com/articles/s42256-019-0138-9.epdf?shared_access_token=RCYPTVkiECUmc0CccSMgXtRgN0jAjWel9jnR3ZoTv0O81kV8DqPb2VXSseRmof0Pl8YSOZy4FHz5vMc3xsxcX6uT10EzEoWo7B-nZQAHJJvBYhQJTT1LnJmpsa48nlgUWrMkThFrEIvZstjQ7Xdc5g%3D%3D that you can use depth instead of N?
why do you start the induction from depth=N? what about depths 1,2...N-1?
what are the 2 groups of sets you're describing? the first group are sets with the same size as S - is this the group of sets created when the splitting feature at the node at level N was a feature already encountered somewhere in the path? and the second group is the sets that were created when it's a new feature that wasn't encountered in the path?
for the second group, why do you say it has identical v(S U {i}) -v(S)? it's split on a new feature so what do you mean by that? S would have landed in both sides of the split as it doesn't contain this new feature
what's the meaning of showing that the sum of the contribution of these 2 groups looks like the contribution of the original set?

from shap.

sh1ng commented on May 5, 2024

https://www.nature.com/articles/s41551-018-0304-0.epdf?author_access_token=vSPt7ryUfdSCv4qcyeEuCdRgN0jAjWel9jnR3ZoTv0PdqacSN9qNY_fC0jWkIQUd0L2zaj3bbIQEdrTqCczGWv2brU5rTJPxyss1N4yTIHpnSv5_nBVJoUbvejyvvjrGTb2odwWKT2Bfvl0ExQKhZw%3D%3D or https://arxiv.org/pdf/1802.03888.pdf or https://arxiv.org/pdf/1706.06060.pdf
It's opposite. I'm starting with number of features N equals to depth and than show that for N+1 features result is the same. Looks like there's one edge case when N < depth. I'll work on it.
Not sure that I got your question. Let's say we have a path with features [1, 2, 3] and adding a new feature 4 . We need to compute w * p for sets of size 4([1, 2, 3, 4]) and update w * p for size 3 ([1, 2, 3, 4 is missing], [1 is missing, 2, 3, 4], [1, 2 is missing, 3, 4], [1, 2, 3 is missing, 4]). It's done with dynamic programming.

from shap.

ihadanny commented on May 5, 2024

@sh1ng - thanks for your patience.
can you please help me first understand the very basics?
please take a look at this notebook

I'm trying to understand first how can we do the polynomial calculation if we ignore the part where w depends on the size of the set |S|. In the example tree from my notebook - we start with only the empty set {} at the root. when we move to the right "cold" child, we're supposed to add {2} as well, agree? what will cause us to do so?

from shap.

ihadanny commented on May 5, 2024

@sh1ng - please review https://cs.stackexchange.com/questions/133247/how-does-the-shap-algorithm-work-in-polynomial-time
and https://colab.research.google.com/drive/1itp2Tql38DhuNwirmz1LgGQZFX7bzmdG#scrollTo=f-Vl9TOfRDbg

from shap.

sh1ng commented on May 5, 2024

@ihadanny

I'm trying to understand first how can we do the polynomial calculation if we ignore the part where w depends on the size of the set |S|. In the example tree from my notebook - we start with only the empty set {} at the root. when we move to the right "cold" child, we're supposed to add {2} as well, agree? what will cause us to do so?

It doesn't work like that. We propagate through a tree all subsets(more precisely all subsets matching current feature path) as a whole. Not considering them separably. Let's say we have a degenerate case of a tree with a single leaf node. It doesn't matter how many features we have all subsets will end up in the leaf. Now let's consider a tree with a single split node and 2 leafs. It's easy to see that all subsets split into 2 groups(hot/cold).

from shap.

how does the algorithm consider the number of subsets behind each path? about shap HOT 24 OPEN

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent