Giter Club home page Giter Club logo

Comments (24)

sh1ng avatar sh1ng commented on May 5, 2024 1

Let me clarify a few aspects.

bc8ed8adc24cb67b8a19d5f9488287f4449cc075
(the articles uses N instead of n)

The articles stand that when we iterate down the tree we can use depth instead of N(number of features). It can be proven by induction. If depth == N it's trivial.
If depth = N+1 two groups of sets are contributing to phi for every unique set {S}:

  • which is size |S| and contributes with w equals to |S|!(N+1 - |S| -1)!/(N+1)! if we denote $ as the original w coefficient(|S|!(N-|S|-1)!/N!) it becomes $ * (N-|S|)/(N+1)
  • A new set of size |S+1|, a new feature get appended to set {S}. It has identical v(S U {i}) -v(S) second term as our tree with depth N put it into the same leaf nodes(although it has an additional feature).And the difference equals the difference for the previous point. w for such sets is |S+1|!(N-|S|-1)/(N + 1)! and we have $*(|S|+1)/(N+1)
  • Summation gives $ * (N-|S|)/(N+1) + $*(|S| + 1)/(N+1) = $ which is identical to N features case.

from shap.

slundberg avatar slundberg commented on May 5, 2024

Didn't get to this before heading out...but I'll reply when I am back next week :)

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

hey, sorry for bugging you on this, but I tried running the algorithm on the simplest case I can think of, and couldn't get the contributions sum to equal the prediction :(
can you please take a quick look?
https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator.ipynb

(Predict and CalculateContributionsApprox returns 1.0, while CalculateContributions returns -3.0...)
thanks again! Ido

from shap.

slundberg avatar slundberg commented on May 5, 2024

from shap.

slundberg avatar slundberg commented on May 5, 2024

I should also point out that https://github.com/dmlc/xgboost/blob/master/include/xgboost/tree_model.h has a C++ version of the algorithm in the TreeSHAP method that would be a useful reference.

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

I know, the python simulator I wrote in https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator.ipynb is just an exact copy of that c++ version... :)

from shap.

slundberg avatar slundberg commented on May 5, 2024

Just wanted to say sorry for the delay, I'll still get back to this when I can, but some graduation-important tasks have prevented me from taking the time it will take to sit down and re-work through the algorithm to check this. When I do I hope to release a python version as well.

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

waiting anxiously! thanks for not forgetting!

from shap.

slundberg avatar slundberg commented on May 5, 2024

I finally wrote up a python version of Tree SHAP that is now in the notebooks directory :) ...I need to do more careful testing, but it seems to producing the correct values.

Could you look over/play with that code and let me know what questions you still have?

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

hey Scott, I'm sorry to say that even your simplified python version eludes me :(
Can you please clarify what do the "condition" and "condition_feature" parameters mean?

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

Also, why does compute_expectations recompute the expectations to make sure they follow the SHAP logic and does not also recompute node_sample_weight and makes sure node_sample_weight(parent) = node_sample_weight(left_child) + node_sample_weight(right_child) ?

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

I now use your exact code to explain the toy example I gave above: https://github.com/ihadanny/shap_simulator/blob/master/shap_simulator2.ipynb . I still don't understand the significance of the values I'm getting. I'm trying to explain the single instance [0, 111, 222, 333, 444] for the tree I gave above

#     0
#   /    \
#  1       2
#        /   \
#      3      4
#           /  \
#          5    6

where the hot path is always left for this instance. I'm getting the shap values [-2.08333333, -0.58333333, -0.08333333, 0. , 0. , 3.75 ], where 3.75 is indeed the correct bias term, but how should I interpret the other values???

if I calculate 3.75 - 2.083*0 - 0.5833 * 111 - 0.0833 * 222 I'm getting -79.4889, which is clearly wrong, as the prediction is of course 1...

can you please make it extra simpler for me? Thanks, Ido

from shap.

slundberg avatar slundberg commented on May 5, 2024

You can ignore the condition and condition feature parameters for now and assume they are always off. They are used for computing SHAP interaction values (which is not exposed yet in the python code).

I recompute the expectations because I didn't want to assume the way they were computed. And if we use non-sklearn models they might not be precomputed. I assumed the weights would always be correct.

For your example perhaps it would be better to start with an even simpler scenario. For example:

https://github.com/slundberg/shap/blob/master/notebooks/Understanding%20Tree%20SHAP%20for%20Simple%20Models.ipynb

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

hey, I know this is old and I should probably let go if I haven't got it by now, but any chance you can try and explain how do EXTEND and UNWIND do what they do?

from shap.

slundberg avatar slundberg commented on May 5, 2024

I have been thinking about working on a sparse version, and some other tweaks. If I start that I'll also try and write up a longer explanation of EXTEND and UNWIND. I can't promise when, but it's on my list.

from shap.

sh1ng avatar sh1ng commented on May 5, 2024

@slundberg Hi Scott!

I'm also trying to grasp ideas behind EXTEND and UNWIND. Could you explain it in more details. Or may be with some visualization.

Thanks

from shap.

yupbank avatar yupbank commented on May 5, 2024

did anyone get any luck on understanding them?

from shap.

sh1ng avatar sh1ng commented on May 5, 2024

Another 2 cents about pweights coefficients

https://github.com/slundberg/shap/blob/master/shap/explainers/pytree.py#L255-L262

    if unique_depth == 0:
        pweights[unique_depth] = 1.
    else:
        pweights[unique_depth] = 0.

    for i in range(unique_depth - 1, -1, -1):
        pweights[i + 1] += one_fraction * pweights[i] * (i + 1.) / (unique_depth + 1.)
        pweights[i] = zero_fraction * pweights[i] * (unique_depth - i) / (unique_depth + 1.)

Correct me if I'm wrong @slundberg pweights contain multiplication of w from the previous comment and probability.

Let's start with simple example

  |
 / \

like a first split. S in this case(before the split) equals to empty set(tree of depth 1, so we can only consider sets smaller that that). w becomes 0!(1-0-1)!/1! = 1

At the split point we separate all possible sets into two groups.

  • Which expands(by splitting feature) for 0 to 1
  • Which stays the same size 0
    It doesn't matter how much features in set S(see my previous comment) matters where we are in a tree's path(how many features are utilized). For simple single split
pweights[1] = one_fraction * w[1] = one_fraction * 1!*0!/2! <- only one_fraction can expand to |S|=1
pweights[0] = zero_fraction * w[0] = zero_fraction * 0!*0!/2!

In more complected case when |S| expands to |S+1| it's crucial to notice that not only "bottom" splits, but sets of all sizes.

1 ------------- 1
     \_________ 2
2  ------------ 2
     \_________ 3

...
S-1 ----------- S-1
     \_________ S
S  ------------- S
     \_________ S+1

To correctly compute a new state(|S+1|) from the current state(|S|) we need to move "buttom->top".

pweights_S+1 = w(S+1, N+1) * one_fraction * fraction_for_S  = 
|S+1|! (N-|S|-1)!/(N+1)! *  one_fraction * fraction_for_S =
one_fraction * |S|! (N-|S|-1)!/(N)!  * fraction_for_S * |S+1|/(N+1) = 
one_fraction * pweights_S * |S+1|/(N+1)

S level is composed as "zero_fraction"s and expansion from level S-1

First term

pweights_S = w(S, N+1) * zero_fraction * fraction_for_S  = 
|S|! (N-|S|-1)!/(N+1)! *  zero_fraction * fraction_for_S =
zero_fraction * |S|! (N-|S|)!/(N)!  * fraction_for_S /(N+1) = 
zero_fraction * pweights_S * (N - |S|) /(N+1)

second term

pweights_S += w(S, N+1) * one_fraction * fraction_for_S-1  = 
|S|! (N-|S|)!/(N+1)! *  one_fraction * fraction_for_S-1 =
one_fraction * |S-1|! (N-|S-1|-1)!/(N)!  * fraction_for_S-1 * |S|/(N+1) = 
one_fraction * pweights_S-1 * |S|/(N+1)

and so on...

More comments are coming.

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

@sh1ng - struggling to read your enlightening comments.
for the first comment:

  1. where did you see in the paper https://www.nature.com/articles/s42256-019-0138-9.epdf?shared_access_token=RCYPTVkiECUmc0CccSMgXtRgN0jAjWel9jnR3ZoTv0O81kV8DqPb2VXSseRmof0Pl8YSOZy4FHz5vMc3xsxcX6uT10EzEoWo7B-nZQAHJJvBYhQJTT1LnJmpsa48nlgUWrMkThFrEIvZstjQ7Xdc5g%3D%3D that you can use depth instead of N?
  2. why do you start the induction from depth=N? what about depths 1,2...N-1?
  3. what are the 2 groups of sets you're describing? the first group are sets with the same size as S - is this the group of sets created when the splitting feature at the node at level N was a feature already encountered somewhere in the path? and the second group is the sets that were created when it's a new feature that wasn't encountered in the path?
  4. for the second group, why do you say it has identical v(S U {i}) -v(S)? it's split on a new feature so what do you mean by that? S would have landed in both sides of the split as it doesn't contain this new feature
  5. what's the meaning of showing that the sum of the contribution of these 2 groups looks like the contribution of the original set?

from shap.

sh1ng avatar sh1ng commented on May 5, 2024
  1. https://www.nature.com/articles/s41551-018-0304-0.epdf?author_access_token=vSPt7ryUfdSCv4qcyeEuCdRgN0jAjWel9jnR3ZoTv0PdqacSN9qNY_fC0jWkIQUd0L2zaj3bbIQEdrTqCczGWv2brU5rTJPxyss1N4yTIHpnSv5_nBVJoUbvejyvvjrGTb2odwWKT2Bfvl0ExQKhZw%3D%3D or https://arxiv.org/pdf/1802.03888.pdf or https://arxiv.org/pdf/1706.06060.pdf
  2. It's opposite. I'm starting with number of features N equals to depth and than show that for N+1 features result is the same. Looks like there's one edge case when N < depth. I'll work on it.
  3. Not sure that I got your question. Let's say we have a path with features [1, 2, 3] and adding a new feature 4 . We need to compute w * p for sets of size 4([1, 2, 3, 4]) and update w * p for size 3 ([1, 2, 3, 4 is missing], [1 is missing, 2, 3, 4], [1, 2 is missing, 3, 4], [1, 2, 3 is missing, 4]). It's done with dynamic programming.

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

@sh1ng - thanks for your patience.
can you please help me first understand the very basics?
please take a look at this notebook

I'm trying to understand first how can we do the polynomial calculation if we ignore the part where w depends on the size of the set |S|. In the example tree from my notebook - we start with only the empty set {} at the root. when we move to the right "cold" child, we're supposed to add {2} as well, agree? what will cause us to do so?

from shap.

ihadanny avatar ihadanny commented on May 5, 2024

@sh1ng - please review https://cs.stackexchange.com/questions/133247/how-does-the-shap-algorithm-work-in-polynomial-time
and https://colab.research.google.com/drive/1itp2Tql38DhuNwirmz1LgGQZFX7bzmdG#scrollTo=f-Vl9TOfRDbg

from shap.

sh1ng avatar sh1ng commented on May 5, 2024

@ihadanny

I'm trying to understand first how can we do the polynomial calculation if we ignore the part where w depends on the size of the set |S|. In the example tree from my notebook - we start with only the empty set {} at the root. when we move to the right "cold" child, we're supposed to add {2} as well, agree? what will cause us to do so?

It doesn't work like that. We propagate through a tree all subsets(more precisely all subsets matching current feature path) as a whole. Not considering them separably. Let's say we have a degenerate case of a tree with a single leaf node. It doesn't matter how many features we have all subsets will end up in the leaf. Now let's consider a tree with a single split node and 2 leafs. It's easy to see that all subsets split into 2 groups(hot/cold).

from shap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.