Giter Club home page Giter Club logo

Comments (3)

kbattocchi avatar kbattocchi commented on May 25, 2024

Thanks for the feedback - I agree that option 4 seems like the best approach, and we'd be happy to take a PR fixing it. I am surprised that our existing tests in test_shap.py don't catch this (because they do cover CausalForestDML and they run against shap==0.43); please add a test that fails without your fix (or modify at least one of the existing tests to do so).

Also, we'd be happy to allow a higher upper bound (say shap<0.46) if there are not other breaking changes that are hard to work around; the only reason that we have a specific upper bound is that shap releases have had breaking changes that have broken functionality in the past.

from econml.

jcreinhold avatar jcreinhold commented on May 25, 2024

@kbattocchi I'm not able to consistently reproduce this error. See #445 for another valiant but ultimately futile effort to reproduce it consistently.

For posterity, here is (essentially) the code I was running when I (occasionally) ran into the error:

Versions:

  • Python version: 3.8.17
  • econml version: 0.15.0
  • numpy version: 1.23.5
  • scikit-learn version: 1.3.2
  • scipy version: 1.10.1
  • shap version: 0.43.0
import econml
import numpy
import sklearn.ensemble as ens

seed = 1337
rng = np.random.default_rng(seed)

n = 1_000
n_x = 2

X = rng.uniform(-1.0, 1.0, size=(n, n_x))
A = rng.binomial(1, 0.5 + 0.5 * X[:, 0])
Y = rng.binomial(1, 0.3 + 0.5 * A + 0.2 * X[:, 0])

X_ = np.hstack([X, A[:, None]])
A_ = rng.binomial(1, 0.5, size=(n,))

model = dml.CausalForestDML(
    model_y=ens.RandomForestClassifier(random_state=seed),
    model_t=ens.RandomForestClassifier(random_state=seed),
    discrete_treatment=True,
    discrete_outcome=True,
    random_state=seed,
)

model.fit(Y, A_, X=X_)
shap_values = model.shap_values(X_)

Part of the problem with reproducing this error is that EconML doesn't currently allow you to pass a seed to the Explainer class.

I'd be happy to add in that keyword argument into the shap_values method; however, that'd involve changing a good number of files (e.g., the LinearCateEstimator and anything that overwrites that method).

This change isn't strictly necessary for this issue (although I believe it's generally desirable), and, even if I do find a configuration that triggers the error on my machine, I doubt it'll be reproducible across machines.

Any alternative ideas? I can quickly submit a PR with just the change proposed in option 4 and we could revisit adding the seed keyword argument to shap_values (if that's desirable).

from econml.

kbattocchi avatar kbattocchi commented on May 25, 2024

Part of the problem with reproducing this error is that EconML doesn't currently allow you to pass a seed to the Explainer class.

I'd be happy to add in that keyword argument into the shap_values method; however, that'd involve changing a good number of files (e.g., the LinearCateEstimator and anything that overwrites that method).

This change isn't strictly necessary for this issue (although I believe it's generally desirable), and, even if I do find a configuration that triggers the error on my machine, I doubt it'll be reproducible across machines.

Any alternative ideas? I can quickly submit a PR with just the change proposed in option 4 and we could revisit adding the seed keyword argument to shap_values (if that's desirable).

I agree that allowing the seed to be passed as an optional argument seems beneficial, not just for testing but for reproducibility more broadly. If you don't mind adding these changes to your PR, that would be great, but if that's too much work I'm also happy to merge it as-is.

from econml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.