Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

AnchorTabularExplainer without categorical features about anchor HOT 10 CLOSED

marcotcr commented on June 29, 2024

AnchorTabularExplainer without categorical features

from anchor.

Comments (10)

eindzl commented on June 29, 2024 4

Hi there.
I found the same problem and used the following workaround, which works fine for me.
In the file anchor_tabular.py add an else clause to the __init__ method of class AnchorTabularExplainer

 class AnchorTabularExplainer(object):

    ... original code ...

    def __init__(self, class_names, feature_names, data=None,

        ... original code ...

        if categorical_names:
            # TODO: Check if this n_values is correct!!
            cat_names = sorted(categorical_names.keys())
            n_values = [len(categorical_names[i]) for i in cat_names]
            self.encoder = sklearn.preprocessing.OneHotEncoder(
                categorical_features=cat_names,
                n_values=n_values)
            self.encoder.fit(data)
            self.categorical_features = self.encoder.categorical_features
        else:  ## Allow for datasets without categorical names
            categorical_names = {}

        ... original code ...

This will prevent the update to fail and allow for discretization of your numerical variables within the explainer.

from anchor.

marcotcr commented on June 29, 2024

Hello,
I'm glad you found the paper interesting.
You are not missing something, this is a bug in the code.
The anchor method needs categorical data, so I used to have a discretizer in the __init__ method for when the model uses numerical features. To be clear: the black box model can use continuous data, but the resulting anchor will be in discretized bins, such as "If Salary > 5000, predict X".

I must have removed that at some point and forgotten to put it back in.
I'll try to add it back soon, thanks for letting me know.

from anchor.

marcotcr commented on June 29, 2024

In the meantime, you can discretize your data first, similar to what I do here

from anchor.

asstergi commented on June 29, 2024

Hi @marcotcr,

I discretized the data and got anchor working, thank you!

However, I'm seeing some inconsistencies in the reported coverage and precision when I try to use the anchor explanation on the original dataset (i.e. before the discretization).

Not sure if you can help just by looking at this code, but here's what I'm doing:
`
print('Anchor: %s' % (' AND '.join(exp.names())))

fit_anchor = np.where(np.all(X_trans_test_disc[:, exp.features()] == X_trans_test_disc[idx][exp.features()], axis=1))[0]
print('Anchor test coverage: %.4f' % (fit_anchor.shape[0] / float(X_trans_test_disc.shape[0])))
print('Anchor test precision: %.4f' % (np.mean(predict_fn(X_trans_test_disc[fit_anchor]) == predict_fn(X_trans_test_disc[idx].reshape(1, -1)))))

anch = y_trans[(X_trans['this_race_last_year_result'] > 1.50) & 
             (X_trans['grid'] > -9.50) & 
             (X_trans['grid'] <= -5.50)]
print ('Anchor test coverage (orig): %.4f' % (1.0*anch.shape[0]/y_trans.shape[0]))
print ('Anchor test precision (orig): %.4f' % (1.0*anch.sum()/anch.shape[0]))`

And here's the output:

Anchor: -9.50 < grid <= -5.50 AND this_race_last_year_result > 1.50

Anchor test coverage: 0.0316
Anchor test precision: 1.0000

Anchor test coverage (orig): 0.0486
Anchor test precision (orig): 0.8527

I would expect the figures to match. Any idea on this?

from anchor.

marcotcr commented on June 29, 2024

If the validation and test distributions are similar, the numbers should match. I would have to see it in more detail to understand if your discretization is doing something or if there's a bug in the code. I can take a look if you can share a notebook.

The newest version I uploaded has discretizing built in, you may want to give it a try.
It may be buggy since I didn't test it throughly, it may be safer to train a classifier on discretized data like you're doing.

from anchor.

ajayaadhikari commented on June 29, 2024

Hello @marcotcr,
I am also trying to use numerical features.
You suggested to discretize the data before giving it to AnchorTabularExplainer right?
How will the AnchorTabularExplainer know to inverse discretize the data to get predictions on the pertubed samples?

from anchor.

marcotcr commented on June 29, 2024

If you discretize the data before you give it to AnchorTabularExplainer, you would have to learn the model on discretized features. If you want the black box model to use numerical features, you have to use the newest version with built in discretizing.

from anchor.

amrebaid commented on June 29, 2024

The anchor method needs categorical data, so I used to have a discretizer in the __init__ method for when the model uses numerical features. To be clear: the black box model can use continuous data, but the resulting anchor will be in discretized bins, such as "If Salary > 5000, predict X".

I must have removed that at some point and forgotten to put it back in.
I'll try to add it back soon, thanks for letting me know.

~~Has this been fixed in the code? Or we still have to do the workaround?~~
Never mind, I figured it out. I had to fit the classifier too, not only the explainer.

Thanks,
Amr

from anchor.

ykshitij commented on June 29, 2024

@eindzl Thanks, I also had the same problem and now it works correctly after your update .

from anchor.

seansaito commented on June 29, 2024

 class AnchorTabularExplainer(object):

    ... original code ...

    def __init__(self, class_names, feature_names, data=None,

        ... original code ...

        if categorical_names:
            # TODO: Check if this n_values is correct!!
            cat_names = sorted(categorical_names.keys())
            n_values = [len(categorical_names[i]) for i in cat_names]
            self.encoder = sklearn.preprocessing.OneHotEncoder(
                categorical_features=cat_names,
                n_values=n_values)
            self.encoder.fit(data)
            self.categorical_features = self.encoder.categorical_features
        else:  ## Allow for datasets without categorical names
            categorical_names = {}

        ... original code ...

This will prevent the update to fail and allow for discretization of your numerical variables within the explainer.

Will this workaround be implemented at some point?

from anchor.

AnchorTabularExplainer without categorical features about anchor HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent