Giter Club home page Giter Club logo

path_explain's Introduction

Path Explain

A repository for explaining feature importances and feature interactions in deep neural networks using path attribution methods.

This repository contains tools to interpret and explain machine learning models using Integrated Gradients and Expected Gradients. In addition, it contains code to explain interactions in deep networks using Integrated Hessians and Expected Hessians - methods that we introduced in our most recent paper: "Explaining Explanations: Axiomatic Feature Interactions for Deep Networks". If you use our work to explain your networks, please cite this paper.

@article{janizek2020explaining,
  author  = {Joseph D. Janizek and Pascal Sturmfels and Su-In Lee},
  title   = {Explaining Explanations: Axiomatic Feature Interactions for Deep Networks},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {104},
  pages   = {1-54},
  url     = {http://jmlr.org/papers/v22/20-1223.html}
}

This repository contains two important directories: the path_explain directory, which contains the packages used to interpret and explain machine learning models, and the examples directory, which contains many examples using the path_explain module to explain different models on different data types.

Installation

The easiest way to install this package is by using pip:

pip install path-explain

Alternatively, you can clone this repository to re-run and explore the examples provided.

Compatibility

This package was written to support TensorFlow 2.0 (in eager execution mode) with Python 3. We have no current plans to support earlier versions of TensorFlow or Python.

API

Although we don't yet have formal API documentation, the underlying code does a pretty good job at explaining the API. See the code for generating attributions and interactions to better understand what the arguments to these functions mean.

Examples

For a simple, quick example to get started using this repository, see the example_usage.ipynb notebook in the top-level directory of this repository. It gives an overview of the functionality provided by this repository. For more advanced examples, keep reading on.

Tabular Data using Expected Gradients and Expected Hessians

Our repository can easily be adapted to explain attributions and interactions learned on tabular data.

# other import statements...
from path_explain import PathExplainerTF, scatter_plot, summary_plot

### Code to train a model would go here
x_train, y_train, x_test, y_test = datset()
model = ...
model.fit(x_train, y_train, ...)
###

### Generating attributions using expected gradients
explainer = PathExplainerTF(model)
attributions = explainer.attributions(inputs=x_test,
                                      baseline=x_train,
                                      batch_size=100,
                                      num_samples=200,
                                      use_expectation=True,
                                      output_indices=0)
###

### Generating interactions using expected hessians
interactions = explainer.interactions(inputs=x_test,
                                      baseline=x_train,
                                      batch_size=100,
                                      num_samples=200,
                                      use_expectation=True,
                                      output_indices=0)
###

Once we've generated attributions and interactions, we can use the provided plotting modules to help visualize them. First we plot a summary of the top features and their attribution values:

### First we need a list of strings denoting the name of each feature
feature_names = ...
###

summary_plot(attributions=attributions,
             feature_values=x_test,
             feature_names=feature_names,
             plot_top_k=10)

Heart Disease Summary Plot

Second, we plot an interaction our model has learned between maximum achieved heart rate and gender:

scatter_plot(attributions=attributions,
             feature_values=x_test,
             feature_index='max. achieved heart rate',
             interactions=interactions,
             color_by='is male',
             feature_names=feature_names,
             scale_y_ind=True)

Interaction: Heart Rate and Gender

The model used to generate the above interactions is a two layer neural network trained on the UCI Heart Disease Dataset. Interactions learned by this model were featured in our paper. To learn more about this particular model and the experimental setup, see the notebook used to train and explain the model.

Explaining an NLP model using Integrated Gradients and Integrated Hessians

As discussed in our paper, we can use Integrated Hessians to get interactions in language models. We explain a transformer from the HuggingFace Transformers Repository.

from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification, \
                         DistilBertConfig, glue_convert_examples_to_features, \
                         glue_processors

# This is a custom explainer to explain huggingface models
from path_explain import EmbeddingExplainerTF, text_plot, matrix_interaction_plot, bar_interaction_plot

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
config = DistilBertConfig.from_pretrained('distilbert-base-uncased', num_labels=num_labels)
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', config=config)

### Some custom code to fine-tune the model on a sentiment analysis task...
max_length = 128
data, info = tensorflow_datasets.load('glue/sst-2', with_info=True)
train_dataset = glue_convert_examples_to_features(data['train'],
                                                  tokenizer,
                                                  max_length,
                                                  'sst-2)
valid_dataset = glue_convert_examples_to_features(data['validation'],
                                                  tokenizer,
                                                  max_length,
                                                  'sst-2')
...
### we won't include the whole fine-tuning code. See the HuggingFace repository for more.

### Here we define functions that represent two pieces of the model:
### embedding and prediction
def embedding_model(batch_ids):
    batch_embedding = model.distilbert.embeddings(batch_ids)
    return batch_embedding

def prediction_model(batch_embedding):
    # Note: this isn't exactly the right way to use the attention mask.
    # It should actually indicate which words are real words. This
    # makes the coding easier however, and the output is fairly similar,
    # so it suffices for this tutorial.
    attention_mask = tf.ones(batch_embedding.shape[:2])
    attention_mask = tf.cast(attention_mask, dtype=tf.float32)
    head_mask = [None] * model.distilbert.num_hidden_layers

    transformer_output = model.distilbert.transformer([batch_embedding, attention_mask, head_mask], training=False)[0]
    pooled_output = transformer_output[:, 0]
    pooled_output = model.pre_classifier(pooled_output)
    logits = model.classifier(pooled_output)
    return logits
###

### We need some data to explain
for batch in valid_dataset.take(1):
    batch_input = batch[0]

batch_ids = batch_input['input_ids']
batch_embedding = embedding_model(batch_ids)

baseline_ids = np.zeros((1, 128), dtype=np.int64)
baseline_embedding = embedding_model(baseline_ids)
###

### We are finally ready to explain our model
explainer = EmbeddingExplainerTF(prediction_model)
attributions = explainer.attributions(inputs=batch_embedding,
                                      baseline=baseline_embedding,
                                      batch_size=32,
                                      num_samples=256,
                                      use_expectation=False,
                                      output_indices=1)
###

### For interactions, the hessian is rather large so we use a very small batch size
interactions = explainer.interactions(inputs=batch_embedding,
                                      baseline=baseline_embedding,
                                      batch_size=1,
                                      num_samples=256,
                                      use_expectation=False,
                                      output_indices=1)
###

We can plot the learned attributions and interactions as follows. First we plot the attributions:

### First we need to decode the tokens from the batch ids.
batch_sentences = ...
### Doing so will depend on how you tokenized your model!

text_plot(batch_sentences[0],
          attributions[0],
          include_legend=True)

Showing feature attributions in text

Then we plot the interactions:

bar_interaction_plot(interactions[0],
                     batch_sentences[0],
                     top_k=5)

Showing feature interactions in text

If you would rather plot the full matrix of attributions rather than the top interactions in a bar plot, our package also supports this. First we show the attributions:

text_plot(batch_sentences[1],
          attributions[1],
          include_legend=True)

Showing additional attributions

And then we show the full interaction matrix. Here we've zeroed out the diagonals so you can better see the off-diagonal terms.

matrix_interaction_plot(interaction_list[1],
                        token_list[1])

Showing the full matrix of feature interactions

This example - interpreting DistilBERT - was also featured in our paper. You can examine the setup more here. For more examples, see the examples directory in this repository.

path_explain's People

Contributors

jjanizek avatar jumelet avatar locust2520 avatar psturmfels avatar shubhe25p avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

path_explain's Issues

Whether the feature attribution method can be applied on training data?

Thanks for your great work! Can I ask a general and quick question? Is it reasonable to make explanations on training instances?

To give an example, model f is trained on dataset X and tested on dataset Y. When I debug a model f, can I use the explanation method to explain the training data, to know which features model f focuses on?
(Usually, the explanation method is applied on test/validate data. According to your paper, the explanation method is also applied on validation set.)

Thank you!

No convergence for IH for larger input strings

Hi!

I've been working with your IH/IG implementation lately, and doing some experiments with it in an NLP context. What I have noticed is that when I increase the length of my input this is an adverse effect on the convergence of my IH interactions, with respect to the attributions I'm getting with IG.

IG itself converges nicely with respect to the completeness axiom and the model output, but the interaction completeness axiom of section 2.2.1 of your paper does not seem to hold at all in these cases

In this plot you can see that as the input length is increased, the Mean Squared Error between the interactions (summed over the last dimension) and the attributions no longer converges to a reasonable margin of error, with the number of interpolation points for IH on the x-axis (note the log scale on the y):
Screenshot from 2021-10-29 16-31-17

I tested this on a 1-layer LSTM (very tiny, only 16 hidden units), using the Tensorflow implementation of IH+IG, with a fixed zero-valued baseline (so not using the expectation).

What I was wondering is whether you encountered similar issues when testing your approach on larger models. I see that in Theorem 1 of the paper you touch upon related issues, but that only seems to concern the simply feedforward layer case, and not more complex models like LSTMs.

Using Longformer

I followed the steps shown in the path_explain README file but used a fine-tuned Longformer sequence classification model. It did not work since there is no model.pre_classifier() method for Longformer. Do you have any suggestions on how I could make this work?

output_indices not passed through for torch interactions

Hi! I'm trying to run your interaction setup on a torch model (from Huggingface's library), but I run into trouble because the output_indices doesn't seem to be passed through to the attributions method.

ig_tensor[:,i,:] = self.attributions(particular_slice, baseline,
num_samples=inner_loop_nsamples, use_expectation=use_expectation)

This causes an error in _get_grad, when output_indices is trying to be unsqueezed, but because it's not passed it is still set to None:

indices_tensor = torch.cat([
sample_indices.unsqueeze(1),
output_indices.unsqueeze(1)], dim=1)

Extend torch interactions to higher dimensions

Seems like PathExplainerTorch.interactions only supports 2D tensors, unlike the TensorFlow version.

What are the bottlenecks to supporting arbitrarily-sized tensors, and how difficult would the change be? If it's not too bad, I would be interested in making a PR myself.

Would love any input @jjanizek, thanks!

.npy files

attributions = np.load('attributions.npy')
interactions = np.load('interactions.npy')

Hi, thanks for your great works!
And I want to reproduce the examples,but I found that these files seem to be missing, can you tell me where I got this from?

Mismatch num_samples for Torch and Tensorflow implementation IG

I'm currently testing the Integrated Gradients implementation on a torch and tensorflow version of a Huggingface model, and noticed the attributions I got with the same configuration are slightly different.

I tested this with use_expectations=False, so for the Integrated Gradients and not the Expected Gradients.

It turns out that the num_samples argument behaves slightly different: for the torch model _get_samples_input returns an interpolated baseline that is of size num_samples + 1, which is happening at this point due to the +1:

scaled_inputs = [reference_tensor + (float(i)/num_samples)*(input_expand - reference_tensor) \
for i in range(0,num_samples+1)]

The _sample_alphas method of PathExplainerTF, on the other hand, returns an interpolation tensor of length num_samples, which causes the mismatch.

When passing a num_samples to the torch implementation that is 1 lower instead the attributions of both methods are exactly the same.

I'm not sure which of these two is more "correct", but I think it would be good if num_samples acts the same for both implementations. One could even argue that in both cases we should return num_samples+2 points, given that the interpolation also includes the input + baseline itself; in that case the number of intermediate points based on which we'd be computing the attributions would actually be equal to num_samples.

can't find bert_explainer

Hello!

In interpret_stsb.ipynb, there is a module called bert_explainer (from bert_explainer import BertExplainerTF), but it seems that there is no bert_explainer in the repo, it would be helpful if you could share that. Thank you!

Unable to use multiprocessing pool with PathExplainerTF

When I do the following, I get an error that tensorflow objects are not pickleable:

import multiprocessing as mp

explainer = PathExplainerTF(model)

with mp.Pool() as p:
attributions = p.starmap(explainer.attributions, [i, ...] for i in x_test])

Can you help update a version with pytorch example?

Hi there,

I was trying to reproduce the example and I found the current examples seems to be out of date with some incorrect path and it needs further modification.

On the other hand, although I've revise the current version and reproduce with TF version, I'm having a hard time reproducing the pytorch version.

Can you help generating a pytorch version example?

Sincerely

cannot import name 'EmbeddingExplainerTF'

Hi, I installed path-explain via pip and somehow cannot import EmbeddingExplainerTF class, although I have no trouble importing PathExplainerTF

from path_explain import EmbeddingExplainerTF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.