Giter Club home page Giter Club logo

captum's Introduction

Captum Logo


GitHub - License Conda PyPI Conda - Platform Conda (channel only) Conda Recipe Docs - GitHub.io

Captum is a model interpretability and understanding library for PyTorch. Captum means comprehension in Latin and contains general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models. It has quick integration for models built with domain-specific libraries such as torchvision, torchtext, and others.

Captum is currently in beta and under active development!

About Captum

With the increase in model complexity and the resulting lack of transparency, model interpretability methods have become increasingly important. Model understanding is both an active area of research as well as an area of focus for practical applications across industries using machine learning. Captum provides state-of-the-art algorithms such as Integrated Gradients, Testing with Concept Activaton Vectors (TCAV), TracIn influence functions, just to name a few, that provide researchers and developers with an easy way to understand which features, training examples or concepts contribute to a models' predictions and in general what and how the model learns. In addition to that, Captum also provides adversarial attacks and minimal input perturbation capabilities that can be used both for generating counterfactual explanations and adversarial perturbations.

Captum helps ML researchers more easily implement interpretability algorithms that can interact with PyTorch models. Captum also allows researchers to quickly benchmark their work against other existing algorithms available in the library.

Overview of Attribution Algorithms

Target Audience

The primary audiences for Captum are model developers who are looking to improve their models and understand which concepts, features or training examples are important and interpretability researchers focused on identifying algorithms that can better interpret many types of models.

Captum can also be used by application engineers who are using trained models in production. Captum provides easier troubleshooting through improved model interpretability, and the potential for delivering better explanations to end users on why they’re seeing a specific piece of content, such as a movie recommendation.

Installation

Installation Requirements

  • Python >= 3.6
  • PyTorch >= 1.6
Installing the latest release

The latest release of Captum is easily installed either via Anaconda (recommended) or via pip.

with conda

You can install captum from any of the following supported conda channels:

  • channel: pytorch

    conda install captum -c pytorch
  • channel: conda-forge

    conda install captum -c conda-forge

With pip

pip install captum

Manual / Dev install

If you'd like to try our bleeding edge features (and don't mind potentially running into the occasional bug here or there), you can install the latest master directly from GitHub. For a basic install, run:

git clone https://github.com/pytorch/captum.git
cd captum
pip install -e .

To customize the installation, you can also run the following variants of the above:

  • pip install -e .[insights]: Also installs all packages necessary for running Captum Insights.
  • pip install -e .[dev]: Also installs all tools necessary for development (testing, linting, docs building; see Contributing below).
  • pip install -e .[tutorials]: Also installs all packages necessary for running the tutorial notebooks.

To execute unit tests from a manual install, run:

# running a single unit test
python -m unittest -v tests.attr.test_saliency
# running all unit tests
pytest -ra

Getting Started

Captum helps you interpret and understand predictions of PyTorch models by exploring features that contribute to a prediction the model makes. It also helps understand which neurons and layers are important for model predictions.

Let's apply some of those algorithms to a toy model we have created for demonstration purposes. For simplicity, we will use the following architecture, but users are welcome to use any PyTorch model of their choice.

import numpy as np

import torch
import torch.nn as nn

from captum.attr import (
    GradientShap,
    DeepLift,
    DeepLiftShap,
    IntegratedGradients,
    LayerConductance,
    NeuronConductance,
    NoiseTunnel,
)

class ToyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = nn.Linear(3, 3)
        self.relu = nn.ReLU()
        self.lin2 = nn.Linear(3, 2)

        # initialize weights and biases
        self.lin1.weight = nn.Parameter(torch.arange(-4.0, 5.0).view(3, 3))
        self.lin1.bias = nn.Parameter(torch.zeros(1,3))
        self.lin2.weight = nn.Parameter(torch.arange(-3.0, 3.0).view(2, 3))
        self.lin2.bias = nn.Parameter(torch.ones(1,2))

    def forward(self, input):
        return self.lin2(self.relu(self.lin1(input)))

Let's create an instance of our model and set it to eval mode.

model = ToyModel()
model.eval()

Next, we need to define simple input and baseline tensors. Baselines belong to the input space and often carry no predictive signal. Zero tensor can serve as a baseline for many tasks. Some interpretability algorithms such as IntegratedGradients, Deeplift and GradientShap are designed to attribute the change between the input and baseline to a predictive class or a value that the neural network outputs.

We will apply model interpretability algorithms on the network mentioned above in order to understand the importance of individual neurons/layers and the parts of the input that play an important role in the final prediction.

To make computations deterministic, let's fix random seeds.

torch.manual_seed(123)
np.random.seed(123)

Let's define our input and baseline tensors. Baselines are used in some interpretability algorithms such as IntegratedGradients, DeepLift, GradientShap, NeuronConductance, LayerConductance, InternalInfluence and NeuronIntegratedGradients.

input = torch.rand(2, 3)
baseline = torch.zeros(2, 3)

Next we will use IntegratedGradients algorithms to assign attribution scores to each input feature with respect to the first target output.

ig = IntegratedGradients(model)
attributions, delta = ig.attribute(input, baseline, target=0, return_convergence_delta=True)
print('IG Attributions:', attributions)
print('Convergence Delta:', delta)

Output:

IG Attributions: tensor([[-0.5922, -1.5497, -1.0067],
                         [ 0.0000, -0.2219, -5.1991]])
Convergence Delta: tensor([2.3842e-07, -4.7684e-07])

The algorithm outputs an attribution score for each input element and a convergence delta. The lower the absolute value of the convergence delta the better is the approximation. If we choose not to return delta, we can simply not provide the return_convergence_delta input argument. The absolute value of the returned deltas can be interpreted as an approximation error for each input sample. It can also serve as a proxy of how accurate the integral approximation for given inputs and baselines is. If the approximation error is large, we can try a larger number of integral approximation steps by setting n_steps to a larger value. Not all algorithms return approximation error. Those which do, though, compute it based on the completeness property of the algorithms.

Positive attribution score means that the input in that particular position positively contributed to the final prediction and negative means the opposite. The magnitude of the attribution score signifies the strength of the contribution. Zero attribution score means no contribution from that particular feature.

Similarly, we can apply GradientShap, DeepLift and other attribution algorithms to the model.

GradientShap first chooses a random baseline from baselines' distribution, then adds gaussian noise with std=0.09 to each input example n_samples times. Afterwards, it chooses a random point between each example-baseline pair and computes the gradients with respect to target class (in this case target=0). Resulting attribution is the mean of gradients * (inputs - baselines)

gs = GradientShap(model)

# We define a distribution of baselines and draw `n_samples` from that
# distribution in order to estimate the expectations of gradients across all baselines
baseline_dist = torch.randn(10, 3) * 0.001
attributions, delta = gs.attribute(input, stdevs=0.09, n_samples=4, baselines=baseline_dist,
                                   target=0, return_convergence_delta=True)
print('GradientShap Attributions:', attributions)
print('Convergence Delta:', delta)

Output

GradientShap Attributions: tensor([[-0.1542, -1.6229, -1.5835],
                                   [-0.3916, -0.2836, -4.6851]])
Convergence Delta: tensor([ 0.0000, -0.0005, -0.0029, -0.0084, -0.0087, -0.0405,  0.0000, -0.0084])

Deltas are computed for each n_samples * input.shape[0] example. The user can, for instance, average them:

deltas_per_example = torch.mean(delta.reshape(input.shape[0], -1), dim=1)

in order to get per example average delta.

Below is an example of how we can apply DeepLift and DeepLiftShap on the ToyModel described above. The current implementation of DeepLift supports only the Rescale rule. For more details on alternative implementations, please see the DeepLift paper.

dl = DeepLift(model)
attributions, delta = dl.attribute(input, baseline, target=0, return_convergence_delta=True)
print('DeepLift Attributions:', attributions)
print('Convergence Delta:', delta)

Output

DeepLift Attributions: tensor([[-0.5922, -1.5497, -1.0067],
                               [ 0.0000, -0.2219, -5.1991])
Convergence Delta: tensor([0., 0.])

DeepLift assigns similar attribution scores as IntegratedGradients to inputs, however it has lower execution time. Another important thing to remember about DeepLift is that it currently doesn't support all non-linear activation types. For more details on limitations of the current implementation, please see the DeepLift paper.

Similar to integrated gradients, DeepLift returns a convergence delta score per input example. The approximation error is then the absolute value of the convergence deltas and can serve as a proxy of how accurate the algorithm's approximation is.

Now let's look into DeepLiftShap. Similar to GradientShap, DeepLiftShap uses baseline distribution. In the example below, we use the same baseline distribution as for GradientShap.

dl = DeepLiftShap(model)
attributions, delta = dl.attribute(input, baseline_dist, target=0, return_convergence_delta=True)
print('DeepLiftSHAP Attributions:', attributions)
print('Convergence Delta:', delta)

Output

DeepLiftShap Attributions: tensor([[-5.9169e-01, -1.5491e+00, -1.0076e+00],
                                   [-4.7101e-03, -2.2300e-01, -5.1926e+00]], grad_fn=<MeanBackward1>)
Convergence Delta: tensor([-4.6120e-03, -1.6267e-03, -5.1045e-04, -1.4184e-03, -6.8886e-03,
                           -2.2224e-02,  0.0000e+00, -2.8790e-02, -4.1285e-03, -2.7295e-02,
                           -3.2349e-03, -1.6265e-03, -4.7684e-07, -1.4191e-03, -6.8889e-03,
                           -2.2224e-02,  0.0000e+00, -2.4792e-02, -4.1289e-03, -2.7296e-02])

DeepLiftShap uses DeepLift to compute attribution score for each input-baseline pair and averages it for each input across all baselines.

It computes deltas for each input example-baseline pair, thus resulting to input.shape[0] * baseline.shape[0] delta values.

Similar to GradientShap in order to compute example-based deltas we can average them per example:

deltas_per_example = torch.mean(delta.reshape(input.shape[0], -1), dim=1)

In order to smooth and improve the quality of the attributions we can run IntegratedGradients and other attribution methods through a NoiseTunnel. NoiseTunnel allows us to use SmoothGrad, SmoothGrad_Sq and VarGrad techniques to smoothen the attributions by aggregating them for multiple noisy samples that were generated by adding gaussian noise.

Here is an example of how we can use NoiseTunnel with IntegratedGradients.

ig = IntegratedGradients(model)
nt = NoiseTunnel(ig)
attributions, delta = nt.attribute(input, nt_type='smoothgrad', stdevs=0.02, nt_samples=4,
      baselines=baseline, target=0, return_convergence_delta=True)
print('IG + SmoothGrad Attributions:', attributions)
print('Convergence Delta:', delta)

Output

IG + SmoothGrad Attributions: tensor([[-0.4574, -1.5493, -1.0893],
                                      [ 0.0000, -0.2647, -5.1619]])
Convergence Delta: tensor([ 0.0000e+00,  2.3842e-07,  0.0000e+00, -2.3842e-07,  0.0000e+00,
        -4.7684e-07,  0.0000e+00, -4.7684e-07])

The number of elements in the delta tensor is equal to: nt_samples * input.shape[0] In order to get an example-wise delta, we can, for example, average them:

deltas_per_example = torch.mean(delta.reshape(input.shape[0], -1), dim=1)

Let's look into the internals of our network and understand which layers and neurons are important for the predictions.

We will start with the NeuronConductance. NeuronConductance helps us to identify input features that are important for a particular neuron in a given layer. It decomposes the computation of integrated gradients via the chain rule by defining the importance of a neuron as path integral of the derivative of the output with respect to the neuron times the derivatives of the neuron with respect to the inputs of the model.

In this case, we choose to analyze the first neuron in the linear layer.

nc = NeuronConductance(model, model.lin1)
attributions = nc.attribute(input, neuron_selector=1, target=0)
print('Neuron Attributions:', attributions)

Output

Neuron Attributions: tensor([[ 0.0000,  0.0000,  0.0000],
                             [ 1.3358,  0.0000, -1.6811]])

Layer conductance shows the importance of neurons for a layer and given input. It is an extension of path integrated gradients for hidden layers and holds the completeness property as well.

It doesn't attribute the contribution scores to the input features but shows the importance of each neuron in the selected layer.

lc = LayerConductance(model, model.lin1)
attributions, delta = lc.attribute(input, baselines=baseline, target=0, return_convergence_delta=True)
print('Layer Attributions:', attributions)
print('Convergence Delta:', delta)

Outputs

Layer Attributions: tensor([[ 0.0000,  0.0000, -3.0856],
                            [ 0.0000, -0.3488, -4.9638]], grad_fn=<SumBackward1>)
Convergence Delta: tensor([0.0630, 0.1084])

Similar to other attribution algorithms that return convergence delta, LayerConductance returns the deltas for each example. The approximation error is then the absolute value of the convergence deltas and can serve as a proxy of how accurate integral approximation for given inputs and baselines is.

More details on the list of supported algorithms and how to apply Captum on different types of models can be found in our tutorials.

Captum Insights

Captum provides a web interface called Insights for easy visualization and access to a number of our interpretability algorithms.

To analyze a sample model on CIFAR10 via Captum Insights run

python -m captum.insights.example

and navigate to the URL specified in the output.

Captum Insights Screenshot

To build Insights you will need Node >= 8.x and Yarn >= 1.5.

To build and launch from a checkout in a conda environment run

conda install -c conda-forge yarn
BUILD_INSIGHTS=1 python setup.py develop
python captum/insights/example.py

Captum Insights Jupyter Widget

Captum Insights also has a Jupyter widget providing the same user interface as the web app. To install and enable the widget, run

jupyter nbextension install --py --symlink --sys-prefix captum.insights.attr_vis.widget
jupyter nbextension enable captum.insights.attr_vis.widget --py --sys-prefix

To build the widget from a checkout in a conda environment run

conda install -c conda-forge yarn
BUILD_INSIGHTS=1 python setup.py develop

FAQ

If you have questions about using Captum methods, please check this FAQ, which addresses many common issues.

Contributing

See the CONTRIBUTING file for how to help out.

Talks and Papers

NeurIPS 2019: The slides of our presentation can be found here

KDD 2020: The slides of our presentation from KDD 2020 tutorial can be found here. You can watch the recorded talk here

GTC 2020: Opening Up the Black Box: Model Understanding with Captum and PyTorch. You can watch the recorded talk here

XAI Summit 2020: Using Captum and Fiddler to Improve Model Understanding with Explainable AI. You can watch the recorded talk here

PyTorch Developer Day 2020 Model Interpretability. You can watch the recorded talk here

NAACL 2021 Tutorial on Fine-grained Interpretation and Causation Analysis in Deep NLP Models. You can watch the recorded talk here

ICLR 2021 workshop on Responsible AI:

  • Paper on the Captum Library
  • Paper on Invesitgating Sanity Checks for Saliency Maps

Summer school on medical imaging at University of Lyon. A class on model explainability (link to the video) https://www.youtube.com/watch?v=vn-jLzY67V0

References of Algorithms

More details about the above mentioned attribution algorithms and their pros and cons can be found on our web-site.

License

Captum is BSD licensed, as found in the LICENSE file.

captum's People

Contributors

99warriors avatar agaction avatar amyreese avatar aobo-y avatar bilalsal avatar caraya10 avatar cicichen01 avatar crawlingcub avatar cyrjano avatar diegoolano avatar dkrako avatar edward-io avatar gabrieltseng avatar j0nreynolds avatar jessijzhao avatar miguelmartin75 avatar mruberry avatar nanohanno avatar narinek avatar orionr avatar pingjunchen avatar progamergov avatar reubend avatar shubhammuttepawar avatar shuwenw avatar stanislavglebik avatar thatch avatar vivekmig avatar yucu avatar zpao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

captum's Issues

how would it be possible to calculate the Integrated gradients for model which has embedding

Hi everyone,

I am applying the integrated gradient method on my dataset which has categorical and numerical data, in which I convert categorical data into embedding and concatenate with numerical. But the output of integrated gradients for all the categorical values are zero and for the numerical ones is calculated correctly.
I have tried to do it with LayerIntegratedGradients but as far as I do not have the developer version of captum installed it failed.
any suggestion?

ImportError: cannot import name 'LayerIntegratedGradients'

`---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in
13
14 from captum.attr import visualization as viz
---> 15 from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
16 from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

ImportError: cannot import name 'LayerIntegratedGradients'`

Models failing with error - Module has no input attribute

I am working with a number of models from the torchreid library. When I use DeepLift on these models, some work and some do not. For example, the DenseNet, MLFN, and MuDeep models work fine, but the OSNet, ResNetMid, and ResNet-50 (and some others) model do not. (N.B. I modified the models to not use inplace=True for nn.ReLU().)

These models that fail usually fail with an error along the lines of 'Sigmoid' object has no attribute 'input' (though it also fails for the same reason if ReLU is used), however I can't see what I need to change in this model in order for it to work with DeepLift.

What is different about these models that cause this error? I understand the error message, but I don't understand why the the module doesn't have an input attribute.

Computing LayerConductance in IMDB sentiment analysis Tutorial

I am trying to compute layer conductance in the IMDB tutorial, and I keep getting a scalar issue. Any guidance on how I should pass the input (test_input_tensor) to get the attributions.

cond = LayerConductance(model, model.convs)
cond_vals = cond.attribute(test_input_tensor,target=1)

Thank you!

Captum for regression problem

Hi all,

I am wondering if there are examples that I could learn to use Captum for regression problem as well as using volume data. My problem setting is feeding volume data with WxHxD (64x64x64) to a 3D convnet which has only one neuron in the top layer that output a real number. Thanks.

how to get captum insights working

it gives error after visualizer.render()
Screenshot (205)

and how do I get this saved image?

# show a screenshot if using notebook non-interactively
from IPython.display import Image
Image(filename='img/captum_insights.png')

BibTeX for citation

Hi folks,
Is there a proper .bib format available for Captum for the purposes of citation in research papers?

Thanks!

How to intepret BERT for SequenceClassification?

Hi @NarineK and captum team, thanks for all the great work on interpretability with PyTorch.

As others here (see #150, #249), I am trying to interpret a BERT classifier finetuned on a binary classification task, using the transformers library from HuggingFace.
Indeed, I have

model = BertForSequenceClassification.from_pretrained('finetuned-bert-base-cased')

I am not being great at doing this, starting from the SQUAD example https://github.com/pytorch/captum/blob/master/tutorials/Bert_SQUAD_Interpret.ipynb

So far, I left almost everything else untouched and redefined

def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, add_special_tokens=False)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

which I call with input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id) and a custom forward method that reads

def custom_forward(inputs, token_type_ids=None, position_ids=None, attention_mask=None, position=0):
    outputs = predict(inputs, token_type_ids=token_type_ids, position_ids=position_ids, attention_mask=attention_mask)
    preds = outputs[0]
   #preds is like
   #tensor([[-1.9723,  2.2183]], grad_fn=<AddmmBackward>)
    return torch.tensor([torch.softmax(preds, dim = 1)[0][1]], requires_grad = True)

which I use in lig = LayerIntegratedGradients(custom_forward, model.bert.embeddings).

When calling lig.attribute (as in the tutorial), I get

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Can you help me debug the above? I guess I am messing something up with the custom_forward method, and maybe also construct_input_ref_pair... or more.

I am happy to post a working solution once done with this!

Issue with resnet18 model

I tried to use any of the saliency methods and I get this error:
AttributeError: 'AvgPool2d' object has no attribute 'divisor_override'

Do not understand why that happens?

Captum Insights not working in SageMaker

When I try to run Captum Insights from a SageMaker notebook terminal on port 6006 by browsing to <sagemaker_notebook_address>/proxy/6006/, the tab name shows "Captum Insights", but the web page is blank. The same method works fine on my local system, or fine with tensorboard/flask apps through SageMaker. It seems to be a problem with Captum+SageMaker specifically.

Screenshot 2019-10-24 05 16 27

Alternatively, when attempting to run tutorials/CIFAR_TorchVision_Captum_Insights.ipynb I get this error from within a notebook:

Screenshot 2019-10-24 05 26 03

(I get the same error with visualizer.render(), just with less details)


Details:

I upgraded my SageMaker pytorch_p36 conda environment to torch==1.3.0. I installed captum from source with git clone https://github.com/pytorch/captum.git and then installed Insights with:

conda install -c conda-forge yarn
BUILD_INSIGHTS=1 python setup.py develop

Then ran the example with python captum/insights/example.py

And tried to access via <sagemaker_notebook_address>/proxy/6006/ (the same way I access a running tensorboard server)

I also tried it with/without modifying line 66 in insights/server.py from tcp.bind(("", 0)) to tcp.bind(("", 6006)) in order to use port 6006 (since this port seemed to work fine for running a tensorboard server).

Import error for Occlusion

Getting import error for Occlusion on running the tutorial Interpreting vision for Resnet.

Error details
ImportError Traceback (most recent call last)
in
15 from captum.attr import IntegratedGradients
16 from captum.attr import GradientShap
---> 17 from captum.attr import Occlusion
18 from captum.attr import NoiseTunnel
19 from captum.attr import visualization as viz
ImportError: cannot import name 'Occlusion' from 'captum.attr' (/home/ubuntu/opt/anaconda3/envs/pytorch/lib/python3.7/site-packages/captum/attr/init.py)

Dealing with .view in DeepLift

Hi, I am an undergrad student looking to apply Captum's implementation of DeepLift for a Graph Convolution Network

Below is a snippet of the code in the forward function that is causing problems:

to_conv1d = batch_sortpooling_graphs.view((-1, 1, self.k * self.total_latent_dim))
    conv1d_res = self.conv1d_params1(to_conv1d)
    conv1d_res = self.conv1d_activation(conv1d_res)
    conv1d_res = self.maxpool1d(conv1d_res)
    conv1d_res = self.conv1d_params2(conv1d_res)
    conv1d_res = self.conv1d_activation(conv1d_res)

    to_dense = conv1d_res.view(len(graph_sizes), -1)

    if self.output_dim > 0:
        out_linear = self.out_params(to_dense)
        reluact_fp = self.conv1d_activation(out_linear)
    else:
        reluact_fp = to_dense
    return self.conv1d_activation(reluact_fp)

As you can see, my code requires several reshapes of the tensors as it moves from the input to the 1d convolution layer and finally to the dense layer. Running as is gives me the following error:

Traceback (most recent call last):
  File "main.py", line 625, in <module>
    attribution = dl.attribute(input, additional_forward_args=[15], target=1)
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 202, in attribute
    additional_forward_args=additional_forward_args,
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_utils/gradient.py", line 92, in compute_gradients
    grads = torch.autograd.grad(torch.unbind(output), inputs)
  File "/home/user/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 157, in grad
    inputs, allow_unused)
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 284, in _backward_hook
    inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 284, in <genexpr>
    inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
RuntimeError: The size of tensor a (160) must match the size of tensor b (19) at non-singleton dimension 2

The shapes of each tensors are as follows:

batch_sortpooling_graphs: torch.Size([1, 19, 97])
conv1d_res (immediately after line 1): torch.Size([1, 1, 1843])
to_dense: torch.Size([1, 160])

May I ask if anyone has any idea how to circumvent this such that the DeepLift can work with tensor reshapes? Thank you!

Request: example with multilabel attribution

The provided vision examples and documentation are excellent for single-class classification, but I am struggling to implement a multi-label use case.

For my use case, I use a single channel image of a cell nucleus as input. The target is a tensor the describes whether or not the cell was positive for each of 22 different protein markers, e.g. tensor([0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1.,
0., 0., 0., 0.], dtype=torch.float64)
...that is, each cell can be positive for multiple markers, not only one. This is a simple multi-label classification task, where my model is the boilerplate torchvision.models.resnet18 with a custom final layer that accommodates the desired output.

I use the CIFAR vision example as a starting point as follows:
image

But I get AssertionError: Tensor target dimension torch.Size([22]) is not valid. I see from the docstring for saliency.attribute that targets/outputs with with greater than two dimensions should be passed as tuples, but when I pass tuple(labels[ind]) instead, I get AssertionError: Cannot choose target column with output shape torch.Size([1, 22])..

Ideally, I'd like to set up an AttributionVisualizer that looks like the following mock-up:

image

...where I can click each element of the prediction (e.g. CK19) and see the corresponding attribution image for that marker.

Any chance that a multi-label classification example like this could be supplied?

Much thanks!

"RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor "

Hi, i am trying to interpret my intent classification model by using your "IMDB tutorial" and im facing the following error "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor
". This error raises during the forward pass of an RNN (lstm) which takes as input a pack sequence (pack_padded_sequence library).

Computing contributions w.r.t. logits rather than final activations

Often, in practice, we wish to compute the contributions w.r.t. the logits of the final sigmoid/softmax, rather than w.r.t. the final network output itself. This is to avoid artifacts that can be caused by the saturating nature of the sigmoid/softmax, and comes into play when comparing attributions between examples. It is particularly relevant if gradient*input is used as an attribution method, because for examples with very confident predictions, the sigmoid/softmax outputs tend to saturate and the gradients will approach zero. I'm wondering if it may be worth mentioning this in the documentation - in the current "getting started", the toy model has a sigmoid output:

Screenshot 2019-10-08 at 2 19 52 PM

I'm concerned that a naive user may try to compare the magnitudes of attributions across different examples without realizing that, for sigmoid/softmax outputs, it may be worth removing the final nonlinearity before doing such a comparison. We discuss this in Section 3.6 of the deeplift paper. Ideally there would be an option in Captum to ignore the final nonlinearity, but I realize it may not be trivial to add that option. Sorry if this is already addressed and I missed it.

Captum Insights build fails on Linux Ubuntu18.04

Cannot build and launch Captum insights on Linux Ubutnu18.04 (inside VM VirtualBox):

(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$ conda install -c conda-forge yarn
Collecting package metadata (repodata.json): done
Solving environment: done

All requested packages already installed.

(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$ BUILD_INSIGHTS=1 python setup.py develop
-- Building version 0.2.0
-- Building Captum Insights
Running: ./scripts/build_insights.sh
~/eStep/XAI/Software/captum/captum/insights/frontend ~/eStep/XAI/Software/captum

Install Dependencies

yarn install v1.22.0
[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
warning " > @babel/[email protected]" has unmet peer dependency "@babel/core@^7.0.0-0".
warning "@babel/plugin-proposal-class-properties > @babel/[email protected]" has unmet peer dependency "@babel/core@^7.0.0".
warning " > [email protected]" has unmet peer dependency "@babel/core@^7.0.0".
warning " > [email protected]" has unmet peer dependency "webpack@>=2".
warning "react-scripts > @typescript-eslint/eslint-plugin > [email protected]" has unmet peer dependency "typescript@>=2.8.0 || >= 3.2.0-dev || >= 3.3.0-dev || >= 3.4.0-dev || >= 3.5.0-dev || >= 3.6.0-dev || >= 3.6.0-beta || >= 3.7.0-dev || >= 3.7.0-beta".
warning " > [email protected]" has unmet peer dependency "prop-types@^15.0.0".
warning " > [email protected]" has unmet peer dependency "[email protected]".
error An unexpected error occurred: "EPERM: operation not permitted, symlink '../../../parser/bin/babel-parser.js' -> '/home/elena/eStep/XAI/Software/captum/captum/insights/frontend/node_modules/@babel/core/node_modules/.bin/parser'".
info If you think this is a bug, please open a bug report with the information provided in "/home/elena/eStep/XAI/Software/captum/captum/insights/frontend/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
Traceback (most recent call last):
File "setup.py", line 105, in
build_insights()
File "setup.py", line 88, in build_insights
subprocess.check_call(command)
File "/home/elena/anaconda3/envs/captum/lib/python3.7/subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command './scripts/build_insights.sh' returned non-zero exit status 1.
(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$

DeepLIFT fails when reusing MaxPool2d layer

Using Captum v0.1, so I'm not sure whether this happens with current master.

Something I have noticed when trying out DeepLIFT with CNNs is that reusing MaxPool2d layers instead of explicitly defining one per usage results in RuntimeErrors. Maybe this is related to #199

For example, consider the CIFAR10 tutorial.
If we were to change the network structure to just reuse the self.pool1 as follows:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        # self.pool2 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.relu1 = nn.ReLU()
        self.relu2 = nn.ReLU()
        self.relu3 = nn.ReLU()
        self.relu4 = nn.ReLU()

    def forward(self, x):
        x = self.pool1(self.relu1(self.conv1(x)))
        x = self.pool1(self.relu2(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = self.relu3(self.fc1(x))
        x = self.relu4(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

Training works just fine, but attributing with DeepLIFT should fail due to size mismatch, such as (unfortunately I can't download the dataset right now, using a local version):

~\envs\lib\site-packages\captum\attr\_core\deep_lift.py in <genexpr>(.0)
    282          """
    283         delta_in = tuple(
--> 284             inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
    285         )
    286         delta_out = tuple(

RuntimeError: The size of tensor a (10) must match the size of tensor b (28) at non-singleton dimension 3

Is this a bug or necessary convention? Note that reusing pooling layers actually occurs in official PyTorch tutorials.

Captum Insights not working in Google Colab

When trying to run the Getting started with Captum Insights tutorial in a Google Colab notebook, I stumbled upon the following issue: When calling visualizer.render(debug=False), the result looks like in the screenshot below.

Screenshot 2019-10-11 at 21 05 48

The reason for this behavior is that Captum's render() method does not redirect requests as e.g. shown in TensorBoard's _display_colab() method. While the current implementation works fine with regular IPython notebooks, Colab requires some additional tweaks as described in the TensorBoard code.

Do you have any plans to support Colab or is this even a priority? If no one is already working on this, I could make a PR adding some code similar to TensorBoard as a proof of concept.

CUDA OOM Error

Hi,

I am currently integrating Captum into my deep learning tool kit, thx for providing this lib.

When I try to run IntegratedGradients on a standard densenet201 model that is on a cuda device (11GB vram), I am getting an out-of-memory error even for one input image.

Just a quick check: Is this normal behaviour?

Could I use captum for object localisation?

Hello,
Can I use this library for object localisation tasks? Would you think you could prepare some very easy tutorial for this? I bet that this would be very helpful for many people since labelling images with bounding boxes or polygons is really time consuming as you know.

Scripting/tracing Captum classes

Hello, I was experimenting with Captum and I was wondering if there was any way to trace/script an attribution model in order to just obtain the final heatmap as output of the serialized file.

I did not find any reference in the documentation nor in the code, and did not manage to integrate it myself by creating intermediate classes to, for example, wrap the Saliency class in a torch.nn.Module one.

Is there something I am missing / is it in the future plans?

captum insights port?

I am running the example application and wanted to ask if it's possible to set a particular port for the app?

Thanks

Plan for perturbation-based methods

Hello,
Kudos for the great work. I believe this has great potential.
I wonder what is in your roadmap, especially regarding perturbation-based attribution methods (Occlusion, LIME/KernelSHAP, Shapley Value sampling, etc.).

Are these planned at all? While being orders of magnitude slower, these methods have the advantage that they can be applied to any black-box model (ie. any network architecture is supported out-of-the-box, with no need to instrument layers or implement custom modules). The implementation into Captum should be easier too. Moreover, Shapley Value attributions have unique theoretical properties that might be important when speed is not critical.

While it makes sense to focus on gradient-based methods first, maybe the structure of the library should be such that these methods can be easily added in the future.

Building failure for captum wheel package

When I built a python wheel package for captum with the following command:

BUILD_INSIGHTS=1 python setup.py bdist_wheel --python-tag py3

I got an error message:

error: can't copy 'captum/insights/frontend/widget/static/extension.js': doesn't exist or not a regular file

I found some errors on the setup.py file, where the paths for extension.js, index.js and index.js.map were not correct.

One solution is the following:

diff --git a/setup.py b/setup.py
index 87f5068..ee0a379 100755
--- a/setup.py
+++ b/setup.py
@@ -150,9 +150,9 @@ if __name__ == "__main__":
             (
                 "share/jupyter/nbextensions/jupyter-captum-insights",
                 [
-                    "captum/insights/frontend/widget/static/extension.js",
-                    "captum/insights/frontend/widget/static/index.js",
-                    "captum/insights/frontend/widget/static/index.js.map",
+                    "captum/insights/widget/static/extension.js",
+                    "captum/insights/widget/static/index.js",
+                    "captum/insights/widget/static/index.js.map",
                 ],
             ),
             (

Not able to Load Vectors

Hey Guys,

While trying to run this tutorial , I am facing issues in loading the Glove vector. After loading the vector, it is showing me size of vocabulary equal to 2, but ideally it is should be more thn 10000. Can anyone help me out in this ?

image

My pytorch version is 1.3.1
Torchtext version is 0.5.1

Help me in this. Thanks !

#fix_error

Cannot install the latest version

When I tried to install the latest version, I got errors below.


    error: can't copy 'captum/insights/frontend/widget/static/extension.js': doesn't exist or not a regular file
    ----------------------------------------

ERROR: Command errored out with exit status 1: /root//.pyenv/versions/3.7.4/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-xijz5fxd/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-xijz5fxd/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-lcp60r6o/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

It seems to be caused by the wrong js path, captum/insights/frontend/widget/static/extension.js.

Integrated gradients using with pack_padded_sequence returns error

Hi all,

I am using integrated gradient (IG) package from Captum package, which I apply one LSTM on varying length sequences and then I try to get IG from the trained model using the following line of code:

attr, delta = ig.attribute((data, seq_lengths), target=1, return_convergence_delta=True)

but I am getting the following error:

RuntimeError: lengths array must be sorted in decreasing order when enforce_sorted is True. You can pass enforce_sorted=False to pack_padded_sequence and/or pack_sequence to sidestep this requirement if you do not need ONNX exportability.

however, I have sorted the lengths of the array in each batch in decreasing order.
please note that If I use this IG without using pack_padded_sequence it works perfectly.

regarding the previous error, I set enforce_sorted=False in pack_padded_sequence but I am getting another error:

RuntimeError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0

Here is the length of all the samples which none of them are less than zero:

tensor([23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20,
14, 10])

any help would be much appreciated.

Documentation on `baseline` argument in DeepLiftShap

Hi all,

Thank you so much for the invitation to captum. Very grateful to all of you for putting this together! I had a quick question regarding the documentation. Currently, in the arguments description for DeepLiftShap, it says "The first dimension in baseline tensors defines the distribution from which we randomly draw samples" (

randomly draw samples. All other dimensions starting after
). However, when I look at the code, it seems as though all the baselines are used for all the inputs (i.e. I'm not seeing any code that I would associate with sampling). Is my understanding correct? I actually prefer the deterministic behavior because in my lab we typically supply multiple baselines per input and we want all the baselines to be used.

Thanks,
Avanti

is this a typo?

in readme,

Next we will use IntegratedGradients algorithms to assign attribution scores to each input feature with respect to the second target output.

and then target=0, is set, should it be first target output?

Captum for BERT

Hi,
Thanks for the great work. The LSTM tutorial looks very nice.
Are any suggestions on how to use Captum for Transformer-based / BERT-like pre-trained contextualized word embeddings? If I want to see the attribution of each token in the word embedding layer, is it that I'd also need the FFN layer for fine-tuning downstream tasks in order to get the gradients? The current code is implemented with torch/text; would really appreciate it if you could some hints how to integrate it with BERT models(e.g. huggingface/transformers).

Thank you.

What is the desired output for _select_targets in common.py?

Hi again,

I have a question to ask about the _select_targets function, specifically when used for the DeepLift implementation. I figured out that the output passed into this function is based on the output from the last layer of the architecture. For my architecture, my last layer is a log_softmax. Sorry if it is a silly question but should i return the predicted class (only 2 classes), the loss value or the class probability of the target class as output?

Attached below is the code snippet for _select_targets for your reference.

def _select_targets(output, target):
    output = output[0]
    num_examples = output.shape[0]
    dims = len(output.shape)

    if target is None:
        return output
    elif isinstance(target, int) or isinstance(target, tuple):
        return _verify_select_column(output, target)
    elif isinstance(target, torch.Tensor):
        if torch.numel(target) == 1 and isinstance(target.item(), int):
            return _verify_select_column(output, target.item())
        elif len(target.shape) == 1 and torch.numel(target) == num_examples:
            assert dims == 2, "Output must be 2D to select tensor of targets."
            return torch.gather(output, 1, target.reshape(len(output), 1))
        else:
            raise AssertionError(
                "Tensor target dimension %r is not valid." % (target.shape,)
            )
    elif isinstance(target, list):
        assert len(target) == num_examples, "Target list length does not match output!"
        if type(target[0]) is int:
            assert dims == 2, "Output must be 2D to select tensor of targets."
            return torch.gather(output, 1, torch.tensor(target).reshape(len(output), 1))
        elif type(target[0]) is tuple:
            return torch.stack(
                [output[(i,) + targ_elem] for i, targ_elem in enumerate(target)]
            )
        else:
            raise AssertionError("Target element type in list is not valid.")
    else:
        raise AssertionError("Target type %r is not valid." % target)

Toy Example breaks with CUDA on compute_convergence_delta for Integrated Gradients

For the toy example with cuda

model = ToyModel()
model = model.cuda()
model.eval()

input = torch.rand(2, 3).cuda()
baseline = torch.zeros(2, 3).cuda()

ig = IntegratedGradients(model)
attributions, delta = ig.attribute(input, baseline, target=0, return_convergence_delta=True)

fails with the error

~/anaconda3/envs/heterokaryon/lib/python3.7/site-packages/captum/attr/_utils/attribution.py in compute_convergence_delta(self, attributions, start_point, end_point, target, additional_forward_args)
    232         row_sums = [_sum_rows(attribution) for attribution in attributions]
    233         attr_sum = torch.tensor([sum(row_sum) for row_sum in zip(*row_sums)])
--> 234         return attr_sum - (end_point - start_point)
    235 
    236 

RuntimeError: expected device cpu and dtype Float but got device cuda:0 and dtype Float

presumably since attr_sum is not on GPU. Turning return_convergence_delta to False results in no error.

Similar issues may arise in other places, though I haven't checked.

Returning only the gradients/"multipliers"

Hi all,

Just wanted to put this particular use-case on your radar. Sometimes we find that it is useful to get access to just the gradients ("multipliers"), before they are multiplied by the difference-from-reference to get the final attribution. Specifically, we use the multipliers to estimate how the network might have responded had it seen slightly different inputs. We refer to these estimates as "hypothetical contribution scores". If you are curious how these hypothetical contributions look, here's a notebook (on a fork of the DeepSHAP repository) where I compute hypothetical contributions in the context of genomic data: https://github.com/AvantiShri/shap/blob/0b0350ba3a42af275f6e99ca2e3c5877d7d94f8a/notebooks/deep_explainer/PyTorch%20Deep%20Explainer%20DeepSEA%20example.ipynb

You've all done an awesome job with this repository, and I will definitely point it to the pytorch users in my lab once the release is formally announced. I totally understand if the ability to return just the multipliers is not something that you are likely to incorporate in the main release; I'm sure we can easily fork the repository and add that feature in for our lab's purposes.

Thanks again!
Av

Internal Server Error

I am running 'captum' on OS X 10.11.6 (also Ubuntu 16.04LTS).
The example 'python -m captum.insights.example' gets and Internal Server Error when I try
to connect to http://localhost:51283/ with Safari.

Any ideas?

============================= test session starts ==============================
platform darwin -- Python 3.6.7, pytest-5.0.1, py-1.8.0, pluggy-0.13.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/davidlaxer/captum/.hypothesis/examples')
rootdir: /Users/davidlaxer/captum
plugins: hypothesis-3.88.3
collected 212 items                                                            

tests/attr/test_approximation_methods.py ....                            [  1%]
tests/attr/test_common.py ........                                       [  5%]
tests/attr/test_data_parallel.py ssssssssssssssss                        [ 13%]
tests/attr/test_deeplift_basic.py ......                                 [ 16%]
tests/attr/test_deeplift_classification.py .....F..                      [ 19%]
tests/attr/test_gradient.py ........                                     [ 23%]
tests/attr/test_gradient_shap.py ...                                     [ 25%]
tests/attr/test_input_x_gradient.py .........                            [ 29%]
tests/attr/test_integrated_gradients_basic.py ........................   [ 40%]
tests/attr/test_integrated_gradients_classification.py ........          [ 44%]
tests/attr/test_internal_influence.py ..........                         [ 49%]
tests/attr/test_layer_activation.py ......                               [ 51%]
tests/attr/test_layer_conductance.py .............                       [ 58%]
tests/attr/test_layer_gradient_x_activation.py ......                    [ 60%]
tests/attr/test_neuron_conductance.py .........                          [ 65%]
tests/attr/test_neuron_gradient.py ........                              [ 68%]
tests/attr/test_neuron_integrated_gradients.py ........                  [ 72%]
tests/attr/test_saliency.py .........                                    [ 76%]
tests/attr/test_targets.py ...................................           [ 93%]
tests/attr/test_utils_batching.py .........                              [ 97%]
tests/attr/models/test_base.py .                                         [ 98%]
tests/attr/models/test_pytext.py ss                                      [ 99%]
tests/insights/test_contribution.py ..                                   [100%]

=================================== FAILURES ===================================
_____________ Test.test_softmax_classification_batch_zero_baseline _____________

self = <tests.attr.test_deeplift_classification.Test testMethod=test_softmax_classification_batch_zero_baseline>

    def test_softmax_classification_batch_zero_baseline(self):
        num_in = 40
        input = torch.arange(0.0, num_in * 3.0, requires_grad=True).reshape(3, num_in)
        baselines = 0 * input
    
        model = SoftmaxDeepLiftModel(num_in, 20, 10)
        dl = DeepLift(model)
    
>       self.softmax_classification(model, dl, input, baselines)

tests/attr/test_deeplift_classification.py:54: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/attr/test_deeplift_classification.py:117: in softmax_classification
    self._assert_attributions(model, attributions, input, baselines, delta, target2)
tests/attr/test_deeplift_classification.py:129: in _assert_attributions
    "some samples".format(delta),
E   AssertionError: False is not true : The sum of attribution values tensor([0.0008, 0.0023, 0.0039]) is not nearly equal to the difference between the endpoint for some samples
=============================== warnings summary ===============================
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/lib/pretty.py:91
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/lib/pretty.py:91: DeprecationWarning: IPython.utils.signatures backport for Python 2 is deprecated in IPython 6, which only supports Python 3
    from IPython.utils.signatures import signature

/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/utils/module_paths.py:28
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/utils/module_paths.py:28: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_batch
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_batch_4D_input
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_multi_ref
tests/attr/test_deeplift_basic.py::Test::test_relu_linear_deeplift
tests/attr/test_deeplift_basic.py::Test::test_tanh_deeplift
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool1d
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool2d
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool3d
tests/attr/test_deeplift_classification.py::Test::test_sigmoid_classification
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_multi_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_zero_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_multi_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_zero_baseline
tests/attr/test_targets.py::Test::test_multi_target_deep_lift
tests/attr/test_targets.py::Test::test_multi_target_deep_lift_shap
tests/attr/test_targets.py::Test::test_simple_target_deep_lift
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap_single_tensor
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap_tensor
  /Users/davidlaxer/captum/captum/attr/_core/deep_lift.py:327: UserWarning: Setting forward, backward hooks and attributes on non-linear
                 activations. The hooks and attributes will be removed
              after the attribution is finished
    after the attribution is finished"""

tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
tests/attr/test_layer_conductance.py::Test::test_matching_conv_with_baseline_conductance
tests/attr/test_layer_conductance.py::Test::test_matching_pool1_conductance
tests/attr/test_layer_conductance.py::Test::test_matching_pool2_conductance
tests/attr/test_neuron_gradient.py::Test::test_matching_intermediate_gradient
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_input_relu2
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
tests/attr/test_targets.py::Test::test_multi_target_deep_lift
tests/attr/test_targets.py::Test::test_multi_target_input_x_gradient
tests/attr/test_targets.py::Test::test_multi_target_saliency
tests/attr/test_targets.py::Test::test_simple_target_deep_lift
tests/attr/test_targets.py::Test::test_simple_target_input_x_gradient
tests/attr/test_targets.py::Test::test_simple_target_saliency
tests/attr/test_targets.py::Test::test_simple_target_saliency_tensor
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 0 did not already require gradients, required_grads has been set automatically.
    "required_grads has been set automatically." % index

tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:34: UserWarning: Input Tensor 1 had a non-zero gradient tensor, which is being reset to 0.
    "which is being reset to 0." % index

tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 2 did not already require gradients, required_grads has been set automatically.
    "required_grads has been set automatically." % index

tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 1 did not already require gradients, required_grads has been set automatically.
    "required_grads has been set automatically." % index

tests/attr/models/test_base.py::Test::test_interpretable_embedding_base
  /Users/davidlaxer/captum/captum/attr/_models/base.py:168: UserWarning: In order to make embedding layers more interpretable they will
          be replaced with an interpretable embedding layer which wraps the
          original embedding layer and takes word embedding vectors as inputs of
          the forward function. This allows to generate baselines for word
          embeddings and compute attributions for each embedding dimension.
          The original embedding layer must be set
          back by calling `remove_interpretable_embedding_layer` function
          after model interpretation is finished.
    after model interpretation is finished."""

tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/colors.py:101: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
    ret = np.asscalar(ex)

tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/image.py:424: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
    a_min = np.asscalar(a_min.astype(scaled_dtype))

tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/image.py:425: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
    a_max = np.asscalar(a_max.astype(scaled_dtype))

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=========================== short test summary info ============================
SKIPPED [1] tests/attr/test_data_parallel.py:116: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:187: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:254: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:38: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:68: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:98: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:137: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:168: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:219: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:24: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:56: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:84: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:123: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:154: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:200: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:235: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/models/test_pytext.py:81: Skip the test since PyText is not installed
SKIPPED [1] tests/attr/models/test_pytext.py:68: Skip the test since PyText is not installed
FAILED tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_zero_baseline
======= 1 failed, 193 passed, 18 skipped, 60 warnings in 1188.87 seconds =======

$ python -m captum.insights.example

Fetch data and view Captum Insights at http://localhost:51283/

<IPython.lib.display.IFrame object at 0x1211f1c18>

Screen Shot 2019-10-18 at 9 08 09 AM

On custom architecture

@orionr @zpao @asmeurer @asuhan @kostmo great work by the team , this is what i was looking for , i have few queries .

  1. can captum used be used for architectures like object detection and semantic segmentation
  2. would i be able to see the intermediate learnings during training

rand_img_dist defined but not used in the official tutorial

Hi,

In the tutorial Model Interpretation for Pretrained ResNet Model, for the occlusion experiment, rand_img_dist = torch.cat([input * 0, input * 1]) is defined but never used, maybe you want to remove it.

occlusion = Occlusion(model)

rand_img_dist = torch.cat([input * 0, input * 1])
attributions_occ = occlusion.attribute(input,
                                       strides = (3, 50, 50),
                                       target=pred_label_idx,
                                       sliding_window_shapes=(3,60, 60),
                                       baselines=0)

_ = viz.visualize_image_attr_multiple(np.transpose(attributions_occ.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      np.transpose(transformed_img.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      ["original_image", "heat_map"],
                                      ["all", "positive"],
                                      show_colorbar=True,
                                      outlier_perc=2,
                                     )

Undesirable behavior of LayerActivation in networks with inplace ReLUs

Hi,
I was trying to use captum.attr._core.layer_activation.LayerActivation to get the activation of the first convolutional layer in a simple model. Here is my code:

torch.manual_seed(23)
np.random.seed(23)
model = nn.Sequential(nn.Conv2d(3, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(inplace=True),
                      nn.Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(inplace=True))

layer_act = LayerActivation(model, model[0])
input = torch.randn(1, 3, 5, 5)
mylayer = model[0]
print(torch.norm(mylayer(input) - layer_act.attribute(input), p=2))

In fact, I have computed the activation in two different ways and compared them afterwards. Obviously, I expected a value close to zero to be printed as the output, however, this is what I got:

tensor(3.4646, grad_fn=<NormBackward0>)

I hypothesize that the inplace ReLU layer after the convolutional layer acts on its output since there were many zeros in the activation computed by Captum ( i.e. layer_act.attribute(input)). In fact, when I changed the architecture of the network to the following:

model = nn.Sequential(nn.Conv2d(3, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(),
                      nn.Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(inplace=True))

then the outputs matched.

System information

  • Python 3.7.0
  • torch 1.3.0
  • Captum 0.1.0

Captum for Bert Sentence Classification

Hi there,

I tried to apply Captum Tutorial for Q&A to Bert Sentence Classification task, but I am facing difficulties to adapt baselines / references part of the code for Classification and the new HugginFace Tokenizer.

Just want to check if someone is working in the same topic, so we can share experiences.

GradientShap's `attribute` method `baselines` argument should be None

class GradientShap(GradientAttribution):
def __init__(self, forward_func):
r"""
Args:
forward_func (function): The forward function of the model or
any modification of it
"""
GradientAttribution.__init__(self, forward_func)
def attribute(
self,
inputs,
baselines,
n_samples=5,
stdevs=0.0,
target=None,
additional_forward_args=None,
return_convergence_delta=False,
):

According to the docs, the baselines parameter in the attribute method of GradientShap is optional, and is replaced with a zero-filled Tensor as the same size as the input if not provided. However at the moment it's a required argument.

RuntimeError: expected device cpu but got device cuda:0 when training and visualizing model on IMDB

I was trying to reproduce the Interpreting text models: IMDB Sentiment Analysis but training my model instead of just loading a pretrained one.

I adapted the code of the original CNN tutorial but when I get to the point of calling interpret_sentence the following error occurs:

RuntimeError                              Traceback (most recent call last)
<ipython-input-23-68d49a3d040b> in <module>()
----> 1 interpret_sentence(model, 'It was a fantastic performance !', label=1)
      2 interpret_sentence(model, 'Best film ever', label=1)
      3 interpret_sentence(model, 'Such a great show!', label=1)
      4 interpret_sentence(model, 'It was a horrible movie', label=0)
      5 interpret_sentence(model, 'I\'ve never watched something as bad', label=0)

2 frames
<ipython-input-22-cbf5d478566f> in interpret_sentence(model, sentence, min_len, label)
     29     # compute attributions and approximation delta using integrated gradients
     30     attributions_ig, delta = ig.attribute(
---> 31         input_embedding, reference_embedding, n_steps=500, return_convergence_delta=True
     32     )
     33 

/usr/local/lib/python3.6/dist-packages/captum/attr/_core/integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
    232                 end_point,
    233                 additional_forward_args=additional_forward_args,
--> 234                 target=target,
    235             )
    236             return _format_attributions(is_inputs_tuple, attributions), delta

/usr/local/lib/python3.6/dist-packages/captum/attr/_utils/attribution.py in compute_convergence_delta(self, attributions, start_point, end_point, target, additional_forward_args)
    232         row_sums = [_sum_rows(attribution) for attribution in attributions]
    233         attr_sum = torch.tensor([sum(row_sum) for row_sum in zip(*row_sums)])
--> 234         return attr_sum - (end_point - start_point)
    235 
    236 

RuntimeError: expected device cpu but got device cuda:0

I am not sure, but I suppose the problem is that torch.tensor being created without any device argument. Can I work around this issue?

In this Colab Notebook you can reproduce the error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.