Giter Club home page Giter Club logo

mm-shap's People

Contributors

letip avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mm-shap's Issues

RuntimeError: cannot register a hook on a tensor that doesn't require gradient

Hello @LetiP,

It's me again :P Thank you for your patience and time.

The spec of my usage GPU: 4x Nvidia GTX 1080 Ti (Pascal, 11GB memory), in 24 cores/48 threads/256 GB memory server

Here is my setting in the beginning of the mm-shap_albef_dataset.py

num_samples = "all"  # "all" or number
if num_samples != "all":
    num_samples = int(num_samples)
checkp = "mscoco"  # refcoco, mscoco, vqa, flickr30k
write_res = "yes"  # "yes" or "no"
task = "image_sentence_alignment"  # image_sentence_alignment, vqa, gqa
other_tasks_than_valse = ['mscoco', 'vqa', 'gqa', 'gqa_balanced', 'nlvr2']
use_cuda = True

DATA = {
    "existence": ["/home/students/cheng/MM-SHAP/visual7w/images",
                  '/home/students/cheng/MM-SHAP/data/existence.json'],
      }

I google for some solutions for this issue, and usually it's related to:

However, these two issues sound not like the case I have here.
Do you encounter any similar problem?

Here is the OOM:

Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.

  0%|          | 0/534 [00:00<?, ?it/s]
  0%|          | 0/534 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "mm-shap_albef_dataset.py", line 306, in <module>
    shap_values = explainer(X)
  File "/home/students/cheng/MM-SHAP/shap/explainers/_permutation.py", line 62, in __call__
    batch_size=batch_size, outputs=outputs, silent=silent
  File "/home/students/cheng/MM-SHAP/shap/explainers/_permutation.py", line 76, in __call__
    outputs=outputs, silent=silent
  File "/home/students/cheng/MM-SHAP/shap/explainers/_explainer.py", line 260, in __call__
    batch_size=batch_size, outputs=outputs, silent=silent, **kwargs
  File "/home/students/cheng/MM-SHAP/shap/explainers/_permutation.py", line 134, in explain_row
    outputs = fm(masks, zero_index=0, batch_size=batch_size)
  File "/home/students/cheng/MM-SHAP/shap/utils/_masked_model.py", line 65, in __call__
    return self._full_masking_call(full_masks, zero_index=zero_index, batch_size=batch_size)
  File "/home/students/cheng/MM-SHAP/shap/utils/_masked_model.py", line 141, in _full_masking_call
    outputs = self.model(*joined_masked_inputs)
  File "/home/students/cheng/MM-SHAP/shap/models/_model.py", line 21, in __call__
    return np.array(self.inner_model(*args))
  File "mm-shap_albef_dataset.py", line 184, in get_model_prediction
    masked_text_inputs.to("cuda"))
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "mm-shap_albef_dataset.py", line 92, in forward
    return_dict=True,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 1067, in forward
    mode=mode,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 601, in forward
    output_attentions,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 504, in forward
    output_attentions=output_attentions,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 407, in forward
    output_attentions,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 329, in forward
    attention_probs.register_hook(self.save_attn_gradients)         
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/_tensor.py", line 289, in register_hook
    raise RuntimeError("cannot register a hook on a tensor that "
RuntimeError: cannot register a hook on a tensor that doesn't require gradient
srun: error: gpu08: task 0: Exited with exit code 1

Questions from applying mm-shap to the new model (LLaVA-next)

The first question:

masked_X[0, 0] = 49406

I check with this code and try to apply it also on the model I am interested in LLaVA-Next (https://huggingface.co/docs/transformers/model_doc/llava_next). I know the number 49406 (rule out CLS and SEP, it’s 49408-2) represent for vocab_size. Since the same parameters in LLaVA-Next is None by default, I am wondering how to pick an apporpreate number for it, also, with other parameters. If you have any idea of it, please let me know.

The second question:

I find out a example down below:

from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)  # we can take the softmax to get the label probabilities

# Reference: https://huggingface.co/docs/transformers/model_doc/clip

Usually, it would need a text_input for asking the caption. However, I didn’t see the asking part in ‘mm-shap_clip_dataset.py’

There are some parameters I would need to revise when I implement LLaVA-Next

LLaVA:-Next https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/blob/main/config.json

  • image_size: 336
  • vocab_size: 32000

it seems to me that LLaVA is more complex than Clip model since it seperate a picture into four.

Clip: https://huggingface.co/openai/clip-vit-base-patch32/blob/main/config.json

  • image_size: 224
  • vocab_size: 49408

Here is the setting of the expirement I would like to do on LLaVa-Next with MM-Shap metrics.

  • num_samples: all
  • task: image_sentence_alignment
  • Dataset: existence

RuntimeError when Registering Hooks on Tensor in mm_albef_dataset

First of all, Thank you for sharing the code base of this interesting work!

I encountered an issue while trying to run the following command python mm-shap_albef_dataset.py 3 "refcoco" "yes".
Below is the error message I received :
RuntimeError: cannot register a hook on a tensor that doesn't require gradient
Here's the full Traceback:

  0%|                                                                                             | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "mm-shap_albef_dataset.py", line 307, in <module>
    shap_values = explainer(X)
  File "/mnt/c/Users/Documents/work/MM-SHAP/shap/explainers/_permutation.py", line 62, in __call__
    batch_size=batch_size, outputs=outputs, silent=silent
  File "/mnt/c/Users/Documents/work/MM-SHAP/shap/explainers/_permutation.py", line 76, in __call__
    outputs=outputs, silent=silent
  File "/mnt/c/Users/Documents/work/MM-SHAP/shap/explainers/_explainer.py", line 260, in __call__
    batch_size=batch_size, outputs=outputs, silent=silent, **kwargs
  File "/mnt/c/Users/Documents/work/MM-SHAP/shap/explainers/_permutation.py", line 134, in explain_row
    outputs = fm(masks, zero_index=0, batch_size=batch_size)
  File "/mnt/c/Users/Documents/work/MM-SHAP/shap/utils/_masked_model.py", line 65, in __call__
    return self._full_masking_call(full_masks, zero_index=zero_index, batch_size=batch_size)
  File "/mnt/c/Users/Documents/work/MM-SHAP/shap/utils/_masked_model.py", line 141, in _full_masking_call
    outputs = self.model(*joined_masked_inputs)
  File "/mnt/c/Users/Documents/work/MM-SHAP/shap/models/_model.py", line 21, in __call__
    return np.array(self.inner_model(*args))
  File "mm-shap_albef_dataset.py", line 192, in get_model_prediction
    masked_text_inputs.to("cuda"))
  File "/home/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "mm-shap_albef_dataset.py", line 100, in forward
    return_dict=True,
  File "/home/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/Documents/work/phd_work/MM-SHAP/ALBEF/models/xbert.py", line 1067, in forward
    mode=mode,
  File "/home/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/Documents/work/phd_work/MM-SHAP/ALBEF/models/xbert.py", line 601, in forward
    output_attentions,
  File "/home/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/Documents/work/phd_work/MM-SHAP/ALBEF/models/xbert.py", line 504, in forward
    output_attentions=output_attentions,
  File "/home/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/Documents/work/phd_work/MM-SHAP/ALBEF/models/xbert.py", line 407, in forward
    output_attentions,
  File "/home/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/Documents/work/phd_work/MM-SHAP/ALBEF/models/xbert.py", line 329, in forward
    attention_probs.register_hook(self.save_attn_gradients)
  File "/home/anaconda3/envs/shap/lib/python3.6/site-packages/torch/_tensor.py", line 289, in register_hook
    raise RuntimeError("cannot register a hook on a tensor that "
RuntimeError: cannot register a hook on a tensor that doesn't require gradient

I resolved this issue by setting save_attention=False at this line 215 - mm_albef_dataset.py :

 model.text_encoder.base_model.base_model.encoder.layer[
        block_num].crossattention.self.save_attention = False  

My question is, is it mandatory to keep registering the attention gradients to accuretly calculate the textual and visual contributions?

How is Explainer getting image data in CLIP?

As part of my thesis, I am trying to understand the code in mm-shap_clip_dataset.py, and I'm a bit stumped at the following section, in which we generate the tensor X which is passed to the Explainer instance to generate masks and then SHAP values. I am concerned that in the code as it is written here, X ends up containing no image data -- or at least, I do not understand how it does.

# shap values need one sentence for transformer
            for k, sentence in enumerate(test_sentences):

                try:  # image feature extraction can go wrong
                    inputs = processor(
                        text=sentence, images=image, return_tensors="pt", padding=True
                    )
                except:
                    continue
                model_prediction = model(**inputs).logits_per_image[0,0].item()

                text_length_tok = inputs.input_ids.shape[1]
                p = int(math.ceil(np.sqrt(text_length_tok)))
                patch_size = 224 // p
                image_token_ids = torch.tensor(
                    range(1, p**2+1)).unsqueeze(0) # (inputs.pixel_values.shape[-1] // patch_size)**2 +1
                # make a cobination between tokens and pixel_values (transform to patches first)
                X = torch.cat(
                    (inputs.input_ids, image_token_ids), 1).unsqueeze(1)

                # create an explainer with model and image masker
                explainer = shap.Explainer(
                    get_model_prediction, custom_masker, silent=True)
                shap_values = explainer(X)
                mm_score = compute_mm_score(text_length_tok, shap_values)

Specifically, X consists of a concatenation of two things: image_token_ids (image) and inputs.input_ids (text)

                # make a cobination between tokens and pixel_values (transform to patches first)
                X = torch.cat(
                    (inputs.input_ids, image_token_ids), 1).unsqueeze(1)

But while the inputs object contains both text and image data, image_token_ids seems to take no image data from the inputs object's pixel_values (other than in its shape).

image_token_ids = torch.tensor(
                    range(1, p**2+1)).unsqueeze(0) # (inputs.pixel_values.shape[-1] // patch_size)**2 +1

Then, by the time we generate the concatenation X, we are combining inputs.input_ids and image_token_ids without having added anything to image_token_ids.

Right after X is assigned, we create an Explainer and pass X to it.



                # create an explainer with model and image masker
                explainer = shap.Explainer(
                    get_model_prediction, custom_masker, silent=True)
                shap_values = explainer(X)

So what I am trying to understand is how does the explainer gets any access to the image data when X consists only of the text data + the blank image_token_ids? Would appreciate any input, thanks!

Models on the VQA task

Hello,

Thank you for your work!

I am trying to understand how the reported Shapley values were estimated for the VQA/GQA tasks. Here are some specific questions:

  1. Are the question and answer of each instance concatenated together for textual input to the model (LXMERT/ALBEF-VQA)?
  2. What model output is being distributed among the tokens? final argmax probability?

Parameters that affect GPU RAM usage

Thanks for the wrok. I tried to reproduce the results of this paper, however I encountered the problem of insufficient GPU RAM.

The python file I am trying: mm-shap_albef_dataset.py
DATA I used: existence
number of sample: all

The basic setting I used:

# num_samples = sys.argv[1] # "all" or number
num_samples = "all" # "all" or number
if num_samples != "all":
    num_samples = int(num_samples)
# checkp = sys.argv[2] #  refcoco, mscoco, vqa, flickr30k
checkp = "mscoco" #  refcoco, mscoco, vqa, flickr30k
# write_res = sys.argv[3] # "yes" or "no"
write_res = "yes" # "yes" or "no"
task = "image_sentence_alignment"  # image_sentence_alignment, vqa, gqa
other_tasks_than_valse = ['mscoco', 'vqa', 'gqa', 'gqa_balanced', 'nlvr2']
use_cuda = True

The problem I encounter:

RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 2.00 GiB total capacity; 1008.19 MiB already allocated; 11.44 MiB free; 1.04 GiB reserved in total by PyTorch)

I am wondering if I cannot scale my GPU RAM right now, which Parameters I should pay attention to?

following few option on my mind now:

  • num_samples

  • patch_size (?)
    I try this today, but it will affect the shape torch.Size. I only change it from 16 to 32

RuntimeError: Error(s) in loading state_dict for VL_Transformer_ITM:
        size mismatch for visual_encoder.pos_embed: copying a param with shape torch.Size([1, 577, 768]) from checkpoint, the shape in current model is torch.Size([1, 145, 768]).
        size mismatch for visual_encoder.patch_embed.proj.weight: copying a param with shape torch.Size([768, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]).

Thanks for your reading and time in advance.

Unable to locate "existence_benchmark.test_mturk.json"

Description:

I'm currently working on MM-Shap and I'm in need of the file existence_benchmark.test_mturk.json. Can someone provide guidance on how to obtain this file or is there a specific procedure to generate it?

if I use the existence.json in VALSE, it also reminder me that "test_sentences = [foil["caption"], foil["foils"][0]]
KeyError: 'foils'"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.