Giter Club home page Giter Club logo

score-cam's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

score-cam's Issues

I have some issues of your score-cam paper,looking forward to your answer

In section 4.2 of the experiment part of the paper,there is a sentence saying "In this experiment, rather than do point-wise multiplication with the original generated saliency map, we slightly modify by limiting the number of positive pixels in the saliency map."Could you explain how you did this experiment?Compared with grad-cam++, which parts have you modified?

At the end of the model must have two full-connection layers?

Dear author:

First of all, thanks for your great work!
In the paper i find FC layer is out of model.But in the code,i find the two FC-layer is inside the model. When I want to visualize the yolov4 that has pretrained,I find it dont have two FC behind the model.If I add two FC which is not pre-trained.It will random initialization what causes each layer of visualization map is not fixed.I don't know if I understand right or do you have any good suggestions.Looking forward to your reply!

Implementation for Energy-based Point Game

I have received many emails about releasing the code for Energy-based Point Game (a modified evaluation metric proposed in our paper), to promote the reproducibility of the research, I will release the code recently (I'm working on NeurIPS reviewing process now, I will clean up the code later). Thanks for your patience!

Is it this acceleration resonable

I found that some of activations are nearly zeros, I want to remove these calculations.

# short cut ratio
self.top_percent = 0.1
# 10% same quality

# use activation as masks
activations = self.activations

# remove useless activations by sorted mean activation , leave sub-masks 
top_count = int(self.top_percent * activations.shape[1])
channel_scores = activations.mean(axis=[2, 3], keepdim=False).flatten()
top_indice = channel_scores.argsort(0,descending=True)[:top_count]
sub_masks = activations[:, top_indice]  # only these will be computed

Grid Effects

I am only getting strong activations along a grid pattern. Attached image shows 1,000 averaged activation maps which highlights grid. Is there a known reason for this effect?

I am using a standard resnet implementation with dilation set to 1 on all conv layers
grid-activations

Modifying images before evaluation using Average Drop/Increase in Confidence

Hi, I have some questions regarding the Average Drop/Increase in Confidence metric in section 4.2.

In this experiment, rather than do point-wise multiplication with the original generated saliency map, we slightly modify by limiting the number of positive pixels in the saliency map (50% of pixels of the image are muted in our experiment).

  • Do you select 50% of the pixels randomly for every image?
  • Is it 50% of the input image, or 50% of the overlayed image produced by the multiplication?

RuntimeError: Output 0 of UnbindBackward is a view and is being modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

Traceback (most recent call last):
File "D:/1.Study/PycharmProjects/Score-CAM-master/test.py", line 50, in
basic_visualize(input_.cpu(), scorecam_map.type(torch.FloatTensor).cpu(), save_path='resnet.png')
File "D:\1.Study\PycharmProjects\Score-CAM-master\utils_init_.py", line 299, in basic_visualize
input_ = format_for_plotting(denormalize(input_))
File "D:\1.Study\PycharmProjects\Score-CAM-master\utils_init_.py", line 173, in denormalize
channel.mul_(std).add_(mean)

Average Increase Drop和其他指标的评估代码

您好,可以提供一下Average Increase Drop和其他指标的评估代码吗T_T
在下感激不尽。!!!!!!
在下感激不尽。!!!!!!
在下感激不尽。!!!!!!

Coefficients of activation maps

Hi, I was looking at the computation of coefficients for the activation maps:

              # how much increase if keeping the highlighted region
              # predication on masked input
              output = self.model_arch(input * norm_saliency_map)
              output = F.softmax(output)
              score = output[0][predicted_class]

              score_saliency_map +=  score * saliency_map

In the paper (and in the comment), you refer to the Increase of confidence, so the score should be computed as the difference between the score of the raw input and the score of the masked input. However, looking at this implementation we understand that the score is just the one predicted on the masked input. Am I missing something?

Thank you
Nicole

If value of score_saliency_map is all nagative, how can i solve this?

Hi, I'm interested in Score-CAM and I'm using it.

I use Score-CAM in pytorch model vgg16_bn and ImageNet validation image. Most of the values on the activation map have negative values.

So score_saliency_map has all negative values and F.relu return all zeros.

My question is

(1) If score_sliency_map have all negative values, I cannot use Score-CAM?

(2) Is there any solution?

Softmax of scores across channels / Baseline image

I have two questions and I couldn't find lines implementing the following functionality in scorecam.py:

  • In Algorithm 1 in the paper you compute the score using a baseline image X_b. This is not done here and instead, we only have the first part of the equation.

  • In Algorithm 1 in the paper you apply softmax channel-wise for the importance scores. This is not done here and instead, we directly multiply with the score.

Am I missing something?

Efficiency problem regarding the implementation of Score-CAM

Dear author:

First of all, thanks for your great work!

When I check the code of scorecam.py, I notice that it computes the score_saliency_map for every single instance. This is sometimes inefficient when you want to compute the score-cam for a number of instances.

I also read your paper and find that for your algorithm it is in fact possible to compute score-cam for a mini-batch (correct me if I am wrong). This can be more efficient than computing for a single instance.

However, to do the mini-batch computations, some of the codes need to be modified:

  1. I saw in ScoreCAM, you did score.backward(retain_graph=retain_graph). But according to my understanding, Score-CAM is gradient-free so the backward computation is in fact useless. We need to remove the backward process before we can do computation for mini-batch input.

  2. In ScoreCAM you skip the computation whenever saliency_map.max() == saliency_map.min() . This logic needs to be implemented for mini-batch computation as well.

I will leave my codes here. I have tested for a few instances and did not find any problems. This implementation will spend around 41 second for 32 instances on my server. And computing for one instance will spend around 16 second. So there is improvement for the efficiency problem.

As I am not sure whether the codes are correct, I will leave them below. You can check them when you are free.

def forward(self, input, class_idx=None, retain_graph=False):
    b, c, h, w = input.size()
    # predication on raw input
    logit = self.model_arch(input).cuda()
    
    if class_idx is None:
        predicted_class = logit.max(1)[-1]
        #score = logit[:, logit.max(1)[-1]].squeeze()
    else:
        predicted_class = class_idx.long() # assume the class_idx in tensor form
        #predicted_class = torch.LongTensor([class_idx])
        #score = logit[:, class_idx].squeeze()
    
    logit = F.softmax(logit, dim=1)

    if torch.cuda.is_available():
      predicted_class= predicted_class.cuda()
      #score = score.cuda()
      logit = logit.cuda()

    #self.model_arch.zero_grad()
    #score.backward(retain_graph=retain_graph)

    predicted_class = predicted_class.reshape(-1, 1)

    activations = self.activations['value']
    b, k, u, v = activations.size()
    
    score_saliency_map = torch.zeros((b, 1, h, w))

    if torch.cuda.is_available():
      activations = activations.cuda()
      score_saliency_map = score_saliency_map.cuda()

    with torch.no_grad():
      for i in range(k):

          # upsampling
          saliency_map = torch.unsqueeze(activations[:, i, :, :], 1)
          saliency_map = F.interpolate(saliency_map, size=(h, w), mode='bilinear', align_corners=False)
          
          #if saliency_map.max() == saliency_map.min():
          #  continue
          
          # normalize to 0-1
          saliency_max = saliency_map.view(b, -1).max(dim=1)[0]
          saliency_max = saliency_max.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
          saliency_min = saliency_map.view(b, -1).min(dim=1)[0]
          saliency_min = saliency_min.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
          norm_saliency_map = (saliency_map - saliency_min) / (saliency_max - saliency_min + 1e-7)
          

          # how much increase if keeping the highlighted region
          # predication on masked input
          output = self.model_arch(input * norm_saliency_map)
          output = F.softmax(output, dim=-1)
          #score = output[0][predicted_class]
          score = output[torch.arange(predicted_class.size(0)).unsqueeze(1), predicted_class]
          # Apply the torch.where function, so the score of saliency_map.max() == saliency_map.min() instance is 0.
          score = torch.where(saliency_map.view(b, -1).max(dim=1)[0].reshape(b, 1) > saliency_map.view(b, -1).min(dim=1)[0].reshape(b, 1), 
                                score, torch.zeros_like(score))
        
          score = score.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
          score_saliency_map +=  score * saliency_map
    
    score_saliency_map = F.relu(score_saliency_map)
    score_saliency_map_min = score_saliency_map.view(b, -1).min(dim=1)[0]
    score_saliency_map_min = score_saliency_map_min.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
    score_saliency_map_max = score_saliency_map.view(b, -1).max(dim=1)[0]
    score_saliency_map_max = score_saliency_map_max.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
    #score_saliency_map_min, score_saliency_map_max = score_saliency_map.min(), score_saliency_map.max()

    # count_nonzero is only available after pytorch 1.7.0 
    if len(((score_saliency_map_max - score_saliency_map_min) == 0).nonzero(as_tuple=False)) != 0:
        raise Exception
    
    #if score_saliency_map_min == score_saliency_map_max:
    #    return None

    score_saliency_map = (score_saliency_map - score_saliency_map_min).div(score_saliency_map_max - score_saliency_map_min).data
    return score_saliency_map

scorecam of batch size more than 1

How are you generating cams when batch_size is more than one? In your implementation there is only one loop which iterates over number of activations but no loop for when batch_size is more than one. Your implementation only returns one activation, I am assuming it does not return multiple cams, when multiple images are given as input

I have some questions about paper and your implementation.

Hello. I'm impressed with the idea of ​​this paper and want to apply it to my project.
But there was an incomprehensible part between your thesis and code implementation.

In my understanding, CIC means difference between target score of original input image and target score multiplied by Mask and input image. Did I get it wrong?

In CIC, doesn't $X_b$ mean the input image? In your implementation, you used the target score for the mask multiplied by the input image to get the CIC. The target score for the original image is not subtracted. I can't understand this part very well. I want to know what I misunderstood.

Thank you!

Normalization Operation

Hi, haofanwang, the author of score_cam.

I'm a student learning interpretation of CNN. Something confused me about score_cam.
I copy a paragraph as follows in the paper "Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks".

3.2. Normalization on Score
Each forward passing in neural network is independent, the score amplitude of each forward propagation is unpredictable and not fixed. The relative output value (post-softmax) after normalization is more reasonable to measure the relevance than absolute output value (pre-softmax).
Thus, in Score-CAM, we represent weight as post-softmax value, so that the score can be rescaled into a fixed range.
...
Normalization operation equips Score-CAM with good class discrimination ability.

What is exactly the Normalization Operation?
After reading, I have two ideas.

  1. Normalization on logits. Vgg16 in pytorch output logits that can include negative elements. The probs(probabilities), is the output of logits after softmax function. The prob of class c is the Normalization.
    This idea comes from the replacement of score function. Score function without norm output a logit, otherwise prob.
  2. Normalization on scores. Scores(CIC) of every channel are stored in tensor scores. Scores act as weights. As written in Algorithm 1, scores are sent to softmax inplace, to ensure the sum of them equals one.

Is both operation applied?
Which of them improved the discrimination power?

BTW, when both were applied , the effect is worse.

Visualization Issue

Hello Haofan,
I have simply attached a code example to show what I need on the example for GradCAM given by Keras and added comments on crucial points.

Thanks in advance,

Halil
GradCAM1.zip

ScoreCAM paper Algorithm1 implementation question

Hello,

I have a question about the implementation of the Algorithm1 of the ScoreCAM paper.
The code

              # how much increase if keeping the highlighted region
              # predication on masked input
              output = self.model_arch(input * norm_saliency_map)
              output = F.softmax(output)
              score = output[0][predicted_class]

suggests that the output is simply the masked images run through the original neural net. However, in the paper there is an additional step:
$S^{c} = f^c(M) - f^c(X_b)$.

I am not sure exactly why this step is needed in the first place, but since it is in the paper, I am curious why it does not seem to be in the code?

Thank you.

2 versions of ScoreCAM

There are 2 versions of ScoreCAM on arXiv. And the implementation of other libraries is also different.
The difference is the use of softmax operation and subtraction of baseline.
Which version is better?

Score-CAM when the output function is sigmoid

Hello, I am trying to visualize the last convolutional layer of my model, which have a sigmoid function as the last layer. In this case, when the masked array is inputted, should we use softmax function to the output of sigmoid function and take dot with activation map (as this calculation results in all one vector)? Otherwise, should we just use the output of sigmoid function, to multiply with activation map?

Sincerely,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.