haofanwang / score-cam Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of Score-CAM in PyTorch
License: MIT License
Official implementation of Score-CAM in PyTorch
License: MIT License
Hi Thank you for your great work.I want to ask that is it possible to combine two CAM of two images for same class?(multiple instance)
In section 4.2 of the experiment part of the paper,there is a sentence saying "In this experiment, rather than do point-wise multiplication with the original generated saliency map, we slightly modify by limiting the number of positive pixels in the saliency map."Could you explain how you did this experiment?Compared with grad-cam++, which parts have you modified?
Dear author:
First of all, thanks for your great work!
In the paper i find FC layer is out of model.But in the code,i find the two FC-layer is inside the model. When I want to visualize the yolov4 that has pretrained,I find it dont have two FC behind the model.If I add two FC which is not pre-trained.It will random initialization what causes each layer of visualization map is not fixed.I don't know if I understand right or do you have any good suggestions.Looking forward to your reply!
I have received many emails about releasing the code for Energy-based Point Game (a modified evaluation metric proposed in our paper), to promote the reproducibility of the research, I will release the code recently (I'm working on NeurIPS reviewing process now, I will clean up the code later). Thanks for your patience!
I think the speed of score-cam is more slower than grad-cam,is that right?
Hi, I want to visualize my datasets on ResNet50. I wonder which layers' feature map should be used. My data's resolution is 28x28.
I found that some of activations are nearly zeros, I want to remove these calculations.
# short cut ratio
self.top_percent = 0.1
# 10% same quality
# use activation as masks
activations = self.activations
# remove useless activations by sorted mean activation , leave sub-masks
top_count = int(self.top_percent * activations.shape[1])
channel_scores = activations.mean(axis=[2, 3], keepdim=False).flatten()
top_indice = channel_scores.argsort(0,descending=True)[:top_count]
sub_masks = activations[:, top_indice] # only these will be computed
Hi, I have some questions regarding the Average Drop/Increase in Confidence metric in section 4.2.
In this experiment, rather than do point-wise multiplication with the original generated saliency map, we slightly modify by limiting the number of positive pixels in the saliency map (50% of pixels of the image are muted in our experiment).
Is there a way to make this work with batches of images?
Traceback (most recent call last):
File "D:/1.Study/PycharmProjects/Score-CAM-master/test.py", line 50, in
basic_visualize(input_.cpu(), scorecam_map.type(torch.FloatTensor).cpu(), save_path='resnet.png')
File "D:\1.Study\PycharmProjects\Score-CAM-master\utils_init_.py", line 299, in basic_visualize
input_ = format_for_plotting(denormalize(input_))
File "D:\1.Study\PycharmProjects\Score-CAM-master\utils_init_.py", line 173, in denormalize
channel.mul_(std).add_(mean)
您好,可以提供一下Average Increase Drop和其他指标的评估代码吗T_T
在下感激不尽。!!!!!!
在下感激不尽。!!!!!!
在下感激不尽。!!!!!!
Hi, I was looking at the computation of coefficients for the activation maps:
# how much increase if keeping the highlighted region
# predication on masked input
output = self.model_arch(input * norm_saliency_map)
output = F.softmax(output)
score = output[0][predicted_class]
score_saliency_map += score * saliency_map
In the paper (and in the comment), you refer to the Increase of confidence
, so the score
should be computed as the difference between the score of the raw input and the score of the masked input. However, looking at this implementation we understand that the score is just the one predicted on the masked input. Am I missing something?
Thank you
Nicole
Hi, I'm interested in Score-CAM and I'm using it.
I use Score-CAM in pytorch model vgg16_bn and ImageNet validation image. Most of the values on the activation map have negative values.
So score_saliency_map has all negative values and F.relu return all zeros.
My question is
(1) If score_sliency_map have all negative values, I cannot use Score-CAM?
(2) Is there any solution?
I have two questions and I couldn't find lines implementing the following functionality in scorecam.py
:
In Algorithm 1 in the paper you compute the score using a baseline image X_b. This is not done here and instead, we only have the first part of the equation.
In Algorithm 1 in the paper you apply softmax channel-wise for the importance scores. This is not done here and instead, we directly multiply with the score.
Am I missing something?
Dear author:
First of all, thanks for your great work!
When I check the code of scorecam.py, I notice that it computes the score_saliency_map for every single instance. This is sometimes inefficient when you want to compute the score-cam for a number of instances.
I also read your paper and find that for your algorithm it is in fact possible to compute score-cam for a mini-batch (correct me if I am wrong). This can be more efficient than computing for a single instance.
However, to do the mini-batch computations, some of the codes need to be modified:
I saw in ScoreCAM, you did score.backward(retain_graph=retain_graph)
. But according to my understanding, Score-CAM is gradient-free so the backward computation is in fact useless. We need to remove the backward process before we can do computation for mini-batch input.
In ScoreCAM you skip the computation whenever saliency_map.max() == saliency_map.min()
. This logic needs to be implemented for mini-batch computation as well.
I will leave my codes here. I have tested for a few instances and did not find any problems. This implementation will spend around 41 second for 32 instances on my server. And computing for one instance will spend around 16 second. So there is improvement for the efficiency problem.
As I am not sure whether the codes are correct, I will leave them below. You can check them when you are free.
def forward(self, input, class_idx=None, retain_graph=False):
b, c, h, w = input.size()
# predication on raw input
logit = self.model_arch(input).cuda()
if class_idx is None:
predicted_class = logit.max(1)[-1]
#score = logit[:, logit.max(1)[-1]].squeeze()
else:
predicted_class = class_idx.long() # assume the class_idx in tensor form
#predicted_class = torch.LongTensor([class_idx])
#score = logit[:, class_idx].squeeze()
logit = F.softmax(logit, dim=1)
if torch.cuda.is_available():
predicted_class= predicted_class.cuda()
#score = score.cuda()
logit = logit.cuda()
#self.model_arch.zero_grad()
#score.backward(retain_graph=retain_graph)
predicted_class = predicted_class.reshape(-1, 1)
activations = self.activations['value']
b, k, u, v = activations.size()
score_saliency_map = torch.zeros((b, 1, h, w))
if torch.cuda.is_available():
activations = activations.cuda()
score_saliency_map = score_saliency_map.cuda()
with torch.no_grad():
for i in range(k):
# upsampling
saliency_map = torch.unsqueeze(activations[:, i, :, :], 1)
saliency_map = F.interpolate(saliency_map, size=(h, w), mode='bilinear', align_corners=False)
#if saliency_map.max() == saliency_map.min():
# continue
# normalize to 0-1
saliency_max = saliency_map.view(b, -1).max(dim=1)[0]
saliency_max = saliency_max.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
saliency_min = saliency_map.view(b, -1).min(dim=1)[0]
saliency_min = saliency_min.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
norm_saliency_map = (saliency_map - saliency_min) / (saliency_max - saliency_min + 1e-7)
# how much increase if keeping the highlighted region
# predication on masked input
output = self.model_arch(input * norm_saliency_map)
output = F.softmax(output, dim=-1)
#score = output[0][predicted_class]
score = output[torch.arange(predicted_class.size(0)).unsqueeze(1), predicted_class]
# Apply the torch.where function, so the score of saliency_map.max() == saliency_map.min() instance is 0.
score = torch.where(saliency_map.view(b, -1).max(dim=1)[0].reshape(b, 1) > saliency_map.view(b, -1).min(dim=1)[0].reshape(b, 1),
score, torch.zeros_like(score))
score = score.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
score_saliency_map += score * saliency_map
score_saliency_map = F.relu(score_saliency_map)
score_saliency_map_min = score_saliency_map.view(b, -1).min(dim=1)[0]
score_saliency_map_min = score_saliency_map_min.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
score_saliency_map_max = score_saliency_map.view(b, -1).max(dim=1)[0]
score_saliency_map_max = score_saliency_map_max.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
#score_saliency_map_min, score_saliency_map_max = score_saliency_map.min(), score_saliency_map.max()
# count_nonzero is only available after pytorch 1.7.0
if len(((score_saliency_map_max - score_saliency_map_min) == 0).nonzero(as_tuple=False)) != 0:
raise Exception
#if score_saliency_map_min == score_saliency_map_max:
# return None
score_saliency_map = (score_saliency_map - score_saliency_map_min).div(score_saliency_map_max - score_saliency_map_min).data
return score_saliency_map
I found that using ScoreCAM to generate feature importance interpretations is slow, taking about 2 seconds for one image. If I want to generate all the image explanations in the Imagenet dataset, the time it takes seems very long. Or, is there a suitable acceleration method?
does it work for RNN+CNN model whose task is text classification
How are you generating cams when batch_size is more than one? In your implementation there is only one loop which iterates over number of activations but no loop for when batch_size is more than one. Your implementation only returns one activation, I am assuming it does not return multiple cams, when multiple images are given as input
Hello. I'm impressed with the idea of this paper and want to apply it to my project.
But there was an incomprehensible part between your thesis and code implementation.
In my understanding, CIC means difference between target score of original input image and target score multiplied by Mask and input image. Did I get it wrong?
In CIC, doesn't
Thank you!
Hi, haofanwang, the author of score_cam.
I'm a student learning interpretation of CNN. Something confused me about score_cam.
I copy a paragraph as follows in the paper "Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks".
3.2. Normalization on Score
Each forward passing in neural network is independent, the score amplitude of each forward propagation is unpredictable and not fixed. The relative output value (post-softmax) after normalization is more reasonable to measure the relevance than absolute output value (pre-softmax).
Thus, in Score-CAM, we represent weight as post-softmax value, so that the score can be rescaled into a fixed range.
...
Normalization operation equips Score-CAM with good class discrimination ability.
What is exactly the Normalization Operation?
After reading, I have two ideas.
Is both operation applied?
Which of them improved the discrimination power?
BTW, when both were applied , the effect is worse.
Would you please offer your Point Game
code for an evaluation? Thanks.
Hello Haofan,
I have simply attached a code example to show what I need on the example for GradCAM given by Keras and added comments on crucial points.
Thanks in advance,
Halil
GradCAM1.zip
Hello,
I have a question about the implementation of the Algorithm1 of the ScoreCAM paper.
The code
# how much increase if keeping the highlighted region
# predication on masked input
output = self.model_arch(input * norm_saliency_map)
output = F.softmax(output)
score = output[0][predicted_class]
suggests that the output is simply the masked images run through the original neural net. However, in the paper there is an additional step:
I am not sure exactly why this step is needed in the first place, but since it is in the paper, I am curious why it does not seem to be in the code?
Thank you.
There are 2 versions of ScoreCAM on arXiv. And the implementation of other libraries is also different.
The difference is the use of softmax operation and subtraction of baseline.
Which version is better?
Hi, did you include the code on Average Drop and Increase In
Confidence?
Hello, I am trying to visualize the last convolutional layer of my model, which have a sigmoid function as the last layer. In this case, when the masked array is inputted, should we use softmax function to the output of sigmoid function and take dot with activation map (as this calculation results in all one vector)? Otherwise, should we just use the output of sigmoid function, to multiply with activation map?
Sincerely,
@haofanwang @haofanw hi thanks for opensourcing the code , can we use ScoreCAM for models like MaskRCNN/Deeplab and FasterRCNN,Yolov2, RetinaNet
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.