First of all, thank you for your work. The method is promising and your article is

Hi, Sorry for the late reply. "I'm using the

Question concerning OOD detection about clipn HOT 2 OPEN

romain-martin commented on September 28, 2024

Question concerning OOD detection

from clipn.

Comments (2)

SiLangWHL commented on September 28, 2024

Hi,

Sorry for the late reply.

"I'm using the .pt weights you kindly provided," The provided weights are for CLIPN with learnable prompts. So these models didn't know the meaning of negative keywords. That means you should train CLIPN (hand-crafted prompts) by yourself.
"In this case, wouldn't the ATD method be unusable when you only have one class and just want to discard the FP as pij is equal to 1?" It seems right. CLIPN works based on finding the best ID score or adjusting ID scores. When deploying it to binary classification task, saying no probability is functionally equal to the (1 - ID probability).
"Do you have a reference or a paper explaining where Eq.4 comes from? " Referring to Figure 4, equation 4 is used to teach the CLIP model to dis-match the image with its positive text. The motivation is opposite to the original contrastive loss in the CLIP paper.

from clipn.

romain-martin commented on September 28, 2024

Hello,
Thank for your answer,

ok I thought the weights already knew the negative keywords meaning. Just to be sure, as I may misunderstood the difference between hand-crafted prompts (1) and learnable ones (2).
(1) Mean ClipN has two text encoders, one of them being used for negatives prompts to further use ATD or CTW strategy to enhance the original CLIP ability to perform zero-shot classification, meaning that it can assess if an image doesn't match the given categories.
(2) Has the same purpose except that the negative weights have been learned so there is no need of the text encoders anymore, the weights are already embedded in the model?
Correct me if I'm wrong, but in this case, isn't (1) more general so better suited for zero-shot classification?

Is it planned to release model with hand crafted prompts? If not I can do it myself using the run.sh in the handcrafted directory using CC3M for example to learn how to say no? Or would it be better to train using images from my usecase as positive prompt and CC3M or ImageNet as negative prompts? The goal is mainly to determine whether an image passed is a False Positive (does not belong to a list of predetermined categories). How long did it take you to train using 4 gpus?
As for Eq.4, I understand the purpose but from where comes this equation? (3) Look like a kind of cross Entropy Loss, but as for (4) it looks as a softmax function to get the probability of being a no, right?

Thank you again

from clipn.

Question concerning OOD detection about clipn HOT 2 OPEN

Comments (2)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent