Hello, I would appreciate it if you could respond to some of my questions below: <

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Some Questions about selfpatch HOT 12 OPEN

alinlab commented on September 23, 2024

Some Questions

from selfpatch.

Comments (12)

yanjk3 commented on September 23, 2024

I am not the author and I hope my answer can help you.
A1: because the neighbor of a patch is defined in the same view. The ''neighbor'' is not easy to define in a cross-view situation. (or you can try to define it with some spatial priori)
A2: the local views are fed into the teacher network to contribute to the selfpatch loss, i.e., the loss from the same view mentioned before, which may not be a must and may accelerate the convergence.
A3: ''loc=True'' means aggregating the neighbor's features, which is enabled in the teacher network. E.g., the i^th patch of the teacher network aggregates its neighbor's features. In the student model, we do not aggregate them. Then, we maximize the similarity between the student's i^th patch and the teacher's i^th patch (it includes the neighbor's features) to model the patch-level representations.

I hope the above opinion may help u.

from selfpatch.

bryanwong17 commented on September 23, 2024

Hi @yanjk3, Thank you very much for the answers, I really appreciate it. It makes more sense now that I know the authors made a slight modification to the original DINO

from selfpatch.

bryanwong17 commented on September 23, 2024

Hi @yanjk3, When I use eval_knn.py from original dino to evaluate selfpatch, it says:

size mismatch for pos_embed: copying a param with shape torch.Size([1, 196, 384]) from checkpoint, the shape in current model is torch.Size([1, 197, 384]).

Do you have any ideas on how can I fix it? Thank you

from selfpatch.

yanjk3 commented on September 23, 2024

This is because the selfpatch checkpoint does not contain the CLS token. Therefore, the position embedding's size is mismatched. In selfpatch, the CLS token is in the SelfPatchHead https://github.com/alinlab/SelfPatch/blob/main/selfpatch_vision_transformer.py#L362, so the ViT backbone does not need the CLS token.

I think you can fix it by modifying the dino's ViT codes https://github.com/facebookresearch/dino/blob/main/vision_transformer.py#L147 from self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim)) to self.pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim)).
And then you should delete the '-1' in line 175 and line 176, and exchange line 202 and line 205.
However, as the selfpatch checkpoint does not contain the CLS token, the ViT model will randomly initialize a CLS token and lead to a potential performance drop. I think you can use a global average pooling on the last transformer block to get the global feature representation of images instead of using the CLS token.

from selfpatch.

bryanwong17 commented on September 23, 2024

Hi @yanjk3, thank you for your answers. Could you demonstrate how I can use a global avg pooling on the last transformer blocks?

from selfpatch.

yanjk3 commented on September 23, 2024

You should make sure you delete the CLS token in the ViT first. And then, you can insert
x = x.mean(dim=1)
after the
x = self.norm(x)
and then return the x

from selfpatch.

bryanwong17 commented on September 23, 2024

Hi @yanjk3, I already took your advice, but it appears that the accuracy is 3% less than it was for the original DINO under the same settings for eval knn.py. What solutions do you have for this? How can accuracy be checked more accurately? Is it better to check from eval linear.py or eval knn.py? Thanks

from selfpatch.

yanjk3 commented on September 23, 2024

To overcome the performance drops, I recommend copying the SelfPatch ViT to the Dino ViT.
The main difference between them is:

SelfPatch uses the CA block after the ViT blocks to aggregate the global feature representations and output the CLS token.

If you use this CLS token, the performance may be improved.
But unfortunately, the released checkpoint only contains the ViT backbone. So if you want to get a precise answer, you should pre-train the entire model on your own.

from selfpatch.

bryanwong17 commented on September 23, 2024

Hi @yanjk3, sorry I don't really get it. What do you mean by copying SelfPatch VIT to DINO VIT?

from selfpatch.

yanjk3 commented on September 23, 2024

I mean you should replace the dino vit model's code with selfpatch vit model's code.

from selfpatch.

bryanwong17 commented on September 23, 2024

Hi @yanjk3, do you mean adding everything you previously suggested to the code for the Dino Vit Model (vision transformer.py)?

from selfpatch.

alijavidani commented on September 23, 2024

Hi @bryanwong17, @yanjk3 . I'm having the same problem as yours. I cannot do the evaluation using eval_knn.py.
I was wondering could you find a solution for this problem?
Thanks in advance.

from selfpatch.

Some Questions about selfpatch HOT 12 OPEN

Comments (12)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent