yzd-v / cls_kd Goto Github PK

View Code? Open in Web Editor NEW

199.0 199.0 16.0 7.79 MB

'NKD and USKD' (ICCV 2023) and 'ViTKD' (CVPRW 2024)

License: Apache License 2.0

Python 99.82% Dockerfile 0.10% Shell 0.08%

image-classification knowledge-distillation pytorch self-knowledge-distillation vision-transformer

cls_kd's People

Contributors

Stargazers

Watchers

Forkers

axel-li jlqzzz mldl mohan259 richardhahahaha dl-kd aust-hansen wyh20000305 renshengji turingsu viktor-paul pacifichongyang zhuyichen99 jiangxiaoyu1224 noticeable

cls_kd's Issues

Cifar100 dataset question

First of all, this is an excellent job, I want cifar100 as a dataset to run this code, but I found that the base of cifar100 is provided in configs/distillers, but in the resnet folder, I can't find the corresponding file，I hope you can clear my doubts, thanks

loss is always nan

When I use my own dataset for distillation training, loss is always nan, what is the reason?

Question about ViTKD loss code

Hi, Thanks for your great work.

May I ask if this part of the code is consistent with chapter 3.1? I am quite confused referring to the paper.

cls_KD/mmcls/distillation/losses/vitkd.py

Lines 58 to 66 in 7d838f6

 '''ViTKD: Mimicking''' 

 if self.align2 is not None: 

 for i in range(2): 

 if i == 0: 

 xc = self.align2[i](low_s[:,i]).unsqueeze(1) 

 else: 

 xc = torch.cat((xc, self.align2[i](low_s[:,i]).unsqueeze(1)),dim=1) 

 else: 

 xc = low_s

Looking forward to your clarification. Thanks.

can you told me which code can generate figure 1 in paper?

Positional Embedding

Thanks for your great work!

I have a few questions about the modification in DeiT_3.

Why do you remove the positional embedding for the cls token?
Do you simply omit the dist token and the positional embeddings for both tokens when transferring weights from DeiT?

Code release

Hi authors,
Looks like a very interesting paper, I was wondering if there are any specific plans for code release?

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	1.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Some question about ViTKD

Hi, thanks for sharing your great work!
I have some question about your work:

where are you get your deit3-base model? Offical model is 85.7 top1 accuracy in ImageNet-1K, paper deit3-base model is 85.48, in addition, official model state_dict is not same with your defined deit3 model state_dict, so you had modified it?
I had used vit-base model from mmcls with 85.43 top1 accuracy distill deit-small from scratch. only got 80.04 top1 accuracy, witch below baseline 80.69, deit3-base model struct is same with vit-base, I'm confused why got this result?

hoping for your reply
thank you.

Experimental results reproducibility issues

Thank you for your contribution, I am very interested in your work. I attempted to reproduce your experimental results using the command provided:

python tools/train.py configs/distillers/cifar100/res18_sd_cifar.py

However, I obtained an experimental result of 77.9, which falls short of the baseline in Table 2 of your paper. Have I possibly overlooked any crucial details? I'd appreciate your guidance. Thank you.

Training DeiT + NKD with ViTKD

I appreciate your excellent work and available code!

I saw the configuration file of DeiT + NKD and figured out that "ViTKD = True".

However, I couldn't find any sentence about the usage of ViTKD in your ICCV paper.
So I just wonder the numbers that you wrote in the supplements denote "ViTKD + NKD" setting.

在CIfar100上复现resnet18, USKD方法的问题

我尝试用您提供的代码对cifar100上的USKD方法进行复现，但是复现的结果只有77.91，远不及文章中的79.90。
运行代码为：python tools/train.py configs/distillers/cifar100/res18_sd_cifar.py。
我确定在运行过程中从未对代码进行修改。
如果可以的话，能否提供您当初训练USKD得到的模型权重以及log日志文件。
在此万分感谢！！

I don't understand where "Normalized" is reflected or special in the NKD code?

Commonly, the original KD loss normalizes the student and teacher logit to class probability before calculating the KL divergence, such as
ori_kd = F.kl_div(F.log_softmax(logit_s), F.softmax(logit_t)) * (self.t**2 ).
In addition, in the DKD code, the Non-target Knowledge Distillation also uses softmax to normalize the non-target logit_s and logit_t:
pred_teacher_part2 = F.softmax(logits_t / temperature - 1000.0 * gt_mask, dim=1)
log_pred_student_part2 = F.log_softmax(logits_s / temperature - 1000.0 * gt_mask, dim=1)
nckd_loss = (F.kl_div(log_pred_student_part2, pred_teacher_part2)* (temperature**2))

So I don't understand where the NKD code is different? Or what I found in mmcls/models/dis_losses/nkd.py is not the corresponding code? Thanks.

Dataset question

First of all, this is a good job. As a beginner, I want to use the CIFAR-100 dataset to run this code. Where can I modify it, because the documentation only provides the usage of the ImageNet dataset ( Maybe I missed it), I hope you can help me and clear my doubts, thanks

Distillation 文件夹在哪呢?

这个链接 https://github.com/yzd-v/cls_KD/blob/1.0/mmcls/distillation/ 是空的

Can you provide the detection code？

Can you provide the detection code? Thanks！

About the ratio between Lori and Lsoft

Hi, thanks for sharing the great work.
I am trying to train SwinTransformerTiny based on tf_NKD. I wonder if you could share the ratio between Lori and Lsoft during your SwinTransformerTiny trainning procedure on ImageNet, is it fixed 1:1 or is it varies during trainning ?

Questions on masked area

Hi Zhendong,
In ViTKD, we only distill the knowledge from unmasked area, while full area in MGD.
My questions are:

Why ViTKD only distill the knowledge only from unmasked area
What is the difference and relationship between unmasked and masked area in distillation.

nkd.py line42-line46

hello
in your paper,T_t is target class probability,but in your code, T_t is always 0.

Implementation of non-target mask

Hi, thanks for your work.

In the implementation of the non-target mask for the NKD loss:

mask = torch.ones_like(logit_s).scatter_(1, label, 1).bool()

Shouldn't it be mask = torch.ones_like(logit_s).scatter_(1, label, 0).bool() instead?

	'''ViTKD: Mimicking'''
	if self.align2 is not None:
	for i in range(2):
	if i == 0:
	xc = self.align2[i](low_s[:,i]).unsqueeze(1)
	else:
	xc = torch.cat((xc, self.align2[i](low_s[:,i]).unsqueeze(1)),dim=1)
	else:
	xc = low_s