yzd-v / cls_kd Goto Github PK
View Code? Open in Web Editor NEW'NKD and USKD' (ICCV 2023) and 'ViTKD' (CVPRW 2024)
License: Apache License 2.0
'NKD and USKD' (ICCV 2023) and 'ViTKD' (CVPRW 2024)
License: Apache License 2.0
When I use my own dataset for distillation training, loss is always nan, what is the reason?
Hi, Thanks for your great work.
May I ask if this part of the code is consistent with chapter 3.1? I am quite confused referring to the paper.
cls_KD/mmcls/distillation/losses/vitkd.py
Lines 58 to 66 in 7d838f6
Looking forward to your clarification. Thanks.
Thanks for your great work!
I have a few questions about the modification in DeiT_3.
Hi authors,
Looks like a very interesting paper, I was wondering if there are any specific plans for code release?
I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.
Here are the OpenMMLab 2.0 repos branches:
OpenMMLab 1.0 branch | OpenMMLab 2.0 branch | |
---|---|---|
MMEngine | 0.x | |
MMCV | 1.x | 2.x |
MMDetection | 0.x 、1.x、2.x | 3.x |
MMAction2 | 0.x | 1.x |
MMClassification | 0.x | 1.x |
MMSegmentation | 0.x | 1.x |
MMDetection3D | 0.x | 1.x |
MMEditing | 0.x | 1.x |
MMPose | 0.x | 1.x |
MMDeploy | 0.x | 1.x |
MMTracking | 0.x | 1.x |
MMOCR | 0.x | 1.x |
MMRazor | 0.x | 1.x |
MMSelfSup | 0.x | 1.x |
MMRotate | 1.x | 1.x |
MMYOLO | 0.x |
Attention: please create a new virtual environment for OpenMMLab 2.0.
Hi, thanks for sharing your great work!
I have some question about your work:
hoping for your reply
thank you.
Thank you for your contribution, I am very interested in your work. I attempted to reproduce your experimental results using the command provided:
python tools/train.py configs/distillers/cifar100/res18_sd_cifar.py
However, I obtained an experimental result of 77.9, which falls short of the baseline in Table 2 of your paper. Have I possibly overlooked any crucial details? I'd appreciate your guidance. Thank you.
I appreciate your excellent work and available code!
I saw the configuration file of DeiT + NKD and figured out that "ViTKD = True".
However, I couldn't find any sentence about the usage of ViTKD in your ICCV paper.
So I just wonder the numbers that you wrote in the supplements denote "ViTKD + NKD" setting.
我尝试用您提供的代码对cifar100上的USKD方法进行复现,但是复现的结果只有77.91,远不及文章中的79.90。
运行代码为:python tools/train.py configs/distillers/cifar100/res18_sd_cifar.py。
我确定在运行过程中从未对代码进行修改。
如果可以的话,能否提供您当初训练USKD得到的模型权重以及log日志文件。
在此万分感谢!!
Commonly, the original KD loss normalizes the student and teacher logit to class probability before calculating the KL divergence, such as
ori_kd = F.kl_div(F.log_softmax(logit_s), F.softmax(logit_t)) * (self.t**2 )
.
In addition, in the DKD code, the Non-target Knowledge Distillation also uses softmax to normalize the non-target logit_s
and logit_t
:
pred_teacher_part2 = F.softmax(logits_t / temperature - 1000.0 * gt_mask, dim=1)
log_pred_student_part2 = F.log_softmax(logits_s / temperature - 1000.0 * gt_mask, dim=1)
nckd_loss = (F.kl_div(log_pred_student_part2, pred_teacher_part2)* (temperature**2))
So I don't understand where the NKD code is different? Or what I found in mmcls/models/dis_losses/nkd.py
is not the corresponding code? Thanks.
First of all, this is a good job. As a beginner, I want to use the CIFAR-100 dataset to run this code. Where can I modify it, because the documentation only provides the usage of the ImageNet dataset ( Maybe I missed it), I hope you can help me and clear my doubts, thanks
Can you provide the detection code? Thanks!
Hi, thanks for sharing the great work.
I am trying to train SwinTransformerTiny based on tf_NKD. I wonder if you could share the ratio between Lori and Lsoft during your SwinTransformerTiny trainning procedure on ImageNet, is it fixed 1:1 or is it varies during trainning ?
Hi Zhendong,
In ViTKD, we only distill the knowledge from unmasked area, while full area in MGD.
My questions are:
hello
in your paper,T_t is target class probability,but in your code, T_t is always 0.
Hi, thanks for your work.
In the implementation of the non-target mask for the NKD loss:
mask = torch.ones_like(logit_s).scatter_(1, label, 1).bool()
Shouldn't it be mask = torch.ones_like(logit_s).scatter_(1, label, 0).bool() instead?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.