Giter Club home page Giter Club logo

cls_kd's People

Contributors

yzd-v avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cls_kd's Issues

Cifar100 dataset question

First of all, this is an excellent job, I want cifar100 as a dataset to run this code, but I found that the base of cifar100 is provided in configs/distillers, but in the resnet folder, I can't find the corresponding file,I hope you can clear my doubts, thanks
image

loss is always nan

When I use my own dataset for distillation training, loss is always nan, what is the reason?

Question about ViTKD loss code

Hi, Thanks for your great work.

May I ask if this part of the code is consistent with chapter 3.1? I am quite confused referring to the paper.

'''ViTKD: Mimicking'''
if self.align2 is not None:
for i in range(2):
if i == 0:
xc = self.align2[i](low_s[:,i]).unsqueeze(1)
else:
xc = torch.cat((xc, self.align2[i](low_s[:,i]).unsqueeze(1)),dim=1)
else:
xc = low_s

Looking forward to your clarification. Thanks.

Positional Embedding

Thanks for your great work!

I have a few questions about the modification in DeiT_3.

  1. Why do you remove the positional embedding for the cls token?
  2. Do you simply omit the dist token and the positional embeddings for both tokens when transferring weights from DeiT?

Code release

Hi authors,
Looks like a very interesting paper, I was wondering if there are any specific plans for code release?

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Some question about ViTKD

Hi, thanks for sharing your great work!
I have some question about your work:

  1. where are you get your deit3-base model? Offical model is 85.7 top1 accuracy in ImageNet-1K, paper deit3-base model is 85.48, in addition, official model state_dict is not same with your defined deit3 model state_dict, so you had modified it?
  2. I had used vit-base model from mmcls with 85.43 top1 accuracy distill deit-small from scratch. only got 80.04 top1 accuracy, witch below baseline 80.69, deit3-base model struct is same with vit-base, I'm confused why got this result?

hoping for your reply
thank you.

Experimental results reproducibility issues

Thank you for your contribution, I am very interested in your work. I attempted to reproduce your experimental results using the command provided:

python tools/train.py configs/distillers/cifar100/res18_sd_cifar.py

However, I obtained an experimental result of 77.9, which falls short of the baseline in Table 2 of your paper. Have I possibly overlooked any crucial details? I'd appreciate your guidance. Thank you.

Training DeiT + NKD with ViTKD

I appreciate your excellent work and available code!

I saw the configuration file of DeiT + NKD and figured out that "ViTKD = True".

However, I couldn't find any sentence about the usage of ViTKD in your ICCV paper.
So I just wonder the numbers that you wrote in the supplements denote "ViTKD + NKD" setting.

在CIfar100上复现resnet18, USKD方法的问题

我尝试用您提供的代码对cifar100上的USKD方法进行复现,但是复现的结果只有77.91,远不及文章中的79.90。
运行代码为:python tools/train.py configs/distillers/cifar100/res18_sd_cifar.py。
我确定在运行过程中从未对代码进行修改。
如果可以的话,能否提供您当初训练USKD得到的模型权重以及log日志文件。
在此万分感谢!!

I don't understand where "Normalized" is reflected or special in the NKD code?

Commonly, the original KD loss normalizes the student and teacher logit to class probability before calculating the KL divergence, such as
ori_kd = F.kl_div(F.log_softmax(logit_s), F.softmax(logit_t)) * (self.t**2 ).
In addition, in the DKD code, the Non-target Knowledge Distillation also uses softmax to normalize the non-target logit_s and logit_t:
pred_teacher_part2 = F.softmax(logits_t / temperature - 1000.0 * gt_mask, dim=1)
log_pred_student_part2 = F.log_softmax(logits_s / temperature - 1000.0 * gt_mask, dim=1)
nckd_loss = (F.kl_div(log_pred_student_part2, pred_teacher_part2)* (temperature**2))

So I don't understand where the NKD code is different? Or what I found in mmcls/models/dis_losses/nkd.py is not the corresponding code? Thanks.

Dataset question

First of all, this is a good job. As a beginner, I want to use the CIFAR-100 dataset to run this code. Where can I modify it, because the documentation only provides the usage of the ImageNet dataset ( Maybe I missed it), I hope you can help me and clear my doubts, thanks

About the ratio between Lori and Lsoft

Hi, thanks for sharing the great work.
I am trying to train SwinTransformerTiny based on tf_NKD. I wonder if you could share the ratio between Lori and Lsoft during your SwinTransformerTiny trainning procedure on ImageNet, is it fixed 1:1 or is it varies during trainning ?

Questions on masked area

Hi Zhendong,
In ViTKD, we only distill the knowledge from unmasked area, while full area in MGD.
My questions are:

  1. Why ViTKD only distill the knowledge only from unmasked area
  2. What is the difference and relationship between unmasked and masked area in distillation.

nkd.py line42-line46

hello
in your paper,T_t is target class probability,but in your code, T_t is always 0.

Implementation of non-target mask

Hi, thanks for your work.

In the implementation of the non-target mask for the NKD loss:

mask = torch.ones_like(logit_s).scatter_(1, label, 1).bool()

Shouldn't it be mask = torch.ones_like(logit_s).scatter_(1, label, 0).bool() instead?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.