Giter Club home page Giter Club logo

Comments (17)

xht033 avatar xht033 commented on July 21, 2024

me too. A huge gap between my experiment and the author's report.

from knowledge-distillation-pytorch.

ChuangbinC avatar ChuangbinC commented on July 21, 2024

@MrLinNing Do you get the experiment result closed to the author's result?

from knowledge-distillation-pytorch.

michaelklachko avatar michaelklachko commented on July 21, 2024

I just ran an experiment on CIFAR-10, with the student being a simple LeNet-5 like network (64C - MP - 128C - MP - 400FC-10), and the teacher is a deeper version (128C-128C-MP-128C-128C-MP-128C-128C-512FC-10).

The teacher gets to ~93% accuracy, the student without KL is ~86.5%. With KL, the student gets to 87.5% consistently.

I didn't use this repo code, only copied the KL loss function to my code.

from knowledge-distillation-pytorch.

xiaowenmasfather avatar xiaowenmasfather commented on July 21, 2024

I found a 10% gap too, 84% nowhere near the expected 94.788% . Student net: Resnet-18, Teacher net: Resnext29. Parameters are the same with @peterliht 's original settings.

from knowledge-distillation-pytorch.

wnma3mz avatar wnma3mz commented on July 21, 2024

I also got similar results. train_set: 84.914%, test_set: 83.89%
The teacher model is derived from the author's pretrained_teacher_models.zip\pretrained_teacher_models\base_resnext29\.
The accuracy of testing this model is: train_set: 100%, test_set: 96.23%.
Other parameters are consistent with the author.

from knowledge-distillation-pytorch.

haitongli avatar haitongli commented on July 21, 2024

Looking through another thread of issue discussions on the data loader, the accuracy inconsistency might be due to the way how student and teacher models got their data when we used shuffling.

from knowledge-distillation-pytorch.

haitongli avatar haitongli commented on July 21, 2024

Has anyone used pytorch 0.3 to run and test?

from knowledge-distillation-pytorch.

michaelklachko avatar michaelklachko commented on July 21, 2024

@peterliht why would you want to use Pytorch 0.3? The current stable version is 1.2.

@wnma3mz @xiaowenmasfather Resnet-18 should get to 94.0% without any teachers. If that's not the case, then you're doing something wrong.

from knowledge-distillation-pytorch.

haitongli avatar haitongli commented on July 21, 2024

@peterliht why would you want to use Pytorch 0.3? The current stable version is 1.2.

@wnma3mz @xiaowenmasfather Resnet-18 should get to 94.0% without any teachers. If that's not the case, then you're doing something wrong.

I understand there is newer (and more stable) version of pytorch available. I just wanted to understand if people have seen different results across different pytorch versions. When first creating this repo 2 years ago, as specified in requirements.txt, v0.3 was used. I want to get a better understanding of issues that have prevented people from reproducing results and see if fixes can be done along with the most stable pytorch version.

from knowledge-distillation-pytorch.

wnma3mz avatar wnma3mz commented on July 21, 2024

Hi
@michaelklachko
You‘re right. Resnet-18 with the author's hyperparameters can indeed reach 94%. So my doubt is, where is the problem? Has anyone encountered the same problem and helped me?

@peterliht
Thanks for your suggestion, I will try it on version 0.3 later.

from knowledge-distillation-pytorch.

haitongli avatar haitongli commented on July 21, 2024

@wnma3mz another thread might also be worth looking into @ #9 and also @ #4

from knowledge-distillation-pytorch.

wnma3mz avatar wnma3mz commented on July 21, 2024

@peterliht
Thank you for your prompt reply. I have already seen this issue, I have changed code according to this comment to ensure the correctness of the distillation.

from knowledge-distillation-pytorch.

forjiuzhou avatar forjiuzhou commented on July 21, 2024

@wnma3mz another thread might also be worth looking into @ #9 and also @ #4

I compare the max index of teacher's output with label, these two disagree with each other. I have commit a request to fix this issue.

from knowledge-distillation-pytorch.

conditionWang avatar conditionWang commented on July 21, 2024

I met the same problem of accuracy gap. I have tried adjusting the learning rate to a small one and observed an improvement, making my results close to those of Peterliht. You can try changing the learning rate and running the code again.

from knowledge-distillation-pytorch.

tianli avatar tianli commented on July 21, 2024

@wnma3mz another thread might also be worth looking into @ #9 and also @ #4

I compare the max index of teacher's output with label, these two disagree with each other. I have commit a request to fix this issue.

Your request (#17) fix the problem and I am getting much improved result. I wonder why it is not merged into the master yet!

from knowledge-distillation-pytorch.

tianli avatar tianli commented on July 21, 2024

FYI, with the pull request #17, I was able to get accuracy 95.19% on reset18 with the resnext29 teacher.

from knowledge-distillation-pytorch.

haitongli avatar haitongli commented on July 21, 2024

Thanks for all the discussions and the reminder from @tianli about the pull request. I haven't been able to keep track of this repo for a while. #17 has been merged.

from knowledge-distillation-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.