I suggest both training loss function without KD and with KD should add a softmax func

As <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

An issue on loss function about knowledge-distillation-pytorch HOT 4 OPEN

lhyfst commented on June 23, 2024

An issue on loss function

from knowledge-distillation-pytorch.

Comments (4)

erichhhhho commented on June 23, 2024 5

I was wondering if the multiplication of T square is really helpful? Because if T=20, the soft loss will dominate the total loss. And there is no need to add extra softmax for the hard target as it is already embedded in nn.functional.cross_entropy. @lhyfst

from knowledge-distillation-pytorch.

haitongli commented on June 23, 2024

As @erichhhhho pointed out, it's indeed no need to manually add extra softmax. From the reference paper, it looks like T^2 is only required when using BOTH hard/soft targets.

from knowledge-distillation-pytorch.

lhyfst commented on June 23, 2024

Thank you, everybody! So, why does the first part of the KD loss function in distill_mnist.py multiply 2?
https://github.com/peterliht/knowledge-distillation-pytorch/blob/e4c40132fed5a45e39a6ef7a77b15e5d389186f8/mnist/distill_mnist.py#L96-L97

from knowledge-distillation-pytorch.

mashrurmorshed commented on June 23, 2024

Thank you, everybody! So, why does the first part of the KD loss function in distill_mnist.py multiply 2?

As per distiller KD_Loss is effectively the following equation:

α * kl_divergence + β * cross_entropy

And Hinton et al. 2015 originally used a weighted average, i.e. α = 1 - β, but this is not strictly necessary. α and β can also be arbitrary and don't need to sum to 1. In this particular MNIST example, the relationship is α = 2 * (1 - β), maybe they were experimenting with a stronger reliance on kl_div.

from knowledge-distillation-pytorch.

Recommend Projects

An issue on loss function about knowledge-distillation-pytorch HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent