In the code, the dataloader 'shuffle' switch is set to True. So the teacher output

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The train dataloader will be shuffled every epoch, Does it really work? about knowledge-distillation-pytorch HOT 11 OPEN

haitongli commented on July 21, 2024

The train dataloader will be shuffled every epoch, Does it really work?

from knowledge-distillation-pytorch.

Comments (11)

HisiFish commented on July 21, 2024 2

I think the random seed can only make the behavior the same for different runs.
But can not make the behavior the same for different epochs in a certain run.

from knowledge-distillation-pytorch.

erichhhhho commented on July 21, 2024 1

@HisiFish Have you solve this problem? Is it possible to compute the teacher output from the same input?
--Updated--
Actually, it helps increasing the acc by 0.10-0.20%.

from knowledge-distillation-pytorch.

haitongli commented on July 21, 2024

Can you clarify the question a bit more? What is the specific concern?
So the way the student model gets trained follows the same way of the teacher model. For one epoch, the training batches are used to compute KD loss to train the student. Then for another epoch, although dataloader is shuffled, KD loss should be still correct given new batches.

from knowledge-distillation-pytorch.

HisiFish commented on July 21, 2024

For example, If we have a dataset with 20 [image, label] pairs. We set the batch size to 4. So there are 5 iters in each epoch. We mark the origin data series indices 0~19.

In the code, we first fetch teacher outputs in one epoch, maybe the shuffled series indices is [[0,5,6,8],[7,9,2,4],[...],[...],[...]].

Then in kd training, another epoch, we need to caculate kd loss by (student outputs & teacher outputs & the labels). Now in current epoch, the indices may be shuffled to [[1,3,6,9],[10,2,8,7],[...],[...],[...]]. In code train.py:215, we get output_teacher_batch by i which is the new index of iters. While i is 0, the teacher outputs is from data [0,5,6,8] while the student outputs is from data [1,3,6,9].

I don't know whether I have the incorrect understanding. Thanks!

from knowledge-distillation-pytorch.

haitongli commented on July 21, 2024

Sorry I did not fully understand. If you have time & are interested, could you run the test based on your understanding? Right now the KD-trained accuracies are consistently higher than native models, though it's only a bit higher. If your modification works better or makes better sense, feel free to make a pull request. Thanks in advance!

from knowledge-distillation-pytorch.

HisiFish commented on July 21, 2024

OK, I'll do that if I have a conclusion. Thanks.

from knowledge-distillation-pytorch.

haitongli commented on July 21, 2024

Wait I think I get what you were saying. Basically, we need to verify that during training of the student model at each epoch, the batch sequence in the train dataloader stays the same as what was used during training of the teacher model. To that end, I think the PyTorch should be able to take care of that when specifying a random seed for reproducibility?

from knowledge-distillation-pytorch.

HisiFish commented on July 21, 2024

Maybe not.
It's easy to verify. The following is a simple example:

dataloader = ...   # in which the shuffle switch is turned ON.
for i in range(10):
    i = 0
    for img_batch, label_batch in dataloader:
        if i == 0:
            print(label_batch)
        i += 1

By comparing the first batch of 10 epoch, We can see the result.

from knowledge-distillation-pytorch.

akaniklaus commented on July 21, 2024

Do you know what happens when you don't use enumerate but get batches via next(iter(data_loader))?

from knowledge-distillation-pytorch.

luhaifeng19947 commented on July 21, 2024

@HisiFish yes, you are right.
I put teacher_model and student model together. Finally, it works.
eg:

for img_batch, label_batch in dataloader:
      y_student = f_student(img_batch)
      with torch.no_grad():
            y_teacher = f_teacher(img_batch)

refer to: https://github.com/szagoruyko/attention-transfer/blob/master/cifar.py

from knowledge-distillation-pytorch.

haitongli commented on July 21, 2024

@HisiFish yes, you are right.
I put teacher_model and student model together. Finally, it works.
eg:
for img_batch, label_batch in dataloader:
      y_student = f_student(img_batch)
      with torch.no_grad():
            y_teacher = f_teacher(img_batch)
refer to: https://github.com/szagoruyko/attention-transfer/blob/master/cifar.py

Hi @luhaifeng19947, I haven't followed the discussions here for a while. Are you interested in initiating a pull request?

from knowledge-distillation-pytorch.

The train dataloader will be shuffled every epoch, Does it really work? about knowledge-distillation-pytorch HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent