I only modify the global pooling according to STL10's image size. I also follow the im

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Is there any change in Wide ResNet for STL10? about cutout HOT 8 CLOSED

uoguelph-mlrg commented on September 22, 2024

Is there any change in Wide ResNet for STL10?

from cutout.

Comments (8)

TDeVries commented on September 22, 2024

Hi, I went through my code and logs for the STL-10 experiments and found two things:

In the paper I stated the patch size used for STL-10 to be 24 for the no data augmentation case and 32 with data augmentation. On looking at my logs, it appears that the values used were actually 48 and 60, respectively.
It appears that I accidentally used the normalization parameters from CIFAR-10 for normalizing STL-10 instead of calculating the new mean and std. While the CIFAR-10 values are pretty close to what you used, it could still have caused some non-negligible change in model performance, especially considering the small train set size of STL-10. So that being said, these test results should not be compared to other STL-10 results that normalize the dataset properly. You could try substituting in the CIFAR-10 normalization values to your pipeline to see if it increases the score at all, since that may be what is causing the difference.

Let me know if those changes allow you to reproduce the results, otherwise we can look into it further.

from cutout.

xyzacademic commented on September 22, 2024

I find the reason is I use FP16(even though BN are float32) rather than FP32. When I use float32, final error rate is about 12+. But I have no idea why the gap is so big. Since I also use FP16 on cifar10 experiment, the result is the same as you post.

from cutout.

TDeVries commented on September 22, 2024

Okay, cool.

I've tried messing around with the FP16 in PyTorch before, but it seems very finicky whenever using it with batchnorm. Strange that it works for CIFAR-10 but not STL-10.

from cutout.

commented on September 22, 2024

Hi @TDeVries and @xyzacademic, I'm trying to reproduce the result for STL10 with no data augmentation nor cutout. I adapt the setting described in the paper with the changes mentioned above. However, I cannot get the errors 23.48% ± 0.68% presented in the paper. Instead, I got errors about 30%. The test errors get stuck to 30% after epoch 400 with perturbation within 1%, while training errors stays less than 0.1% and xentropy < 0.01. Could you help me on reproducing this, or possibly upload your code? More specifically, I set parameters as follows.

image size = 48, normalization mean = [0.44671097, 0.4398105 , 0.4066468], std = [0.2603405 , 0.25657743, 0.27126738], wide resnet depth = 16, widenn factor = 8, dropRate=0.3, inital learning rate = 0.1, momentum=0.9, weight_decay=5e-4, , number of epochs = 1000, date type = FP32, learning rate scheduler = MultiStepLR(cnn_optimizer, milestones=[300, 400, 600, 800], gamma=0.2)

from cutout.

TDeVries commented on September 22, 2024

Could it be image size? You said you are using 48x48 resolution. For the results in the paper I used the original image size of 96x96.

from cutout.

commented on September 22, 2024

@TDeVries Thanks! I'll give it a try. Just to confirm that with the image size changed (from 32 to 96 compare to the published code), do you keep nChannels and avg_pool kernel size unchanged, but input dimension of the fully connected layer increased by a factor of 3*3 = 9? Or you increase the avg_pool kernel size from 8 to 24?

from cutout.

TDeVries commented on September 22, 2024

nChannels is unchanged. I think the main differences are that I changed the stride in block1 from 1 to 2, and the avg_pool kernel size from 8 to 12.

self.block1 = NetworkBlock(n, nChannels[0], nChannels[1], block, 2, dropRate)
...
out = F.avg_pool2d(out, 12)
Hopefully that should give the correct output size.

Another thing you could try to improve results is to increase the dropout probability from 0.3 to 0.5. I'm not sure how much of an effect it has though.

from cutout.

commented on September 22, 2024

Thanks for your advice. FYI: I am actually trying to test my hyperparameter optimization algorithm on this problem :)

from cutout.

Is there any change in Wide ResNet for STL10? about cutout HOT 8 CLOSED

Comments (8)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent