Comments (8)
Hi, I went through my code and logs for the STL-10 experiments and found two things:
-
In the paper I stated the patch size used for STL-10 to be 24 for the no data augmentation case and 32 with data augmentation. On looking at my logs, it appears that the values used were actually 48 and 60, respectively.
-
It appears that I accidentally used the normalization parameters from CIFAR-10 for normalizing STL-10 instead of calculating the new mean and std. While the CIFAR-10 values are pretty close to what you used, it could still have caused some non-negligible change in model performance, especially considering the small train set size of STL-10. So that being said, these test results should not be compared to other STL-10 results that normalize the dataset properly. You could try substituting in the CIFAR-10 normalization values to your pipeline to see if it increases the score at all, since that may be what is causing the difference.
Let me know if those changes allow you to reproduce the results, otherwise we can look into it further.
from cutout.
I find the reason is I use FP16(even though BN are float32) rather than FP32. When I use float32, final error rate is about 12+. But I have no idea why the gap is so big. Since I also use FP16 on cifar10 experiment, the result is the same as you post.
from cutout.
Okay, cool.
I've tried messing around with the FP16 in PyTorch before, but it seems very finicky whenever using it with batchnorm. Strange that it works for CIFAR-10 but not STL-10.
from cutout.
Hi @TDeVries and @xyzacademic, I'm trying to reproduce the result for STL10 with no data augmentation nor cutout. I adapt the setting described in the paper with the changes mentioned above. However, I cannot get the errors 23.48% ± 0.68% presented in the paper. Instead, I got errors about 30%. The test errors get stuck to 30% after epoch 400 with perturbation within 1%, while training errors stays less than 0.1% and xentropy < 0.01. Could you help me on reproducing this, or possibly upload your code? More specifically, I set parameters as follows.
image size = 48, normalization mean = [0.44671097, 0.4398105 , 0.4066468], std = [0.2603405 , 0.25657743, 0.27126738], wide resnet depth = 16, widenn factor = 8, dropRate=0.3, inital learning rate = 0.1, momentum=0.9, weight_decay=5e-4, , number of epochs = 1000, date type = FP32, learning rate scheduler = MultiStepLR(cnn_optimizer, milestones=[300, 400, 600, 800], gamma=0.2)
from cutout.
Could it be image size? You said you are using 48x48 resolution. For the results in the paper I used the original image size of 96x96.
from cutout.
@TDeVries Thanks! I'll give it a try. Just to confirm that with the image size changed (from 32 to 96 compare to the published code), do you keep nChannels and avg_pool kernel size unchanged, but input dimension of the fully connected layer increased by a factor of 3*3 = 9? Or you increase the avg_pool kernel size from 8 to 24?
from cutout.
nChannels is unchanged. I think the main differences are that I changed the stride in block1 from 1 to 2, and the avg_pool kernel size from 8 to 12.
self.block1 = NetworkBlock(n, nChannels[0], nChannels[1], block, 2, dropRate)
...
out = F.avg_pool2d(out, 12)
Hopefully that should give the correct output size.
Another thing you could try to improve results is to increase the dropout probability from 0.3 to 0.5. I'm not sure how much of an effect it has though.
from cutout.
Thanks for your advice. FYI: I am actually trying to test my hyperparameter optimization algorithm on this problem :)
from cutout.
Related Issues (19)
- About the test accuracy of cifar 100 on ResNet-18 without cutout HOT 5
- Was additional data augmented during testing? Such as Flip? HOT 1
- Why does resnet18 in the report achieve such high accuracy? HOT 1
- Questions about the parameters of cutout in object detection? HOT 1
- Extending Cutout with a probability of when it is applied HOT 3
- Which version of SVHN is used to train? HOT 1
- problem in getting same output size as batch size from forward function
- Bug report same images are multiply generated when using multiple workers.
- Is other torch version suitable for this code? I can't install torch==0.4.0
- CutOut in Albumentations HOT 5
- About Adam and AdaBelief
- I really like your idea, I think this is a very useful idea.
- What is the cutout region for Imagenet HOT 1
- Could you add a LICENSE.md? HOT 2
- Need Help Getting Cutout to work Properly HOT 2
- hi,I have a question about your paper?Could you give me some tips? HOT 2
- TypeError: 'tuple' object is not callable HOT 4
- nb_layers should be int HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cutout.