This repository is about some CNN Architecture's implementations for cifar10.
I just use Keras and Tensorflow to implementate all of these CNN models.
- Python (3.5.2)
- Keras (2.0.8)
- tensorflow-gpu (1.3.0)
- The first CNN model: LeNet
- Network in Network
- Vgg19 Network
- Residual Network
- Wide Residual Network
- ResNeXt
- DenseNet
- SENet
network | dropout | preprocess | GPU | params | training time | accuracy(%) |
---|---|---|---|---|---|---|
Lecun-Network | - | meanstd | GTX980TI | 62k | 30 min | 76.27 |
Network-in-Network | 0.5 | meanstd | GTX1060 | 0.96M | 1 h 30 min | 91.25 |
Network-in-Network_bn | 0.5 | meanstd | GTX980TI | 0.97M | 2 h 20 min | 91.75 |
Vgg19-Network | 0.5 | meanstd | GTX980TI | 45M | 4 hours | 93.53 |
Residual-Network50 | - | meanstd | GTX980TI | 1.7M | 8 h 58 min | 94.10 |
Wide-resnet 16x8 | - | meanstd | GTX1060 | 11.3M | 11 h 32 min | 95.14 |
DenseNet-100x12 | - | meanstd | GTX980TI | 0.85M | 30 h 40 min | 95.15 |
ResNeXt-4x64d | - | meanstd | GTX1080TI | 20M | 22 h 50 min | 95.51 |
SENet(ResNeXt-4x64d) | - | meanstd | GTX1080 | 20M | - | - |
Now, I fixed some bugs and used 1080TI to retrain all of the following models.
In particular๏ผ
Change the batch size according to your GPU's memory.
Modify the learning rate schedule may imporve the results of accuracy!
network | GPU | params | batch size | epoch | training time | accuracy(%) |
---|---|---|---|---|---|---|
Lecun-Network | GTX1080TI | 62k | 128 | 200 | 30 min | 76.25 |
Network-in-Network | GTX1080TI | 0.97M | 128 | 200 | 1 h 40 min | 91.63 |
Vgg19-Network | GTX1080TI | 45M | 128 | 200 | 2 h 17 min | 93.40 |
Residual-Network50 | GTX1080TI | 1.7M | 128 | 200 | 4 h 29 min | 94.44 |
Wide-resnet 16x8 | GTX1080TI | 11.3M | 128 | 200 | 5 h 1 min | 95.13 |
DenseNet-100x12 | GTX1080TI | 0.85M | 64 | 250 | 19 h 2 min | 94.91 |
ResNeXt-4x64d | GTX1080TI | 20M | 120 | 250 | 21 h 3 min | 95.19 |
SENet(ResNeXt-4x64d) | GTX1080TI | 20M | 120 | 250 | 21 h 57 min | 95.60 |
Because I don't have enough machines to train the larger networks.
So I only trained the smallest network described in the paper.
You can see the results in liuzhuang13/DenseNet and prlz77/ResNeXt.pytorch