bigballon / cifar-10-cnn Goto Github PK

View Code? Open in Web Editor NEW

814.0 30.0 293.0 1.53 MB

Play deep learning with CIFAR datasets

Home Page: https://bigballon.github.io/cifar-10-cnn/

License: MIT License

Python 97.53% Shell 2.47%

deep-learning cifar10 tensorflow keras convolutional-neural-networks cifar-100 residual-networks densenet learning-rate

cifar-10-cnn's Introduction

Convolutional Neural Networks for CIFAR-10

This repository is about some implementations of CNN Architecture for cifar10.

I just use Keras and Tensorflow to implementate all of these CNN models.
~~(maybe torch/pytorch version if I have time)~~
A pytorch version is available at CIFAR-ZOO

Requirements

Python (3.5)
keras (>= 2.1.5)
tensorflow-gpu (>= 1.4.1)

Architectures and papers

The first CNN model: LeNet
- LeNet-5 - Yann LeCun
Network in Network
- Network In Network
Vgg19 Network
- Very Deep Convolutional Networks for Large-Scale Image Recognition
- The 1st places in ILSVRC 2014 localization tasks
- The 2nd places in ILSVRC 2014 classification tasks
Residual Network
- Deep Residual Learning for Image Recognition
- Identity Mappings in Deep Residual Networks
- CVPR 2016 Best Paper Award
- 1st places in all five main tracks:
  - ILSVRC 2015 Classification: "Ultra-deep" 152-layer nets
  - ILSVRC 2015 Detection: 16% better than 2nd
  - ILSVRC 2015 Localization: 27% better than 2nd
  - COCO Detection: 11% better than 2nd
  - COCO Segmentation: 12% better than 2nd
Wide Residual Network
- Wide Residual Networks
ResNeXt
- Aggregated Residual Transformations for Deep Neural Networks
- Used in Mask-RCNN
DenseNet
- Densely Connected Convolutional Networks
- CVPR 2017 Best Paper Award
SENet
- Squeeze-and-Excitation Networks
- The 1st places in ILSVRC 2017 classification tasks

Documents & tutorials

There are also some documents and tutorials in doc & issues/3.
Get it if you need.
You can also see the articles if you can speak Chinese.

Accuracy of all my implementations

In particular：
Change the batch size according to your GPU's memory.
Modify the learning rate schedule may imporve the results of accuracy!

network	GPU	params	batch size	epoch	training time	accuracy(%)
Lecun-Network	GTX1080TI	62k	128	200	30 min	76.23
Network-in-Network	GTX1080TI	0.97M	128	200	1 h 40 min	91.63
Vgg19-Network	GTX1080TI	39M	128	200	1 h 53 min	93.53
Residual-Network20	GTX1080TI	0.27M	128	200	44 min	91.82
Residual-Network32	GTX1080TI	0.47M	128	200	1 h 7 min	92.68
Residual-Network110	GTX1080TI	1.7M	128	200	3 h 38 min	93.93
Wide-resnet 16x8	GTX1080TI	11.3M	128	200	4 h 55 min	95.13
Wide-resnet 28x10	GTX1080TI	36.5M	128	200	10 h 22 min	95.78
DenseNet-100x12	GTX1080TI	0.85M	64	250	17 h 20 min	94.91
DenseNet-100x24	GTX1080TI	3.3M	64	250	22 h 27 min	95.30
DenseNet-160x24	1080 x 2	7.5M	64	250	50 h 20 min	95.90
ResNeXt-4x64d	GTX1080TI	20M	120	250	21 h 3 min	95.19
SENet(ResNeXt-4x64d)	GTX1080TI	20M	120	250	21 h 57 min	95.60

About LeNet and CNN training tips/tricks

LeNet is the first CNN network proposed by LeCun.
I used different CNN training tricks to show you how to train your model efficiently.

LeNet_keras.py is the baseline of LeNet,
LeNet_dp_keras.py used the Data Prepossessing [DP],
LeNet_dp_da_keras.py used both DP and the Data Augmentation[DA],
LeNet_dp_da_wd_keras.py used DP, DA and Weight Decay [WD]

network	GPU	DP	DA	WD	training time	accuracy(%)
LeNet_keras	GTX1080TI	-	-	-	5 min	58.48
LeNet_dp_keras	GTX1080TI	√	-	-	5 min	60.41
LeNet_dp_da_keras	GTX1080TI	√	√	-	26 min	75.06
LeNet_dp_da_wd_keras	GTX1080TI	√	√	√	26 min	76.23

For more CNN training tricks, see Must Know Tips/Tricks in Deep Neural Networks (by Xiu-Shen Wei)

About Learning Rate schedule

Different learning rate schedule may get different training/testing accuracy!
See ./htd, and HTD for more details.

About Multiple GPUs Training

Since the latest version of Keras is already supported keras.utils.multi_gpu_model, so you can simply use the following code to train your model with multiple GPUs:

from keras.utils import multi_gpu_model
from keras.applications.resnet50 import ResNet50

model = ResNet50()

# Replicates `model` on 8 GPUs.
parallel_model = multi_gpu_model(model, gpus=8)
parallel_model.compile(loss='categorical_crossentropy',optimizer='adam')

# This `fit` call will be distributed on 8 GPUs.
# Since the batch size is 256, each GPU will process 32 samples.
parallel_model.fit(x, y, epochs=20, batch_size=256)

About ResNeXt & DenseNet

Since I don't have enough machines to train the larger networks, I only trained the smallest network described in the paper. You can see the results in liuzhuang13/DenseNet and prlz77/ResNeXt.pytorch

Please feel free to contact me if you have any questions!

Citation

@misc{bigballon2017cifar10cnn,
  author = {Wei Li},
  title = {cifar-10-cnn: Play deep learning with CIFAR datasets},
  howpublished = {\url{https://github.com/BIGBALLON/cifar-10-cnn}},
  year = {2017}
}

cifar-10-cnn's People

Contributors

Stargazers

Watchers

Forkers

zjujeremy jiangguanying davidnewgate hughleo21 hjnhjn123 ladyzhu slidelucask hughhugo george0049 jackiefangtheborg liumenglife guanbin1994 peakerlee2016 officea1t a382695908 tutty427 muyunzhe leezqcst sjtu-cz weizy1981 sriharsha0806 ustcpcs songzhaozhe taichu012 mryeshuai qchen2017 yaohuatj hanhanlixianji hbcbh1999 xinyuecai2016 zanjs yearfreeze weitian8280 avbuffer cowboy-lee gisdeveloper2017 sola303 cloudsurging zhukkang somsirsa woshiwanglitao zhudfly tonykuo222 grsgth wanke15 zz198808 ezerlc doitdodo alexanderluo tilidaimon zslomo yy1252450987 atrybyme braveheart3118 caoyue19930616 haloguo moongss zhongminjin longchuan1985 erickcai liuheng0111 pablitocho zhang405744522 xiaolinhan grgdh jennifermq youxiang-wang jercas saurabhsh910 zkwalt elffer dago0 ostoe wangshicr7 lucgiffon juventi 123fengye741 gueddouchaouki yangshiming abbynini huoqiang1993 safesender2012 xieclgithub gegetang healingl xalexchen grseb9s xdlequ qiuzixiang queenjuliazxx shubhampachori12110095 cooper111 izhangh mouxiaofeng1981 lile-tju xuex cvtower junrongli as472780551 jianweilin

cifar-10-cnn's Issues

Something wrong in the SENET

There are someting wrong happened in the SENET when I copy the code and train it in my cpu. In the first Epoch 1/250, the begaining the loss is decrease and the accurancy increaing, yet, when the iterations came to 464/781, the loss became to "nan', I don't know what happened.
the config:
cardinality = 4
batch_size = 64
iterations = 781

Please help me.

Would you mind uploading pre-trained models?

in densenet, after "transition layer", the variable "nchannels" is not updated.

suppose x has 24 channels, nblocks=5, nchannels=12, then

    x, nchannels = dense_block(x, nblocks, nchannels) # nchannels = 24+12*5 = 84, ok
    x = transition(x, nchannels) # x's channels is 84/2=42, ok
    x, nchannels = dense_block(x, nblocks, nchannels) #nchannels = 84+12*5 = 144, not ok. should be 42+12*5= 102
    x = transition(x, nchannels) #x's channels is 144/2=72, not ok? should be 102/2=51

About DenseNet_keras.py

I have read the DenseNet pater, I think the last TransitionLayer is not needed in your implementation.

See the paper, the TransitionLayer is 1 less then the number of blocks, because the origin paper has 4 blocks, so the number of additional layer is
FirstLayer(1) + TransitionLayer(3) + LastLayer(1) = 5

In the implementation, it has 3 dense block, so the number of additional layer is
FirstLayer(1) + TransitionLayer(2) + LastLayer(1) = 4

What do you think? thank you!

i canot get your result

i get accuracy is 10%, if i just modify the x_Train by the way:
mean = [125.307, 122.95, 113.865]
std = [62.9932, 62.0087, 66.7048]
for i in range(3):
x_train[:,:,i] = ( x_train[:,:,i] - mean[i])/std[i]
x_test[:,:,i] = (x_test[:,:, i] - mean[i])/std[i]
but i get accuracy 52%,if i modify the x_Train by the way:
x_train /= 255
x_test /= 255

i donnot know why i cannot get the same result with you?
please help.thx.

my code is :
import keras
from keras import optimizers
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Conv2D,Dense, Flatten, MaxPooling2D
from keras.callbacks import LearningRateScheduler, TensorBoard

batch_size = 128
epochs = 10
iteration = 391
num_classes = 10
log_filepath = './lenet'

##kernel_initializer:?????
def build_model():
model = Sequential()
model.add(Conv2D(6, (5,5), padding = 'valid', activation = 'relu', kernel_initializer = 'he_normal', input_shape = (32, 32, 3)))
model.add(MaxPooling2D((2,2),strides = (2,2)))
model.add(Conv2D(16, (5,5), padding = 'valid', activation = 'relu', kernel_initializer = 'he_normal'))
model.add(MaxPooling2D((2,2), strides = (2,2)))
model.add(Flatten())
model.add(Dense(120, activation = 'relu', kernel_initializer = 'he_normal'))
model.add(Dense(84, activation = 'relu', kernel_initializer = 'he_normal'))
model.add(Dense(num_classes, activation = 'softmax', kernel_initializer = 'he_normal'))

sgd = optimizers.SGD(lr = 0.1, momentum = 0.9, nesterov = True)
model.compile(loss = 'categorical_crossentropy', optimizer = sgd, metrics = ['accuracy'])

return model

def scheduler(epoch):
learning_rate_init = 0.02
if epoch >= 80:
learning_rate_init = 0.01
if epoch >= 150:
learning_rate_init = 0.004
return learning_rate_init

if name == 'main':
(x_train, y_train), (x_test, y_test) = cifar10.load_data() ## values ???
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# x_train /= 255
# x_test /= 255
mean = [125.307, 122.95, 113.865]
std = [62.9932, 62.0087, 66.7048]
for i in range(3):
	x_train[:,:,i] = ( x_train[:,:,i] - mean[i])/std[i]
	x_test[:,:,i] = (x_test[:,:, i] - mean[i])/std[i]
model = build_model()
print(model.summary())

tb_cb = TensorBoard(log_dir = log_filepath, histogram_freq = 0)
change_lr = LearningRateScheduler(scheduler)
cbks = [tb_cb, change_lr]

model.fit(x_train, y_train, batch_size = batch_size, epochs = epochs, callbacks = cbks, validation_data = (x_test, y_test), shuffle = True)

model.save('lenet.h5')

Unknown layer:layers

keras.models.load_model('vgg19_cifar10.h5')
but have an error:
D:\Users\Python\python3.exe D:/pythonwork/Adversarial/Cifar10_keras_Adversarial/MisClassification.py
Using TensorFlow backend.
Traceback (most recent call last):
File "D:/pythonwork/Adversarial/Cifar10_keras_Adversarial/MisClassification.py", line 100, in
main()
File "D:/pythonwork/Adversarial/Cifar10_keras_Adversarial/MisClassification.py", line 21, in main
kmodel = load_model(weight_path)
File "D:\Users\Python\lib\site-packages\keras\engine\saving.py", line 261, in load_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "D:\Users\Python\lib\site-packages\keras\engine\saving.py", line 335, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "D:\Users\Python\lib\site-packages\keras\layers_init_.py", line 55, in deserialize
printable_module_name='layer')
File "D:\Users\Python\lib\site-packages\keras\utils\generic_utils.py", line 145, in deserialize_keras_object
list(custom_objects.items())))
File "D:\Users\Python\lib\site-packages\keras\engine\sequential.py", line 292, in from_config
custom_objects=custom_objects)
File "D:\Users\Python\lib\site-packages\keras\layers_init_.py", line 55, in deserialize
printable_module_name='layer')
File "D:\Users\Python\lib\site-packages\keras\utils\generic_utils.py", line 165, in deserialize_keras_object
':' + function_name)
ValueError: Unknown layer:layers

not convergence

Network_in_Network_keras.py
Trainning accuracy is always around 1.0%.

error in SENet_Keras.py

Hi Wei Li. Thanks for sharing this great code! I learn a lot from it.
I found two error in SENet_Keras.py:
1, in line 77, you use y = add_common_layer(y). add_common_layer include BN and relu. from the office code https://github.com/hujie-frank/SENet. We can see at the end of resnet block, it is just BN. And you use a relu before the se-block. It's not intuitive. So in line 77. y = BatchNormalization(momentum=0.9, epsilon=1e-5)(y).
2, in line 48, you define the global variable inplanes. But you change the value of inplances in residual_layer. So you should define the global variable inplanes in residual_layer(line 89). You can visualize the current model structure. In each shortcut branch,there is a conv+bn.
Thanks again for sharing！

Could I output specified layer using the model defined in 4_Residual_Network/ResNet_keras.py?

How to make the dropout layer disappear in testing process in this code?

NIN训练问题

你好，对于NIN模型训练的时候，我第一次采用你的初始化方法，第二次采用he_normal。不过训练的结果还是差强人意，达不到lenet水平。请问一下是什么原因呢？而且我看到你并没有用mlpconv而是conv+relu+bn堆栈。

wrong num of parameters in readme

I found that both resent-20 and 110 have the same number of parameters 0.27M. Is there something wrong with that?

Residual-Network20 | GTX1080TI | 0.27M | 128 | 200 | 44 min | 91.82
Residual-Network32 | GTX1080TI | 0.47M | 128 | 200 | 1 h 7 min | 92.68
Residual-Network50 | GTX1080TI | 1.7M | 128 | 200 | 1 h 42 min | 93.18
Residual-Network110 | GTX1080TI | 0.27M | 128 | 200 | 3 h 38 min | 93.93

Thanks

Thanks, Wei Li. It helps me a lot!

Memory leak during resnet.fit_generator?

some documents

CNN_tutorial.pdf
Convolutional Neural Network Architectures.pdf
Tensorflow Tutorial.pdf
learning_rate.pptx

my result is wrong, can you help me?

I just follow your code to do,but my model is not correct,why?
have any where i want to do or ?

i use your retrain.h5 to prediction like this:

model = VGG19(weights=None)
filepath1 = os.path.abspath('retrain.h5')
model.load_weights(filepath=filepath1, by_name=True)

this is my result:
Please input picture file to predict ( input Q to exit ): test_pic/tiger.jpeg
Predicted: [('n04200800', 'shoe_shop', 0.0059397803), ('n04462240', 'toyshop', 0.0048586507), ('n02640242', 'sturgeon', 0.0048460886), ('n12985857', 'coral_fungus', 0.0044603818), ('n03063689', 'coffeepot', 0.0042976518)]

i don't know why...

import missing in Network_in_Network

I found that code Network_in_Network_bn_keras.py and Network_in_Network_keras.py both missing import with from keras import backend as K

about learning_rate

Can the callback function pass the learning rate to the optimizer continuously during the iteration? The learning rate is already available when the optimizer is set.