Our project dataset mainly came from ECUSTFD[1], which is a free public food image dataset. This dataset has 19 types of food as shown in the figure. The number of food images is 2978. The number of images in each class.
'apple': 322, 'banana': 212, 'bread': 66, 'bun': 90 , 'doughnut': 210, 'egg': 104, 'fired_dough_twist': 124, 'grape': 58, 'lemon': 185, 'litchi': 78, 'mango': 250, 'mooncake': 134, 'orange': 281, 'peach': 126, 'pear': 182, 'plum': 176, 'qiwi': 137, 'sachima': 150, 'tomato': 201.
The first step is data pre-process, include transforming "xml" annotation file into "csv" file. Then I split data into training and validation set. The ratio is 7:1.
We employed Keras to implement Faster RCNN. For Faster RCNN, it use Region Proposal Network(RPN) to generate the prediction box. specifically, RPN uses CNN to extract a feature map(5139256). Then each point at the feature map is responsible for the screening of 9 boxes with different size in the original image. The goal for screening is to check whether there is an object or not in the box. All these points are called 'anchors'. You can adjust the size of the box. After RPN generate the box, we use ResNet to classify all these boxes.
In this project, we achieve 82% mean accuracy for all 19 kinds of food. According to the confusion matrix, we can konw that the result is balanced.
The test result show as below.
h5py
Keras==2.0.3
numpy
opencv-python
sklearn