Simple-CV-Pytorch-master

This code includes detection and classification tasks in Computer Vision, and semantic segmentation task will be added later.

For classification, I reproduced LeNet5, VGG, AlexNet, ResNet(ResNeXt), GoogLeNet,MobileNet, shuffleNet. Then I will reproduce EiffcientNet, etc.
For object detection, I reproduced RetinaNet and SSD (I broke the code up into modules, such as backbone, neck, head,loss,etc.This makes it easier to modify and add code.) Of course, other object detection algorithms(like CenterNet, FCOS, YOLO series, Faster RCNN) will be added later.
For semantic segmentation, I'm going to reproduce FCN, Mask RCNN, DeepLab, UNet later.
Detailed explanation has been published on CSDN and Quora(Chinese) Zhihu.
- CSDN
- Quora(Chinese)Zhihu

In this project, you should create checkpoint(model save), log, results and tenshorboard(loss visualization).

Now, need to be added:

1.object detection
(CenterNet, FCOS, YOLO series, Faster RCNN)

2.semantic segmentation
(FCN, Mask RCNN, DeepLab, UNet)

Compiling environment

Install requirements with pip (you can put requirements.txt to venv/Scripts/, if you need it.)

pip install -r requirements.txt

python == 3.8.5

torch == 1.11.0+cu113

torchaudio == 0.11.0+cu113

torchvision == 0.12.0+cu113

pycocotools == 2.0.4

numpy

Cython

matplotlib

opencv-python  (maybe you want to use skimage or PIL etc...)

skimage

tensorboard

tqdm

Dataset Path

Please, watch FolderOrganization.txt ( There are more details.)

Folder Organization

I use Ubuntu20.04 (OS).

project path: /data/PycharmProject

Simple-CV-master path: /data/PycharmProject/Simple-CV-Pytorch-master
|
|----checkpoints(resnet50-19c8e357.pth or retinanet_resnet50_coco.pth)
|
|----configs----|----detection----|----retinanet_coco.yaml
|                                 |----retinanet_voc.yaml
|                                 |----ssd300_coco.yaml
|                                 |----ssd300_voc.yaml
| 
|----data----|----classification----|----CIAR_labels.txt(cifar.py is null, this is because I just use torchvision.datasets.CIFAR10）
|            |                      |----ImageNet_labels.txt(imagenet.py is null, this is because I just use torchvision.datasets.ImageFolder)
|            |----detection----|----RetinaNet----|----coco.py
|                              |                 |----voc.py
|                              |----SSD----|----coco.py(/data/coco/coco2017/coco_labels.txt)
|                                          |----voc0712.py
|
|                                     |----automobile.png
|              |----classification----|----crash_helmet.png
|              |                      |----photocopier.png
|              |                      |----sunflower.png
|              |----detection----|----000001.jpg
|              |                 |----000001.xml
|              |                 |----000002.jpg
|              |                 |----000002.xml
|              |                 |----000003.jpg
|              |                 |----000003.xml
|----images----|----icon----|----alexnet.png
|                           |----darknet19.png
|                           |----darknet53.png
|                           |----darknettiny.png                   
|                           |----googlenet.png
|                           |----lenet5.png
|                           |----mobilenet_v2.png
|                           |----mobilenet_v3_large.png
|                           |----mobilenet_v3_small.png
|                           |----resnet.png
|                           |----resnext.png
|                           |----retinanet.png
|                           |----shufflenet_v2.png
|                           |----ssd.png
|                           |----vgg.png
|
|----log(XXX[ detection or classification ]_XXX[  train or test or eval ].info.log)
|
|              |----classification----|----utils----|----accuracy.py
|              |                      |             |----AverageMeter.py
|              |                      |----lenet5.py
|              |                      |----alexnet.py
|              |                      |----vgg.py
|              |                      |----resnet.py(include: resenext)
|              |                      |----googlenet.py
|              |                      |----mobilenet_v2.py
|              |                      |----mobilenet_v3.py
|              |                      |----shufflenet.py
|              |----detection----|----RetinaNet----|----anchor----|----__init__.py
|              |                 |                 |              |----Anchor.py
|              |                 |                 |----backbones----|----__init__.py
|              |                 |                 |                 |----DarkNet.py
|              |                 |                 |                 |----ResNet.py
|              |                 |                 |----head----|----__init__.py
|              |                 |                 |            |----Head.py
|              |                 |                 |
|              |                 |                 |----loss----|----__init__.py
|              |                 |                 |            |----Loss.py
|              |                 |                 |
|              |                 |                 |----neck----|----__init__.py
|----models----|                 |                 |            |----FPN.py
|              |                 |                 |            |----FPN.txt
|              |                 |                 |----utils----|----augmentations.py
|              |                 |                 |             |----BBoxTransform.py
|              |                 |                 |             |----ClipBoxes.py
|              |                 |                 |             |----collate.py
|              |                 |                 |             |----iou.py
|              |                 |                 |----RetinaNet.py
|              |                 |
|              |                 |----SSD----|----anchor----|----prior_box.py
|              |                             |----backbone----|----vgg.py
|              |                             |----box_head----|----box_predictor.py
|              |                             |                |----inference.py
|              |                             |                |----loss.py
|              |                             |----utils----|----augmentations.py
|              |                             |             |----box_utils.py
|              |                             |             |----collate.py
|              |                             |             |----l2norm.py
|              |                             |----ssd.py
|----options----|----detection----|----RetinaNet----|----eval_options.py
|                                 |                 |----test_options.py
|                                 |                 |----train_options.py
|                                 |----SSD----|----eval_options.py
|                                             |----test_options.py
|                                             |----train_options.py
|----results----|----SSD----|----COCO----|----coco_bbox_results.json
|               |           |----VOC----|----annot_cache----|----XXX_pr.pkl
|               |           |           |----det----|----det_test_xxx.txt(eg: car AP)
|               |           |           |----annots.pkl
|               |           |           |----detections.pkl
|               |           |           |----visualize.txt
|               |           |----XX(name: 000478)_XX(coco or voc).jpg
|               |----RetinaNet----|----COCO----|----coco_bbox_results.json
|               |                 |----VOC----|----annot_cache----|----XXX_pr.pkl
|                                 |           |----det----|----det_test_xxx.txt(eg: car)
|                                 |           |----annots.pkl
|                                 |           |----detections.pkl
|                                 |----XX(name:000478)_XX(coco or voc).jpg            
|----tensorboard(Loss Visualization)
|----tools----|----classification----|----eval.py
|             |                      |----train.py
|             |                      |----test.py
|             |----detection----|----RetinaNet----|----eval_coco.py
|                               |                 |----eval_voc.py
|                               |                 |----train.py
|                               |                 |----visualize.py
|                               |----SSD----|----eval_coco.py
|                                           |----eval_voc.py
|                                           |----train.py
|                                           |----visualize.py
|             |----get_logger.py
|----utils----|----optimizer.py
|             |----path.py
|             |----scheduler.py
|----FolderOrganization.txt
|----main.py
|----README.md
|----requirements.txt

Run the program

Since there is not much time to adjust the accuracy, the accuracy of all models will be lower than the accuracy of the model in the paper.If you want to use the models in this project, you need to readjust the parameters and accuracy.

1.classification

Reproduce network architectures

1).EfficientNet

(finished)

1).LeNet5(models/classification/lenet5.py)[1]

 I add nn.BatchNorm2d(). This is because that I was so upset about the poor accuracy.
 basenet: lenet5 (image size: 32 * 32 * 3)
 dataset: cifar
 batch_size: 32
 optim: SGD
 lr: 0.01
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: MultiStepLR
 milestones: [15, 20, 30]
 gamma: 0.1
 poch: 30

epochs	times	avg top1 acc (%)	avg top5 acc (%)
30	0h11m44s	62.21	95.97

2).AlexNet(models/classification/alexnet.py)[2]

 I add nn.BatchNorm2d(). This is because that I was so upset about the poor accuracy.
 basenet: AlexNet (image size: 224 * 224 * 3)
 dataset: cifar
 batch_size: 32
 optim: SGD
 gamma: 0.1
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: MultiStepLR
 milestones: [15, 20, 30]
 lr: 0.01
 epoch: 30

epochs	times	avg top1 acc (%)	avg top5 acc (%)
30	0h22m44s	86.27	99.00

3).VGG(models/classification/vgg.py)[3]

 I add nn.BatchNorm2d() and transfer learning. This is because that I was so upset about the poor accuracy.
 basenet: vgg16 (image size: 224 * 224 * 3)
 dataset: cifar
 batch_size: 32
 optim: SGD
 lr: 0.01
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: MultiStepLR
 milestones: [15, 20, 30]
 gamma: 0.1
 epoch: 30

epochs	times	avg top1 acc (%)	avg top5 acc (%)
30	1h23m43s	76.56	96.44

4).ResNet(models/classification/resnet.py)[4]

 basenet: resnet18 
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: MultiStepLR
 milestones: [15, 20, 30]
 gamma: 0.1
 epoch: 30

No.epoch	times/epoch	top1 acc (%)	top5 acc (%)
5	3h49min35s	50.21	75.59

5).ResNetXt(models/classification/resnet.py include: resnext50_32x4d,resnext101_32x8d)[5]

 basenet: resnext50_32x4d
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True

No.epoch	times/eopch	top1 acc (%)	top5 acc (%)
7	4h5min16s	72.28	91.56

6).GoogLeNet(models/classification/googlenet.py)[6]

 basenet: GoogLeNet 
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.01
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True

No.epoch	times/eopch	top1 acc (%)	top5 acc (%)
5	3h54min31s	42.70	69.34

7).MobileNet(models/classification/mobilenet_v2.py or mobilenet_v3.py)

a).MobileNet_v2[7]

 basenet: MobileNet_v2
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True

No.epoch	times/epoch	top1 acc (%)	top5 acc (%)
5	3h58min3s	66.90	88.19

b).MobileNet_v3[8]

(1).Large

 basenet: MobileNet_v3_large
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True

No.epoch	times/epoch	top1 acc (%)	top5 acc (%)
5	3h58min13s	71.15	90.32

(2).Small

 basenet: MobileNet_v3_small
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True

No.epoch	times/epoch	top1 acc (%)	top5 acc (%)
5	3h54min38s	68.89	88.92

8).ShuffleNet v2(models/classification/shufflenet.py)[9]

 basenet: ShuffleNet_v2_x0_5
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True

No.epoch	times/epoch	top1 acc (%)	top5 acc (%)
5	3h52min0s	55.61	78.84

Run

#!/bin/bash
conda activate base
python /data/PycharmProject/Simple-CV-Pytorch-master/tools/classification/XXX.py(train.py or eval.py or test.py)

2.object detection

Although all models use COCO and VOC datasets, they are processed differently, so each model has its own data( dataloader), train, test and eval.

Reproduce network architectures
- CenterNet
- FCOS
- YOLO series
- Faster RCNN
(finished)

1.SSD(models/detection/SSD/ssd.py)[10]

 Network: ssd
 backbone: vgg+add_extras
 loss: cls(cross_entropy_loss)+reg(smooth_l1_loss)
 dataset: voc
 batch_size: 16
 optim: SGD
 lr: 0.001
 scheduler: adjust_learning_rate
 epoch: 115

epochs	batch norm	Mean AP (%)	Download Baidu yun	Key
115	False	75.4	Link	xwaw
115	True	76.2	Link	2xzk

visualize

 Network: ssd
 backbone: vgg+add_extras
 loss: cls(cross_entropy_loss)+reg(smooth_l1_loss)
 dataset: coco
 batch_size: 16
 optim: SGD
 lr: 0.001
 scheduler: adjust_learning_rate
 epoch: 55

epochs	batch norm	IoU=0.5 AP(%)	Download Baidu yun	Key
55	False	38.0	Link	j6wn
55	True	37.7	Link	7i64

visualize

2.RetinaNet(models/detection/RetinaNet.py)[11]

Backbones

ResNet

Do not show

DarkTiny

DarkNet19

DarkNet53

 Network: RetinaNet
 backbone: ResNet50
 neck: FPN
 loss: Focal Loss
 dataset: voc
 batch_size: 4
 optim: Adam
 lr: 0.0001
 scheduler: WarmupCosineSchedule
 epoch: 80

epochs	AP(%)	Download Baidu yun	Key
80	70.1	Link	dww8

visualize

 Network: RetinaNet
 backbone: ResNet50
 neck: FPN
 loss: Focal Loss
 dataset: coco
 batch_size: 4
 optim: Adam
 lr: 0.0001
 scheduler: ReduceLROnPlateau
 patience: 3
 epoch: 30
 pretrained: True

epochs	AP(%)	Download Baidu yun	Key
30	29.3	Link	5vak

visualize

Run

#!/bin/bash
conda activate base
python /data/PycharmProject/Simple-CV-Pytorch-master/tools/detection/XXX(eg:SSD or RetinaNet)/XXX.py(train.py or eval_coco.py or eval_voc.py or visualize.py)