Giter Club home page Giter Club logo

edgeyolo's Introduction

EdgeYOLO: anchor-free, edge-friendly

1 Intro
2 Updates
3 Coming Soon
4 Models
5 Quick Start
$\quad$5.1 setup
$\quad$5.2 inference
$\quad$5.3 train
$\quad$5.4 evaluate
$\quad$5.5 export onnx & tensorrt
6 Cite EdgeYOLO
7 Bugs found currently

Intro

  • In embeded device such as Nvidia Jetson AGX Xavier, EdgeYOLO reaches 34FPS with 50.6% AP in COCO2017 dataset and 25.9% AP in VisDrone2019 (image input size is 640x640, batch=16, post-process included). And for smaller model EdgeYOLO-S, it reaches 53FPS with 44.1% AP and 63.3% AP0.5(SOTA in P5 small models) in COCO2017.
  • we provide a more effective data augmentation during training.
  • small object and medium object detect performace is imporved by using RH loss during the last few training epochs.
  • Our pre-print paper is released on arxiv.

Updates

[2023/2/20]

  1. TensorRT cpp inference console demo (lib opencv and qt5 required)
  2. Fix bugs when exporting models using Version 7.X TensorRT

[2023/2/19]

  1. Publish TensorRT int8 export code with Calibration (torch2trt is required)

Coming Soon

  • Now evaluate.py doesn't support tensorrt model, we will update it in the near future
  • MNN deployment code
  • More different models
  • C++ code for TensorRT inference with UI
  • EdgeYOLO-mask for segmentation task
  • Simple but effective pretrain method

Models

  • models trained on COCO2017-train
Model Size mAPval
0.5:0.95
mAPval
0.5
FPSAGX Xavier
trt fp16 batch=16
include NMS
Params
train / infer
(M)
Download
EdgeYOLO-Tiny-LRELU 416
640
33.1
37.8
50.5
56.7
206
109
7.6 / 7.0 github
EdgeYOLO-Tiny 416
640
37.2
41.4
55.4
60.4
136
67
5.8 / 5.5 github
EdgeYOLO-S 640 44.1 63.3 53 9.9 / 9.3 github
EdgeYOLO-M 640 47.5 66.6 46 19.0 / 17.8 github
EdgeYOLO 640 50.6 69.8 34 41.2 / 40.5 github
  • models trained on VisDrone2019 (pretrained backbone on COCO2017-train)

we use VisDrone2019-DET dataset with COCO format in our training.

Model Size mAPval
0.5:0.95
mAPval
0.5
Download
EdgeYOLO-Tiny-LRELU 416
640
12.1
18.5
22.8
33.6
github
EdgeYOLO-Tiny 416
640
14.9
21.8
27.3
38.5
github
EdgeYOLO-S 640 23.6 40.8 github
EdgeYOLO-M 640 25.0 42.9 github
EdgeYOLO 640 25.9
26.4
43.9
44.8
github(legacy)
github(new)
Some of our detect results in COCO2017

COCO2017

Quick Start

setup

git clone https://github.com/LSH9832/edgeyolo.git
cd edgeyolo
pip install -r requirements.txt

if you use tensorrt, please make sure torch2trt and TensorRT Development Toolkit(version>7.1.3.0) are installed.

git clone https://github.com/NVIDIA-AI-IOT/torch2trt.git
cd torch2trt
python setup.py install

or to make sure you use the same version of torch2trt as ours, download here

inference

First download weights here

python detect.py --weights edgeyolo_coco.pth --source XXX.mp4 --fp16

# all options
python detect.py --weights edgeyolo_coco.pth 
                 --source /XX/XXX.mp4     # or dir with images, such as /dataset/coco2017/val2017    (jpg/jpeg, png, bmp, webp is available)
                 --conf-thres 0.25 
                 --nms-thres 0.5 
                 --input-size 640 640 
                 --batch 1 
                 --save-dir ./output/detect/imgs    # if you press "s", the current frame will be saved in this dir
                 --fp16 
                 --no-fuse                # do not reparameterize model
                 --no-label               # do not draw label with class name and confidence
                 --mp                     # use multi-process to show images more smoothly when batch > 1
                 --fps 30                 # max fps limitation, valid only when option --mp is used

train

  • first preparing your dataset and create dataset config file(./params/dataset/XXX.yaml), make sure your dataset config file contains:

(COCO, VOC, VisDrone and DOTA formats are supported)

type: "coco"                        # dataset format(lowercase),COCO, VOC, VisDrone and DOTA formats are supported currently
dataset_path: "/dataset/coco2017"   # root dir of your dataset

kwargs:
  suffix: "jpg"        # suffix of your dataset's images
  use_cache: true      # test on i5-12490f: Total loading time: 52s -> 10s(seg enabled) and 39s -> 4s(seg disabled)

train:
  image_dir: "images/train2017"                   # train set image dir
  label: "annotations/instances_train2017.json"   # train set label file(format with single label file) or directory(multi label files)

val:
  image_dir: "images/val2017"                     # evaluate set image dir
  label: "annotations/instances_val2017.json"     # evaluate set label file or directory

test:
  test_dir: "test2017"     # test set image dir (not used in code now, but will)

segmentaion_enabled: true  # whether this dataset has segmentation labels and you are going to use them instead of bbox labels

names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
        'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
        'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
        'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
        'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
        'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
        'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
        'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
        'hair drier', 'toothbrush']    # category names
  • then edit file ./params/train/train_XXX.yaml
  • finally
python train.py --cfg ./params/train/train_XXX.yaml

evaluate

python evaluate.py --weights edgeyolo_coco.pth --dataset params/dataset/XXX.yaml --batch 16 --device 0

# all options
python evaluate.py --weights edgeyolo_coco.pth 
                   --dataset params/dataset/XXX.yaml 
                   --batch 16   # batch size for each gpu
                   --device 0
                   --input-size 640 640   # height, width

export onnx & tensorrt

  • ONNX
python export.py --onnx --weights edgeyolo_coco.pth --batch 1

# all options
python export.py --onnx   # or --onnx-only if tensorrt and torch2trt are not installed
                 --weights edgeyolo_coco.pth 
                 --input-size 640 640   # height, width
                 --batch 1
                 --opset 11
                 --no-simplify    # do not simplify this model

it generates

output/export/edgeyolo_coco/640x640_batch1.onnx
  • TensorRT
# fp16
python export.py --trt --weights edgeyolo_coco.pth --batch 1 --workspace 8

# int8
python export.py --trt --weights edgeyolo_coco.pth --batch 1 --workspace 8 --int8 --dataset params/dataset/coco.yaml --num-imgs 1024

# all options
python export.py --trt                       # you can add --onnx and relative options to export both models
                 --weights edgeyolo_coco.pth
                 --input-size 640 640        # height, width
                 --batch 1
                 --workspace 10              # (GB)
                 --no-fp16        # fp16 mode in default, use this option to disable it(fp32)
                 --int8           # int8 mode, the following options are needed for calibration
                 --datset params/dataset/coco.yaml   # generates calibration images from its val images(upper limit:5120)
                 --train          # use train images instead of val images(upper limit:5120)
                 --all            # use all images(upper limit:5120)
                 --num-imgs 512   # (upper limit:5120)

it generates

(optional) output/export/edgeyolo_coco/640x640_batch1.onnx
output/export/edgeyolo_coco/640x640_batch1_fp16/int8.pt       # for python inference
output/export/edgeyolo_coco/640x640_batch1_fp16/int8.engine   # for c++ inference
output/export/edgeyolo_coco/640x640_batch1_fp16/int8.json     # for c++ inference

Benchmark of TensorRT Int8 Model

  • enviroment: TensorRT Version 8.2.5.1, Windows, i5-12490F, RTX 3060 12GB
  • increase workspace and the number of images for calibration may improve the performance

COCO2017-TensorRT-int8

Int8 Model Size Calibration
Image number
Workspace
(GB)
mAPval
0.5:0.95
mAPval
0.5
FPSRTX 3060
trt int8 batch=16
include NMS
Tiny-LRELU 416
640
512 8 31.5
36.4
48.7
55.5
730
360
Tiny 416
640
512 8 34.9
39.8
53.1
59.5
549
288
S 640 512 8 42.4 61.8 233
M 640 512 8 45.2 64.2 211
L 640 512 8 49.1 68.0 176

for python inference

python detect.py --trt --weights output/export/edgeyolo_coco/640x640_batch1_int8.pt --source XXX.mp4

# all options
python detect.py --trt 
                 --weights output/export/edgeyolo_coco/640x640_batch1_int8.pt 
                 --source XXX.mp4
                 --legacy         # if "img = img / 255" when you train your train model
                 --use-decoder    # if use original yolox tensorrt model before version 0.3.0
                 --mp             # use multi-process to show images more smoothly when batch > 1
                 --fps 30         # max fps limitation, valid only when option --mp is used

for c++ inference

# build
cd cpp/console/linux
mkdir build && cd build
cmake ..
make -j4

# help
./yolo -?
./yolo --help

# run
# ./yolo [json file] [source] [--conf] [--nms] [--loop] [--no-label]
./yolo ../../../../output/export/edgeyolo_coco/640x640_batch1_int8.json ~/Videos/test.avi --conf 0.25 --nms 0.5 --loop --no-label

Cite EdgeYOLO

 @article{edgeyolo2023,
  title={EdgeYOLO: An Edge-Real-Time Object Detector},
  author={Shihan Liu, Junlin Zha, Jian Sun, Zhuo Li, and Gang Wang},
  journal={arXiv preprint arXiv:2302.07483},
  year={2023}
}

Bugs found currently

  • Sometimes it raises error as follows during training. Reduce pytorch version to 1.8.0 might solve this problem.
File "XXX/edgeyolo/edgeyolo/train/loss.py", line 667, in dynamic_k_matching
_, pos_idx = torch.topk(cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  • For DOTA dataset, we only support single GPU training mode now, please do not train DOTA dataset with distributed mode or model can not be trained correctly.
  • Sometimes converting to TensorRT fp16 model with 8.4.X.X or higher version might lose a lot of precision, please use TensorRT Verson 7.X.X.X or 8.2.X.X

edgeyolo's People

Contributors

lsh9832 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.