Giter Club home page Giter Club logo

dynamicdet's Introduction

DynamicDet [arXiv]

This repo contains the official implementation of "DynamicDet: A Unified Dynamic Architecture for Object Detection".

Performance

MS COCO

Model Easy / Hard Size FLOPs FPS APval APtest
Dy-YOLOv7 90% / 10% 640 112.4G 110 51.4% 52.1%
50% / 50% 640 143.2G 96 52.7% 53.3%
10% / 90% 640 174.0G 85 53.3% 53.8%
0% / 100% 640 181.7G 83 53.5% 53.9%
Dy-YOLOv7-X 90% / 10% 640 201.7G 98 53.0% 53.3%
50% / 50% 640 248.9G 78 54.2% 54.4%
10% / 90% 640 296.1G 65 54.7% 55.0%
0% / 100% 640 307.9G 64 54.8% 55.0%
Dy-YOLOv7-W6 90% / 10% 1280 384.2G 74 54.9% 55.2%
50% / 50% 1280 480.8G 58 55.9% 56.1%
10% / 90% 1280 577.4G 48 56.4% 56.7%
0% / 100% 1280 601.6G 46 56.5% 56.8%
Table Notes
  • FPS is measured on the same machine with 1 NVIDIA V100 GPU, with batch_size = 1, no_trace and fp16.

  • More results can be found on the paper.

Quick Start

Installation

cd DynamicDet
conda install pytorch=1.11 cudatoolkit=11.3 torchvision -c pytorch
pip install -r requirements.txt

Data preparation

Download MS COCO dataset images (train, val, test) and labels.

├── coco
│   ├── train2017.txt
│   ├── val2017.txt
│   ├── test-dev2017.txt
│   ├── images
│   │   ├── train2017
│   │   ├── val2017
│   │   ├── test2017
│   ├── labels
│   │   ├── train2017
│   │   ├── val2017
│   ├── annotations
│   │   ├── instances_val2017.json

Training

Step1: Training cascaded detector

  • Single GPU training

    python train_step1.py --workers 8 --device 0 --batch-size 16 --epochs 300 --img 640 --cfg cfg/dy-yolov7-step1.yaml --weight '' --data data/coco.yaml --hyp hyp/hyp.scratch.p5.yaml --name dy-yolov7-step1
  • Multiple GPU training (OURS, RECOMMENDED 🚀)

    python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train_step1.py --workers 8 --device 0,1,2,3,4,5,6,7 --sync-bn --batch-size 128 --epochs 300 --img 640 --cfg cfg/dy-yolov7-step1.yaml --weight '' --data data/coco.yaml --hyp hyp/hyp.scratch.p5.yaml --name dy-yolov7-step1

Step2: Training adaptive router

python train_step2.py --workers 4 --device 0 --batch-size 1 --epochs 2 --img 640 --adam --cfg cfg/dy-yolov7-step2.yaml --weight runs/train/dy-yolov7-step1/weights/last.pt --data data/coco.yaml --hyp hyp/hyp.finetune.dynamic.adam.yaml --name dy-yolov7-step2

Getting the dynamic thresholds for variable-speed inference

python get_dynamic_thres.py --device 0 --batch-size 1 --img-size 640 --cfg cfg/dy-yolov7-step2.yaml --weight weights/dy-yolov7.pt --data data/coco.yaml --task val

Testing

python test.py --img-size 640 --batch-size 1 --conf 0.001 --iou 0.65 --device 0 --cfg cfg/dy-yolov7-step2.yaml --weight weights/dy-yolov7.pt --data data/coco.yaml --dy-thres <DY_THRESHOLD>

Inference

python detect.py --cfg cfg/dy-yolov7-step2.yaml --weight weights/dy-yolov7.pt --num-classes 80 --source <IMAGE/VIDEO> --device 0 --dy-thres <DY_THRESHOLD>

Citation

If you find this repo useful in your research, please consider citing the following paper:

@inproceedings{lin2023dynamicdet,
  title={DynamicDet: A Unified Dynamic Architecture for Object Detection},
  author={Lin, Zhihao and Wang, Yongtao and Zhang, Jinhe and Chu, Xiaojie},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

License

The project is only free for academic research purposes, but needs authorization for commerce. For commerce permission, please contact [email protected].

dynamicdet's People

Contributors

lzhgrla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dynamicdet's Issues

关于Variable-speed inference的疑问

作者您好:

image
通过(14)求出来的是分类为“hard”图像的比例,假如0.75,也就是75%的图像被分类为“hard”。那
image
计算的不应该是25%的分位数嘛,这样计算出来的阈值才有75%的图像的难度分数大于它呀。如果直接将0.75放进percentile函数,计算的好像不对啊
image

运行python train_step2.py 报错,

您好,运行程序训练自己的数据出现了以下错误。
运行python train_step2.py 报错,错误如下:
Traceback (most recent call last):
File "train_step2.py", line 551, in
train(hyp, opt, device, tb_writer)
File "train_step2.py", line 160, in train
optimizer.load_state_dict(ckpt['optimizer'])
File "/opt/conda/lib/python3.8/site-packages/torch/optim/optimizer.py", line 141, in load_state_dict
raise ValueError("loaded state dict has a different number of "
ValueError: loaded state dict has a different number of parameter groups
错误发生在加载train_step1.py生成的模型,
相应命令是:python train_step2.py --weight runs/train/exp5/weights/last.pt --name dy-yolov7-step2
定位到代码是发现pg0长度为空,导致静态字典长度不匹配为2,而加载train_step1.py生成的模型的长度为3:
pg0, pg1, pg2 = [], [], [] # optimizer parameter groups
for k, v in model.named_modules():
if 'router' in k:
if hasattr(v, 'bias') and isinstance(v.bias, nn.Parameter):
pg2.append(v.bias) # biases
if isinstance(v, nn.BatchNorm2d):
pg0.append(v.weight) # no decay
elif hasattr(v, 'weight') and isinstance(v.weight, nn.Parameter):
pg1.append(v.weight) # apply decay

if opt.adam:
    optimizer = optim.AdamW(pg1, lr=hyp['lr0'], weight_decay=hyp['weight_decay'], betas=(hyp['momentum'], 0.999))  # adjust beta1 to momentum
else:
    optimizer = optim.SGD(pg1, lr=hyp['lr0'], weight_decay=hyp['weight_decay'], momentum=hyp['momentum'], nesterov=True)
if len(pg0):
    optimizer.add_param_group({'params': pg0, 'weight_decay': 0})  # add pg0 without weight_decay
if len(pg2):
    optimizer.add_param_group({'params': pg2, 'weight_decay': 0})  # add pg2 (biases)
logger.info('Optimizer groups: %g .bias, %g conv.weight, %g other' % (len(pg2), len(pg1), len(pg0)))
del pg0, pg1, pg2

谢谢!

The implementation of Dy-YOLOv7 is developed by the YOLOv7 [45] framework, with two identical detectors. The implementation of dynamic two-stage detectors is developed by the open-source CBNet [24] framework, with two identical backbones and a shared neck and head.

图片
作者您好,这句话什么意思呢?Dy-YOLOv7按照yolov7的框架包括了两个相同的yolov7检测器,动态两阶段检测器是按照CBNet的架构,包括两个相同的backbone和共享的neck和head是吧?那动态的两阶段检测器不符合DynamicDet的架构啊

Can it be extended to anchor-free detectors?

Dear Author:
Hello, first of all, thank you for your excellent work.
You say “This dynamic architecture can be easily adapted to mainstream detectors, e.g., Faster R-CNN and YOLO”, I was wondering if it could be extended to anchor-free detectors such as CornerNet and CenterNet.

Why do you use segmentation labels instead of bounding boxes in the COCO dataset?

Hello,
This is indeed a fascinating piece of work, and I appreciate the outstanding contribution you have made to the community. However, I have a question. I noticed that the labels in this repository use segmentation instead of bounding boxes. Could you explain the reason behind this choice? If my dataset is labeled with bounding boxes, would it still be compatible? Moreover, were the performances of other methods compared in the article also trained with segmentation labels? To my knowledge, both v7 and v5 have used bounding box labels.

cuda10.2

您好,请问环境必须是cuda11+吗?

Evaluation metrics explanation

Hi Team,

I was wondering how you are computing the metrics for evaluation. I was going through metrics.py file and came across ap_per_class function which seems to be computing the average precision for each class in an image. (FYI - my custom dataset only has 1 class with a lot of objects of that class in a single image)
I wanted to understand what *stats is (the parameter passed in the function) in test.py? And how does it help in being able to assign a predicted class to a ground truth?

Also,
I wanted to know how you are associating a particular predicted class with a ground truth? Is it solely based on the highest iou values? If yes, what if you assign a ground truth to a particular predicted class (and eliminate it from the iteration once it is assigned) and find a higher iou to another predicted class further down the iteration?

Thanks!

关于Fig 7的疑问

image
尊敬的作者:
您好,Figure 7中,使用三种不同的优化策略训练的router,在以batch=1推理时,他们的推理时间为什么一样啊?还是说,固定住网络的推理时间和难易图像的分割比例,观察AP值的变化?

Config for custom data

Hello,
My dataset has only one object - I have modified the coco.yaml file accordingly to give data paths, nc = 1 and my class name (which in represented by 0 in the txt files).

Apart from that, do I need to alter any other cfg/hyp files?
I see cfg/dy-yolov7-step1.yaml has an nc variable which I think I would change. Any other changes that I’d need to make?

Continue training model on new data

Hello!
I have a fully trained model (75 epochs) on a certain dataset and I tested it on another dataset. I now want to further train my model on this test dataset. Should I just run:

python train_step1.py --weights 'path/to/best_75_epochs.pt' (Or should the training be continued using last.pt?)

Can I run this on fewer epochs or should it be the same as best.pt? And will this new model be considered to be trained with 75 epochs + new training epochs or just new training epochs on new data?

Also, train_step2.py will then use this new model's last.pt weights, right?

Thanks!

Early stopping or using epoch_xx.pt

Hello,

Is there an early stopping mechanism for DynamicDet? Alternately, could I use intermediate weights in runs/train/weights like epoch_xx.pt for train_step2 and inference?

Thanks.

冒昧问一下大佬这个有什么前景/应用价值吗

首先我可以将大佬的工作理解为:设计了一个可变计算量的网络吗,在easy图像下计算资源占用变少,在hard图片下计算资源占用变多。吗?
如果是这样,那么在资源受限的设备上我一检测hard图片那么计算资源就不够了,在算力资源充足的设备上,既然有那么丰富的算力资源那么强的算力,我使用一个较为复杂的模型,对于hard和easy图片是不是都能以差不多的速度且较快地跑出来。
请问这个的应用价值是什么?是为了在算力充足设备上多扣几毫秒时间吗。还是说比如说一台服务器,为别人远程提供检测任务,本来使用复杂模型我只能跑两个,现在我使用这种计算资源占用的模型我可以跑三个,多跑一个?
望大佬解答,谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.