Giter Club home page Giter Club logo

ink_msdetr's Introduction

MS-DETR

This repository is the official implementation of "MS-DETR: Efficient DETR Training with Mixed Supervision"

Authors: Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang

Implementations

  • The Paddle version of MS-DETR will be available in PaddleDetection.

  • The PyTorch implementation of MS-DETR architecture (a) (Figure 4 in main paper) is available here.

Introduction

DETR accomplishes end-to-end object detection through iteratively generating multiple object candidates based on image features and promoting one candidate for each ground-truth object. The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates.

We aim at improving the DETR training efficiency by explicitly supervising the candidate generation procedure through mixing one-to-one supervision and one-to-many supervision. Our approach, namely MS-DETR, is simple, and places one-to-many supervision to the object queries of the primary decoder that is used for inference. In comparison to existing DETR variants with one-to-many supervision, such as Group DETR and Hybrid DETR, our approach does not need additional decoder branches or object queries. The object queries of the primary decoder in our approach directly benefit from one-to-many supervision and thus are superior in object candidate prediction. Experimental results show that our approach outperforms related DETR variants, such as DN-DETR, Hybrid DETR, and Group DETR, and the combination with related DETR variants further improves the performance.

Model Zoo

We provide the checkpoint of the following models:

Name Baseline Backbone Queries Epochs mAP Download
MS-DETR Deformable-DETR R50 300 12 47.6 model
MS-DETR Deformable-DETR++ R50 300 12 48.8 model
MS-DETR Deformable-DETR++ R50 900 12 50.0 model

Installation

We tested our code with Python=3.10, PyTorch=1.12.1, CUDA=11.3. Please install PyTorch first according to official instructions.

  1. Clone the repository.
git clone https://github.com/Atten4Vis/MS-DETR.git
cd MS-DETR
  1. Install dependencies.
pip install -r requirements.txt
  1. Compile MSDeformAttn CUDA operators.
cd models/ops
python setup.py build install

Data

We use the COCO-2017 dataset for training and evaluation. Please download and organize the dataset as follows:

coco_path/
  ├── train2017/
  ├── val2017/
  └── annotations/
  	├── instances_train2017.json
  	└── instances_val2017.json

Run

Training

Train MS-DETR with 8 GPUs based on Deformable-DETR:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python -u main.py \
   --output_dir $EXP_DIR \
   --with_box_refine \
   --two_stage \
   --dim_feedforward 2048 \
   --epochs 12 \
   --lr_drop 11 \
   --coco_path=$coco_path \
   --num_queries 300 \
   --use_ms_detr \
   --use_aux_ffn \
   --cls_loss_coef 1 \
   --o2m_cls_loss_coef 2

Other training scripts are available in ./scripts.

Evaluation

Evaluate MS-DETR with 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python -u main.py \
    --coco_path=$coco_path \
    --with_box_refine \
    --two_stage \
    --dim_feedforward 2048 \
    --num_queries 300 \
    --use_ms_detr \
    --use_aux_ffn \
    --resume $EXP_DIR/checkpoint.pth \
    --eval

Acknowledgement

Our code is based on Deformable-DETR, H-DETR and DETA. Thanks for their great works.

Citation

If you use MS-DETR in your research or wish to refer to the baseline results published here, please use the following BibTeX entry.

@article{zhao2024ms,
  title={MS-DETR: Efficient DETR Training with Mixed Supervision},
  author={Zhao, Chuyang and Sun, Yifan and Wang, Wenhao and Chen, Qiang and Ding, Errui and Yang, Yi and Wang, Jingdong},
  journal={arXiv preprint arXiv:2401.03989},
  year={2024}
}

ink_msdetr's People

Contributors

jacksonsc007 avatar zhaochuyang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.