Giter Club home page Giter Club logo

deco's Introduction

DECO

DECO: Query-Based End-to-End Object Detection with ConvNets

Xinghao Chen*, Siwei Li*, Yijing Yang, Yunhe Wang (*Equal Contribution)

arXiv 2023

[arXiv] [BibTeX]

Updates

  • 2024/02/04: Pre-trained models and codes of DECO are released both in Pytorch and Mindspore.

Overview

Detection ConvNet (DECO) is a simple yet effective query-based end-to-end object detection framework, which is composed of a backbone and convolutional encoder-decoder architecture. Our DECO model enjoys the similar favorable attributes as DETR. We compare the proposed DECO against prior detectors on the challenging COCO benchmark. Despite its simplicity, our DECO achieves competitive performance in terms of detection accuracy and running speed. Specifically, with the ResNet-50 and ConvNeXt-Tiny backbone, DECO obtains 38.6% and 40.8% AP on COCO val set with 35 and 28 FPS respectively. We hope the proposed DECO brings another perspective for designing object detection framework.


Main Results

Here we provide the pretrained DECO weights.

Backbone Pretrain Epochs Queries AP (%) Download
R-50 ImageNet-1k 150 100 38.8 deco_r50_150e

Installation

pip install torch==1.8.0 torchvision==0.9.0
pip install pycocotools
pip install timm

Training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --backbone resnet50 --batch_size 2 --coco_path {your_path_to_coco} --output_dir {your_path_for_outputs} # 4 gpus example

By default, we use 4 GPUs with total batch size as 8 for training DECO with ResNet-50 backbone.

Evaluation

Model evaluation can be done as follows:

python eval.py --backbone resnet50 --batch_size 1 --coco_path {your_path_to_coco} --ckpt_path {your_path_to_pretrained_ckpt}
Results of DECO with ResNet-50 backbone:
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.388
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.588
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.199
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.431
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.555
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.320
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.522
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.556
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.297
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.607
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.798

Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@misc{chen2023deco,
      title={DECO: Query-Based End-to-End Object Detection with ConvNets}, 
      author={Xinghao Chen and Siwei Li and Yijing Yang and Yunhe Wang},
      year={2023},
      eprint={2312.13735},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

deco's People

Contributors

xinghaochen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

deco's Issues

Some question about Table 1 and Table 2

Your work is excellent, but there are two questions about the paper:

  • Question 1. Are all the comparison algorithms in Table 1 only half-trained epochs, such as 150 epochs for fair comparison?

  • Question 2. Are all the comparison algorithms in Table 2 only train 150 epochs?

About DecoDecoderLayer

What do query pos and tgt stand for?

class DecoDecoderLayer(nn.Module):
    ……
    def forward(self, tgt, memory, query_pos: Optional[Tensor] = None):
        # SIM
        b, d, h, w = memory.shape
        tgt2 = tgt + query_pos

About DecoderLayer

Excuse me,how Decoder handles multi-scale features that we get from the CCFM(RT-DETR) of Encoder?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.