Giter Club home page Giter Club logo

omdet's Introduction

OmDet-Turbo

[Paper πŸ“„] [Model πŸ—‚οΈ]

Fast and accurate open-vocabulary end-to-end object detection


πŸ—“οΈ Updates

  • 03/25/2024: Inference code and a pretrained OmDet-Turbo-Tiny model released.
  • 03/12/2024: Github open-source project created

πŸ”— Related Works

If you are interested in our research, we welcome you to explore our other wonderful projects.

πŸ”† How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection(AAAI24)  🏠Github Repository

πŸ”† OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network(IET Computer Vision)


πŸ“– Introduction

This repository is the official PyTorch implementation for OmDet-Turbo, a fast transformer-based open-vocabulary object detection model.

⭐️Highlights

  1. OmDet-Turbo is a transformer-based real-time open-vocabulary detector that combines strong OVD capabilities with fast inference speed. This model addresses the challenges of efficient detection in open-vocabulary scenarios while maintaining high detection performance.
  2. We introduce the Efficient Fusion Head, a swift multimodal fusion module designed to alleviate the computational burden on the encoder and reduce the time consumption of the head with ROI.
  3. OmDet-Turbo-Base model, achieves state-of-the-art zero-shot performance on the ODinW and OVDEval datasets, with AP scores of 30.1 and 26.86, respectively.
  4. The inference speed of OmDetTurbo-Base on the COCO val2017 dataset reach 100.2 FPS on an A100 GPU.

For more details, check out our paper Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head model_structure


⚑️ Inference Speed

Comparison of inference speeds for each component in tiny-size model. speed


πŸ› οΈ How To Install

Follow the Installation Instructions to set up the environments for OmDet-Turbo


πŸš€ How To Run

  1. Download our pretrained model and the CLIP checkpoints.
  2. Create a folder named resources, put downloaded models into this folder.
  3. Run run_demo.py, the images with predicted results will be saved at ./outputs folder.

We already added language cache while inferring with run_demo.py. For more details, please open and check run_demo.py scripts.


πŸ“¦ Model Zoo

The performance of COCO and LVIS are evaluated under zero-shot setting.

Model Backbone Pre-Train Data COCO LVIS FPS (pytorch/trt) Weight
OmDet-Turbo-Tiny Swin-T O365,GoldG 42.5 30.3 21.5/140.0 weight

πŸ“ Main Results

main_result


Citation

Please consider citing our papers if you use our projects:

@article{zhao2024real,
  title={Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head},
  author={Zhao, Tiancheng and Liu, Peng and He, Xuan and Zhang, Lu and Lee, Kyusong},
  journal={arXiv preprint arXiv:2403.06892},
  year={2024}
}
@article{zhao2024omdet,
  title={OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network},
  author={Zhao, Tiancheng and Liu, Peng and Lee, Kyusong},
  journal={IET Computer Vision},
  year={2024},
  publisher={Wiley Online Library}
}

omdet's People

Contributors

p3ngliu avatar eltociear avatar snakeztc avatar nxf1111 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.