Giter Club home page Giter Club logo

t-rex's Introduction

A picture speaks volumes, as do the words that frame it.

Static Badge arXiv preprint Homepage Hits Static Badge Gradio demo


Introduction Video 🎥

Turn on the music if possible 🎧

trex2_ongithub.mp4

News 📰

Contents 📜

1. Introduction 📚

Object detection, the ability to locate and identify objects within an image, is a cornerstone of computer vision, pivotal to applications ranging from autonomous driving to content moderation. A notable limitation of traditional object detection models is their closed-set nature. These models are trained on a predetermined set of categories, confining their ability to recognize only those specific categories. The training process itself is arduous, demanding expert knowledge, extensive datasets, and intricate model tuning to achieve desirable accuracy. Moreover, the introduction of a novel object category, exacerbates these challenges, necessitating the entire process to be repeated.

T-Rex2 addresses these limitations by integrating both text and visual prompts in one model, thereby harnessing the strengths of both modalities. The synergy of text and visual prompts equips T-Rex2 with robust zero-shot capabilities, making it a versatile tool in the ever-changing landscape of object detection.

What Can T-Rex Do 📝

T-Rex2 is well-suited for a variety of real-world applications, including but not limited to: agriculture, industry, livstock and wild animals monitoring, biology, medicine, OCR, retail, electronics, transportation, logistics, and more. T-Rex2 mainly supports three major workflows including interactive visual prompt workflow, generic visual prompt workflow and text prompt workflow. It can cover most of the application scenarios that require object detection

workflows.mp4

2. Try Demo 🎮

We are now opening online demo for T-Rex2. Check our demo here

3. API Usage Examples📚

We are now opening free API access to T-Rex2. For educators, students, and researchers, we offer an API with extensive usage times to support your educational and research endeavors. You can get API at here request API.

Setup

Install the API package and acquire the API token from the email.

git clone https://github.com/IDEA-Research/T-Rex.git
cd T-Rex
pip install dds-cloudapi-sdk==0.1.1
pip install -v -e .

Interactive Visual Prompt API

  • In interactive visual prompt workflow, users can provide visual prompts in boxes or points format on a given image to specify the object to be detected.

    python demo_examples/interactive_inference.py --token <your_token> 
    • You are supposed get the following visualization results at demo_vis/

Generic Visual Prompt API

  • In generic visual prompt workflow, users can provide visual prompts on one reference image and detect on the other image.

    python demo_examples/generic_inference.py --token <your_token> 
    • You are supposed get the following visualization results at demo_vis/
      + =

Customize Visual Prompt Embedding API

In this workflow, you cam customize a visual embedding for a object category using multiple images. With this embedding, you can detect on any images.

python demo_examples/customize_embedding.py --token <your_token> 
  • You are supposed to get a download link for this visual prompt embedding in safetensors format. Save it and let's use it for embedding_inference.

Embedding Inference API

With the visual prompt embeddings generated from the previous API. You can use it detect on any images.

  python demo_examples/embedding_inference.py --token <your_token> 

4. Local Gradio Demo with API🎨

4.1. Setup

  • Install T-Rex2 API if you haven't done so
- install gradio and other dependencies
```bash
# install gradio and other dependencies
pip install gradio==4.22.0
pip install gradio-image-prompter

4.2. Run the Gradio Demo

python gradio_demo.py --trex2_api_token <your_token>

4.3. Basic Operations

  • Draw Box: Draw a box on the image to specify the object to be detected. Drag the left mouse button to draw a box.
  • Draw Point: Draw a point on the image to specify the object to be detected. Click the left mouse button to draw a point.
  • Interactive Visual Prompt: Provide visual prompts in boxes or points format on a given image to specify the object to be detected. The Input Target Image and Interactive Visual Prompt Image should be the same
  • Generic Visual Prompt: Provide visual prompts on multiple reference images and detect on the other image.

5. Related Works

🔥 We release the training and inference code and demo link of DINOv, which can handle in-context visual prompts for open-set and referring detection & segmentation. Check it out!

BibTeX 📚

@misc{jiang2024trex2,
      title={T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy}, 
      author={Qing Jiang and Feng Li and Zhaoyang Zeng and Tianhe Ren and Shilong Liu and Lei Zhang},
      year={2024},
      eprint={2403.14610},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

t-rex's People

Contributors

eltociear avatar fengli-ust avatar kishkath avatar mountchicken avatar spacewalk01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

t-rex's Issues

The release of code

Thanks for your impressive work,When do you expect to release the code?

python demo_examples/interactive_inference.py --token <your_token>

excuse me,when i use your tool like this,
python demo_examples/interactive_inference.py --token <your_token>
what token should be?
i set is as a img dir,and i get the error like this:

raceback (most recent call last):
File "demo_examples/interactive_inference.py", line 53, in
results = trex2.interactve_inference(prompts)
File "/home/cl/devData2/gitclone/T-Rex/trex/model_wrapper.py", line 79, in interactve_inference
image=self.client.upload_file(prompt["prompt_image"]),
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/dds_cloudapi_sdk/client.py", line 67, in upload_file
rsp = requests.post(sign_url, json=data, headers=headers, timeout=2)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/urllib3/connectionpool.py", line 496, in _make_request
conn.request(
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/urllib3/connection.py", line 399, in request
self.putheader(header, value)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/urllib3/connection.py", line 313, in putheader
super().putheader(header, *values)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/http/client.py", line 1229, in putheader
values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 26-27: ordinal not in range(256)

跑偏了。

groundingdino和本文类似,但是在工程运用上,不好落地,推理速度不像yolo-world那般快。transformer不好加速推理。跨模态方式的文本加载方式在加速上处理较为麻烦。

Difference w.r.t GLEE?

Hi! Great work!

It seems a concurrent work, GLEE, is very similar to yours. Some of the reported results in their paper seem to be even better. Can you elaborate the differences or compare the results? Thanks!

How to Edit in Annotation on photo after run the model

First : thanks a lot for your great work.
Second : How to Edit in Annotation on photo after run the model
how can I delete the box in red
and add the rest of boxes.
Third : by any chance , it is possible to reseal desktop version for this awesome project
Capture

Querying Coordinate Output Capabilities in T-Rex Project Web Demo

Hello, I greatly appreciate the work you've put into the T-Rex(2) project; it has been very inspiring to me. I am interested in knowing if it is possible to directly obtain the coordinates of the bounding boxes or the center points when using the web demo for testing. Or, would I need to fork the project and run it on my own to achieve this functionality?

Can T-Rex deal with real-time webcam frames provided by OpenCV and only get the instance mask of the obiect within my rectangle?

Hi, @Mountchicken and @spacewalk01 !

I've tried the demo of T-Rex on this website https://deepdataspace.com/playground/ivp, and the result is awsome!
image

I have three questions about T-Rex which are also written in the screenshot above:

  1. Is it possible to only get the instance mask of the obiect within my rectangle?
  2. Is it compatible with the real-time webcam frames provided by OpenCV, and can it maintain the same intuitive interaction method when dealing with real-time webcam frames?
  3. Can it keep the impressive detection effect even after the object disappears from the real-time video frames and appears again?

Thank you for your wonderful work and looking forward to your reply!

Question about CA44 results

Hi, congratulate for the wonder projects of T-Rex and T-Rex2!

I'd like to know whether the results for CA44 in paper T-Rex were reported on the valid set or the test set? Besides, do you have the specific results for Figure 7 (the values of MAE and NMAE in CA44)? Many thanks!

License

Hi,
Thanks for releasing this great work! Might it be possible to switch to an open source license?
Thank you!

Release Code

Thanks for your awesome work! When will you release the new version, and will code be available?

Release code

Hello

Congrats, really impressed with your work.

I wanted to ask, is there any plan to release the code and model?

Thanks

API

Dear author:
I have applied for an API. May I ask how long it will take to receive a response

only one class visual prompt for once

for generic visual prompt mode, it only supports one class prompt, right? It is possible to detect different classes simultaneously with multiple prompts for different classes?

ImportError: cannot import name 'BatchPointInfer' from 'dds_cloudapi_sdk'

Thank you for your wonderful work.

I installed the trex according to "Setup".
The following error occurred when running "from dds_cloudapi_sdk import BatchRectInfer, BatchPointInfer, BatchEmbdInfer":
ImportError: cannot import name 'BatchPointInfer' from 'dds_cloudapi_sdk' (/home/nvidia/miniconda3/envs/demo/lib/python3.8/site-packages/dds_cloudapi_sdk/init.py)

How can I fix it?

Thank you.

code

can you release the training and model code ?

what is inference time?

Hello, I would like to know about inference time per image. And how many images can it infer at once for free?

Requesting customize_embedding api failed without any error message

Using the token obtained through email, I can successfully execute "python demo_examples/generic_inference.py --token <your_token>", but "python demo_examples/customize_embedding.py --token <your_token>" always fails. The error message is as follows, and there is no error message.
image

bad result on unexpected prompt

Thank you for your wonderful work. I tried your work on some images and got excellent result on common classes. However, I tried to prompt it with uncommon object like 'rock' but it detects wrong objects (common objects on the image):
image

Will TRex-1 model file will be provided ?

As I have being following your work from the beginning with the TRex-1 model, all these work are extremely excellent. Is it possible to provide the TRex-1 model file, so that further development will be much easier.

Could you show me two test samples in FSC147?

image image

1121.jpg and 7621.jpg are dominant in the final MAE of FSC-147 dataset. For example, there is a gap of MAE=3000/1190 without '1123.jpg'. I cannot achieve a good result using your demo.

Object tracking in video

Hi, I have read in a closed topic (#19 (comment)) that T-Rex 2 can be applied to videos too, but I don't see how in your demo. Could you please explain how to do that?

Additionally, do you have any suggestions on how to use your excellent object detection for object tracking? Could you suggest a package for object tracking maybe using your COCO results on the first frame as input? Thank you!

Textual Prompting examples

Thank you for making this available.

Is it possible to textually prompt the model, I could not find any examples in the API docs or find a way to do it in the online demo?

Cheers

Question about the reported result on FSC147

Thanks for your great work.
I would like to ask if the evaluation results on FSC147 shown in the report are the models trained on the FSC147 dataset? And the model structure is not going to be published?

Release the inference code and pre-trained models

First of all, congrats on your great work!
I'm very interested in appling your work into few-shot data labeling, it'll help a lot in reducing annotation cost. I hope that you can release the inference code and pre-trained models.
Again, great work!

Dataset

If you don't mind, can you tell me which dataset you used?

About demo for T-Rex2

Thanks for your great contribution!
I tried sample images on the T-Rex2 model, it works well. However, when I upload images, T-Rex2 cannot detect anything, even just one object (Please refer to the attached image)
image (6)
Could you please confirm this matter? Or are there any tips I should pay more attention to?
Thank you for your support in advance:)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.