idea-research / t-rex Goto Github PK

View Code? Open in Web Editor NEW

1.9K 37.0 113.0 120.39 MB

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Home Page: https://deepdataspace.com/home

License: Other

Python 100.00%

interactive object-counting object-detection open-set visual-prompt text-prompt

t-rex's Introduction

A picture speaks volumes, as do the words that frame it.

Introduction Video 🎥

Turn on the music if possible 🎧

trex2_ongithub.mp4

News 📰

2024-03-27: Online Gradio demo hosted at https://huggingface.co/spaces/Mountchicken/T-Rex2
2024-03-26: New Gradio demo available🤗! Get our API but don't know how to use it? We are now providing a local Gradio demo that provides GUI interface to support your usage. Check here: Gradio APP

Contents 📜

Introduction Video 🎥
News 📰
Contents 📜
1. Introduction 📚
- What Can T-Rex Do 📝
2. Try Demo 🎮
3. API Usage Examples📚
4. Local Gradio Demo with API🎨
5. Related Works
BibTeX 📚

1. Introduction 📚

Object detection, the ability to locate and identify objects within an image, is a cornerstone of computer vision, pivotal to applications ranging from autonomous driving to content moderation. A notable limitation of traditional object detection models is their closed-set nature. These models are trained on a predetermined set of categories, confining their ability to recognize only those specific categories. The training process itself is arduous, demanding expert knowledge, extensive datasets, and intricate model tuning to achieve desirable accuracy. Moreover, the introduction of a novel object category, exacerbates these challenges, necessitating the entire process to be repeated.

T-Rex2 addresses these limitations by integrating both text and visual prompts in one model, thereby harnessing the strengths of both modalities. The synergy of text and visual prompts equips T-Rex2 with robust zero-shot capabilities, making it a versatile tool in the ever-changing landscape of object detection.

What Can T-Rex Do 📝

T-Rex2 is well-suited for a variety of real-world applications, including but not limited to: agriculture, industry, livstock and wild animals monitoring, biology, medicine, OCR, retail, electronics, transportation, logistics, and more. T-Rex2 mainly supports three major workflows including interactive visual prompt workflow, generic visual prompt workflow and text prompt workflow. It can cover most of the application scenarios that require object detection

workflows.mp4

2. Try Demo 🎮

We are now opening online demo for T-Rex2. Check our demo here

3. API Usage Examples📚

We are now opening free API access to T-Rex2. For educators, students, and researchers, we offer an API with extensive usage times to support your educational and research endeavors. You can get API at here request API.

Full API documentation can be found here.

Setup

Install the API package and acquire the API token from the email.

git clone https://github.com/IDEA-Research/T-Rex.git
cd T-Rex
pip install dds-cloudapi-sdk==0.1.1
pip install -v -e .

Interactive Visual Prompt API

In interactive visual prompt workflow, users can provide visual prompts in boxes or points format on a given image to specify the object to be detected.
```
python demo_examples/interactive_inference.py --token <your_token> 
```
- You are supposed get the following visualization results at demo_vis/

Generic Visual Prompt API

In generic visual prompt workflow, users can provide visual prompts on one reference image and detect on the other image.
```
python demo_examples/generic_inference.py --token <your_token> 
```
- You are supposed get the following visualization results at demo_vis/
  + =

Customize Visual Prompt Embedding API

In this workflow, you cam customize a visual embedding for a object category using multiple images. With this embedding, you can detect on any images.

python demo_examples/customize_embedding.py --token <your_token>

You are supposed to get a download link for this visual prompt embedding in safetensors format. Save it and let's use it for embedding_inference.

Embedding Inference API

With the visual prompt embeddings generated from the previous API. You can use it detect on any images.

  python demo_examples/embedding_inference.py --token <your_token>

4. Local Gradio Demo with API🎨

4.1. Setup

Install T-Rex2 API if you haven't done so

- install gradio and other dependencies
```bash
# install gradio and other dependencies
pip install gradio==4.22.0
pip install gradio-image-prompter

4.2. Run the Gradio Demo

python gradio_demo.py --trex2_api_token <your_token>

4.3. Basic Operations

Draw Box: Draw a box on the image to specify the object to be detected. Drag the left mouse button to draw a box.
Draw Point: Draw a point on the image to specify the object to be detected. Click the left mouse button to draw a point.
Interactive Visual Prompt: Provide visual prompts in boxes or points format on a given image to specify the object to be detected. The Input Target Image and Interactive Visual Prompt Image should be the same
Generic Visual Prompt: Provide visual prompts on multiple reference images and detect on the other image.

5. Related Works

🔥 We release the training and inference code and demo link of DINOv, which can handle in-context visual prompts for open-set and referring detection & segmentation. Check it out!

BibTeX 📚

@misc{jiang2024trex2,
      title={T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy}, 
      author={Qing Jiang and Feng Li and Zhaoyang Zeng and Tianhe Ren and Shilong Liu and Lei Zhang},
      year={2024},
      eprint={2403.14610},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

t-rex's People

Contributors

Stargazers

Watchers

Forkers

jiangfengshaoniancai eamon-cai water1209 zcfrank1st hadryan nstoykov brookswood drcihanyilmaz sivaramana-h-v shepardo thanhpham1987 tslgithub pythoncodeashish westamine dailinyu anminhhung snoopybingo skylning sulasen rentainhe udayzee05 andreped yellowfish82 mru4913 balakreshnan if-only1 rhinojosa xuelainiao danim1236 renzhenwen alialemimatinpour hiyyg yangyang117 lijihhh odoo2055 lasyka toozande kingfener otherbackup kimwoonggon lplzyp mc261670164 eltociear charli117 jakubik2023 tangyiyong kamiorz ranitmukherjee leosapucaia objsgit faisalshahbaz uavlab-slu eos21 lijianqing317 kh-54 qinwentu tavis991 yanxg newledge mbrukman a6delrahman toutua hhy5277 jieyoujun simhaonline orlgln mandarukrulkar arguebahn289 techthiyanes jeaniceheiting193 hersheytarah464 pgodowski finnertymolone565 yangbinb yifengchen9 valteresj2 sailfish009 topper-crypto sorokinvld alexseysua yangkuiwu jqk6 gp-vdc machinegpt xxyqsy vindzory dfb1023 dongfobao khankindle lwppwl lyamin626 m4rm0k kishkath sinet2000 skvortsovvv inwebuz benmfeng saugatach b25723 qiqihuang

t-rex's Issues

请问这个模型适合这种文档分类检测的任务吗？

The release of code

Thanks for your impressive work，When do you expect to release the code？

python demo_examples/interactive_inference.py --token <your_token>

excuse me,when i use your tool like this,
python demo_examples/interactive_inference.py --token <your_token>
what token should be?
i set is as a img dir,and i get the error like this:

raceback (most recent call last):
File "demo_examples/interactive_inference.py", line 53, in
results = trex2.interactve_inference(prompts)
File "/home/cl/devData2/gitclone/T-Rex/trex/model_wrapper.py", line 79, in interactve_inference
image=self.client.upload_file(prompt["prompt_image"]),
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/dds_cloudapi_sdk/client.py", line 67, in upload_file
rsp = requests.post(sign_url, json=data, headers=headers, timeout=2)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/urllib3/connectionpool.py", line 496, in _make_request
conn.request(
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/urllib3/connection.py", line 399, in request
self.putheader(header, value)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/site-packages/urllib3/connection.py", line 313, in putheader
super().putheader(header, *values)
File "/home/cl/devData2/anaconda3/envs/trex/lib/python3.8/http/client.py", line 1229, in putheader
values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 26-27: ordinal not in range(256)

code availability

Dear @Mountchicken ，

Thanks for sharing the awesome work!

Do you have any plans to release the code?

.tif support and exporting the results

Very nice work!
I wonder if you could add .tif support as well as an option for exporting the results. Thanks in advance.

跑偏了。

groundingdino和本文类似，但是在工程运用上，不好落地，推理速度不像yolo-world那般快。transformer不好加速推理。跨模态方式的文本加载方式在加速上处理较为麻烦。

Difference w.r.t GLEE?

Hi! Great work!

It seems a concurrent work, GLEE, is very similar to yours. Some of the reported results in their paper seem to be even better. Can you elaborate the differences or compare the results? Thanks!

How to Edit in Annotation on photo after run the model

First : thanks a lot for your great work.
Second : How to Edit in Annotation on photo after run the model
how can I delete the box in red
and add the rest of boxes.
Third : by any chance , it is possible to reseal desktop version for this awesome project

Querying Coordinate Output Capabilities in T-Rex Project Web Demo

Hello, I greatly appreciate the work you've put into the T-Rex(2) project; it has been very inspiring to me. I am interested in knowing if it is possible to directly obtain the coordinates of the bounding boxes or the center points when using the web demo for testing. Or, would I need to fork the project and run it on my own to achieve this functionality?

Can T-Rex deal with real-time webcam frames provided by OpenCV and only get the instance mask of the obiect within my rectangle?

Hi, @Mountchicken and @spacewalk01 !

I've tried the demo of T-Rex on this website https://deepdataspace.com/playground/ivp, and the result is awsome!

I have three questions about T-Rex which are also written in the screenshot above:

Is it possible to only get the instance mask of the obiect within my rectangle?
Is it compatible with the real-time webcam frames provided by OpenCV, and can it maintain the same intuitive interaction method when dealing with real-time webcam frames?
Can it keep the impressive detection effect even after the object disappears from the real-time video frames and appears again?

Thank you for your wonderful work and looking forward to your reply!

could provide pretrained model for research?

Question about CA44 results

Hi, congratulate for the wonder projects of T-Rex and T-Rex2!

I'd like to know whether the results for CA44 in paper T-Rex were reported on the valid set or the test set? Besides, do you have the specific results for Figure 7 (the values of MAE and NMAE in CA44)? Many thanks!

Is there an onnx model provided

I want to do some development thanks

Is the demo app ML server down? "Request error, please retry (0)"

Hi,

I liked you results so wanted to check it on quite easy machine vision case. But I get request error after clicking "Start>>"

H

License

Hi,
Thanks for releasing this great work! Might it be possible to switch to an open source license?
Thank you!

Release Code

Thanks for your awesome work! When will you release the new version, and will code be available?

Release code

Hello

Congrats, really impressed with your work.

I wanted to ask, is there any plan to release the code and model?

Thanks

API

Dear author:
I have applied for an API. May I ask how long it will take to receive a response

only one class visual prompt for once

for generic visual prompt mode, it only supports one class prompt, right? It is possible to detect different classes simultaneously with multiple prompts for different classes?

ImportError: cannot import name 'BatchPointInfer' from 'dds_cloudapi_sdk'

Thank you for your wonderful work.

I installed the trex according to "Setup".
The following error occurred when running "from dds_cloudapi_sdk import BatchRectInfer, BatchPointInfer, BatchEmbdInfer"：
ImportError: cannot import name 'BatchPointInfer' from 'dds_cloudapi_sdk' (/home/nvidia/miniconda3/envs/demo/lib/python3.8/site-packages/dds_cloudapi_sdk/init.py)

How can I fix it?

Thank you.

code

can you release the training and model code ?

what is inference time?

Hello, I would like to know about inference time per image. And how many images can it infer at once for free?

demo页面请求失败

demo页面https://deepdataspace.com/playground/ivp，request error.

demo result

Requesting customize_embedding api failed without any error message

Using the token obtained through email, I can successfully execute "python demo_examples/generic_inference.py --token <your_token>", but "python demo_examples/customize_embedding.py --token <your_token>" always fails. The error message is as follows, and there is no error message.

bad result on unexpected prompt

Thank you for your wonderful work. I tried your work on some images and got excellent result on common classes. However, I tried to prompt it with uncommon object like 'rock' but it detects wrong objects (common objects on the image):

when we draw a box on object how can we edit this box (zoom , change place , etc..)

How to save annotation information?

Will TRex-1 model file will be provided ?

As I have being following your work from the beginning with the TRex-1 model, all these work are extremely excellent. Is it possible to provide the TRex-1 model file, so that further development will be much easier.

the result is not good

the result is not good, Did I do something wrong in my steps?

Could you show me two test samples in FSC147?

1121.jpg and 7621.jpg are dominant in the final MAE of FSC-147 dataset. For example, there is a gap of MAE=3000/1190 without '1123.jpg'. I cannot achieve a good result using your demo.

Object tracking in video

Hi, I have read in a closed topic (#19 (comment)) that T-Rex 2 can be applied to videos too, but I don't see how in your demo. Could you please explain how to do that?

Additionally, do you have any suggestions on how to use your excellent object detection for object tracking? Could you suggest a package for object tracking maybe using your COCO results on the first frame as input? Thank you!

Interesting project. Will it be open source?

Online Gradio demo is error

Online Gradio demo is error, please check it,thanks.

best wish

tongchangD

Where is the text prompt for T-Rex2?

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
So where is the text prompt for T-Rex2?

Please also check our work

Hi,

Thanks for your interesting project. We published a similar paper for class-agnostic similar object finding: https://simpson-cvpr23.github.io/

Please consider citing our work.

Best,

Textual Prompting examples

Thank you for making this available.

Is it possible to textually prompt the model, I could not find any examples in the API docs or find a way to do it in the online demo?

Cheers

Question about the reported result on FSC147

Thanks for your great work.
I would like to ask if the evaluation results on FSC147 shown in the report are the models trained on the FSC147 dataset? And the model structure is not going to be published?

Hjjj

Will open source training and inference code be available? What time?

about Video counting

Can this technique be used for Video counting？

basic model

May I ask if the basic model uses DINOv

Release the inference code and pre-trained models

First of all, congrats on your great work!
I'm very interested in appling your work into few-shot data labeling, it'll help a lot in reducing annotation cost. I hope that you can release the inference code and pre-trained models.
Again, great work!

Dataset

If you don't mind, can you tell me which dataset you used?

K set of InfoNCE Loss at Region-Level Contrastive Alignment

What do you set K in L_algin?

Visual Prompts only choose once of every different categories? Is this correct?

What did you guys open source? Maybe added this on your docs

I was very frustrated by seeing this
People to get popular posing as opensource but actually they are completely private like re-trex
Hope there will we open trex for everyone

About demo for T-Rex2

Thanks for your great contribution!
I tried sample images on the T-Rex2 model, it works well. However, when I upload images, T-Rex2 cannot detect anything, even just one object (Please refer to the attached image)

Could you please confirm this matter? Or are there any tips I should pay more attention to?
Thank you for your support in advance:)

CA-44 Benchmark

Where can the CA-44 Benchmark be found?