seanzhuh / seqtr Goto Github PK

View Code? Open in Web Editor NEW

128.0 1.0 14.0 2.68 MB

SeqTR: A Simple yet Universal Network for Visual Grounding

Home Page: https://arxiv.org/abs/2203.16265

Python 99.74% Shell 0.26%

visual-grounding auto-regressive-models

seqtr's People

Contributors

Stargazers

Watchers

Forkers

pengliang-cn luogen1996 genie-kim ccychongyanchen maxlws ping-song tsvanco hatanakashumpei chenwei746 guanrunwei trantorrepository baduncleboy heartfirey lfusst

seqtr's Issues

Source of tokenizer files?

Thanks for your great work!
I am new to VG and want to know the source of servel files (ix_to_token.pkl, token_to_ix.pkl and word_emb.npz) below work_dir/data/annotations/dataset_name/. Do you define these vocabs and embeddings yourself or learn from other works?
Thanks again!

Version for packages?

Dear author,
Could you please kindly share your versions for each of the following packages?
torch, torchvision, mmdet, and mmcv-full

Thank you so much!

ImportError: Cannot import name 'Config' from 'mmcv'

I am trying to run SeqTR with python=3.7, torch=1.13.1, torchvision=0.14.1 . While I run the training script :

python tools/train.py configs/seqtr/segmentation/seqtr_mask_[DATASET_NAME].py --cfg-options ema=True

I got the import error. After looking up into the mmcv's issues, they said mmcv>2.0.0 removes those training-related modules. I tried to downgrade mmcv's version to 1.x but got another error:

AssertionError: MMCV==1.7.1 is used but incompatible. Please install mmcv>=2.0.0rc4, <2.1.0.

So could you tell me what exactly version of python/torch/torchvision/mmcv should I install to solve those problems?

关于AutoRegressiveTransformer错误

作者您好，我配置了这个代码，运行中则在seqtr/core/layers/transformer.py文件下419行出现错误，报错为：RuntimeError: The size of the 2D attn_mask is not correct，也就是416行位置生成的mask出错，请问这如何解决？不知可否告知您之前的配置环境

multi-task

Hi, here are some questions about multi-task:
KeyError: "RefCOCOgUMD: 'GenerateMaskVertices is not in the PIPELINES registry'"
Thank you very much for your project and look forward to more code and configuration for multi-task

mixed datasets

Hi, thanks for the awesome work.
Datasets and most annotations can be normally downloaded following the README. But I did not find mixed in the provided google drive link. Have I missed something? Thanks in advance.

importerror

Thank you so much for publicizing such an excellent project, but I have a problem:
cannot import name 'imshow_expr_bbox' from 'seqtr.core'

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	1.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Pretrained weights for mixed model?

Could anyone share the pre-trained weights for the mixed model?
Thank you so much!

pre-trained + fine-tuned SeqTR segmentation models for RefCOCO and RefCOCO+

Hi, I wonder if you can provide the pre-trained + fine-tuned SeqTR segmentation models for RefCOCO and RefCOCO+. Thanks a lot!

模型权重百度网盘提取码

请问下载模型所需要的百度网盘提取码在哪？

size mismatch for head.transformer.seq_positional_encoding.embedding.weight:

Dear Author,
I am trying to use the model for Refcocog (pre-trained + fine-tuned SeqTR segmentation) and test it on Refcoco dataset and visualize the results.

The code I run is "python tools/inference.py /home/chch3470/SeqTR/configs/seqtr/segmentation/seqtr_segm_refcoco-unc.py "/home/chch3470/SeqTR/work_dir/segm_best.pth" --output-dir="/home/chch3470/SeqTR/attention_map_output" --with-gt --which-set="testA"
"

I meet the error below. Do you have any idea why it happens? Is Refcocog (pre-trained + fine-tuned SeqTR segmentation) based on yolo or darknet? If it is based on yolo, what configs should we use? Also, should we change the vis_encs(currently the codebase only provides darknet.py for vis_encs)?

I can visualize the provided models for detection tasks so I guess I know the basic setups...

RuntimeError: Error(s) in loading state_dict for SeqTR:
size mismatch for lan_enc.embedding.weight: copying a param with shape torch.Size([12692, 300]) from checkpoint, the shape in current model is torch.Size([10344, 300]).
size mismatch for head.transformer.seq_positional_encoding.embedding.weight: copying a param with shape torch.Size([25, 256]) from checkpoint, the shape in current model is torch.Size([37, 256]).

Visualization

Hi,

Congratulation!

I want to visualize the attention weights of segmentation points similar to Fig. 5.

According to the paper: "We visualize the cross attention map averaged over decoder layers and attention heads in Fig. 5.", but I am not sure how to incorporate these weights into the original image.

Would you like to share the script or provide a workable idea?

Thanks~

Pretrained model

Great work! Could you pls provide the pre-trained model? Thanks!

setting "is_crowd = 1" for multiple masks/ polygons resulting in inaccurate evaluation?

Hi, thanks for sharing the great work. I have a question about the is_crowd flag. Why do you need to set it to 1 for multiple masks/ polygons when loading the data?
https://github.com/sean-zhuh/SeqTR/blob/36f74bb9da4bcf81775f9f3bb3e54b170860c536/seqtr/datasets/pipelines/loading.py#L126

If it looks like if is_crowd=1, the IoU computation from pycocotool will use a modified criterion that considers the union of gt_mask and pred_mask as pred_mask alone, resulting in a higher number than the standard IoU definition.

https://github.com/sean-zhuh/SeqTR/blob/36f74bb9da4bcf81775f9f3bb3e54b170860c536/seqtr/apis/test.py#L19

(See the note in pycocotool
https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/mask.py#L65)

Do I understand this correctly? Thanks for your help!

Model weights cannot be downloaded :(

Hi, thanks for sharing your awesome work!
I am trying to download your model weights but the baidu website keeps failing.
Could you provide the weights via a different platform, preferably google drive?

Thanks in advance!

SeqTR model parameter count

How many parameters does the SeqTR model have? Maybe that number can be added to the Readme.

Memory and BatchSize

Hi, thanks for the wonderful work.
I am curious about why SeqTR is so memory-efficient. As shown in the config file, SeqTR is trained with a batch size of 128 on a single 32GB GPU! However, for object detectors like DETR, the batch size on each GPU is quite limited. Could you please give some insights about this? Thanks in advance.

Meeting a bug in "./seqtr/api/train.py" , 94 line, the accuracy function.

Thank author for prviding clear code. When the model trained by use "python tools/train.py configs/seqtr/segmentation/seqtr_mask_[DATASET_NAME].py --cfg-options ema=True", the accuracy function only receives 3 return values, and this cause the training failed. According to post code, the "batch_ie" isn't a significant parameter. It seems a reminder, so I delete the code about "batch_iz" in "./seqtr/api/train.py" that it can work well. Could author gives a description about "batch_ie"?
It would be nice if the author provided weights trained for the model. Thank you!

multi-task的配置文件

作者您好，我正在研读您的code，目前有两个问题存在一些疑问。

1.请问multi-task的训练是detection和segmentation两个任务统一训练吗？还是需要分开训呢？

2.在multi-task的配置文件中，比如 configs/seqtr/multi-task/seqtr_multi-task_refcocog-google.py，其中需要到 '../../base/datasets/multi-task/refcocog-google.py' 的配置文件，但在本项目中没有给出，请问作者会公开这部分的配置吗？或者您可以告诉我该如何更改配置吗？

Training with 16GB

Thanks for your sharing code.
I have 8 GPUs with 16 GB each. How can I train and evaluate SeqTR?

inference api

Thanks for sharing this great work!

How to perform single image and single query inference? I only see scripts for batch testing and training.

Bests

Errors in finetuning

After completing pre-training, I finetuned to refcoco-unc and found the following error messages
File "SeqTR/seqtr/utils/checkpoint.py", line 57, in load_pretrained_checkpoint
state, ema_state = ckpt['state_dict'], ckpt['ema_state_dict']
KeyError: 'ema_state_dict'
Even after fixing this bug, I still found many bugs (e.g. lan_enc.embedding.weight, model.head) in load_pretrained_checkpoint().
Can you please check it?

指标对不上

你好，train_log里的valid的指标和单独跑test.py使用val的数据集出来的指标好像对不上

About ReferitGame test ACC

size mismatch for lan_enc.embedding.weight

Hello , Thanks for your work.
I was trying to test this model on a dataset by using this command : python tools/test.py configs/seqtr/detection/seqtr_det_refcoco-unc.py --load-from data/weights/weight_2/det_best.pth --cfg-options ema=True , I'm using the RefCOCO (pre-trained + fine-tuned SeqTR detection) weight file for this.
But I keep getting this error :

RuntimeError: Error(s) in loading state_dict for SeqTR:
        size mismatch for lan_enc.embedding.weight: copying a param with shape torch.Size([10344, 300]) from checkpoint, the shape in current model is torch.Size([27, 300]).

Any ideas ?

inconsistency error

Hello, may I ask how the inconsistency error between two separate tasks in Table 6 of the article was calculated.

ImportError: cannot import name 'imshow_expr_bbox' from 'seqtr.core' (...../SeqTR/seqtr/core/init.py)

Hi!
The following two functions imshow_expr_bbox, imshow_expr_mask are called in seqtr/apis/inference.py https://github.com/sean-zhuh/SeqTR/blob/36f74bb9da4bcf81775f9f3bb3e54b170860c536/seqtr/apis/inference.py#L6`
But I can't find them from seqtr.core.
Am I missing anything?

Thanks so much for your help!

Customized dataset?

Hi, thanks for the awesome work.
Could I ask how could we obtain the token_to_ix.pkl, ix_to_token.pkl, and the word_emb.npz to generate customized dataset?
Thank you so much!

modify the modules in the SEQTR library

Hello, how to modify the modules in the SEQTR library?

pretrained model

a great job,can you provide the pretrained model?

weights for flickr30k?

flickr30k's word_emb is of [14746,300] shape, but downloaded weight if of [10344,300] shape. can you provide the former's weight?

Seq_in的边界值问题

作者您好，请教您一个问题:
seq_in[seq_in != self.end].clamp_(min=0, max=self.end-1)
这句code会将目标bbox的左上角和右下角坐标做一个最大最小值的约束，前提是seq_in != self.num_bin (eg: self.end=1000)，如果碰到刚好seq_in == self.end的情况该怎么办呢？
即比如seq_in = [806, 59, 1000, 233], self.end=1000, 那么执行上述code时，1000会被过滤掉，不进行约束。同时这是不是就与targets label [X1，Y1，X2，Y2，1000]冲突了，这该怎么解决呢？

麻烦作者有空解答一下，万分感谢！

关于SeqTr在VG上的performance

您好，我注意到你的Tab1和Tab2是在与VG的model进行对比，这里有些问题想请问一下：

您的Tab1的SeqTR (ours)不带星号的结果是在各自的dataset上train的吗，还是在多个dataset上一起train的结果呢？
我注意到您给的annotations文件里好像少了大量的training sample，可能是RES的annotation。例如gref_umd的training只有42226个sample，而VG任务里该dataset的training sample应该有80k+个。而且，我看了一下您的train_log（在SeqTR detection 81.23 85.00 76.08 model & log的log里），我发现您的iter数量是300+，batchsize是128setting下，总的training sample确实是40k。所以您的模型在REC的任务上，少于TransVG一半的训练数据吗？

期待您的解答！

中文训练

作者您好，请问您的模型可以支持中文输入的REC任务吗