yongliu20 / unilseg Goto Github PK

[CVPR 2024] Official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"

License: MIT License

Python 100.00%

unilseg's Introduction

Universal Segmentation at Arbitrary Granularity with Language Instruction

Yong Liu, Cairong Zhang, Yitong Wang, Jiahao Wang, Yujiu Yang, Yansong Tang

The repository contains the official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"[CVPR 2024]

Paper

📖 Abstract

This paper aims to achieve universal segmentation of arbitrary semantic level. Despite significant progress in recent years, specialist segmentation approaches are limited to specific tasks and data distribution. Retraining a new model for adaptation to new scenarios or settings takes expensive computation and time cost, which raises the demand for versatile and universal segmentation model that can cater to various granularity. Although some attempts have been made for unifying different segmentation tasks or generalization to various scenarios, limitations in the definition of paradigms and input-output spaces make it difficult for them to achieve accurate understanding of content at arbitrary granularity. To this end, we present UniLSeg, a universal segmentation model that can perform segmentation at any semantic level with the guidance of language instructions. For training UniLSeg, we reorganize a group of tasks from original diverse distributions into a unified data format, where images with texts describing segmentation targets as input and corresponding masks are output. Combined with a automatic annotation engine for utilizing numerous unlabeled data, UniLSeg achieves excellent performance on various tasks and settings, surpassing both specialist and unified segmentation models.

📖 Pipeline

We have open-sourced the general inference code and UniLSeg-20 model weights (w/o finetuned on specified task dataset). If you find any bugs due to carelessness on our part in organizing the code, feel free to contact us and point that!

Installation

Install required packages.

conda create -n UniLSeg python=3.7
conda activate UniLSeg
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge -y
pip install -r requirements.txt

Usage

Pretrained Weight
We have provided the pretrained UniLSeg-20 model weights (w/o finetuned on specified task dataset) and other pre-trained backbone weights. Please download them from here and put them under the current path.

General Inference

You can run the general inference by the following command:

python general_inference.py  --img <IMG_PATH> --exp <'EXPRESSION'> --sp <MASK_SAVE_PATH>

Cite

If you find our work helpful, we'd appreciate it if you could cite our paper in your work.

@article{liu2023universal,
  title={Universal Segmentation at Arbitrary Granularity with Language Instruction},
  author={Liu, Yong and Zhang, Cairong and Wang, Yitong and Wang, Jiahao and Yang, Yujiu and Tang, Yansong},
  journal={arXiv preprint arXiv:2312.01623},
  year={2023}
}

unilseg's People

Contributors

Stargazers

Watchers

unilseg's Issues

I get poor result

Is there anything I need to adjust?
Why is the result I get different form yours so much?

About the training code

Hi authors, thanks for your good work.
Whether the training code be released?

代码问题

  if not self.cfg.aux_loss:
      pred = torch.bmm(query_output, pixel_output.flatten(2)) 
      pred = rearrange(pred, 'b l (h w) -> b l h w', h=h, w=w)   
  else:
      for l, q in enumerate(query_output):
          final_output = []
          pred = torch.bmm(query_output[l], pixel_output.flatten(2))
          pred = rearrange(pred, 'b l (h w) -> b l h w', h=h, w=w)
          final_output.append(pred)
  return pred.detach()

请问这里为什么会输出最后pred，其他5个pred得作用是什么呢?
麻烦您解决我的困惑，非常感谢！！！！

请问模型总参数量是多少？

您好，运行模型需要多大显存，CPU运行出错

ask for UniLSeg-100 model weights

would you release the UniLSeg-100 model weights
thanks!

would you release the training code?

Code release

Looking forward to it!

AP for PartSeg

Hi @workforai et al.,

thx for ur cvpr'24 work. for part segmentation, may i ask if the conventional ap metric (apart from iou in the paper) could be reported as well? looking fwd to the code & ckpt. thx & best,

分割图像中的所有物体

注意到在bpe_simple_vocab_16e6.txt.gz文件中没有发现object、all objects等提示词，想分割图像中的所有物体提示词写什么

Evaluation on Semantic and Open-Vocabulary Segmentation

Thank you for your outstanding work! The model seems to be more tailored towards Referring Image Segmentation, and I'm still somewhat confused about testing for Semantic Segmentation (SS) and Open-Vocabulary Segmentation (OVS). Although the paper mentions that "Semantic segmentation and open-vocabulary segmentation can be reformulated as language-guided paradigm by replacing output layers with computing the similarity between visual and linguistic embeddings," the process still appears unclear to me.

From what I understand, the model seems to output a mask by calculating the similarity between the activated visual features and content-aware linguistic embedding. However, I'm unsure how this is evaluated in SS or OVS. Here's my guess:

For example, in Open-Vocabulary Segmentation, for a given image, we need to identify which categories are present (say, M categories). Then, for each category, the similarity calculation is performed between the activated visual features and content-aware linguistic embedding, ultimately outputting M masks. These masks are then merged to create the final semantic segmentation map.

Could you please confirm if this understanding is correct? If not, could you provide more details on how the model operates for these tasks?

Thank you for your assistance!