Giter Club home page Giter Club logo

vita's Introduction

VITA: Video Instance Segmentation via Object Token Association (NeurIPS 2022)

PWC
PWC
PWC

Miran Heo*, Sukjun Hwang*, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim (*equal contribution)

[arXiv] [BibTeX]


Updates

  • Jan 20, 2023: Our new online VIS method "GenVIS" is available at here!
  • Sep 14, 2022: VITA is accepted to NeurIPS 2022!
  • Aug 15, 2022: Code and pretrained weights are now available! Thanks for your patience :)

Installation

See installation instructions.

Getting Started

We provide a script train_net_vita.py, that is made to train all the configs provided in VITA.

To train a model with "train_net_vita.py" on VIS, first setup the corresponding datasets following Preparing Datasets for VITA.

Then run with COCO pretrained weights in the Model Zoo:

python train_net_vita.py --num-gpus 8 \
  --config-file configs/youtubevis_2019/vita_R50_bs8.yaml \
  MODEL.WEIGHTS vita_r50_coco.pth

To evaluate a model's performance, use

python train_net_vita.py \
  --config-file configs/youtubevis_2019/vita_R50_bs8.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

Model Zoo

Pretrained weights on COCO

Name R-50 R-101 Swin-L
VITA model model model

YouTubeVIS-2019

Name Backbone AP AP50 AP75 AR1 AR10 Download
VITA R-50 49.8 72.6 54.5 49.4 61.0 model
VITA Swin-L 63.0 86.9 67.9 56.3 68.1 model

YouTubeVIS-2021

Name Backbone AP AP50 AP75 AR1 AR10 Download
VITA R-50 45.7 67.4 49.5 40.9 53.6 model
VITA Swin-L 57.5 80.6 61.0 47.7 62.6 model

OVIS

Name Backbone AP AP50 AP75 AR1 AR10 Download
VITA R-50 19.6 41.2 17.4 11.7 26.0 model
VITA Swin-L 27.7 51.9 24.9 14.9 33.0 model

License

The majority of VITA is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), Mask2Former(MIT License), and Deformable-DETR(Apache-2.0 License).

Citing VITA

If you use VITA in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{VITA,
  title={VITA: Video Instance Segmentation via Object Token Association},
  author={Heo, Miran and Hwang, Sukjun and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgement

Our code is largely based on Detectron2, IFC, Mask2Former, and Deformable DETR. We are truly grateful for their excellent work.

vita's People

Contributors

miranheo avatar sukjunhwang avatar

Stargazers

Jinyu Liu avatar  avatar Shumpei Kobayashi avatar Yufu avatar  avatar Yunguan Fu avatar Jiho Choi avatar Jeongwhan Choi avatar  avatar Chenhao Xu avatar  avatar Castria avatar Zuka avatar Mingqi Gao avatar Long Qian avatar Mijin Koo avatar  avatar  avatar Matt Shaffer avatar  avatar cz avatar  avatar  avatar Jiajun Chen avatar Pujian Lai avatar QIN QI avatar yahooo avatar Vinson avatar Daekyu Kwon avatar  avatar  avatar XING Zhenghao avatar Xiao Hu avatar Andrii Zadaianchuk avatar Minseong Kim avatar Qing Zhong avatar Vateye avatar  avatar Joon-Young avatar Wangbo Zhao(明先生) avatar  avatar  avatar Guo Pinxue avatar Robert Luo avatar Erika Lu avatar  avatar Feng Chen avatar Hanoona Rasheed avatar  avatar Shallow avatar Antinis avatar  avatar YimingCui avatar Tang avatar lslrh avatar LI Minghan avatar sanjiaohao avatar Pu Cao avatar Ren Tianhe avatar ZhangZhengHao avatar Jeongseok Oh avatar  avatar Eren, Elif avatar Lu Qi avatar Kaining Ying avatar haochen wang avatar Lawrence Chen avatar Pengxiang Li avatar DS.Xu avatar CarlHuang avatar Sejong Yang avatar qwertyuiop avatar Liew Jun Hao avatar Jiangpengtao avatar Tianheng Cheng avatar Bencheng avatar Hanjung Kim avatar zhangtao avatar Jaehyun Kang avatar Jiaxin avatar HYUN Jeongseok avatar  avatar Knight avatar Yi Jiang avatar Víctor Uceda avatar  avatar  avatar Hao Lee avatar 爱可可-爱生活 avatar Javayss avatar Huang Haiduo avatar Wenhe Jia avatar Researcher.YuanYuhui avatar Yuechuan Pu avatar Hongje Seong avatar NingYuanxiang avatar Yuxin Fang (方羽新) avatar

Watchers

 avatar Jiaying Lin avatar LI Wentong avatar Ko Sung avatar sanjiaohao avatar Nguyen Truong Hai avatar Matt Shaffer avatar

vita's Issues

Dateset prepare

Why "STEP-2: Prepare annotations for combined data"? Why do we need to use the COCO dataset again for the second round of fine-tuning, even though it has already been used for pre-training?

self.training when calling swin attention in object encoder

In vita.py of implementation encode_frame_query, why the logic is if self.training or layer_idx %2 == 0, since i think adding self.training will always direct to vanilla window_attn during training, which will make swin attention ineffective. Although all configs files use clip size of 6 and window size 6, this seems wrong to me. Can you explain?

Some training details.

I have some problems about excellent VITA.

  1. How about train without COCO joint? How much performance improvement brings by COCO joint?
  2. The performance drops about 10 points when freeze detector. However, in GenVIS, the performance seems well with m2f frozen.

Error in detectron2

I installed the environment completely according to your requirements, but there was a problem during training:

Traceback (most recent call last):
File "/home/fanghao/anaconda3/envs/vita/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/fanghao/detectron2/detectron2/engine/launch.py", line 126, in _distributed_worker
main_func(*args)
File "/home/fanghao/VITA/train_net_vita.py", line 306, in main
return trainer.train()
File "/home/fanghao/detectron2/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/home/fanghao/detectron2/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/home/fanghao/detectron2/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/home/fanghao/detectron2/detectron2/engine/train_loop.py", line 421, in run_step
with autocast(dtype=self.precision):
TypeError: init() got an unexpected keyword argument 'dtype'

I can train after deleting “dtype=self. precision” in detectron2/detectron2/engine/train_loop.py, but it's not clear if this will have an impact. I suspect that the problem is caused by the difference between the author's current version and the current version of detectron2. I suggest you run the project again under the latest detectron2 to verify whether you will encounter the same problems as me.
If you have the same problem, I hope you can update your code to adapt to the latest detectron2, or embed your current detectron2 version into the VITA project, similar to the VNext project. Of course, it is OK to simply delete "dtype=self. precision".

thank you!

OVIS Dataset Error

I successfully train and inference on youtubevis2019 and 2021, but it fails on OVIS. There is no error in the OVIS training process, but an error is reported when the last pth file is generated and reasoned, the following is my error:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
  File "/anaconda3/envs/vita/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/detectron2/detectron2/engine/launch.py", line 123, in _distributed_worker
    main_func(*args)
  File "/VITA/train_net_vita.py", line 301, in main
    return trainer.train()
  File "/detectron2/detectron2/engine/defaults.py", line 484, in train
    super().train(self.start_iter, self.max_iter)
  File "/detectron2/detectron2/engine/train_loop.py", line 165, in train
    self.after_train()
  File "/detectron2/detectron2/engine/train_loop.py", line 174, in after_train
    h.after_train()
  File "/detectron2/detectron2/engine/hooks.py", line 561, in after_train
    self._do_eval()
  File "/detectron2/detectron2/engine/hooks.py", line 529, in _do_eval
    results = self._func()
  File "/detectron2/detectron2/engine/defaults.py", line 453, in test_and_save_results
    self._last_eval_results = self.test(self.cfg, self.model)
  File "/VITA/train_net_vita.py", line 235, in test
    data_loader = cls.build_test_loader(cfg, dataset_name)
  File "/VITA/train_net_vita.py", line 119, in build_test_loader
    return build_detection_test_loader(cfg, dataset_name, mapper=mapper)
  File "/detectron2/detectron2/config/config.py", line 207, in wrapped
    explicit_args = _get_args_from_config(from_config, *args, **kwargs)
  File "/detectron2/detectron2/config/config.py", line 245, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
  File "/VITA/vita/data/build.py", line 199, in _test_loader_from_config
    dataset = get_detection_dataset_dicts(
  File "/VITA/vita/data/build.py", line 92, in get_detection_dataset_dicts
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
  File "/VITA/vita/data/build.py", line 92, in <listcomp>
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
  File "/detectron2/detectron2/data/catalog.py", line 58, in get
    return f()
  File "/VITA/vita/data/datasets/ytvis.py", line 294, in <lambda>
    DatasetCatalog.register(name, lambda: load_ytvis_json(json_file, image_root, name))
  File "/VITA/vita/data/datasets/ytvis.py", line 156, in load_ytvis_json
    ytvis_api = YTVOS(json_file)
  File "/VITA/vita/data/datasets/ytvis_api/ytvos.py", line 63, in __init__
    self.createIndex()
  File "/VITA/vita/data/datasets/ytvis_api/ytvos.py", line 71, in createIndex
    for ann in self.dataset['annotations']:
TypeError: 'NoneType' object is not iterable

Please author help me, thanks!

COCO pretrained weight

How to get the COCO pretrained weights, such as vita_r50_coco.pth?
Does it directly convert the pkl file provided by the mask2former project into the pth file?
Or train the whole VITA on coco? If so, why should we train coco again when training youtube-vis?

ytvis_2019 submit error

When I submit the result. json file derived from the author's model and my own trained model to Codalab( https://codalab.lisn.upsaclay.fr/competitions/7682#participate -submit_ Results), the following error occurred:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Running command git clone -q https://github.com/youtubevos/cocoapi.git /tmp/pip-install-rtgaknkx/pycocotools
Traceback (most recent call last):
File "/tmp/codalab/tmppYOyAn/run/program/evaluate.py", line 68, in
res = gts.loadRes(submit_file)
File "/opt/conda/lib/python3.7/site-packages/pycocotools/ytvos.py", line 217, in loadRes
anns = json.load(open(resFile))
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/codalab/tmppYOyAn/run/input/res/results.json'

How to use multiple GPUs for inference?

I try to test a sequence with 399 frames by demo.py, but there is an error like follows
RuntimeError: CUDA out of memory. Tried to allocate 23.72 GiB (GPU 0; 23.70 GiB total capacity; 595.83 MiB already allocated; 20.43 GiB free; 1.63 GiB reserved in total by PyTorch)
I want to know how can I use multi-GPUs?Thanks.

Demo

Thanks for sharing your work. can you provide the correct commend to test the model on image and video please. I test with:

python demo_vita/demo.py --video-input clips/0d40b015f4.mp4 --output out/

but I just have a video without segmentation and this line in the terminal:

[ERROR:[email protected]] global obsensor_uvc_stream_channel.cpp:156 getStreamChannelGroup Camera index out of range

Thanks in advance.

about data preparation

Dear Sir,

Thanks for your outstanding work!

I would like to know how to find convert_coco2ytvis.py file, I did not see it in your repo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.