sukjunhwang / vita Goto Github PK

View Code? Open in Web Editor NEW

97.0 7.0 11.0 279 KB

VITA: Video Instance Segmentation via Object Token Association (NeurIPS 2022)

License: Apache License 2.0

Python 89.47% Shell 0.11% C++ 1.04% Cuda 9.38%

vita's Introduction

VITA: Video Instance Segmentation via Object Token Association (NeurIPS 2022)

Miran Heo^*, Sukjun Hwang^*, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim (*equal contribution)

[arXiv] [BibTeX]

Updates

Jan 20, 2023: Our new online VIS method "GenVIS" is available at here!
Sep 14, 2022: VITA is accepted to NeurIPS 2022!
Aug 15, 2022: Code and pretrained weights are now available! Thanks for your patience :)

Installation

See installation instructions.

Getting Started

We provide a script train_net_vita.py, that is made to train all the configs provided in VITA.

To train a model with "train_net_vita.py" on VIS, first setup the corresponding datasets following Preparing Datasets for VITA.

Then run with COCO pretrained weights in the Model Zoo:

python train_net_vita.py --num-gpus 8 \
  --config-file configs/youtubevis_2019/vita_R50_bs8.yaml \
  MODEL.WEIGHTS vita_r50_coco.pth

To evaluate a model's performance, use

python train_net_vita.py \
  --config-file configs/youtubevis_2019/vita_R50_bs8.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

Model Zoo

Pretrained weights on COCO

Name	R-50	R-101	Swin-L
VITA	model	model	model

YouTubeVIS-2019

Name	Backbone	AP	AP50	AP75	AR1	AR10	Download
VITA	R-50	49.8	72.6	54.5	49.4	61.0	model
VITA	Swin-L	63.0	86.9	67.9	56.3	68.1	model

YouTubeVIS-2021

Name	Backbone	AP	AP50	AP75	AR1	AR10	Download
VITA	R-50	45.7	67.4	49.5	40.9	53.6	model
VITA	Swin-L	57.5	80.6	61.0	47.7	62.6	model

OVIS

Name	Backbone	AP	AP50	AP75	AR1	AR10	Download
VITA	R-50	19.6	41.2	17.4	11.7	26.0	model
VITA	Swin-L	27.7	51.9	24.9	14.9	33.0	model

License

The majority of VITA is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), Mask2Former(MIT License), and Deformable-DETR(Apache-2.0 License).

Citing VITA

If you use VITA in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{VITA,
  title={VITA: Video Instance Segmentation via Object Token Association},
  author={Heo, Miran and Hwang, Sukjun and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgement

Our code is largely based on Detectron2, IFC, Mask2Former, and Deformable DETR. We are truly grateful for their excellent work.

vita's People

Contributors

Stargazers

Watchers

Forkers

ashhyun kang-jaehyun abhineet123 wangbo-zhao xihuachen9804 coralexbadea tranganhthuan wenhe-jia haochenheheda zz-zzu

vita's Issues

Dateset prepare

Why "STEP-2: Prepare annotations for combined data"? Why do we need to use the COCO dataset again for the second round of fine-tuning, even though it has already been used for pre-training?

Looking forward to your excellent work open source.

self.training when calling swin attention in object encoder

In vita.py of implementation encode_frame_query, why the logic is if self.training or layer_idx %2 == 0, since i think adding self.training will always direct to vanilla window_attn during training, which will make swin attention ineffective. Although all configs files use clip size of 6 and window size 6, this seems wrong to me. Can you explain?

About "Maximum number of frames" in Table 4.

Hi @sukjunhwang Excellent work!

I am curious how the "Max Frames" in Table 4 is calculated? How were these values obtained: 2677,1392,741?

How to get the train.json, val.json of ytvis_2021?

FileNotFoundError: [Errno 2] No such file or directory: '/media/wuhan/disk1/dataset/ytvis_2021/train.json'

Some training details.

I have some problems about excellent VITA.

How about train without COCO joint? How much performance improvement brings by COCO joint?
The performance drops about 10 points when freeze detector. However, in GenVIS, the performance seems well with m2f frozen.

Error in detectron2

I installed the environment completely according to your requirements, but there was a problem during training：

Traceback (most recent call last):
File "/home/fanghao/anaconda3/envs/vita/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/fanghao/detectron2/detectron2/engine/launch.py", line 126, in _distributed_worker
main_func(*args)
File "/home/fanghao/VITA/train_net_vita.py", line 306, in main
return trainer.train()
File "/home/fanghao/detectron2/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/home/fanghao/detectron2/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/home/fanghao/detectron2/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/home/fanghao/detectron2/detectron2/engine/train_loop.py", line 421, in run_step
with autocast(dtype=self.precision):
TypeError: init() got an unexpected keyword argument 'dtype'

I can train after deleting “dtype=self. precision” in detectron2/detectron2/engine/train_loop.py, but it's not clear if this will have an impact. I suspect that the problem is caused by the difference between the author's current version and the current version of detectron2. I suggest you run the project again under the latest detectron2 to verify whether you will encounter the same problems as me.
If you have the same problem, I hope you can update your code to adapt to the latest detectron2, or embed your current detectron2 version into the VITA project, similar to the VNext project. Of course, it is OK to simply delete "dtype=self. precision".

thank you！

How to get the AP for youtube_vis?

@sukjunhwang

how to visualize the segmentation result and generate the bitmask png for trained model?

OVIS Dataset Error

I successfully train and inference on youtubevis2019 and 2021, but it fails on OVIS. There is no error in the OVIS training process, but an error is reported when the last pth file is generated and reasoned, the following is my error:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
  File "/anaconda3/envs/vita/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/detectron2/detectron2/engine/launch.py", line 123, in _distributed_worker
    main_func(*args)
  File "/VITA/train_net_vita.py", line 301, in main
    return trainer.train()
  File "/detectron2/detectron2/engine/defaults.py", line 484, in train
    super().train(self.start_iter, self.max_iter)
  File "/detectron2/detectron2/engine/train_loop.py", line 165, in train
    self.after_train()
  File "/detectron2/detectron2/engine/train_loop.py", line 174, in after_train
    h.after_train()
  File "/detectron2/detectron2/engine/hooks.py", line 561, in after_train
    self._do_eval()
  File "/detectron2/detectron2/engine/hooks.py", line 529, in _do_eval
    results = self._func()
  File "/detectron2/detectron2/engine/defaults.py", line 453, in test_and_save_results
    self._last_eval_results = self.test(self.cfg, self.model)
  File "/VITA/train_net_vita.py", line 235, in test
    data_loader = cls.build_test_loader(cfg, dataset_name)
  File "/VITA/train_net_vita.py", line 119, in build_test_loader
    return build_detection_test_loader(cfg, dataset_name, mapper=mapper)
  File "/detectron2/detectron2/config/config.py", line 207, in wrapped
    explicit_args = _get_args_from_config(from_config, *args, **kwargs)
  File "/detectron2/detectron2/config/config.py", line 245, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
  File "/VITA/vita/data/build.py", line 199, in _test_loader_from_config
    dataset = get_detection_dataset_dicts(
  File "/VITA/vita/data/build.py", line 92, in get_detection_dataset_dicts
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
  File "/VITA/vita/data/build.py", line 92, in <listcomp>
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
  File "/detectron2/detectron2/data/catalog.py", line 58, in get
    return f()
  File "/VITA/vita/data/datasets/ytvis.py", line 294, in <lambda>
    DatasetCatalog.register(name, lambda: load_ytvis_json(json_file, image_root, name))
  File "/VITA/vita/data/datasets/ytvis.py", line 156, in load_ytvis_json
    ytvis_api = YTVOS(json_file)
  File "/VITA/vita/data/datasets/ytvis_api/ytvos.py", line 63, in __init__
    self.createIndex()
  File "/VITA/vita/data/datasets/ytvis_api/ytvos.py", line 71, in createIndex
    for ann in self.dataset['annotations']:
TypeError: 'NoneType' object is not iterable

Please author help me, thanks!

Dear author, I did not find the fps data of VITA in your paper. Can you tell us the fps in different sizes?

COCO pretrained weight

How to get the COCO pretrained weights, such as vita_r50_coco.pth?
Does it directly convert the pkl file provided by the mask2former project into the pth file?
Or train the whole VITA on coco? If so, why should we train coco again when training youtube-vis?

ytvis_2019 submit error

When I submit the result. json file derived from the author's model and my own trained model to Codalab（ https://codalab.lisn.upsaclay.fr/competitions/7682#participate -submit_ Results), the following error occurred:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Running command git clone -q https://github.com/youtubevos/cocoapi.git /tmp/pip-install-rtgaknkx/pycocotools
Traceback (most recent call last):
File "/tmp/codalab/tmppYOyAn/run/program/evaluate.py", line 68, in
res = gts.loadRes(submit_file)
File "/opt/conda/lib/python3.7/site-packages/pycocotools/ytvos.py", line 217, in loadRes
anns = json.load(open(resFile))
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/codalab/tmppYOyAn/run/input/res/results.json'

How to use multiple GPUs for inference?

I try to test a sequence with 399 frames by demo.py, but there is an error like follows
RuntimeError: CUDA out of memory. Tried to allocate 23.72 GiB (GPU 0; 23.70 GiB total capacity; 595.83 MiB already allocated; 20.43 GiB free; 1.63 GiB reserved in total by PyTorch)
I want to know how can I use multi-GPUs？Thanks.

Demo

Thanks for sharing your work. can you provide the correct commend to test the model on image and video please. I test with:

python demo_vita/demo.py --video-input clips/0d40b015f4.mp4 --output out/

but I just have a video without segmentation and this line in the terminal:

[ERROR:[email protected]] global obsensor_uvc_stream_channel.cpp:156 getStreamChannelGroup Camera index out of range

Thanks in advance.

about data preparation

Dear Sir,

Thanks for your outstanding work!

I would like to know how to find convert_coco2ytvis.py file, I did not see it in your repo.

sukjunhwang / vita Goto Github PK

vita's Introduction

VITA: Video Instance Segmentation via Object Token Association (NeurIPS 2022)

Updates

Installation

Getting Started

Model Zoo

Pretrained weights on COCO

YouTubeVIS-2019

YouTubeVIS-2021

OVIS

License

Citing VITA

Acknowledgement

vita's People

Contributors

Stargazers

Watchers

Forkers

vita's Issues

Recommend Projects

Recommend Topics

Recommend Org