Giter Club home page Giter Club logo

afb-urr's Introduction

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

This repository is the official implementation of Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement (NeurIPS 2020). It is designed for semi-supervised video object segmentation (VOS) task.

[NeurIPS Page] [Paper] [Supplementary]

Paper corrections: Our feature map generated by the encoders has 1024 channels and 1/16 of the original image size.

1. Requirements

We built and tested the repository on Python 3.6.9 and Ubuntu 18.04 with one NVIDIA 1080Ti card (11GB Memory). Run on Windows or Mac is possible with minor modifications. An NVIDIA GPU card and CUDA environment are required. To install requirements, run:

pip3 install -r requirements.txt

Install the package torch_scatter by the official instructions. Our version is 2.0.4.

2. Evaluation

DAVIS17-TrainVal

  1. Download and extract DAVIS17-TrainVal dataset.
  2. Download the pretrained DAVIS17 checkpoint.
  3. run:
python3 eval.py --level 1 --resume /path/to/checkpoint.pth/ --dataset /path/to/dir/

To reproduce the segmentation scores, you can use the official evaluation tool from DAVIS benchmark.

YouTube-VOS18

  1. Download and extract YouTube-VOS18 dataset.
  2. Download the pretrained YouTube-VOS18 checkpoint.
  3. run:
python3 eval.py --level 2 --resume /path/to/checkpoint.pth/ --dataset /path/to/dir/ --update-rate 0.05

Attention: Directly submit our results to the YouTube-VOS codalab for evaluation will pollute the leader board. We encourage you to submit your own results.

Long Videos

  1. Download and extract Long Videos dataset.
  2. Download the pretrained YouTube-VOS18 checkpoint above.
  3. run:
python3 eval.py --level 3 --resume /path/to/checkpoint.pth/ --dataset /path/to/dir/ --update-rate 0.05

To reproduce the segmentation scores, you can use the same tool from the DAVIS benchmark.

Your Own Video

Prepare your video frames and the first frame annotation following the data structure of the long videos page. You can see the data structure without download it and you only need to provide the first frame annotation.

Run the same parameters as the long videos setting.

Options for Evaluation

  1. --gpu: GPU id to run (default: 0).
  2. --viz: Enable output overlays along with the estimated masks (default: False).
  3. --budget: The number of features that can be stored in total (default: 300000 for 1080Ti).

By default, the segmentation results will be saved in ./output.

3. Training

Pre-training on Static Images

  1. Download the following the datasets (COCO is the largest one). You don't have to download all, our pretrain codes skip datasets that don't exist by default.
  2. Run unify_pretrain_dataset.py to convert them into a uniform format (followed DAVIS).
python3 unify_pretrain_dataset.py --name NAME --src /path/to/dataset/dir/ --dst /path/to/output
  1. MSRA10K: --name MSRA10K
  2. ECSSD: --name ECSSD
  3. PASCAL-S: --name PASCAl-s
  4. PASCAL VOC2012: --name PASCALVOC2012
  5. COCO: --name COCO. API pycocotools is required.

You may need minor modifications in the dataset path. Descriptions of useful options,

  1. --palette: Path to the palette image. We provide a template in assets/mask_palette.png, followed the formats of DAVIS17.
  2. --workder: The parallel threads number to accelerate the procedures (Default: 20).

After the conversion process, you can start pre-training the model:

python3 train.py --level 0 --dataset /path/to/pretrain/ --lr 1e-5 --scheduler-step 3 --total-epoch 12 --log

Pre-training process may takes days to weeks, you can download our checkpoint to save time.

Training on DAVIS17

Download the semi-supervised TrainVal 480p from the DAVIS website. Run

python3 train.py --level 1 --new --resume /path/to/PreTrain/checkpoint.pth --dataset /path/to/DAVIS17/ --lr 4e-6 --scheduler-step 200 --total-epoch 1000 --log

Training on YouTube-VOS

Download training set of the YouTube-VOS dataset. Run

python3 train.py --level 2 --new --resume /path/to/PreTrain/checkpoint.pth --dataset /path/to/YouTubeVOS/train --lr 4e-6 --scheduler-step 30 --total-epoch 150 --log

4. License

This repository is released for academic use only. If you want to use our codes for commercial products, please contact [email protected] in advance. If you use our codes, please cite our paper,

@inproceedings{NEURIPS2020_liangVOS,
 author = {Liang, Yongqing and Li, Xin and Jafari, Navid and Chen, Jim},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {3430--3441},
 publisher = {Curran Associates, Inc.},
 title = {Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement},
 url = {https://proceedings.neurips.cc/paper/2020/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf},
 volume = {33},
 year = {2020}
}

5. Update Logs

  • 2022/04/24 Update the evaluation script for long video benchmark.

afb-urr's People

Contributors

xmlyqing00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

afb-urr's Issues

_version_cuda.so: undefined symbol:

Do you know what caused this problem when I was running your code?
OSError: /GPUFS/.conda/envs/tor17/lib/python3.6/site-packages/torch_scatter/_version_cuda.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationESs

static images datasets

Is all static images datasets for Pre-training the training set or both training and val set?
Is static images datasets COCO for Pre-training the coc2014 training set or coco2017 training set.

About long time video validation

Hi,

Thank you for your nice work! I found the long time video dataset here.

Please tell me how to evaluate the results? Because the Annotation file is not complete.

out of memory

Are there any Settings to reduce memory consumption?When I run with a 2080ti, I always report running out of memory.Still think your code doesn't support multiple GPUs?

ValueError: num_samples should be a positive integer value, but got num_samples=0

Hi,

Ok so, I'm trying to train a custom dataset, either from scratch or from the checkpoint.
When I run:
python train.py --level 0 --dataset dataset_path\ --seed 1 --lr 1e-5 --scheduler-step 3 --new --total-epoch 12 --log --budget 1873920

OR:

python train.py --level 0 --dataset dataset_path\ --seed 1 --lr 1e-5 --scheduler-step 3 --resume checkpoint_path\checkpoint.pth --total-epoch 12 --log --budget 1873920

I get the ValueError mentioned in the title.
I'm guessing this is because it's looking for COCO, ECSSD, MSRA10k, PASCAL-S or PASCALVOC2021 and is only getting my custom dataset, but I would've thought it should still work given the file directories are set out as required.

My dataset_path folder has an Annotations folder, a ImageSets folder and a JPEGImages folder and then inside each of those are folders 0, 1 and 2 with the rest of the files contained in those just like the DAVIS dataset is set out.

What have I done wrong? Really looking forward to using this algorithm :)

A difference between your paper and code.

In your paper, you refer to the resolution of backbone output from resnet.layer3, which output stride is 8. But in your code, you don't change original resnet50's stride, leading to a 16 of output stride.

Wrong Directory Name

2 bugs in unify_pretrain_dataset.py

1, For MSRA10K
line 67-68, the directory name 'MSRA10K_Imgs_GT' is misspelled as 'MARA10K_Imgs_GT'

2, For COCO
line 167, 178, the directory name for the annotations in the downloaded file (2021.12.27) is 'annotations 2' instead of 'annotations'

No such file called dataset.txt!

Is this "dataset.txt " created by unify_pretrain_dataset.py or other ".py" file you didn't release?
PS: glob is built in package in python, no need to state in "requirements.txt".

Now, I'm fine. The "dataset.txt " contains dataset names like this

COCO
ECSSD
MSRA10K
PASCAL-S
PASCALVOC2012

about q and p

image

Hi, thanks for your great work! What's the difference between p and q here?What do q and p stand for?

static images datasets

Is all static images datasets for Pre-training the training set or both training and val set?
Is static images datasets COCO for Pre-training the coc2014 training set or coco2017 training set.

Hi, question about time of pretain step

I finished to collect five static image datasets, according to your readme.

11/08 18:02:44 114763 imgs are used for PreTrain. They are from ['COCO', 'ECSSD', 'MSRA10K', 'PASCAL-S', 'PASCALVOC2012'].
pretrain dataset length is  114763
11/08 18:02:44 Load level 0 dataset: 114763 training cases.
11/08 18:02:51 Random seed: 1604829771

pretrained dataset has 114763 images, almostly close to "136032" in your paper. But pretrain time is about 12 hours, which is largely far way from "days to week" in your README. May I make some mistakes?

Main Train:   1%|               | 879/114763 [05:07<10:48:59,  2.92it/s, loss=0.38932 (0.64092 0.56776)]

Why does the dataloader of DAVIS need to randomly sample images from a video sequence?

Hello, thanks for sharing such a complete code.

Maybe due to my inadequate understanding, I have some questions about the train process, I hope to get your answers.
In the dataloader DAVIS_Train_DS of training, each time, 6 randomly shuffled images are sampled from this video sequence, as mentioned in the paper [4.2 Main training on the benchmark datasets]. But in DAVIS_Test_DS, the images are read in order.

I want to know, what is the purpose of random sampling? Are there any obvious advantages to this approach?
If I read five or six consecutive images randomly, e.g. [random_id : random_id + 6], will there be any negative effects?

Thanks.

def __getitem__(self, idx):
video_name = self.dataset_list[idx]
img_dir = os.path.join(self.root, 'JPEGImages', '480p', video_name)
mask_dir = os.path.join(self.root, 'Annotations', '480p', video_name)
img_list = sorted(glob(os.path.join(img_dir, '*.jpg')))
mask_list = sorted(glob(os.path.join(mask_dir, '*.png')))
idx_list = list(range(len(img_list)))
random.shuffle(idx_list)
idx_list = idx_list[:self.clip_n]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.