Giter Club home page Giter Club logo

piwivos's Introduction

Fast Video Object Segmentation by Pixel-Wise Feature Comparison (PiWiVOS)

Code for my final degree thesis Fast Video Object Segmentation by Pixel-Wise Feature Comparison (PiWiVOS).

This final degree thesis tackles the task of One-Shot Video Object Segmentation, where multiple objects have to be separated from the background using the ground truth masks for them in the very first frame only. Objects' large pose and scale variations throughout the sequence, alongside occlusions happening among them, make this task extremely challenging. Fast Video Object Segmentation by Pixel-Wise Feature Comparison—which is trained and tested on the well-known DAVIS dataset—goes a step further, and besides achieving comparable results with state-of-the-art methods, it works one order of magnitude faster than them, or even two in some cases.

Dependencies

This version of the project (updated from the original one for better reproducibility) has been build using:

Usage

There are two main scripts: train.py, which serves to train a model; and test.py, which is used to evaluate a checkpoint and optionally export the predicted masks.

train.py

The complete usage can be seen typing:

$ train.py -h

This script has many arguments which control specific parts of our method. Section 4.2 of the Thesis introduces all these parameters, and Chapter 5 presents a complete study of their optimal values, in which they are set by default.

Apart from method-specific parameters, the most important arguments are:

  • --job_name JOB_NAME: Used to identify the job and create a log directory for it at logs/JOB_NAME, in which tensorboard logs and checkpoints will be stored.

  • --path PATH: Path to the DAVIS dataset. Defaults to data/DAVIS.

  • --model_name ['piwivos', 'piwivosf']: Name of the model to use. PiWiVOS uses a resnet50 backbone while PiWiVOS-F uses a resnet34 and has lower output resolution. See Chapter 5 of the Thesis for more information. Defaults to 'piwivos'.

The script trains the model from a pre-trained ResNet using the official DAVIS 2017 train set, and validates using the val one.

test.py

The complete usage can be seen typing:

$ test.py -h

The main arguments are:

  • --path PATH: Path to the DAVIS dataset. Defaults to data/DAVIS.

  • --checkpoint_path CHECKPOINT_PATH: Path to the checkpoint file (.pth) to evaluate. Defaults to checkpoints/piwivos/piwivos.pth following this repository's structure.

  • --model_name ['piwivos', 'piwivosf']: Name of the model to use. Must match with the loaded checkpoint. Defaults to 'piwivos'.

  • --image_set ['val', 'test-dev', 'test-challenge']: Set of images on which to evaluate the model. Defaults to 'val'.

  • --export: When set, the script exports the predicted masks in the disk. These are stored in a results subdirectory side by side the evaluated checkpoint.

Data

PiWiVOS is trained and evaluated using the DAVIS 2017 semi-supervised 480p dataset, which can be downloaded from this link.

Nonetheless, our code can also be used with different DAVIS data. In first place, our dataloader supports the DAVIS 2016 semi-supervised 480p dataset, which is a subset of the DAVIS 2017 version and contains single-object sequences, being an easier task. However, if the user wants to perform this task on the (larger) DAVIS 2017 dataset, the dataloader has also an option to merge individual object masks into a "single-object mask".

See the DAVIS dataloader.

Results

Results reported by this repository's checkpoints are slightly better than the ones in the Thesis strictly due to seeding and possible library updates.

Model Name J Mean F Mean G Mean (J&F)
PiWiVOS 67.95% 74.93% 71.42%
PiWiVOS-F 56.17% 54.46% 55.32%

Table: Results on the val set. See Thesis for the original results on val and test-dev sets.

Citation

You can cite our work using:

@phdthesis{Palliser Sans_2019,
	title={Fast Video Object Segmentation by Pixel-Wise Feature Comparison},
	url={http://hdl.handle.net/2117/169370},
	abstractNote={This final degree thesis tackles the task of One-Shot Video Object Segmentation, where multiple objects have to be separated from the background only having the ground truth masks for them in the very first frame. Their large pose and scale variations throughout the sequence, and the occlusions happening between them make this task very difficult to solve. Fast Video Object Segmentation by Pixel-Wise Feature Comparison goes a step further, and besides achieving comparable results with state-of-the-art methods, it works one order of magnitude faster than them, or even two in some cases.},
	school={UPC, Centre de Formació Interdisciplinària Superior, Departament de Teoria del Senyal i Comunicacions},
	author={Palliser Sans, Rafel},
	year={2019},
	month={May},
}

piwivos's People

Contributors

rafelps avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.