Giter Club home page Giter Club logo

civos's Introduction

CiVOS : Revisiting Click-Based Interactive Video Object Segmentation

Stephane Vujasinovic, Sebastian Bullinger, Stefan Becker, Norbert Scherer-Negenborn, Michael Arens, Rainer Stiefelhagen

ICIP 2022


📰 New Project (18/09/2023):
READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation

TL;DR: We manage the memory of STM like sVOS methods to better deal with long video. To attain long-term performance we estimate the inter-frame diversity of the base memory and integrate the embeddings of an incoming frame into the memory if it enhances the diversity. In return, we are able to limit the number of memory slots and deal with unconstrained video sequences without hindering the performance on short sequences and alleviate the need for a sampling interval.


Paper

Abstract

While current methods for interactive Video Object Segmentation (iVOS) rely on scribble-based interactions to generate precise object masks, we propose a Click-based interactive Video Object Segmentation (CiVOS) framework to simplify the required user workload as much as possible. CiVOS builds on de-coupled modules reflecting user interaction and mask propagation. The interaction module converts click-based interactions into an object mask, which is then inferred to the remaining frames by the propagation module. Additional user interactions allow for a refinement of the object mask. The approach is extensively evaluated on the popular interactive DAVIS dataset, but with an inevitable adaptation of scribble-based interactions with click-based counterparts. We consider several strategies for generating clicks during our evaluation to reflect various user inputs and adjust the DAVIS performance metric to perform a hardware-independent comparison. The presented CiVOS pipeline achieves competitive results, although requiring a lower user workload.

Architecture

[Paper] [ArXiv] [PDF]

@INPROCEEDINGS{Vujasinović_2021_ICIP,
  author={Vujasinović, Stéphane and Bullinger, Sebastian and Becker, Stefan and Scherer-Negenborn, Norbert and Arens, Michael and Stiefelhagen, Rainer},
  booktitle={2022 IEEE International Conference on Image Processing (ICIP)}, 
  title={Revisiting Click-Based Interactive Video Object Segmentation}, 
  year={2022},
  pages={2756-2760},
  doi={10.1109/ICIP46576.2022.9897460}}

Setting up the environment

  1. The framework is built with Python 3.7 and relies on the following packages:
    • NumPy 1.21.4
    • SciPy 1.7.2
    • PyTorch 1.10.0
    • torchvision 0.11.1
    • OpenCV 4.5.4 (opencv-python-headless if you don't want to use the demo)
    • Cython 0.29.24
    • scikit-learn 0.20.3
    • scikit-image 0.18.3
    • scipy 1.7.2
    • Pillow 8.4.0
    • imgaug 0.4.0
    • albumentations 1.10
    • tqdm 4.62.3
    • PyYaml 6.0
    • easydict 1.9
    • future 0.18.2
    • cffi 1.15.0
    • davis-interactive 1.0.4
    • networkx 2.6.3 for DAVIS
    • gdown 4.2.0 for downloading pretrained models
    • tensorboard 2.4.1
  2. Download the DAVIS dataset download_datasets.py
  3. Download the pretrained models download_models.py

Guide for Demo

  1. Adapt the paths and variables in Demo.yml
  2. Launch CiVOS_Demo.py (Nota bene: only 1 object can be segmented in the Demo)
  3. Mouse and keyboard bindings:
    • Positive interaction: left mouse click
    • Negative interaction: right mouse click
    • Predict a mask of the object of interest for the video sequence: space bar
    • Visualize the results with the keys x(forward direction) and y(backward direction)
    • Quit the demo with key q

How to evaluate on DAVIS

  1. Adapt the paths and variables of EXAMPLE_DEBUGGING.yml
  2. Adapt and lauch the bash file CiVOS_evaluation_script_example.sh
  3. Read .csv files results with Summarize_with_DAVIS_arbitrary_report.py

Results

Quantitative evaluation on the interactive DAVIS 2017 validation set.

Methods Training interaction Testing interaction R-AUC-J&F AUC-J&F J&F@60s
MANet Scribbles Scribbles 0.72 0.79 0.79
ATNet Scribbles Scribbles 0.75 0.80 0.80
MiVOS Scribbles Scribbles 0.81 0.87 0.88
GIS-RAmap Scribbles Scribbles 0.79 0.86 0.87
MiVOS Clicks Clicks 0.70 0.78 0.79
CiVOS Clicks Clicks 0.76 0.83 0.84

R-AUC-J&F results on the DAVIS 2017 validation set for CiVOS by generating clicks in three different ways.

Maximal Number of Clicks 1 2 3 4 5 6 7
Interaction Strategy 1 0.69 - - - - - -
Interaction Strategy 2 0.72 0.76 0.76 0.75 0.75 0.75 0.76
Interaction Strategy 3 0.74 0.77 0.78 0.78 0.78 0.78 0.78

Credits

RiTM: GitHub, Paper

MiVOS: GitHub, Paper

DeepLabV3Plus: GitHub, Paper

DAVIS-interactive: GitHub, Project

civos's People

Contributors

sbcv avatar vujas-eteph avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.