Giter Club home page Giter Club logo

birefnet's Introduction

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

DIS-Sample_1 DIS-Sample_2

This repo is the official implementation of "Bilateral Reference for High-Resolution Dichotomous Image Segmentation" (arXiv 2024).

Authors: Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, & Nicu Sebe.

[arXiv] [code] [stuff]

Our BiRefNet has achieved SOTA on many similar HR tasks:

DIS: PWC PWC PWC PWC PWC

COD:PWC PWC PWC PWC

HRSOD: PWC PWC PWC PWC PWC

Try our online demos for inference:

  • Inference and evaluation of your given weights: Open In Colab
  • Online Inference with GUI with adjustable resolutions: Hugging Face Spaces

Third-Party Creations

Concerning edge devices with less computing power, we provide a lightweight version with swin_v1_tiny as the backbone, which is x4+ faster and x5+ smaller. The details can be found in this issue and links there.

We found there've been some 3rd party applications based on our BiRefNet. Many thanks for their contribution to the community!
Choose the one you like to try with clicks instead of codes:

  1. Applications:

    • Thanks fal.ai/birefnet: this project on fal.ai encapsulates BiRefNet online with more useful options in UI and API to call the model.

    • Thanks ZHO-ZHO-ZHO/ComfyUI-BiRefNet-ZHO: this project further improves the UI for BiRefNet in ComfyUI, especially for video data.

      app-comfyUI_ZHO.mp4
    • Thanks viperyl/ComfyUI-BiRefNet: this project packs BiRefNet as ComfyUI nodes, and makes this SOTA model easier use for everyone.

  2. More Visual Comparisons

    video-from_twitter_toyxyz3_2.mp4
    video-from_twitter_toyxyz3_1.mp4

Usage

Environment Setup

# PyTorch==2.0.1 is used for faster training with compilation.
conda create -n dis python=3.9 -y && conda activate dis
pip install -r requirements.txt

Dataset Preparation

Download combined training / test sets I have organized well from: DIS--COD--HRSOD or the single official ones in the single_ones folder, or their official pages. You can also find the same ones on my BaiduDisk: DIS--COD--HRSOD.

Weights Preparation

Download backbone weights from my google-drive folder or their official pages.

Run

# Train & Test & Evaluation
./sub.sh RUN_NAME GPU_NUMBERS_FOR_TRAINING GPU_NUMBERS_FOR_TEST
# See train.sh / test.sh for only training / test-evaluation.
# After the evluation, run `gen_best_ep.py` to select the best ckpt from a specific metric (you choose it from Sm, wFm, HCE (DIS only)).

Well-trained weights:

Download the BiRefNet_{TASK}_{EPOCH}.pth from [stuff].

The results might be a bit different from those in the original paper, you can see them in the performances_all_ckpts folder in stuff. Due to the very high cost I used (A100-80G x 8) which many people cannot afford to (including myself....), I re-trained BiRefNet on a single A100-40G only and achieve the performance on the same level (even better). It means you can directly train the model on a single GPU with 36.5G+ memory. BTW, 5.5G GPU memory is needed for inference in 1024x1024. (I personally paid a lot for renting an A100-40G to re-train BiRefNet on the three tasks... T_T. Hope it can help you.)

But if you have more and more powerful GPUs, you can set GPU IDs and increase the batch size in config.py to accelerate the training. We have made all this kind of things adaptive in scripts to seamlessly switch between single-card training and multi-card training. Enjoy it :)

Some of my messages:

This project was originally built for DIS only. But after the updates one by one, I made it larger and larger with many functions embedded together. Finally, you can use it for any binary image segmentation tasks, such as DIS/COD/SOD, medical image segmentation, anomaly segmentation, etc. You can eaily open/close below things (usually in config.py):

  • Multi-GPU training: open/close with one variable.
  • Backbone choices: Swin_v1, PVT_v2, ConvNets, ...
  • Weighted losses: BCE, IoU, SSIM, MAE, Reg, ...
  • Adversarial loss for binary segmentation (proposed in my previous work MCCL).
  • Training tricks: multi-scale supervision, freezing backbone, multi-scale input...
  • Data collator: loading all in memory, smooth combination of different datasets for combined training and test.
  • ... I really hope you enjoy this project and use it in more works to achieve new SOTAs.

Quantitative Results

quan_dis

quan_cod_hrsod

Qualitative Results

qual_dis

qual_cod

Citation

@article{zheng2024birefnet,
  title={Bilateral Reference for High-Resolution Dichotomous Image Segmentation},
  author={Zheng, Peng and Gao, Dehong and Fan, Deng-Ping and Liu, Li and Laaksonen, Jorma and Ouyang, Wanli and Sebe, Nicu},
  journal={arXiv},
  year={2024}
}

Contact

Any question, discussion or even complaint, feel free to leave issues here or send me e-mails ([email protected]).

birefnet's People

Contributors

zhengpeng7 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.