Giter Club home page Giter Club logo

lang-seg's Introduction

Language-driven Semantic Segmentation (LSeg)

The repo contains official PyTorch Implementation of paper Language-driven Semantic Segmentation.

ICLR 2022

Authors:

Overview

We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., ''grass'' or 'building'') together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class. The text embeddings provide a flexible label representation in which semantically similar labels map to similar regions in the embedding space (e.g., ''cat'' and ''furry''). This allows LSeg to generalize to previously unseen categories at test time, without retraining or even requiring a single additional training sample. We demonstrate that our approach achieves highly competitive zero-shot performance compared to existing zero- and few-shot semantic segmentation methods, and even matches the accuracy of traditional segmentation algorithms when a fixed label set is provided.

Please check our Video Demo (4k) to further showcase the capabilities of LSeg.

Usage

Installation

Option 1:

pip install -r requirements.txt

Option 2:

conda install ipython
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
pip install git+https://github.com/zhanghang1989/PyTorch-Encoding/
pip install pytorch-lightning==1.3.5
pip install opencv-python
pip install imageio
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install altair
pip install streamlit
pip install --upgrade protobuf
pip install timm
pip install tensorboardX
pip install matplotlib
pip install test-tube
pip install wandb

Option 3 (WSL2 or Ubuntu):

DO NOT INSTALL cuda through $ sudo apt install cuda 
Uninstall any existing CUDA in your system
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
Follow these steps to set up CUDA in WSL2
https://jrkwon.com/2022/11/22/cuda-and-cudnn-inside-a-conda-env/
sudo apt-get install -y libgl1-mesa-dev
sudo apt-get install -y libglib2.0-0
sudo apt install g++

conda create --name langseg python=3.11
conda activate langseg
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install git+https://github.com/zhanghang1989/PyTorch-Encoding/
# If the above doesn't work, you may follow the following instruction to install PyTorch-Encoding
# https://hangzhang.org/PyTorch-Encoding/notes/compile.html
pip install pytorch-lightning opencv-python imageio ftfy regex tqdm altair streamlit timm tensorboardX matplotlib test-tube wandb ipython ninja
pip install git+https://github.com/openai/CLIP.git


Data Preparation

By default, for training, testing and demo, we use ADE20k.

python prepare_ade20k.py
unzip ../datasets/ADEChallengeData2016.zip

Note: for demo, if you want to use random inputs, you can ignore data loading and comment the code at link.

๐ŸŒป Try demo now

Download Demo Model

name backbone text encoder url
Model for demo ViT-L/16 CLIP ViT-B/32 download

You may need to update this weight by: python -m pytorch_lightning.utilities.upgrade_checkpoint checkpoints/demo_e200.ckpt

๐Ÿ‘‰ Option 1: Running interactive app

Download the model for demo and put it under folder checkpoints as checkpoints/demo_e200.ckpt.

Then streamlit run lseg_app.py

๐Ÿ‘‰ Option 2: Jupyter Notebook

Download the model for demo and put it under folder checkpoints as checkpoints/demo_e200.ckpt.

Then follow lseg_demo.ipynb to play around with LSeg. Enjoy!

Training and Testing Example

Training: Backbone = ViT-L/16, Text Encoder from CLIP ViT-B/32

bash train.sh

Testing: Backbone = ViT-L/16, Text Encoder from CLIP ViT-B/32

bash test.sh

Zero-shot Experiments

Data Preparation

Please follow HSNet and put all dataset in data/Dataset_HSN

Pascal-5i

for fold in 0 1 2 3; do
python -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset pascal \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold ${fold} --nshot 0 \
--weights checkpoints/pascal_fold${fold}.ckpt 
done

COCO-20i

for fold in 0 1 2 3; do
python -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset coco \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold ${fold} --nshot 0 \
--weights checkpoints/pascal_fold${fold}.ckpt 
done

FSS

python -u test_lseg_zs.py --backbone clip_vitl16_384 --module clipseg_DPT_test_v2 --dataset fss \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 \
--weights checkpoints/fss_l16.ckpt 
python -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset fss \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 \
--weights checkpoints/fss_rn101.ckpt 

Model Zoo

dataset fold backbone text encoder performance url
pascal 0 ResNet101 CLIP ViT-B/32 52.8 download
pascal 1 ResNet101 CLIP ViT-B/32 53.8 download
pascal 2 ResNet101 CLIP ViT-B/32 44.4 download
pascal 3 ResNet101 CLIP ViT-B/32 38.5 download
coco 0 ResNet101 CLIP ViT-B/32 22.1 download
coco 1 ResNet101 CLIP ViT-B/32 25.1 download
coco 2 ResNet101 CLIP ViT-B/32 24.9 download
coco 3 ResNet101 CLIP ViT-B/32 21.5 download
fss - ResNet101 CLIP ViT-B/32 84.7 download
fss - ViT-L/16 CLIP ViT-B/32 87.8 download

If you find this repo useful, please cite:

@inproceedings{
li2022languagedriven,
title={Language-driven Semantic Segmentation},
author={Boyi Li and Kilian Q Weinberger and Serge Belongie and Vladlen Koltun and Rene Ranftl},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=RriDjddCLN}
}

Acknowledgement

Thanks to the code base from DPT, Pytorch_lightning, CLIP, Pytorch Encoding, Streamlit, Wandb

lang-seg's People

Contributors

boyiliee avatar ranftlr avatar yiluzhou avatar yiluzhou1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.