Giter Club home page Giter Club logo

cagnet-zero-shot-semantic-segmentation's Introduction

CaGNet: Context-aware Feature Generation for Zero-shot Semantic Segmentation

Code for our ACM MM 2020 paper "Context-aware Feature Generation for Zero-shot Semantic Segmentation".

Created by Zhangxuan Gu, Siyuan Zhou, Li Niu*, Zihan Zhao, Liqing Zhang*.

Paper Link: [arXiv]

News

In our journal extension CaGNetv2 [arXiv, github], we extend pixel-wise feature generation and finetuning to patch-wise feature generation and finetuning.

Visualization on Pascal-VOC

Visualization on Pascal-VOC

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{Gu2020CaGNet,
  title={Context-aware Feature Generation for Zero-shot Semantic Segmentation},
  author={Zhangxuan Gu and Siyuan Zhou and Li Niu and Zihan Zhao and Liqing Zhang},
  booktitle={ACM International Conference on Multimedia},
  year={2020}
}

Introduction

Existing semantic segmentation models heavily rely on dense pixel-wise annotations. To reduce the annotation pressure, we focus on a challenging task named zero-shot semantic segmentation, which aims to segment unseen objects with zero annotations. This can be achieved by transferring knowledge across categories via semantic word embeddings. In this paper, we propose a novel context-aware feature generation method for zero-shot segmentation named as CaGNet. In particular, with the observation that a pixel-wise feature highly depends on its contextual information, we insert a contextual module in a segmentation network to capture the pixel-wise contextual information, which guides the process of generating more diverse and context-aware features from semantic word embeddings. Our method achieves state-of-the-art results on three benchmark datasets for zero-shot segmentation.

Overview of Our CaGNet

Experiments

Basic Settings

  • Inductive or Transductive:

    Inductive -> No test samples (images and annotations) are available during training (including finetuning).

  • Generalized or Non-generalized:

    Generalized -> Both seen and unseen categories can appear in test samples.

  • Baselines:

    SPNet [github, paper] & ZS3Net [github, paper]

  • Backbone Network:

    DeepLabV2 (ResNet-101) pre-trained on ImageNet (following SPNet)

  • Semantic Word Embedding:

    Word2vec (300-dim) & FastText (300-dim)

  • Datasets:

    • Pascal-Context

      Samples: 4998 train / 5105 test

      Split: 33 classes including 29 seen / 4 unseen "cow, motorbike, sofa, cat"

    • COCO-Stuff

      Samples: 118288 train / 5001 test

      Split: 182 classes including 167 seen / 15 unseen (following SPNet)

    • Pascal-VOC and SBD (Semantic Boundary Dataset)

      Samples: 11685 train / 1449 test

      Split: 20 classes including 15 seen / 5 unseen (following SPNet)

  • "Background" or Not:

    ZS3Net uses the word embedding of "background" as the semantic representation of all categories (e.g., sky and ground) belonging to "background", which seems a little unreasonable, while SPNet ignores "background" in both training and testing. Although including "background" can bring large performance gain, we follow SPNet and ignore it all the time.

  • Additional Operation on Train Samples:

    Since train images may contain pixels that do not belong to seen categories (e.g. unseen categories, background, or no label), we mark the annotations of these pixels as 'ignored' so that only seen categories are visible during training (including finetuning).

Results

“ST” in the following tables stands for self-training mentioned in ZS3Net.

Our Results on Pascal-Context dataset

Method hIoU mIoU pixel acc. mean acc. S-mIoU U-mIoU
SPNet 0 0.2938 0.5793 0.4486 0.3357 0
SPNet-c 0.0718 0.3079 0.5790 0.4488 0.3514 0.0400
ZS3Net 0.1246 0.3010 0.5710 0.4442 0.3304 0.0768
CaGNet 0.2061 0.3347 0.5924 0.4900 0.3610 0.1442
ZS3Net+ST 0.1488 0.3102 0.5725 0.4532 0.3398 0.0953
CaGNet+ST 0.2252 0.3352 0.5961 0.4962 0.3644 0.1630

Our Results on COCO-Stuff dataset

Method hIoU mIoU pixel acc. mean acc. S-mIoU U-mIoU
SPNet 0.0140 0.3164 0.5132 0.4593 0.3461 0.0070
SPNet-c 0.1398 0.3278 0.5341 0.4363 0.3518 0.0873
ZS3Net 0.1495 0.3328 0.5467 0.4837 0.3466 0.0953
CaGNet 0.1819 0.3345 0.5658 0.4845 0.3549 0.1223
ZS3Net+ST 0.1620 0.3367 0.5631 0.4862 0.3489 0.1055
CaGNet+ST 0.1946 0.3372 0.5676 0.4854 0.3555 0.1340

Our Results on Pascal-VOC dataset

Method hIoU mIoU pixel acc. mean acc. S-mIoU U-mIoU
SPNet 0.0002 0.5687 0.7685 0.7093 0.7583 0.0001
SPNet-c 0.2610 0.6315 0.7755 0.7188 0.7800 0.1563
ZS3Net 0.2874 0.6164 0.7941 0.7349 0.7730 0.1765
CaGNet 0.3972 0.6545 0.8068 0.7636 0.7840 0.2659
ZS3Net+ST 0.3328 0.6302 0.8095 0.7382 0.7802 0.2115
CaGNet+ST 0.4366 0.6577 0.8164 0.7560 0.7859 0.3031

Please note that our reproduced results of SPNet on Pascal-VOC dataset are obtained using their released model and code with careful tuning, but still lower than their reported results.

Hardware Dependency

Our released code temporarily supports a single GPU or multiple GPUs. To acquire satisfactory training results, we advise that each GPU card should be at least 32GB with batch size larger than 8.

The results in the conference paper / this repository are obtained on a single 32GB GPU with batch size 8. If you use multiple GPUs (each ≥ 32GB) to train CaGNet, you might hopefully achieve better results.

Getting Started

Installation

1.Clone this repository.

git clone https://github.com/bcmi/CaGNet-Zero-Shot-Semantic-Segmentation.git

2.Create python environment for CaGNet via conda.

conda env create -f CaGNet_environment.yaml

3.Download dataset.

  • Pascal-VOC

    --> CaGNet_VOC2012_data.tar : BCMI-Cloud or BaiduNetDisk (extraction code: beau)

    1. download the above .tar file into directory ./dataset/voc12/
    2. uncompress it to form ./dataset/voc12/images/ and ./dataset/voc12/annotations/
  • Pascal-Context

    --> CaGNet_context_data.tar : BCMI-Cloud or BaiduNetDisk (extraction code: rk29)

    1. download the above .tar file into directory ./dataset/context/
    2. uncompress it to form ./dataset/context/images/ and ./dataset/context/annotations/
  • COCO-Stuff

    1. follow the setup instructions on the COCO-Stuff homepage to obtain two folders: images and annotations.
    2. move the above two folders into directory ./dataset/cocostuff/ to form ./dataset/cocostuff/images/ and ./dataset/cocostuff/annotations/

4.Download pre-trained weights and our optimal models into directory ./trained_models/

  • deeplabv2 pretrained weight for Pascal-VOC and Pascal-Context

    --> deeplabv2_resnet101_init.pth : BCMI-Cloud or BaiduNetDisk (extraction code: 5o0m)

  • SPNet pretrained weight for COCO-Stuff

    --> spnet_cocostuff_init.pth : BCMI-Cloud or BaiduNetDisk (extraction code: qjpo)

  • our best model on Pascal-VOC

    --> voc12_ourbest.pth : BCMI-Cloud or BaiduNetDisk (extraction code: nxj4)

  • our best model on Pascal-Context

    --> context_ourbest.pth : BCMI-Cloud or BaiduNetDisk (extraction code: 0x2i)

  • our best model on COCO-Stuff

    --> cocostuff_ourbest.pth : BCMI-Cloud or BaiduNetDisk (extraction code: xl88)

Training

1.Train on Pascal-VOC dataset

python train.py --config ./configs/voc12.yaml --schedule step1
python train.py --config ./configs/voc12_finetune.yaml --schedule mixed

2.Train on Pascal-Context dataset

python train.py --config ./configs/context.yaml --schedule step1
python train.py --config ./configs/context_finetune.yaml --schedule mixed

3.Train on COCO-Stuff dataset

python train.py --config ./configs/cocostuff.yaml --schedule step1
python train.py --config ./configs/cocostuff_finetune.yaml --schedule mixed

Testing

1.Test our best model on Pascal-VOC dataset

python train.py --config ./configs/voc12.yaml --init_model ./trained_models/voc12_ourbest.pth --val

2.Test our best model on Pascal-Context dataset

python train.py --config ./configs/context.yaml --init_model ./trained_models/context_ourbest.pth --val

3.Test our best model on COCO-Stuff dataset

python train.py --config ./configs/cocostuff.yaml --init_model ./trained_models/cocostuff_ourbest.pth --val

Visualization

COMING SOON !

Try on Custom Data

COMING SOON !

Acknowledgement

Some of the codes are built upon FUNIT and SPNet. Thanks them for their great work!

If you get any problems or if you find any bugs, don't hesitate to comment on GitHub or make a pull request!

CaGNet is freely available for non-commercial use, and may be redistributed under these conditions. For commercial queries, please drop an e-mail. We will send the detail agreement to you.

cagnet-zero-shot-semantic-segmentation's People

Contributors

siyuan-zhou avatar zhangxgu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cagnet-zero-shot-semantic-segmentation's Issues

about dataset

why can images containing seen classes be used as training samples? Although these pixels belong to unseen classes are marked as 'ignored', the network can ultilize these pixel information as context . So why not follow zero-shot detection , images that contain any unseen objects are removed from the training set?
Thanks very much! Looking forward to your reply and dispelling my question.

model initialization

Hi thanks for the code!

I have a question concerning the model initialization for training on cocostuff dataset.
In the configs/cocostuff.yaml file, seems you are using the SPNet trained model as initialization. Is that fair?

1

problems for self training

Good job and it helps me a lot! You mentioned you used the self training strategy in the training process. However, I'm confused which stage of the training process you introduce the self-training strategy. Thank you very much!

Questions about Finetuning part

Hi, thanks for this excellent work! However, I have one problem with the settings in Finetuning part. In this paper, you mention "...The corresponding label map is Y(s/m) with pixel-wise label vector...", which contains both the unseen and seen part. Does it mean the ground-truth label map of the unseen class is involved during the finetuning process? Otherwise how to generate the label map of the unseen class during the fine-tuning process?

Should there be one or two steps in self-training

Hi, I'm currently trying to run self-training to see its improvement on performance. Since the self-training configuration is not given here, I suppose all I need to change is changing "--schedule step1(mixed)" to "--schedule st(st_transfer)". And I should change "lr_transfer" to "lr_st_transfer". (Please correct me if I'm wrong and there are more configurations need to be modified). Could you please give the proper learning rate for self-training and corresponding max iterations (ITER_MAX_ST_TRANSFER in the config file)?

In train.py I saw there are choice for self-training and self-training transfer. I don't think it is meaningful to do self-training on seen categories only. Shall I just go for step "st_transfer" after getting model from "mixed" schedule (i.e. step1 -> mixed -> st_mixed). Or do I need add self-training before st_mixed (i.e. step1 -> mixed -> st -> st_mixed).

One more possibility is adding "st" before "mixed" (i.e. step1 -> st -> mixed -> st_mixed) but it seems to be meaningless to have self-training after step 1.

Thank you!

How to train custom data

How to train custom data?or
In the folder, 'dataset/voc12' , how to generate the file .npy and .pkl

Problem downloading resources

Hi Zhangxuan Gu et al.,

I am facing the problem cannot downloading the pretrained weight provided in both BCMI-Cloud and BaiduNetDisk.
The BCMI link seems cannot be reached and the BaiduNetDisk needs some applications downloading and signing up to the Baidu, which is confusing.

Could it be any other simple ways to reach those resources?

Thank you very much for the precious publication!

image
image

Train classes problem

In your VOC train_list, there is 10582 images, I find it contains the 15 seen_cls and 5 novel_cls.
So, in the train loop, do you use all the 20 classes or only 15 seen_cls?
Thanks!

Generate word vector

Hello, I want to train my own dataset, how can I generate the word vector? Can you give me some reference? Thank you very much!!!

mean acc of unseen is NaN

Hi, I checked the fasttext value and your pickled embedding value, it seems they does not match. I directly use word embeddings from fasttext, but the mean acc of unseen is NaN for my customed data, how to fix it? Thanks.

Finetune training problem

When I ran python train.py --config ./configs/voc12_finetune.yaml --schedule mixed.
It happened that
Original Traceback (most recent call last):
File "/data/gongdawei/anaconda3/envs/CaGNet/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/data/gongdawei/anaconda3/envs/CaGNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/data/gongdawei/Research/CaGNet/model.py", line 97, in forward
self.get_loss_B()
File "/data/gongdawei/Research/CaGNet/model.py", line 162, in get_loss_B
self.loss_B_KLD = self.loss_KLD * self.hp['lambda_B_KLD']
File "/data/gongdawei/anaconda3/envs/CaGNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 576, in getattr
type(self).name, name))
AttributeError: 'OurModel' object has no attribute 'loss_KLD'.
How can I solve it?
Many thanks!

Train and val data path

I download the data follow the installation 3, using 'CaGNet_VOC2012_data.tar'. In train.py line 166, I print the train and val data:
微信截图_20200911102303
and the print results are:
微信截图_20200911102359
the datapath is '/home/schoudhu/zss/SBD/dataset/', how can I change the path to mine?
Thanks

Calculation of evaluation

It's a great job and it helps me a lot. I would like to know why background should be filtered out when calculating acc or IOU by VOC?, which will lead to a significantly higher ACC and IOU, thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.