zjulearning / rmi Goto Github PK

View Code? Open in Web Editor NEW

268.0 10.0 37.0 1.28 MB

This is the code for the NeurIPS 2019 paper Region Mutual Information Loss for Semantic Segmentation.

License: MIT License

Python 83.78% Shell 6.73% C++ 5.22% Cuda 4.27%

rmi's Introduction

Region Mutual Information Loss for Semantic Segmentation

Introduction
Features and TODO
Installation
Training
Evaluation and Inference
Experiments
Citations
Acknowledgements

Introduction

This is the code for the NeurIPS 2019 paper Region Mutual Information Loss for Semantic Segmentation.

This paper proposes a region mutual information (RMI) loss to model the dependencies among pixels. RMI uses one pixel and its neighbor pixels to represent this pixel. Then for each pixel in an image, we get a multi-dimensional point that encodes the relationship between pixels, and the image is cast into a multi-dimensional distribution of these high-dimensional points. The prediction and ground truth thus can achieve high order consistency through maximizing the mutual information (MI) between their multi-dimensional distributions.

Features and TODO

Support different segmentation models, i.e., DeepLabv3, DeepLabv3+, PSPNet
Multi-GPU training
Multi-GPU Synchronized BatchNorm
Support different backbones, e.g., Mobilenet, Xception
Model pretrained on MS-COCO
Distributed training

We are open to pull requests.

Installation

Install dependencies

Please install PyTorch-1.1.0 and Python3.6.5. We highly recommend you to use our established PyTorch docker image - zhaosssss/torch_lab.

docker pull zhaosssss/torch_lab:1.1.0

If you have not installed docker, see https://docs.docker.com/.

After you install docker and pull our image, you can cd to script directory and run

./docker.sh

to create a running docker container.

If you do not want to use docker, try

pip install -r requirements.txt

However, this is not suggested.

Prepare data

Generally, directories are organized as follow:

|
|--dataset (save the dataset) 
|--models  (save the output checkpoints)
|--github  (save the code)
|--|
|--|--RMI  (the RMI code repository)
|--|--|--crf
|--|--|--dataloaders
|--|--|--losses
...

Download PASCAL VOC training/validation data (2GB tar file) and augmented segmentation data, extract and put them in the dataset directory.
cd to github directory and clone the RMI repo.

As for the CamVid dataset, you can download at SegNet-Tutorial. This is a processed version of original CamVid dataset.

Training

See script/train.sh for detailed information. Before start training, you should specify some variables in the script/train.sh.

pre_dir, where you save your output checkpoints. If you organize the dir as we suggest, it should be pre_dir=models.
data_dir, where you save your dataset. Besides, you should put the lists of the images in the dataset in a certain directory, check dataloaders/datasets/pascal.py to find how we organize the input pipeline.

You can find more information about the arguments of the code in parser_params.py.

python parser_params.py --help

usage: parser_params.py [-h] [--resume RESUME] [--checkname CHECKNAME]
                        [--save_ckpt_steps SAVE_CKPT_STEPS]
                        [--max_ckpt_nums MAX_CKPT_NUMS]
                        [--model_dir MODEL_DIR] [--output_dir OUTPUT_DIR]
                        [--seg_model {deeplabv3,deeplabv3+,pspnet}]
                        [--backbone {resnet50,resnet101,resnet152,resnet50_beta,resnet101_beta,resnet152_beta}]
                        [--out_stride OUT_STRIDE] [--batch_size N]
                        [--accumulation_steps N] [--test_batch_size N]
                        [--dataset {pascal,coco,cityscapes,camvid}]
                        [--train_split {train,trainaug,trainval,val,test}]
                        [--data_dir DATA_DIR] [--use_sbd] [--workers N]
                        ...
                        [--rmi_pool_size RMI_POOL_SIZE]
                        [--rmi_pool_stride RMI_POOL_STRIDE]
                        [--rmi_radius RMI_RADIUS]
                        [--crf_iter_steps CRF_ITER_STEPS]
                        [--local_rank LOCAL_RANK] [--world_size WORLD_SIZE]
                        [--dist_backend DIST_BACKEND]
                        [--multiprocessing_distributed]

After you set all the arguments properly, you can simply cd to RMI/script and run

./train.sh

to start training.

Monitoring the training process through tensorboard

tensorboard --logdir=your_logdir --port=your_port

GPU memory usage

Training a DeepLabv3 model with output_stride=16, crop_size=513, and batch_size=16 needs 4 GTX 1080 GPUs (8GB) or 2 GTX TITAN X GPUs (12 GB) or 1 TITAN RTX GPUs (24 GB).

Evaluation and Inference

See script/eval.sh and script/inference.sh for detailed information.

You should also specify some variables in the scripts.

data_dir, where you save your dataset.
resume, where your checkpoints locate.
output_dir, where the output data will be saved.

Then run

./eval.sh

./inference.sh

Experiments

Some selected qualitative results on PASCAL VOC 2012 val set. Segmentation results of DeepLabv3+&RMI have richer details than DeepLabv3+&CE, e.g., small bumps of the airplane wing, branches of plants, limbs of cows and sheep, and so on.

Citations

If our paper and code are beneficial to your work, please cite:

@inproceedings{2019_zhao_rmi,
  author    = {Shuai Zhao and
               Yang Wang and
               Zheng Yang and
               Deng Cai},
  title     = {Region Mutual Information Loss for Semantic Segmentation},
  booktitle = {NeurIPS},
  year      = {2019},
}

If other related work in our code or paper also helps you, please cite the corresponding papers.

Acknowledgements

rmi's People

Contributors

Stargazers

Watchers

rmi's Issues

two classification tasks with output channel = 1

Hello, thank you for your excellent work. I have a question about that when the output channels of my model is 1 for 2 classification tasks, is rmi loss meaningful?

Use CRF as post-process in image segmentation

Hello, thank you for your code.
I have a question to ask you, my network output is y_ pred: (10242,36) 10242 is the number of pixels, 36 is the number of categories, y_ PRED can be expressed as the probability that each pixel belongs to a certain class. Y_ true: (10242,36) one hot. How do you use CRF for post-processing?

The intuition behind RMI loss

Since the intuition behind RMI loss is to model the dependencies among pixels, the improvement of boundary segmentation should be not obvious. Is it right?

Effect of resizing image & mask without keeping width-height ratio in the data augmentation

Hello
How are you?
Thanks for contribution to this project.
I'm NOT sure if this RMI loss would work well in case that we resize the image & mask to input size(NxN in pixels) without keeping width-height ratio in the data augmentation step.
I am working on image segmentation project.
There are many images & masks with different sizes in my dataset.
The data by dataloader are resized to input size(ex: 256x256) and feed into the model.
So the original width-height ratio of image & mask are NOT kept.
Even in such case, does this RMI loss work well?

Training a DeepLabv3 model with output_stride=16, crop_size=513, and batch_size=8 on two 2080Ti GPUs

Hi, I only have two 2080Ti GPUs with memory 11G per gpu. I'd like to train the baseline deeplabv3 with resnet-101 as backbone and batch_size=8 per gpu (for 2 gpus, global batch_size=16):

input the gpu (seperate by comma (,) ): 0,1
using gpus 0,1

0  --  deeplabv3
1  --  deeplabv3+
2  --  pspnet
choose the base network: 0

0  --  resnet_v1_50
1  --  resnet_v1_101
2  --  resnet_v1_152
choose the base network: 1
The backbone is resnet101
The base model is deeplabv3

0  --  softmax cross entropy loss.
1  --  sigmoid binary cross entropy loss.
2  --  bce and RMI loss.
3  --  Affinity field loss.
5  --  Pyramid loss.
input the loss type of the first stage: 2

0 -- PASCAL VOC2012 dataset
1 -- Cityscapes
2 -- CamVid
input the dataset: 0

input the batch_size (4, 8, 12 or 16): 8
The data dir is /workspace/data/PASCAL_VOC2012/VOCdevkit/VOC2012, the batch size is 8.
make the directory /workspace/pyroom/RMISegLoss/rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-1-0.5_n
Namespace(accumulation_steps=1, backbone='resnet101', base_size=513, batch_size=8, bn_mom=0.05, checkname='deeplab-resnet', crf_iter_steps=1, crop_size=513, cuda=True, data_dir='/workspace/data/PASCAL_VOC2012/VOCdevkit/VOC2012', dataset='pascal', dist_backend='nccl', distributed=True, epochs=23, eval_interval=2, freeze_bn=False, ft=False, gpu_ids=[0, 1], init_global_step=0, init_lr=0.007, local_rank=0, loss_type=2, loss_weight_lambda=0.5, lr_multiplier=10.0, lr_scheduler='poly', main_gpu=0, max_ckpt_nums=15, model_dir='/workspace/pyroom/RMISegLoss/rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-1-0.5_n', momentum=0.9, multiprocessing_distributed=False, nesterov=False, no_cuda=False, no_val=False, out_stride=16, output_dir='/home/zhaoshuai/models/deeplabv3_cbl_2/', proc_name='rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-1-0.5_n', resume='None', rmi_pool_size=4, rmi_pool_stride=4, rmi_pool_way=1, rmi_radius=3, save_ckpt_steps=500, seed=1, seg_model='deeplabv3', slow_start_lr=0.0001, slow_start_steps=1500, start_epoch=0, sync_bn=True, test_batch_size=8, train_split='trainaug', use_balanced_weights=False, use_sbd=False, weight_decay=0.0001, workers=8, world_size=2)
INFO:PyTorch: Using PASCAL VOC dataset, the training batch size 8 and crop size is 513.
Number of image_lists in trainaug: 10582
Number of image_lists in val: 1449
Restore parameters from the /root/.encoding/models/resnet101-2a57e44d.pth
INFO:PyTorch: Using Region Mutual Information Loss.
INFO:PyTorch: The batch norm layer is Hang Zhang's <class 'model.sync_bn.syncbn.BatchNorm2d'>
INFO:PyTorch: Using poly learning rate scheduler!
INFO:PyTorch: Starting Epoch: 0
INFO:PyTorch: Total Epoches: 23

I wonder if it is equal to train a DeepLabv3 model with output_stride=16, crop_size=513, and batch_size=16 on a single 1 TITAN RTX GPUs? Will it achieve similar convergence in 23 epochs.

Does the batch_size matter? If so, how can I adjust other hyperparams with batch_size=8, like epochs, lr as well as the lr_scheduler?

The RMI loss provides negative loss

Hello,
Firstly thanks for this work

I'm currently working with this RMI loss on my own segmentation toolbox,
but i found the RMI loss provide negative loss

I just copied all codes from rmi.py and rmi_utils.py, then use this RMI loss instead of cross entropy loss

is it normal issue that RMI loss provide negative loss at the beginning of training?

thanks you

Negative RMI loss

Out of the box, i'm seeing negative RMI loss. Is that expected? I'm using the provided docker image.

save model into /home/dcg-adlr-atao-source.cosmos318/sources/RMI/rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-0-0.5_n
Namespace(accumulation_steps=1, backbone='resnet50', base_size=513, batch_size=8, bn_mom=0.0003, checkname='deeplab-resnet', crf_iter_steps=1, crop_size=513, cuda=True, data_dir='/home/dcg-adlr-atao-data.cosmos277/data/PASCAL/2012/VOCdevkit/VOC2012', dataset='pascal', dist_backend='nccl', distributed=False, epochs=23, eval_interval=2, freeze_bn=False, ft=False, gpu_ids=[0], init_global_step=0, init_lr=0.007, local_rank=0, loss_type=2, loss_weight_lambda=0.5, lr_multiplier=10.0, lr_scheduler='poly', main_gpu=0, max_ckpt_nums=15, model_dir='/home/dcg-adlr-atao-source.cosmos318/sources/RMI/rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-0-0.5_n', momentum=0.9, multiprocessing_distributed=False, nesterov=False, no_cuda=False, no_val=False, out_stride=16, output_dir='/home/zhaoshuai/models/deeplabv3_cbl_2/', proc_name='rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-0-0.5_n', resume='None', rmi_pool_size=4, rmi_pool_stride=4, rmi_pool_way=1, rmi_radius=3, save_ckpt_steps=500, seed=1, seg_model='deeplabv3', slow_start_lr=0.0001, slow_start_steps=1500, start_epoch=0, sync_bn=False, test_batch_size=8, train_split='trainaug', use_balanced_weights=False, use_sbd=False, weight_decay=4e-05, workers=8, world_size=1)
INFO:PyTorch: Using PASCAL VOC dataset, the training batch size 8 and crop size is 513.
Number of image_lists in trainaug: 10582
Number of image_lists in val: 1449
Restore parameters from the /home/atao/.encoding/models/resnet101-2a57e44d.pth
INFO:PyTorch: Using Region Mutual Information Loss.
INFO:PyTorch: Using poly learning rate scheduler!
INFO:PyTorch: Starting Epoch: 0
INFO:PyTorch: Total Epoches: 23
INFO:PyTorch: epoch=1/23, steps=20, loss=-30.94986, learning_rate=0.00019, train_miou=0.02007, px_accuracy=0.22422 (20.791 sec)
INFO:PyTorch: epoch=1/23, steps=40, loss=-31.49414, learning_rate=0.00028, train_miou=0.02949, px_accuracy=0.44150 (15.590 sec)
INFO:PyTorch: epoch=1/23, steps=60, loss=-32.38523, learning_rate=0.00038, train_miou=0.03192, px_accuracy=0.51059 (15.593 sec)
INFO:PyTorch: epoch=1/23, steps=80, loss=-32.04794, learning_rate=0.00047, train_miou=0.03653, px_accuracy=0.55031 (15.519 sec)
INFO:PyTorch: epoch=1/23, steps=100, loss=-30.86725, learning_rate=0.00056, train_miou=0.04386, px_accuracy=0.57324 (15.507 sec)
INFO:PyTorch: epoch=1/23, steps=120, loss=-31.81583, learning_rate=0.00065, train_miou=0.05846, px_accuracy=0.59686 (15.437 sec)
INFO:PyTorch: epoch=1/23, steps=140, loss=-32.34855, learning_rate=0.00074, train_miou=0.07140, px_accuracy=0.62079 (15.426 sec)
...

-

The RMI loss does not change too much

I tested the rmi loss with random inputs and I found the rmi loss does not change too much. Is it normal? My test code is as follows.

logits = np.random.randn(5, 3, 32, 32)
labels = np.random.randint(0, 3, size=(5, 32, 32))

logits = torch.from_numpy(logits.astype(np.float32))
labels = torch.from_numpy(labels.astype(np.int32))

rmiloss= RMILoss(num_classes=3)(logits, labels)
print(rmiloss)

Hi! Can the RMI loss be used to measure the similarity of two natural images？

RMI

import java.io.;
import java.net.;
// Server class
class Server {
public static void main(String[] args)
{
ServerSocket server = null;
try {
// server is listening on port 1234
server = new ServerSocket(1234);
server.setReuseAddress(true);
// running infinite loop for getting
// client request
while (true) {
// socket object to receive incoming client
// requests
Socket client = server.accept();
// Displaying that new client is connected
// to server
System.out.println("New client connected"

client.getInetAddress()
.getHostAddress());
// create a new thread object
ClientHandler clientSock
= new ClientHandler(client);
// This thread will handle the client
// separately
new Thread(clientSock).start();
}
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (server != null) {
try {
server.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
}
}
// ClientHandler class
private static class ClientHandler implements Runnable {
private final Socket clientSocket;
// Constructor
public ClientHandler(Socket socket)
{
this.clientSocket = socket;
}
public void run()
{
PrintWriter out = null;
BufferedReader in = null;
try {
// get the outputstream of client
out = new PrintWriter(
clientSocket.getOutputStream(), true);
// get the inputstream of client
in = new BufferedReader(
new InputStreamReader(
clientSocket.getInputStream()));
String line;
while ((line = in.readLine()) != null) {

Not available to download ResNet101 pretrained model.

Hello there,
I tried to download ResNet101 pretrained model using by model_store.py scripts,

but, it was not possible to download ResNet101 pretrained model.

Can you provide pretrained model by other way?
otherwise, it is not possible to reproduce your work....
i just trained from scratch using camvid datasets... it was around 60% mIoU using RMI Loss..

thanks.

RuntimeError: CUDA error: an illegal memory access was encountered

hello, thanks for your work! I always meet the following error. Could you figure out how it is? Thanks!

Please do not raise issues here. This repo is archived and will no longer be maintained. Some 3rd party links maybe helpful in the content.

https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/paddleseg/models/losses/rmi_loss.py

https://github.com/NVIDIA/semantic-segmentation/tree/main/loss

cholesky_cuda: For batch 0: U(1,1) is zero, singular U.

Great work!

We meet an issue caused by the computation of chol = torch.cholesky(matrix). We have pasted the error information as shown below,

RuntimeError:     cholesky_cuda: For batch 0: U(1,1) is zero, singular U.
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: rmi_now = 0.5 * log_det_by_cholesky(appro_var + diag_matrix.type_as(appro_var) * _POS_ALPHA)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010:       File "/teamdrive/yuyua/code/segmentation/mmsegmentation/mmseg/models/losses/rmi_loss.py", line 118, in log_det_by_cholesky
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: chol = torch.cholesky(matrix)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010:         chol = torch.cholesky(matrix)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: chol = torch.cholesky(matrix)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: RuntimeError    chol = torch.cholesky(matrix)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: RuntimeError: cholesky_cuda: For batch 0: U(1,1) is zero, singular U.
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: : cholesky_cuda: For batch 0: U(1,1) is zero, singular U.

Sample weights for RMI loss

Some augmentations (e.g. random angle rotation) make image and mask not fully significant.
To deal with such cases i usually use per pixel weights (0. for holes, 1. for correct parts) and multiply per pixel loss on that weights.

But RMI loss uses "high dimension points" and final loss has shape incompatible with original labels.

Could you please suggest what is the best way to decouple such "holes" loss (multiply by pixel weight)?

Benchmarking on Cityscapes

Hello,

Do you have any results on Cityscapes datasets?

I just wonder rmi loss will bring better performance on cityscapes datset

Thank you

About setting of baseline

Hi，your work is so amazing and help me a lot!
I mentioned in your reported result in paper, you did some comparison between deeplab and your results on different datasets, but I ran your code with batch size=16, deeplab v3+ model , resnet101 backbone on VOC2012 dataset, but only get 0.772miou after about 30k iterations(just the default setting), so can you tell me how can I set the hyper-parameters to get the desire result (including 78.8 with crossentropy loss and higer miou with your proposed loss)?
this is my result on val dataset:

Thank you very much for your excellent work!

How to weight some examples if loss may be negative

In most cases when we have loss >= 0 with shape [batch_size] and we want to weight up importance of some examples we would multiply loss by weight. E.g. loss = [0.1, 0.3], weights = [2., 1.], weighted_loss = [0.2, 0.3]

But how should we do that for RMI loss that may be negative?
E.g. loss = [-0.1, -0.3], weights = [2., 1.], weighted_loss = [-0.2, -0.3]. In this example weighted loss will be smaller instead of expected "larger".

Should we multiply loss by weights or divide?

Comparison with the Potential Energy Loss on Gibbs distribution

Hello
How are you?
Thanks for contributing to this project.
Did u look at this paper?
https://www.mdpi.com/2072-4292/13/3/454
The author of this paper says that the Potential Energy Loss on Gibbs distribution outperforms your RMI loss.
Excuse me, is that really true?
I think that u can easily implement the PE loss and compare it with your RMI loss.
If u can implement the PE loss, could u share the code?
Thanks.