Giter Club home page Giter Club logo

rmi's Introduction

Region Mutual Information Loss for Semantic Segmentation

Table of Contents

Introduction

This is the code for the NeurIPS 2019 paper Region Mutual Information Loss for Semantic Segmentation.

This paper proposes a region mutual information (RMI) loss to model the dependencies among pixels. RMI uses one pixel and its neighbor pixels to represent this pixel. Then for each pixel in an image, we get a multi-dimensional point that encodes the relationship between pixels, and the image is cast into a multi-dimensional distribution of these high-dimensional points. The prediction and ground truth thus can achieve high order consistency through maximizing the mutual information (MI) between their multi-dimensional distributions.

img_intro

Features and TODO

  • Support different segmentation models, i.e., DeepLabv3, DeepLabv3+, PSPNet
  • Multi-GPU training
  • Multi-GPU Synchronized BatchNorm
  • Support different backbones, e.g., Mobilenet, Xception
  • Model pretrained on MS-COCO
  • Distributed training

We are open to pull requests.

Installation

Install dependencies

Please install PyTorch-1.1.0 and Python3.6.5. We highly recommend you to use our established PyTorch docker image - zhaosssss/torch_lab.

docker pull zhaosssss/torch_lab:1.1.0

If you have not installed docker, see https://docs.docker.com/.

After you install docker and pull our image, you can cd to script directory and run

./docker.sh

to create a running docker container.

If you do not want to use docker, try

pip install -r requirements.txt

However, this is not suggested.

Prepare data

Generally, directories are organized as follow:

|
|--dataset (save the dataset) 
|--models  (save the output checkpoints)
|--github  (save the code)
|--|
|--|--RMI  (the RMI code repository)
|--|--|--crf
|--|--|--dataloaders
|--|--|--losses
...

As for the CamVid dataset, you can download at SegNet-Tutorial. This is a processed version of original CamVid dataset.

Training

See script/train.sh for detailed information. Before start training, you should specify some variables in the script/train.sh.

  • pre_dir, where you save your output checkpoints. If you organize the dir as we suggest, it should be pre_dir=models.

  • data_dir, where you save your dataset. Besides, you should put the lists of the images in the dataset in a certain directory, check dataloaders/datasets/pascal.py to find how we organize the input pipeline.

You can find more information about the arguments of the code in parser_params.py.

python parser_params.py --help

usage: parser_params.py [-h] [--resume RESUME] [--checkname CHECKNAME]
                        [--save_ckpt_steps SAVE_CKPT_STEPS]
                        [--max_ckpt_nums MAX_CKPT_NUMS]
                        [--model_dir MODEL_DIR] [--output_dir OUTPUT_DIR]
                        [--seg_model {deeplabv3,deeplabv3+,pspnet}]
                        [--backbone {resnet50,resnet101,resnet152,resnet50_beta,resnet101_beta,resnet152_beta}]
                        [--out_stride OUT_STRIDE] [--batch_size N]
                        [--accumulation_steps N] [--test_batch_size N]
                        [--dataset {pascal,coco,cityscapes,camvid}]
                        [--train_split {train,trainaug,trainval,val,test}]
                        [--data_dir DATA_DIR] [--use_sbd] [--workers N]
                        ...
                        [--rmi_pool_size RMI_POOL_SIZE]
                        [--rmi_pool_stride RMI_POOL_STRIDE]
                        [--rmi_radius RMI_RADIUS]
                        [--crf_iter_steps CRF_ITER_STEPS]
                        [--local_rank LOCAL_RANK] [--world_size WORLD_SIZE]
                        [--dist_backend DIST_BACKEND]
                        [--multiprocessing_distributed]

After you set all the arguments properly, you can simply cd to RMI/script and run

./train.sh

to start training.

  • Monitoring the training process through tensorboard
tensorboard --logdir=your_logdir --port=your_port

img_ten

  • GPU memory usage

Training a DeepLabv3 model with output_stride=16, crop_size=513, and batch_size=16 needs 4 GTX 1080 GPUs (8GB) or 2 GTX TITAN X GPUs (12 GB) or 1 TITAN RTX GPUs (24 GB).

Evaluation and Inference

See script/eval.sh and script/inference.sh for detailed information.

You should also specify some variables in the scripts.

  • data_dir, where you save your dataset.

  • resume, where your checkpoints locate.

  • output_dir, where the output data will be saved.

Then run

./eval.sh

or

./inference.sh

Experiments

img_res01 img_res02

img_res03

Some selected qualitative results on PASCAL VOC 2012 val set. Segmentation results of DeepLabv3+&RMI have richer details than DeepLabv3+&CE, e.g., small bumps of the airplane wing, branches of plants, limbs of cows and sheep, and so on.

Citations

If our paper and code are beneficial to your work, please cite:

@inproceedings{2019_zhao_rmi,
  author    = {Shuai Zhao and
               Yang Wang and
               Zheng Yang and
               Deng Cai},
  title     = {Region Mutual Information Loss for Semantic Segmentation},
  booktitle = {NeurIPS},
  year      = {2019},
}

If other related work in our code or paper also helps you, please cite the corresponding papers.

Acknowledgements

img_cad

rmi's People

Contributors

mzhaoshuai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rmi's Issues

Use CRF as post-process in image segmentation

Hello, thank you for your code.
I have a question to ask you, my network output is y_ pred: (10242,36) 10242 is the number of pixels, 36 is the number of categories, y_ PRED can be expressed as the probability that each pixel belongs to a certain class. Y_ true: (10242,36) one hot. How do you use CRF for post-processing?

The intuition behind RMI loss

Since the intuition behind RMI loss is to model the dependencies among pixels, the improvement of boundary segmentation should be not obvious. Is it right?

Effect of resizing image & mask without keeping width-height ratio in the data augmentation

Hello
How are you?
Thanks for contribution to this project.
I'm NOT sure if this RMI loss would work well in case that we resize the image & mask to input size(NxN in pixels) without keeping width-height ratio in the data augmentation step.
I am working on image segmentation project.
There are many images & masks with different sizes in my dataset.
The data by dataloader are resized to input size(ex: 256x256) and feed into the model.
So the original width-height ratio of image & mask are NOT kept.
Even in such case, does this RMI loss work well?

Training a DeepLabv3 model with output_stride=16, crop_size=513, and batch_size=8 on two 2080Ti GPUs

Hi, I only have two 2080Ti GPUs with memory 11G per gpu. I'd like to train the baseline deeplabv3 with resnet-101 as backbone and batch_size=8 per gpu (for 2 gpus, global batch_size=16):

input the gpu (seperate by comma (,) ): 0,1
using gpus 0,1

0  --  deeplabv3
1  --  deeplabv3+
2  --  pspnet
choose the base network: 0

0  --  resnet_v1_50
1  --  resnet_v1_101
2  --  resnet_v1_152
choose the base network: 1
The backbone is resnet101
The base model is deeplabv3

0  --  softmax cross entropy loss.
1  --  sigmoid binary cross entropy loss.
2  --  bce and RMI loss.
3  --  Affinity field loss.
5  --  Pyramid loss.
input the loss type of the first stage: 2

0 -- PASCAL VOC2012 dataset
1 -- Cityscapes
2 -- CamVid
input the dataset: 0

input the batch_size (4, 8, 12 or 16): 8
The data dir is /workspace/data/PASCAL_VOC2012/VOCdevkit/VOC2012, the batch size is 8.
make the directory /workspace/pyroom/RMISegLoss/rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-1-0.5_n
Namespace(accumulation_steps=1, backbone='resnet101', base_size=513, batch_size=8, bn_mom=0.05, checkname='deeplab-resnet', crf_iter_steps=1, crop_size=513, cuda=True, data_dir='/workspace/data/PASCAL_VOC2012/VOCdevkit/VOC2012', dataset='pascal', dist_backend='nccl', distributed=True, epochs=23, eval_interval=2, freeze_bn=False, ft=False, gpu_ids=[0, 1], init_global_step=0, init_lr=0.007, local_rank=0, loss_type=2, loss_weight_lambda=0.5, lr_multiplier=10.0, lr_scheduler='poly', main_gpu=0, max_ckpt_nums=15, model_dir='/workspace/pyroom/RMISegLoss/rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-1-0.5_n', momentum=0.9, multiprocessing_distributed=False, nesterov=False, no_cuda=False, no_val=False, out_stride=16, output_dir='/home/zhaoshuai/models/deeplabv3_cbl_2/', proc_name='rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-1-0.5_n', resume='None', rmi_pool_size=4, rmi_pool_stride=4, rmi_pool_way=1, rmi_radius=3, save_ckpt_steps=500, seed=1, seg_model='deeplabv3', slow_start_lr=0.0001, slow_start_steps=1500, start_epoch=0, sync_bn=True, test_batch_size=8, train_split='trainaug', use_balanced_weights=False, use_sbd=False, weight_decay=0.0001, workers=8, world_size=2)
INFO:PyTorch: Using PASCAL VOC dataset, the training batch size 8 and crop size is 513.
Number of image_lists in trainaug: 10582
Number of image_lists in val: 1449
Restore parameters from the /root/.encoding/models/resnet101-2a57e44d.pth
INFO:PyTorch: Using Region Mutual Information Loss.
INFO:PyTorch: The batch norm layer is Hang Zhang's <class 'model.sync_bn.syncbn.BatchNorm2d'>
INFO:PyTorch: Using poly learning rate scheduler!
INFO:PyTorch: Starting Epoch: 0
INFO:PyTorch: Total Epoches: 23

I wonder if it is equal to train a DeepLabv3 model with output_stride=16, crop_size=513, and batch_size=16 on a single 1 TITAN RTX GPUs? Will it achieve similar convergence in 23 epochs.

Does the batch_size matter? If so, how can I adjust other hyperparams with batch_size=8, like epochs, lr as well as the lr_scheduler?

The RMI loss provides negative loss

Hello,
Firstly thanks for this work

I'm currently working with this RMI loss on my own segmentation toolbox,
but i found the RMI loss provide negative loss

I just copied all codes from rmi.py and rmi_utils.py, then use this RMI loss instead of cross entropy loss

is it normal issue that RMI loss provide negative loss at the beginning of training?

thanks you

Negative RMI loss

Out of the box, i'm seeing negative RMI loss. Is that expected? I'm using the provided docker image.

save model into /home/dcg-adlr-atao-source.cosmos318/sources/RMI/rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-0-0.5_n
Namespace(accumulation_steps=1, backbone='resnet50', base_size=513, batch_size=8, bn_mom=0.0003, checkname='deeplab-resnet', crf_iter_steps=1, crop_size=513, cuda=True, data_dir='/home/dcg-adlr-atao-data.cosmos277/data/PASCAL/2012/VOCdevkit/VOC2012', dataset='pascal', dist_backend='nccl', distributed=False, epochs=23, eval_interval=2, freeze_bn=False, ft=False, gpu_ids=[0], init_global_step=0, init_lr=0.007, local_rank=0, loss_type=2, loss_weight_lambda=0.5, lr_multiplier=10.0, lr_scheduler='poly', main_gpu=0, max_ckpt_nums=15, model_dir='/home/dcg-adlr-atao-source.cosmos318/sources/RMI/rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-0-0.5_n', momentum=0.9, multiprocessing_distributed=False, nesterov=False, no_cuda=False, no_val=False, out_stride=16, output_dir='/home/zhaoshuai/models/deeplabv3_cbl_2/', proc_name='rmi_model/rmi_re_pascal_r3_pw1_st4_si4_bp513-8_net0-0-0.5_n', resume='None', rmi_pool_size=4, rmi_pool_stride=4, rmi_pool_way=1, rmi_radius=3, save_ckpt_steps=500, seed=1, seg_model='deeplabv3', slow_start_lr=0.0001, slow_start_steps=1500, start_epoch=0, sync_bn=False, test_batch_size=8, train_split='trainaug', use_balanced_weights=False, use_sbd=False, weight_decay=4e-05, workers=8, world_size=1)
INFO:PyTorch: Using PASCAL VOC dataset, the training batch size 8 and crop size is 513.
Number of image_lists in trainaug: 10582
Number of image_lists in val: 1449
Restore parameters from the /home/atao/.encoding/models/resnet101-2a57e44d.pth
INFO:PyTorch: Using Region Mutual Information Loss.
INFO:PyTorch: Using poly learning rate scheduler!
INFO:PyTorch: Starting Epoch: 0
INFO:PyTorch: Total Epoches: 23
INFO:PyTorch: epoch=1/23, steps=20, loss=-30.94986, learning_rate=0.00019, train_miou=0.02007, px_accuracy=0.22422 (20.791 sec)
INFO:PyTorch: epoch=1/23, steps=40, loss=-31.49414, learning_rate=0.00028, train_miou=0.02949, px_accuracy=0.44150 (15.590 sec)
INFO:PyTorch: epoch=1/23, steps=60, loss=-32.38523, learning_rate=0.00038, train_miou=0.03192, px_accuracy=0.51059 (15.593 sec)
INFO:PyTorch: epoch=1/23, steps=80, loss=-32.04794, learning_rate=0.00047, train_miou=0.03653, px_accuracy=0.55031 (15.519 sec)
INFO:PyTorch: epoch=1/23, steps=100, loss=-30.86725, learning_rate=0.00056, train_miou=0.04386, px_accuracy=0.57324 (15.507 sec)
INFO:PyTorch: epoch=1/23, steps=120, loss=-31.81583, learning_rate=0.00065, train_miou=0.05846, px_accuracy=0.59686 (15.437 sec)
INFO:PyTorch: epoch=1/23, steps=140, loss=-32.34855, learning_rate=0.00074, train_miou=0.07140, px_accuracy=0.62079 (15.426 sec)
...

The RMI loss does not change too much

I tested the rmi loss with random inputs and I found the rmi loss does not change too much. Is it normal? My test code is as follows.

logits = np.random.randn(5, 3, 32, 32)
labels = np.random.randint(0, 3, size=(5, 32, 32))

logits = torch.from_numpy(logits.astype(np.float32))
labels = torch.from_numpy(labels.astype(np.int32))

rmiloss= RMILoss(num_classes=3)(logits, labels)
print(rmiloss)

RMI

import java.io.;
import java.net.
;
// Server class
class Server {
public static void main(String[] args)
{
ServerSocket server = null;
try {
// server is listening on port 1234
server = new ServerSocket(1234);
server.setReuseAddress(true);
// running infinite loop for getting
// client request
while (true) {
// socket object to receive incoming client
// requests
Socket client = server.accept();
// Displaying that new client is connected
// to server
System.out.println("New client connected"

  • client.getInetAddress()
    .getHostAddress());
    // create a new thread object
    ClientHandler clientSock
    = new ClientHandler(client);
    // This thread will handle the client
    // separately
    new Thread(clientSock).start();
    }
    }
    catch (IOException e) {
    e.printStackTrace();
    }
    finally {
    if (server != null) {
    try {
    server.close();
    }
    catch (IOException e) {
    e.printStackTrace();
    }
    }
    }
    }
    // ClientHandler class
    private static class ClientHandler implements Runnable {
    private final Socket clientSocket;
    // Constructor
    public ClientHandler(Socket socket)
    {
    this.clientSocket = socket;
    }
    public void run()
    {
    PrintWriter out = null;
    BufferedReader in = null;
    try {
    // get the outputstream of client
    out = new PrintWriter(
    clientSocket.getOutputStream(), true);
    // get the inputstream of client
    in = new BufferedReader(
    new InputStreamReader(
    clientSocket.getInputStream()));
    String line;
    while ((line = in.readLine()) != null) {

Not available to download ResNet101 pretrained model.

Hello there,
I tried to download ResNet101 pretrained model using by model_store.py scripts,

but, it was not possible to download ResNet101 pretrained model.

Can you provide pretrained model by other way?
otherwise, it is not possible to reproduce your work....
i just trained from scratch using camvid datasets... it was around 60% mIoU using RMI Loss..

thanks.

cholesky_cuda: For batch 0: U(1,1) is zero, singular U.

Great work!

We meet an issue caused by the computation of chol = torch.cholesky(matrix). We have pasted the error information as shown below,

RuntimeError:     cholesky_cuda: For batch 0: U(1,1) is zero, singular U.
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: rmi_now = 0.5 * log_det_by_cholesky(appro_var + diag_matrix.type_as(appro_var) * _POS_ALPHA)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010:       File "/teamdrive/yuyua/code/segmentation/mmsegmentation/mmseg/models/losses/rmi_loss.py", line 118, in log_det_by_cholesky
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: chol = torch.cholesky(matrix)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010:         chol = torch.cholesky(matrix)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: chol = torch.cholesky(matrix)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: RuntimeError    chol = torch.cholesky(matrix)
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: RuntimeError: cholesky_cuda: For batch 0: U(1,1) is zero, singular U.
2020-08-11T12:37:40.000Z /container_e2240_1583898264103_325873_01_000010: : cholesky_cuda: For batch 0: U(1,1) is zero, singular U.

Sample weights for RMI loss

Some augmentations (e.g. random angle rotation) make image and mask not fully significant.
To deal with such cases i usually use per pixel weights (0. for holes, 1. for correct parts) and multiply per pixel loss on that weights.

But RMI loss uses "high dimension points" and final loss has shape incompatible with original labels.

Could you please suggest what is the best way to decouple such "holes" loss (multiply by pixel weight)?

Benchmarking on Cityscapes

Hello,

Do you have any results on Cityscapes datasets?

I just wonder rmi loss will bring better performance on cityscapes datset

Thank you

About setting of baseline

Hi,your work is so amazing and help me a lot!
I mentioned in your reported result in paper, you did some comparison between deeplab and your results on different datasets, but I ran your code with batch size=16, deeplab v3+ model , resnet101 backbone on VOC2012 dataset, but only get 0.772miou after about 30k iterations(just the default setting), so can you tell me how can I set the hyper-parameters to get the desire result (including 78.8 with crossentropy loss and higer miou with your proposed loss)?
this is my result on val dataset:
image
Thank you very much for your excellent work!

How to weight some examples if loss may be negative

In most cases when we have loss >= 0 with shape [batch_size] and we want to weight up importance of some examples we would multiply loss by weight. E.g. loss = [0.1, 0.3], weights = [2., 1.], weighted_loss = [0.2, 0.3]

But how should we do that for RMI loss that may be negative?
E.g. loss = [-0.1, -0.3], weights = [2., 1.], weighted_loss = [-0.2, -0.3]. In this example weighted loss will be smaller instead of expected "larger".

Should we multiply loss by weights or divide?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.