Giter Club home page Giter Club logo

irn's Introduction

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations

outline

The code of:

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, Jiwoon Ahn, Sunghyun Cho, and Suha Kwak, CVPR 2019 [Paper]

This repository contains a framework for learning instance segmentation with image-level class labels as supervision. The key component of our approach is Inter-pixel Relation Network (IRNet) that estimates two types of information: a displacement vector field and a class boundary map, both of which are in turn used to generate pseudo instance masks from CAMs.

Citation

If you find the code useful, please consider citing our paper using the following BibTeX entry.

@InProceedings{Ahn_2019_CVPR,
author = {Ahn, Jiwoon and Cho, Sunghyun and Kwak, Suha},
title = {Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}

Prerequisite

  • Python 3.7, PyTorch 1.1.0, and more in requirements.txt
  • PASCAL VOC 2012 devkit
  • NVIDIA GPU with more than 1024MB of memory

Usage

Install python dependencies

pip install -r requirements.txt

Download PASCAL VOC 2012 devkit

Run run_sample.py or make your own script

python run_sample.py
  • You can either mannually edit the file, or specify commandline arguments.

Train Mask R-CNN or DeepLab with the generated pseudo labels

TO DO

  • Training code for MS-COCO
  • Code refactoring
  • IRNet v2

irn's People

Contributors

jiwoon-ahn avatar sinashish avatar udonda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

irn's Issues

comparation with Affinity

Hi,
In train_irn step, I remove the dispalce loss part and remains only boundary loss.
I notice boundary loss is similar to the AffinityNet which you published in CVPR18 even the detail has some differents. But the semantic mIoU only 37+% which is even worse than CAM result(50%),comared to Affinity result(59%)。
So I confuse the reason for such gap in same idea, similar loss. Have you some suggests? THX

Log files for training

Hi,
Can you share the log files for your training? I am unable to reproduce the performance of IRN reported in the paper using the default hyper-parameters (also mentioned here [Link]).

For instance segmentation, instead of 37.7 [email protected], I am getting the following:

step.eval_ins_seg: Wed Aug 14 09:55:44 2019
0.5iou: {'ap': array([0.0402722 , 0.        , 0.04831983, 0.02532846, 0.01264213,
       0.21497569, 0.13079764, 0.06767052, 0.00229753, 0.08129419,
       0.01570647, 0.05994737, 0.03092302, 0.26370536, 0.02019956,
       0.02099569, 0.0646912 , 0.16558015, 0.23535844, 0.1566734 ]), 'map': 0.08286894241843508}

and for semantic segmentation, instead of 66.5 mIOU, I am getting:

step.eval_sem_seg: Wed Aug 14 10:15:06 2019
0.12114407058527121 0.08625727491374735
0.2459830480445712 0.30624211370783205
{'iou': array([0.79259865, 0.43975817, 0.27018399, 0.42519734, 0.34189571,
       0.43639392, 0.57453956, 0.48851971, 0.41510347, 0.26892431,
       0.54274295, 0.37697739, 0.40495999, 0.47331797, 0.5605337 ,
       0.51401678, 0.39511615, 0.63538235, 0.40350322, 0.50775112,
       0.48067896]), 'miou': 0.4641950199739483}

Thanks.

path index

Hi, Jiwoon Ahn,

I wonder to know what is the path index in the code? which part in the paper could I refer to it?

Additionally, when will the training detail be released? Looking forward to following your work.

Thanks.

COCO training code

"Training code for MS-COCO" is on the TODOs.
Any plans to release this code soonish so as to include in ECCV2020 experiments ?

How to get the initial displacement field?

Thanks for your attention! I am confused about the initial displacement field.

In the Figure 5 of your paper, the "center" image is an initial displacement field. What is meaning of different colors about it? And how we get it? Does it has any relationship with the CAM of the corresponding image?

Looking forward to your reply.

about the function of “Instance Map”

I think it's OK to use "CAM" and "Pairwise Affinities" capturing instance segmentation masks.
Because the "Instance Map" purpose is to distinguish instances, and "Pairwise Affinities" also has this function.
And only using these two modules can make the algorithm simple. Can you tell me why "Instance Map" can't be ignored? Thank you for your reply!

As shown in the figure below.
Image 6

Asking about the Mask-Rcnn training strategy

Hi, Jiwoon Ahn
After transforming the pseudo label to the COCO-style annotations, I trained the Mask R-CNN with ResNet-50-FPN .

But the performance i got is slightly lower than the report ,mAP50 is 45.0.

image

I 'd like to ask you about the mask-rcnn training strategy, what kind data augmentation you adopt.

Thank you !

On the number of convolutional filters in IRNet

I noticed that the convolutinal filter numbers in IRNet (either the class boundary part or the displacement part) is different from the settings in your original paper. So, may I ask, generally speaking, which setting is better in your former experiments? Best wishes.

How to calculate loss?

Hi, I have some questions on how to calculate loss. As all labels are generated according to centres( figure 3 in paper), how to determine These centres in Images?

About every time the results are unstable

在run_sample.py中,加入seed,具体代码如下:
import argparse
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2,3"
from misc import pyutils
import torch
import numpy as np
import random

def setup_seed(seed):
print("random seed is set to", seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True

if name == 'main':

parser = argparse.ArgumentParser()

# Environment
parser.add_argument("--num_workers", default=os.cpu_count()//2, type=int)
parser.add_argument("--voc12_root", default="/disk4/xxx/2022-02-08-wang-peak/irn-master/VOC2012", type=str,
                    help="Path to VOC 2012 Devkit, must contain ./JPEGImages as subdirectory.")

# Dataset
parser.add_argument("--train_list", default="voc12/train_aug.txt", type=str)
parser.add_argument("--val_list", default="voc12/val.txt", type=str)
parser.add_argument("--infer_list", default="voc12/train.txt", type=str,
                    help="voc12/train_aug.txt to train a fully supervised model, "
                         "voc12/train.txt or voc12/val.txt to quickly check the quality of the labels.")
parser.add_argument("--chainer_eval_set", default="train", type=str)
parser.add_argument("--seed", default=15, type=int)

# Class Activation Map
parser.add_argument("--cam_network", default="net.resnet50_cam", type=str)
parser.add_argument("--cam_crop_size", default=512, type=int)
parser.add_argument("--cam_batch_size", default=16, type=int)
parser.add_argument("--cam_num_epoches", default=5, type=int)
parser.add_argument("--cam_learning_rate", default=0.1, type=float)
parser.add_argument("--cam_weight_decay", default=1e-4, type=float)
parser.add_argument("--cam_eval_thres", default=0.15, type=float)
parser.add_argument("--cam_scales", default=(1.0, 0.5, 1.5, 2.0),
                    help="Multi-scale inferences")

# Mining Inter-pixel Relations
parser.add_argument("--conf_fg_thres", default=0.30, type=float)
parser.add_argument("--conf_bg_thres", default=0.05, type=float)

# Inter-pixel Relation Network (IRNet)
parser.add_argument("--irn_network", default="net.resnet50_irn", type=str)
parser.add_argument("--irn_crop_size", default=512, type=int)
parser.add_argument("--irn_batch_size", default=32, type=int)
parser.add_argument("--irn_num_epoches", default=3, type=int)
parser.add_argument("--irn_learning_rate", default=0.1, type=float)
parser.add_argument("--irn_weight_decay", default=1e-4, type=float)

# Random Walk Params
parser.add_argument("--beta", default=10)
parser.add_argument("--exp_times", default=8,
                    help="Hyper-parameter that controls the number of random walk iterations,"
                         "The random walk is performed 2^{exp_times}.")
parser.add_argument("--ins_seg_bg_thres", default=0.25)
parser.add_argument("--sem_seg_bg_thres", default=0.25)

# Output Path
parser.add_argument("--log_name", default="sample_train_eval", type=str)
parser.add_argument("--cam_weights_name", default="sess/res50_cam.pth", type=str)
parser.add_argument("--irn_weights_name", default="sess/res50_irn.pth", type=str)
parser.add_argument("--cam_out_dir", default="result/cam", type=str)
parser.add_argument("--ir_label_out_dir", default="result/ir_label", type=str)
parser.add_argument("--sem_seg_out_dir", default="result/sem_seg", type=str)
parser.add_argument("--ins_seg_out_dir", default="result/ins_seg", type=str)

# Step
parser.add_argument("--train_cam_pass", default=True)
parser.add_argument("--make_cam_pass", default=True)
parser.add_argument("--eval_cam_pass", default=True)
parser.add_argument("--cam_to_ir_label_pass", default=False)
parser.add_argument("--train_irn_pass", default=False)
parser.add_argument("--make_ins_seg_pass", default=False)
parser.add_argument("--eval_ins_seg_pass", default=False)
parser.add_argument("--make_sem_seg_pass", default=False)
parser.add_argument("--eval_sem_seg_pass", default=False)

args = parser.parse_args()
setup_seed(args.seed)
os.makedirs("sess", exist_ok=True)
os.makedirs(args.cam_out_dir, exist_ok=True)
os.makedirs(args.ir_label_out_dir, exist_ok=True)
os.makedirs(args.sem_seg_out_dir, exist_ok=True)
os.makedirs(args.ins_seg_out_dir, exist_ok=True)

pyutils.Logger(args.log_name + '.log')
print(vars(args))

if args.train_cam_pass is True:
    import step.train_cam

    timer = pyutils.Timer('step.train_cam:')
    step.train_cam.run(args)

if args.make_cam_pass is True:
    import step.make_cam

    timer = pyutils.Timer('step.make_cam:')
    step.make_cam.run(args)

if args.eval_cam_pass is True:
    import step.eval_cam

    timer = pyutils.Timer('step.eval_cam:')
    step.eval_cam.run(args)

if args.cam_to_ir_label_pass is True:
    import step.cam_to_ir_label

    timer = pyutils.Timer('step.cam_to_ir_label:')
    step.cam_to_ir_label.run(args)

if args.train_irn_pass is True:
    import step.train_irn

    timer = pyutils.Timer('step.train_irn:')
    step.train_irn.run(args)

if args.make_ins_seg_pass is True:
    import step.make_ins_seg_labels

    timer = pyutils.Timer('step.make_ins_seg_labels:')
    step.make_ins_seg_labels.run(args)

if args.eval_ins_seg_pass is True:
    import step.eval_ins_seg

    timer = pyutils.Timer('step.eval_ins_seg:')
    step.eval_ins_seg.run(args)

if args.make_sem_seg_pass is True:
    import step.make_sem_seg_labels

    timer = pyutils.Timer('step.make_sem_seg_labels:')
    step.make_sem_seg_labels.run(args)

if args.eval_sem_seg_pass is True:
    import step.eval_sem_seg

    timer = pyutils.Timer('step.eval_sem_seg:')
    step.eval_sem_seg.run(args)

int32 error

Change it to the following
`def load_img_name_list(dataset_path):

img_name_list = np.loadtxt(dataset_path, dtype=str)
img_name_list = np.array(img_name_list, dtype=float)
return img_name_list`

CAM_to_irlabel and train_irn

Hello, I would like to ask how to set the parameters for cam_to_irlabel, train_irn, and make_seg_labels. After using these methods, the performance improvement has been minimal. I have tried many parameters, but found that the performance does not change much.

How to process test data?

Hi,
For train/val data, CAMs firstly filter by GT classification labels, then get final segmentation by argmax after norming remained CAMs.
But How to handle with test data? Should I generate test classification label to do similar filter? or multiply cls probabilty with corresponding CAM?

Time cost of generating one pseudo instance mask

Hi,

After testing the IRNet, I found it takes about 3 seconds to generate one pseudo instance mask on my machine.
I searched around and found no one mentioned the efficiency here, or even in the WSIS community.
Or maybe I missed some paper/post.

I understand for the final goal the inference time matters, not the time of generating one pseudo instance mask.
But is there any way that I can make it faster? Why people don't care about this?

Thanks

Training on own dataset

thanks for the great work, I wish to train the network on Berkeley deep drive dataset where we have 2d bounding boxes in json files.
what would be the steps? is there a data converter available? I am trying to get the dataset in pascal voc 2012 format

thanks

about the search indices

` for x in range(1, max_radius):
search_dirs.append((0, x))

    for y in range(1, max_radius): 
        for x in range(-max_radius + 1, max_radius):
            if x * x + y * y < max_radius ** 2:
                search_dirs.append((y, x))`

Thanks for sharing the work. I think the search_dirs seems to be a half circle instead of a circle. Not sure whether i understand it correctly.
Look forward to your reply.

Help For the CAMs

I'am so sorry to bother you @jiwoon-ahn , I have a little trouble in the code you shared here.
At first step, generating class attention maps , I convert the *.npy file ,and find the picture like the fig.1.
What tasks should I do to get the results in your paper in 2018cvpr, like fig.2.
And the classification network only train about 5 epoches, I don't know whether it is enough.(I really don't know the reason, sincerely for help)
I'm looking forward to your letter. Thanks a lot.
image

image

About train_aug.txt

Congratulations! This is really good work!

As I was running your code, I find that train_aug.txt file was used to train CAM. I wonder where is this file comes from? And why not directly use VOC2012 trainval set?

Thanks a lot!

I have ran your code, but the results are not good as yours.

I have ran your code, but the results are not good as yours. So do you have some special skills to run the code? Thanks.

Instance segmentation + training dataset (0.5AP): mine 35.7, yours 37.7;
Semantic segmentation + training dataset (miou): mine 66.0, yours 66.5;

get AssertionError when eval_ins_seg.py

Traceback (most recent call last):
File "run_sample.py", line 119, in
step.eval_ins_seg.run(args)
File "/home/maskrcnn-benchmark/irn/step/eval_ins_seg.py", line 10, in run
gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))]
File "/home/irn/step/eval_ins_seg.py", line 10, in
gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))]
File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/chainer_experimental/datasets/sliceable/getter_dataset.py", line 89, in get_example_by_keys
cache[getter_index] = self._gettersgetter_index
File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_instance_segmentation_dataset.py", line 66, in _get_annotations
label_img, inst_img)
File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_utils.py", line 55,in image_wise_to_instance_wise
assert lbl != -1
AssertionError

[ OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).

In step.make_cam, the following error occurred:

[ OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.

Has anyone else seen the same issue?

Please kindly advice on how to fix this one.

Thanks a lot

BTW, I was running sampy.py on NVDIA-SMI 410.79, Driver Version: 410.79, CUDA version 10.0

Tuning GN using inference data?

Dear Jiwoon, in the file 'train_irn.py', I noticed that GN was tuning using the inference data in the latest commit, location. Is this right in the weakly supervised instance segmentation setting? I think the validation set should not be touched except for evaluation, rather than training/tuning parameters. And I'm also curious what would be affected by this? Will the mAP be improved? Thanks

Performance is poor after re-train a Mask RCNN

Hi,
I took the instance-level pseudo labels generated by running `make_ins_seg_labels.py' and kept the instance mask whose score is higher than 0.
Then, I transfered these labels from *.npy to cocostyle json annotation and trained the standard Mask R-CNN with ResNet-50-FPN.
However, the performance I've get is:

image

Specifically, box mAP of AP50 is 45.8, segmentation mAP of AP50 is 22.6.
I noticed that the instance number in pseudo label is about 2/3 of the gt instance number for `train_aug' set.
Did I miss something to reimplement the performance of Mask R-CNN with pseudo label?

Thanks a lot!

Pre-trained models

Hi, first of all, thanks for the amazing work

I was wandering if you intend to provide pretrained models, mainly cam reset and Irnet

thanks

For comparison with AffinityNet implementation details in your paper

Hi, in your paper, you have such part:
Comparison to AffinityNet: For a fair comparison, we
modified AffinityNet [1] by replacing its backbone with
ResNet50 as in our IRNet. Then we compare IRNet with
the modified AffinityNet in terms of the accuracy of
pseudo segmentation labels (Table 2) and performance of
DeepLab [5] trained with these pseudo labels (Table 4).

Could you provide more implementation details for this part using this repro?? Looking forward to your reply~

How many epochs for IRN ?

I have train CAM for 5 epochs and get 48.3 miou,
but map and miou is low for sem_seg and ins_seg, when I train it 3 epochs.

Every thing not changed except I modify train_irn.py in line40,
model = torch.nn.DataParallel(model).cuda()

deeplab-v2 and CRF

Hi Jiwoon,
the original deeplab-v2 with VGG16 and ResNet-101 have somewhat different architectures (e.g. design of ASPP module). I was wondering, in your implementation with ResNet-50, did you use ResNet-101 as the reference, or the VGG-based one? Also, from A.2 it seems that you used CRF to compute the upper bound. Did you also use CRF after fully-supervised training on the pseudo labels?
Thanks in advance,
Nikita

Training is so slow after first epoch

Hello,

We were using a custom dataset for this repo. Training CAM is too slow. After the first epoch, it shows an estimated finish time of 2.5 days later.

Our training dataset has 8960 images. The batch size is 4.

Have you ever faced this problem? Thank you.

When will the code be released?

Hi Jiwoon Ahn, congratulations! I'm really interested in your code and can't wait to try it out. So when are you going to release the code? Thank you! The paper was great!

Inter-pixel relation mining. Point neighborhood.

Hi! Thanks for the great work!
Why You take only a half of circle in get_search_paths_dst method of PathIndex class:

        for x in range(1, max_radius):
            search_dirs.append((0, x))

        for y in range(1, max_radius):
            for x in range(-max_radius + 1, max_radius):
                if x * x + y * y < max_radius ** 2:
                    search_dirs.append((y, x))

Maybe I miss something? Thanks for the explanation! :)

using own dataset

I am trying to adjust the code to my own dataset. However, I am really struggling since I am not a pro at python.

How can I generate cls_labels.npy for a different dataset? The script make_cls_labels.py does not work. Plus, it makes use of .xml files. Is there an easier way to generate a dictionary with image level labels?

cls_labels_dict = np.load('voc12/cls_labels.npy', allow_pickle=True).item()
print(cls_labels_dict) # 2011003271: array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)}

Also, my images don't share the same naming conventions as VOC12, so this part of the code creates a ton of problems:
def decode_int_filename(int_filename):
s = str(int(int_filename))

image

about the paper

Congratulation! may I ask when will you upload the paper on arxiv? I've been keeping track of weakly supervised learning now

Performance Gap and Hyper-parameter Settings

Hi Jiwoon Ahn,
Your paper is very good and I'm really interested in it. I've already tried your code, but I cannot achieve the same performace as the paper. Would you please help me figure out where the problem is?

In my experiments, the learning rates of both CAM and IRN are set to 0.1, while other hyper-parameters follow the default setting in rum_sample.py. My performance are as following,

model task my exp. reported
CAM semantic segmentation 48.1 48.3
IRN semantic segmentation 64.9 66.5
IRN instance segmentation 32.4 37.7

The CAM models have similar performace, but there are performance gaps between IRN models in both task.

There may be two possible reasons for the gap.

  1. I notice the hyper-parameter settings in the paper and the code are not exactly the same. The exp_times is set to 8 in the code, while in the paper it is set to 256 (which also does not work in my case).
  2. Anthor possible problem is that multiscale testing is only used in CAM, but not in IRN.

Would you please point out the differences between my experiments and yours that may results in the gap? Thank you!

About the L_fg^D loss

Thank you for the very good work!
I have a question about the L_fg^D loss, in section 4.3, why the difference of (i, j) in D denotes D(x_i)-D(x_j) rather than D(x_j)-D(x_i)?
I'm very confused about this point, looking forward to your reply.
Thank you very much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.