Giter Club home page Giter Club logo

pytorch-segmentation-detection's Introduction

Image Segmentation and Object Detection in Pytorch

Pytorch-Segmentation-Detection is a library for image segmentation and object detection with reported results achieved on common image segmentation/object detection datasets, pretrained models and scripts to reproduce them.

Segmentation

PASCAL VOC 2012

Implemented models were tested on Restricted PASCAL VOC 2012 Validation dataset (RV-VOC12) or Full PASCAL VOC 2012 Validation dataset (VOC-2012) and trained on the PASCAL VOC 2012 Training data and additional Berkeley segmentation data for PASCAL VOC 12.

You can find all the scripts that were used for training and evaluation here.

This code has been used to train networks with this performance:

Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link Related paper
Resnet-18-8s RV-VOC12 59.0 in prog. in prog. 28 ms. Dropbox DeepLab
Resnet-34-8s RV-VOC12 68.0 in prog. in prog. 50 ms. Dropbox DeepLab
Resnet-50-16s VOC12 66.5 in prog. in prog. in prog. in prog. DeepLab
Resnet-50-8s VOC12 67.0 in prog. in prog. in prog. in prog. DeepLab
Resnet-50-8s-deep-sup VOC12 67.1 in prog. in prog. in prog. in prog. DeepLab
Resnet-101-16s VOC12 68.6 in prog. in prog. in prog. in prog. DeepLab
PSP-Resnet-18-8s VOC12 68.3 n/a n/a n/a in prog. PSPnet
PSP-Resnet-50-8s VOC12 73.6 n/a n/a n/a in prog. PSPnet

Some qualitative results:

Alt text

Endovis 2017

Implemented models were trained on Endovis 2017 segmentation dataset and the sequence number 3 was used for validation and was not included in training dataset.

The code to acquire the training and validating the model is also provided in the library.

Additional Qualitative results can be found on this youtube playlist.

Binary Segmentation

Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link
Resnet-9-8s Seq # 3 * 96.1 in prog. in prog. 13.3 ms. Dropbox
Resnet-18-8s Seq # 3 96.0 in prog. in prog. 28 ms. Dropbox
Resnet-34-8s Seq # 3 in prog. in prog. in prog. 50 ms. in prog.

Resnet-9-8s network was tested on the 0.5 reduced resoulution (512 x 640).

Qualitative results (on validation sequence):

Alt text

Multi-class Segmentation

Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link
Resnet-18-8s Seq # 3 81.0 in prog. in prog. 28 ms. Dropbox
Resnet-34-8s Seq # 3 in prog. in prog. in prog. 50 ms. in prog

Qualitative results (on validation sequence):

Alt text

Cityscapes

The dataset contains video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames. The annotations contain 19 classes which represent cars, road, traffic signs and so on.

Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link
Resnet-18-32s Validation set 61.0 in prog. in prog. in prog. in prog.
Resnet-18-8s Validation set 60.0 in prog. in prog. 28 ms. Dropbox
Resnet-34-8s Validation set 69.1 in prog. in prog. 50 ms. Dropbox
Resnet-50-16s-PSP Validation set 71.2 in prog. in prog. in prog. in prog.

Qualitative results (on validation sequence):

Whole sequence can be viewed here.

Alt text

Installation

This code requires:

  1. Pytorch.

  2. Some libraries which can be acquired by installing Anaconda package.

    Or you can install scikit-image, matplotlib, numpy using pip.

  3. Clone the library:

git clone --recursive https://github.com/warmspringwinds/pytorch-segmentation-detection

And use this code snippet before you start to use the library:

import sys
# update with your path
# All the jupyter notebooks in the repository already have this
sys.path.append("/your/path/pytorch-segmentation-detection/")
sys.path.insert(0, '/your/path/pytorch-segmentation-detection/vision/')

Here we use our pytorch/vision fork, which might be merged and futher merged in a future. We have added it as a submodule to our repository.

  1. Download segmentation or detection models that you want to use manually (links can be found below).

About

If you used the code for your research, please, cite the paper:

@article{pakhomov2017deep,
  title={Deep Residual Learning for Instrument Segmentation in Robotic Surgery},
  author={Pakhomov, Daniil and Premachandran, Vittal and Allan, Max and Azizian, Mahdi and Navab, Nassir},
  journal={arXiv preprint arXiv:1703.08580},
  year={2017}
}

During implementation, some preliminary experiments and notes were reported:

pytorch-segmentation-detection's People

Contributors

erasaur avatar peteflorence avatar randl avatar warmspringwinds avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-segmentation-detection's Issues

Changing the number of classes

Hi there,

First, thanks a lot for the good work, it's really useful!

I am trying to train the model on 1 only one class (that class + background) using the code in resnet_34_8s_train.ipynb in a .py file . I am confident my dataset only has one class, so I change the number of class from 21 to 2, but I get the following error after when starting the first iteration:

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1518238441757/work/torch/lib/THC/generic/THCStorage.cu:58

I just wanted to make sure that for only 1 class, I should set number_of_classes = 2 instead of 21, and that you are able to make work with you home code work with a different number of classes? The full error is below:

  File "<ipython-input-1-574834e79b43>", line 1, in <module>
    runfile('/home/ft_fcnpt/pytorch-segmentation-detection-master/pytorch_segmentation_detection/recipes/pascal_voc/segmentation/py_version2.py', wdir='/home/john/ft_fcnpt/pytorch-segmentation-detection-master/pytorch_segmentation_detection/recipes/pascal_voc/segmentation')

  File "/home/anaconda3/envs/pt27/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "/home/anaconda3/envs/pt27/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile
    builtins.execfile(filename, *where)

  File "/home/ft_fcnpt/pytorch-segmentation-detection-master/pytorch_segmentation_detection/recipes/pascal_voc/segmentation/py_version2.py", line 280, in <module>
    loss.backward()

  File "/home/anaconda3/envs/pt27/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)

  File "/home/anaconda3/envs/pt27/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)

  File "/home/anaconda3/envs/pt27/lib/python2.7/site-packages/torch/autograd/function.py", line 91, in apply
    return self._forward_cls.backward(self, *args)

  File "/home/anaconda3/envs/pt27/lib/python2.7/site-packages/torch/nn/_functions/thnn/upsampling.py", line 283, in backward
    grad_input = UpsamplingBilinear2dBackward.apply(grad_output, ctx.input_size, ctx.output_size)

  File "/home/anaconda3/envs/pt27/lib/python2.7/site-packages/torch/nn/_functions/thnn/upsampling.py", line 296, in forward
    grad_output = grad_output.contiguous()

Incompatible ResNet arguments

While executing any of the notebooks in: pytorch_segmentation_detection/recipes/pascal_voc/segmentation/*.ipynb
the errors relate to initialization regarding unknown arguments for ResNet object init. Tried on multiple cloud / GPU setups with same output. Maybe there are unchecked files (there could be cached files in your setups. On a clean repo pull, these errors might occur for you as well)
[python2.7, pytorch-0.3.1]:

TypeErrorTraceback (most recent call last)
<ipython-input-1-a4cbdc8e5706> in <module>()
     36 
     37 print(torch.__version__)
---> 38 fcn = resnet_dilated.Resnet34_8s(num_classes=21)
     39 fcn.load_state_dict(torch.load('resnet_34_8s_68.pth'))
     40 #fcn.load_state_dict(torch.load('resnet34-333f7ec4.pth'))

/models/pytorch-segmentation-detection/pytorch_segmentation_detection/models/resnet_dilated.py in __init__(self, num_classes)
    290         # Load the pretrained weights, remove avg pool
    291         # layer and get the output stride of 8
--> 292         resnet34_8s = models.resnet34(fully_conv=True, pretrained=True, output_stride=8, remove_avg_pool_layer=True)
    293         #resnet34_8s = models.resnet34(pretrained=True)
    294 

/usr/local/lib/python2.7/dist-packages/torchvision/models/resnet.pyc in resnet34(pretrained, **kwargs)
    172         pretrained (bool): If True, returns a model pre-trained on ImageNet
    173     """
--> 174     model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
    175     if pretrained:
    176         model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))

TypeError: __init__() got an unexpected keyword argument 'fully_conv'

@warmspringwinds Could you please put in the requirements.txt file as well?
Will clear out a lot of confusions.

pip freeze > requirements.txt

Also, if you could also put in the resnet_34_8_66.pth, it will be helpful to execute the code without changes in resnet_34_8s_demo.ipynb

Many thanks!

RuntimeError: value cannot be converted to type float without overflow

Hi, I try to train model using python 3, but I got below issue:

/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/nn/_reduction.py:46: UserWarning: size│
_average and reduce args will be deprecated, please use reduction='sum' instead.                         │
  warnings.warn(warning.format(ret))                                                                     │
/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/nn/functional.py:2622: UserWarning: nn│
.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.                      │
  warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.") │
0.4247354666311015                                                                                       │
0.5617590819998219                                                                                       │
0.5815541637890524                                                                                       │
0.6344758887881029                                                                                       │
Traceback (most recent call last):                                                                       │
  File "pytorch_segmentation_detection/recipes/pascal_voc/segmentation/psp_resnet_50_8s_train.py", line 3│
76, in <module>                                                                                          │
    optimizer.step()                                                                                     │
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/torch/optim/adam.py", line 107, in s│
tep                                                                                                      │
    p.data.addcdiv_(-step_size, exp_avg, denom)                                                          │
RuntimeError: value cannot be converted to type float without overflow: (3.52033e-08,-1.14383e-08)

Can someone give me suggestion?

Optimizer for unet model on Pascal Voc segmentation

Hello,
Can I know the optimizer and its specifications to use on unet model on Pascal Voc segmentation using FOCAL loss ?
Should I have to use any learning rate schedulers?
Also, is it better to take mean focal loss or sum focal loss ?
I am training the model from scratch.

fully_conv in vgg16

Very good repository.

How did you make this work ? it seems that vgg16 does not have the fully_conv keyword in torchvision

vgg16 = models.vgg16(pretrained=True,
                            fully_conv=True)

RuntimeError: Error(s) in loading_state_dict for VGG

By using your fork of torchvision and default installation of pytorch for Linux-Python3.6-CUDA10:

  1. init_weights argument in the class VGG was missing.

  2. After fixing (1), the following error was generated:

In [1]: from torchvision import models                                                                               
In [2]: model = models.vgg16(pretrained=True, fully_conv=True)                                                       

RuntimeError                              Traceback (most recent call last)
<ipython-input-2-802ee77a237c> in <module>
----> 1 model = models.vgg16(pretrained=True, fully_conv=True)

~/repositories/github/pytorch-segmentation-detection/vision/torchvision/models/vgg.py in vgg16(pretrained, **kwargs)
    164     model = VGG(make_layers(cfg['D']), **kwargs)
    165     if pretrained:
--> 166         model.load_state_dict(model_zoo.load_url(model_urls['vgg16']))
    167     return model
    168 

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
    767         if len(error_msgs) > 0:
    768             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 769                                self.__class__.__name__, "\n\t".join(error_msgs)))
    770 
    771     def _named_members(self, get_members_fn, prefix='', recurse=True):

RuntimeError: Error(s) in loading state_dict for VGG:
	size mismatch for classifier.0.weight: copying a param with shape torch.Size([4096, 25088]) from checkpoint, the shape in current model is torch.Size([4096, 512, 7, 7]).
	size mismatch for classifier.3.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096, 1, 1]).
	size mismatch for classifier.6.weight: copying a param with shape torch.Size([1000, 4096]) from checkpoint, the shape in current model is torch.Size([1000, 4096, 1, 1]).

Train on my own dataset without superpixels

First of all, great work!
I assume that in order to use it I have to train the weights on a dataset with superpixels annotations, or at least some boundry boxes.
Am I right?
What do you suggest me to do if now I have lots of clean videos which I want to make segmentation on some features that are moving in the video?
Should I start with simple boundary boxes classification and only than proceed? or that there is any shortcut for that?

Thanks

Error when trying to run resnet_34_8s_test

It says

TypeError Traceback (most recent call last)
in ()
34 img = Variable(img.cuda())
35
---> 36 fcn = resnet_dilated.Resnet34_8s(num_classes=19)
37 fcn.load_state_dict(torch.load('/home/sawyer/workspace/segmentation/resnet_34_8s_cityscapes_best.pth'))
38 fcn.cuda()

/home/sawyer/workspace/segmentation/pytorch-segmentation-detection/pytorch_segmentation_detection/models/resnet_dilated.pyc in init(self, num_classes)
293 pretrained=True,
294 output_stride=8,
--> 295 remove_avg_pool_layer=True)
296
297 # Randomly initialize the 1x1 Conv scoring layer

/home/sawyer/workspace/segmentation/pytorch-segmentation-detection/vision/torchvision/models/resnet.pyc in resnet34(pretrained, **kwargs)
172 pretrained (bool): If True, returns a model pre-trained on ImageNet
173 """
--> 174 model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
175 if pretrained:
176 model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))

TypeError: init() got an unexpected keyword argument 'fully_conv'

In both python2.7-3.5
torch version 1.0.1

Error(s) in loading state_dict for Resnet18_8s

@warmspringwinds I am getting the following error for resnet_18_8s_59.pth

RuntimeError: Error(s) in loading state_dict for Resnet18_8s:
size mismatch for resnet18_8s.fc.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([21]).
size mismatch for resnet18_8s.fc.weight: copying a param with shape torch.Size([2, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([21, 512, 1, 1]).

If I change num_classes=21 to num_classes=2, it generates the output without any segmentation (purple screen)

new model implementation

I would like to test a new model. Is there a walk through of what i would need to do if I want to test a new model with your system?

about Resnet18_8s

Hello, I am very impressed by your great work! However, I am a little confused when I look at your Resnet18_8s network. I assume Resnet18_8s follows your approach in your paper "Deep Residual Learning for Instrument Segmentation in Robotic Surgery", which employ dilated convolutions to keep resolution. But in resnet_dilated.py, I could not find any dilated convolutions in Class Resnet18_8s. Could you please give more detailed explanation on the structure of Resnet18_8s? Many thanks.

adaptive_computation_time

Hi Daniil,
I'm trying to work on image segmentation of microscopy images using pytorch.
I've been trying to work with your examples.
But i'm having error on resnet_34_8s_train.

ImportError: No module named adaptive_computation_time

I wonder if it's something in the anaconda?

Difference Between Semantic Segmentation and Image Classification

I'm new to implementing CNNs and I'm trying to understand how a model knows whether to perform semantic classification (pixelwise) or image classification (one class per image). As far as I can see, the only difference is in the models/resnet_dilated.py file in the lines
resnet34_8s.fc = nn.Conv2d(resnet34_8s.inplanes, num_classes, 1)

whereas most other codes have it as
resnet34_8s.fc = nn.Conv2d(resnet34_8s.fc.in_features, num_classes)

Is this the difference between returning a logits of shape [batch x num_classes x H x W] and [batch x num_classes]?

about image size of training set

Hello Daniil,
In the training process of your ResNet-8s, I notice that you crop all training images to 224x224 (RandomCropJoint(crop_size=(224, 224))). But you didn't adopt this approach when you train your FCN-32s model. Is it because the ResNet pretrained model is used as initial weights so we need to comply with its input image size (224x224) too? Do you think other input size can be used for training, without causing accuracy decline? Please advice. Thanks.

FCN Skip Connections

Hi, FCN-8s/16s (regardless of the base model being VGG/ResNet) should have skip connections for aggregating the features from pooling layers. But, I can't seem to find these in your model definitions.

Unable to run resnet_34_8s_demo.ipynb

TypeError: torch.FloatTensor constructor received an invalid combination of arguments - got (int, int, numpy.int64, numpy.int64), but expected one of:
 * no arguments
 * (int ...)
      didn't match because some of the arguments have invalid types: (int, int, numpy.int64, numpy.int64)
 * (torch.FloatTensor viewed_tensor)
 * (torch.Size size)
 * (torch.FloatStorage data)
 * (Sequence data)

Which version of pytorch is to be used?

CRF implementation

Hi! Thank you for sharing your code :)
Where can I find CRF implementation in this repo? I couldn't find any result when searching with 'crf' keyword in this repo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.