fcn-for-semantic-segmentation's Introduction

FCN-for-Semantic-Segmentation

Implementation and testing the performance of FCN-16 and FCN-8. In addition to that CRFs are used as a post processing technique and results are compared.

PAPERS REFERRED :

FULLY CONVOLUTIONAL NETWORKS FOR SEMANTIC SEGMENTATION

AUTHORS : Jonathan Long, Evan Shelhamer, Trevor Darrell
LINK : https://github.com/Gurupradeep/FCN-for-Semantic-Segmentation/blob/master/Paper/long_shelhamer_fcn.pdf

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE SCALE IMAGE RECOGNITION

AUTHORS : Karen Simonyan, Andrew Zisserman
LINK : https://github.com/Gurupradeep/FCN-for-Semantic-Segmentation/blob/master/Paper/VGG.pdf

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

AUTHORS :Philipp Krähenbühl, Vladlen Koltun
LINK : https://github.com/Gurupradeep/FCN-for-Semantic-Segmentation/blob/master/Paper/crf.pdf

IMPLEMENTATION STEPS :

1. Converting a classifier to dense FCN :

The model which is used for the task of semantic segmentation is derived from VGG. VGG on it's own is meant for classification task. So to make the model suitable for dense prediction we remove the last fully connected layers of VGG and replace them with convolutions. We append a 1x1 convolution with channel dimension 21 to predict scores for each of the PASCAL classes (including background) at each of the coarse output locations, followed by a deconvolution layer to bilinearly upsample the coarse outputs to pixel-dense outputs.

2. Transferring features of lower level layers to higher layers

We define a new fully convolutional net (FCN) for segmentation that combines layers of the feature hierarchy and refines the spatial precision of the output. While fully convolutionalized classifiers can be fine-tuned to segmentation and even score highly on the standard metric, their output is dissatisfyingly coarse. The 32 pixel stride at the final prediction layer limits the scale of detail in the upsampled output.

We address this by adding skips that combine the final prediction layer with lower layers with finer strides. This turns a line topology into a DAG with edges that skip ahead from lower layers to higher ones. As they see fewer pixels, the finer scale predictions should need fewer layers, so it makes sense to make them from shallower net outputs. Combining fine layers and coarse layers lets the model make local predictions that respect global structure.

We first divide the output stride in half by predicting from a 16 pixel stride layer. We add a 1x1 convolution layer on top of pool4 to produce additional class predictions. We fuse this output with the predictions computed on top of conv7 (convolutionalized fc7) at stride 32 by adding a 2x upsampling layer and summing both predictions.

Finally, the stride 16 predictions are upsampled back to the image.

We call this net FCN-16s. FCN-16's have only one skip connection which transferring the information from 4th Max pooling layer. To improve the results further we introduce one more skip connection which transfer information from 3rd Max pooling layer also with the skip connection which transfers information from 4th Max pooling layer.

Plot of FCN-16 Architecutre : https://github.com/Gurupradeep/FCN-for-Semantic-Segmentation/blob/master/Plots/FCN-16_withshape.png
Plot of FCN-8 Architecture : https://github.com/Gurupradeep/FCN-for-Semantic-Segmentation/blob/master/Plots/FCN-8with_shapes.png

3 . Using CRF as post processing technique :

While predicting using FCN we gave label to each pixel independently of it's surrounding pixels, this may result in coarse segmentation. CRF takes two inputs one is the original image and the other is predicted probabilities for each pixel. The CRF which was uses a highly efficient inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels in an arbitrary feature space. Therby it considers the surrounding pixels also while assigning the class to particular pixel which results in better semantic segmentation results.

INSTALLATION OF REQUIRED TOOLS

1. Tensorflow

Refer to the following link https://www.tensorflow.org/install/install_sources. Tensorflow is used as backend for Keras. The link contains installation instructions with and without gpu support

2. Keras

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation.

To install Keras

sudo pip install keras

3. Matplotlib

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

Refer to following link for installation instructions https://matplotlib.org/users/installing.html. While installing it or tensorflow number of dependencies like Numpy will be installed.

4.Skimage

Scikit-image is an image processing toolbox for SciPy. It is used for loading,saving and applying various transformations like color to gray and gray to color on images.

Refer following link for installation instructions http://scikit-image.org/docs/dev/install.html

5. graphviz

This package facilitates the creation and rendering of graph descriptions in the DOT language of the Graphviz graph drawing software from Python. It is required to plot the models in keras.

To install graphviz

sudo pip install graphviz

6. Jupyter Notebook

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text

Refer following link for installation instructions https://www.digitalocean.com/community/tutorials/how-to-set-up-a-jupyter-notebook-to-run-ipython-on-ubuntu-16-04

Quick start

Run the following commands in Terminal.

git clone https://github.com/Gurupradeep/FCN-for-Semantic-Segmentation.git
cd FCN-for-Semantic-Segmentation
jupyter notebook

It opens up all the notebooks which are there in the directory in the browser.

FCN-16.ipynb contains code related to implementation of FCN-16.
FCN-8.ipynb contains code related to implementation of FCN-8
Comparison_of_fcn8_and_fcn16.ipynb has code which compares results of FCN-8 and FCN-16 models.
CRF.ipynb has code which is used to compare the results after applying CRF on FCN-8 and FCN-16 annotated images.
All the images which are used can be found in Testimages Folder(https://github.com/Gurupradeep/FCN-for-Semantic-Segmentation/tree/master/TestImages)

Open respective notebooks and run the commands to reproduce the results. As we are running in jupyter notebook we can see results after executing every command.

fcn-for-semantic-segmentation's Issues

Coarse output from the network

Hello, thanks for your work, I have tried to rewrite your network using PyTorch, but what I got from the network is a coarse image where I can only see the profile of my segmentation object, would you like to tell me where I was wrong, thanks!

my model code is like this:

import torchvision.models as models
import torch.nn as nn
import torch.nn.functional as F

# referred to this site: https://github.com/Gurupradeep/FCN-for-Semantic-Segmentation
class MyFCN(nn.Module):
    def __init__(self):
        super().__init__()
        model = models.vgg16(pretrained=True)
        self.backbone_third = model.features[:17]  # (256, 28, 28) third pooling before conv layer
        self.backbone_fourth = model.features[:24] # (512, 14, 14) fourth pooling before conv layer
        self.backbone_fifth = model.features[:31]  # (512, 7, 7) final pooling before conv layer

        self.conv_256_1 = nn.Sequential(
            nn.Conv2d(256, 1, (1, 1), 1),
        )

        self.conv_512_1 = nn.Sequential(
            nn.Conv2d(512, 1, (1, 1), 1),
        )

        # fc6
        self.conv_512_4096 = nn.Sequential(
            nn.Conv2d(512, 4096, (7, 7), 1, 3),
            nn.ReLU(inplace=True),
        )

        # fc7
        self.conv_4096_4096 = nn.Sequential(
            nn.Conv2d(4096, 4096, (1, 1), 1),
            nn.ReLU(inplace=True),
        )

        # score_fr
        self.conv_4096_1 = nn.Sequential(
            nn.Conv2d(4096, 1, (1, 1), 1),
            nn.ReLU(inplace=True),
        )

        # score_2 for 7=>14 and 14=>28
        self.conv_transpose = nn.Sequential(
            nn.ConvTranspose2d(1, 1, (4, 4), 2),
        )

        # final upsample
        self.conv_transpose_8 = nn.Sequential(
            nn.ConvTranspose2d(1, 1, (16, 16), 8),
        )


    def forward(self, x):
        x_from_pooling_3 = self.backbone_third(x)
        x_from_pooling_4 = self.backbone_fourth(x)
        x_from_pooling_5 = self.backbone_fifth(x)

        # pooling 3
        x_3 = self.conv_256_1(x_from_pooling_3)

        # pooling 4
        x_4 = self.conv_512_1(x_from_pooling_4)     # (1, 1, 14, 14)

        # pooling 5
        x_5 = self.conv_512_4096(x_from_pooling_5)  # (1, 4096, 7, 7)
        x_5 = self.conv_4096_4096(x_5)              # (1, 4096, 7, 7)
        x_5 = self.conv_4096_1(x_5)                 # (1, 1, 7, 7)
        x_5 = self.conv_transpose(x_5)              # (1, 1, 16, 16)
        x_5 = F.pad(x_5, (-1, -1, -1, -1))          # crop layer, (1, 1, 14, 14)

        # fusing x_4
        x_fused_1 = x_4 + x_5                       # (1, 1, 14, 14)
        x_fused_1 = self.conv_transpose(x_fused_1)  # (1, 1, 30, 30)
        x_fused_1 = F.pad(x_fused_1, (-1, -1, -1, -1))  # crop layer, (1, 1, 28, 28)

        # fusing x_3
        x_fused_2 = x_3 + x_fused_1
        x_fused_2 = self.conv_transpose_8(x_fused_2)    # (1, 1, 232, 232)
        x_fused_2 = F.pad(x_fused_2, (-4, -4, -4, -4))  # crop layer (1, 1, 224, 224)

        return x_fused_2

and my output is like:

pydensecrf package problem

I want to use CRF.ipynb to post processing, I got an issue: module 'pydensecrf.densecrf' has no attribute 'DenseCRF2D'.
(before that :pip install pydensecrf)

Query regarding Image Classification

I just know the basics of neural networks, I tried understanding the paper. Can you give me some links to better understand the same?

Where is the weights.h5 file?

I tried running your code but I always come across an error that says weights.h5 cannot be found. I downloaded another weights file, but using that shows another error saying "you are trying to load a 0 layer model into a 19 layer model". Upon further digging I found out that this has something to do with the version of keras being used.

So my questions are:

Where is the weights.h5 file?
If I can't get that file, what is the version of keras you used to build this project?

Thank you

fcn16_model.load_weights('weights.h5')

fcn16_model.load_weights('weights.h5')，Please explain the document 'weights.h5'

Why do I use CRF code to come out like this?

Hello, Thanks for sharing the code. But I have a question, why do I use the CRF code to come out like this?

PS: My train image data is three channel image. And my prediction image is already colored three-channel image data. I used the CRF function to pass in the original image, the colored image and the output image, I don't know if it's a mistake?