Giter Club home page Giter Club logo

sppnet-pytorch's Introduction

sppnet-pytorch

SPP layer could be added in CNN model between convolutional layer and fully-connected lay, so that you can input multi-size images into your CNN model. We use this structure in the paper Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond

The function spatial_pyramid_pool() in file spp_layer.py is independent. It could be added in your own models.

See this:Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Citation

If you find this work useful for your research, please cite:

@article{ouyang2018pedestrian,
  title={Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond},
  author={Ouyang, Xi and Cheng, Yu and Jiang, Yifan and Li, Chun-Liang and Zhou, Pan},
  journal={arXiv preprint arXiv:1804.02047},
  year={2018}
}

and

@inproceedings{he2014spatial,
  title={Spatial pyramid pooling in deep convolutional networks for visual recognition},
  author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
  booktitle={European conference on computer vision},
  pages={346--361},
  year={2014},
  organization={Springer}
}

sppnet-pytorch's People

Contributors

yifanjiang19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

sppnet-pytorch's Issues

DataLoader not support image with different size

Hi, I have used your SPP layer implement but I found that pytorch not support tensor with different size. So if I want to forward image to the net, I need to to set batch_size=1 like this:
dataLoader = DataLoader(dataSet, batch_size=1, shuffle=True)
So the training speed is really slow, because we backprop for each image. Do you have any idea for this?

Many thanks

Help

Hello, could you tell me how to get the formula of padding? Thank you very much!

The difference with AdaptivePooling in PyTorch

Hi, I wonder what is the difference between your implementation of SPP layer and AdaptativeAvgPool2d or AdaptativeMaxPool2d in PyTorch lib? Doc: AdaptativeAvgPool2d

Standard AdaptiveAvgPool2d receives pooled size as input and accept any size feature maps, I think this is an implementation of SPP layer.

So, what is the difference between your implementation and those standard functions?

Stride calculation

I noticed you used math.ceil in your computation of the pool stride but the paper uses math.floor, is there a reason for that?

why padding in spatial_pyramid_pool function

I'm confused about your spatial_pyramid_pool function, in this function I can't understand why pad?
I think [n,c,w,h] - > [n,c,a,a] (a is the number in Variable 'output_num' ) - > [n,-1] ,then feed this to fc. I think the right code is
h_wid = math.ceil(previous_conv_size[0]/out_pool_size[i])
w_wid = math.ceil(previous_conv_size[1]/out_pool_size[i])
h_str = math.floor(previous_conv_size[0]/out_pool_size[i])
w_str = math.floor(previous_conv_size[1]/out_pool_size[i])
maxpool = nn.MaxPool2d((h_wid, w_wid), stride=(h_str, w_str))

the size of input tensor have to > 26*26 !!!

for the stride 2 conv in class SPP_NET,
we have to ensure that the size of tensor > 26 * 26

otherwise, the kernal(4 * 4) of self.conv5 will larger than the conv5 input
i.e. the conv5 input will < 4 * 4 if the input x in forward < 26 * 26

questions about the training of the sppnet

Hi! I am confused about the training of multi-size images. Based on the paper,

In other words, during training we implement the varying-input-size SPP-net by two fixed-size networks that share parameters

we train each full epoch on one network, and then switch to the other one (keeping all weights) for the next full epoch

My interpretion is as below: Provided the epoch number is 15 and the size of training images is classified 3 categories, like 180, 224, 250. In the training process, epoch 0 is trained using images with the size of 180, epoch 1 is trained using images with the size of 224, and epoch 2 is trained with the size of 250. That is, the model is trained using the images with the size of 180, 224, 250 in order, right?

Thanks in advance! Looking forward to your help.

I want to know how to use your code.

I want to know how to use your code.
This is my code
`import tensorflow as tf
import numpy as np
import os
from PIL import Image
import random
import math
import torch
import torch.nn as nn
from torch.nn import init
class CNN(object):
def init(self, image_height, image_width, max_captcha, char_set, model_save_dir):
self.image_height = image_height
self.image_width = image_width
self.max_captcha = max_captcha
self.char_set = char_set
self.char_set_len = len(char_set)
self.model_save_dir = model_save_dir # 模型路径
with tf.name_scope('parameters'):
self.w_alpha = 0.01
self.b_alpha = 0.1
# tf初始化占位符
with tf.name_scope('data'):
self.X = tf.placeholder(tf.float32, [None, self.image_height * self.image_width]) # 特征向量
self.Y = tf.placeholder(tf.float32, [None, self.max_captcha * self.char_set_len]) # 标签
self.keep_prob = tf.placeholder(tf.float32) # dropout值

@staticmethod
def convert2gray(img):

    if len(img.shape) > 2:
        r, g, b = img[:, :, 0], img[:, :, 1], img[:, :, 2]
        gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
        return gray
    else:
        return img

def text2vec(self, text):
    """
    转标签为oneHot编码
    :param text: str
    :return: numpy.array
    """
    text_len = len(text)
    if text_len > self.max_captcha:
        raise ValueError('验证码最长{}个字符'.format(self.max_captcha))

    vector = np.zeros(self.max_captcha * self.char_set_len)

    for i, ch in enumerate(text):
        idx = i * self.char_set_len + self.char_set.index(ch)
        vector[idx] = 1
    return vector

def spatial_pyramid_pool(self, previous_conv, num_sample, previous_conv_size, out_pool_size):
    '''
    previous_conv: a tensor vector of previous convolution layer
    num_sample: an int number of image in the batch
    previous_conv_size: an int vector [height, width] of the matrix features size of previous convolution layer
    out_pool_size: a int vector of expected output size of max pooling layer

    returns: a tensor vector with shape [1 x n] is the concentration of multi-level pooling
    '''
    # print(previous_conv.size())
    for i in range(len(out_pool_size)):
        # print(previous_conv_size)
        h_wid = int(math.ceil(previous_conv_size[0] / out_pool_size[i]))
        w_wid = int(math.ceil(previous_conv_size[1] / out_pool_size[i]))
        h_pad = (h_wid * out_pool_size[i] - previous_conv_size[0] + 1) / 2
        w_pad = (w_wid * out_pool_size[i] - previous_conv_size[1] + 1) / 2
        maxpool = nn.MaxPool2d((h_wid, w_wid), stride=(h_wid, w_wid), padding=(h_pad, w_pad))
        x = maxpool(previous_conv)
        if (i == 0):
            spp = x.view(num_sample, -1)
            # print("spp size:",spp.size())
        else:
            # print("size:",spp.size())
            spp = torch.cat((spp, x.view(num_sample, -1)), 1)
    return spp
def model(self):
    x = tf.reshape(self.X, shape=[-1, self.image_height, self.image_width, 1])
    print(">>> input x: {}".format(x))

    # Convolution layer1
    wc1 = tf.get_variable(name='wc1', shape=[3, 3, 1, 32], dtype=tf.float32,
                          initializer=tf.contrib.layers.xavier_initializer())
    bc1 = tf.Variable(self.b_alpha * tf.random_normal([32]))
    conv1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(x, wc1, strides=[1, 1, 1, 1], padding='SAME'), bc1))
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv1 = tf.nn.dropout(conv1, self.keep_prob)

    # Convolution layer 2
    wc2 = tf.get_variable(name='wc2', shape=[3, 3, 32, 64], dtype=tf.float32,
                          initializer=tf.contrib.layers.xavier_initializer())
    bc2 = tf.Variable(self.b_alpha * tf.random_normal([64]))
    conv2 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv1, wc2, strides=[1, 1, 1, 1], padding='SAME'), bc2))
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv2 = tf.nn.dropout(conv2, self.keep_prob)

    # Convolution layer 3
    wc3 = tf.get_variable(name='wc3', shape=[3, 3, 64, 128], dtype=tf.float32,
                          initializer=tf.contrib.layers.xavier_initializer())
    bc3 = tf.Variable(self.b_alpha * tf.random_normal([128]))
    conv3 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv2, wc3, strides=[1, 1, 1, 1], padding='SAME'), bc3))
    conv3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv3 = tf.nn.dropout(conv3, self.keep_prob)
    print(">>> convolution 3: ", conv3.shape)
    next_shape = conv3.shape[1] * conv3.shape[2] * conv3.shape[3]

    #I want to know how to use your code.



    #
    # Fully connected layer 1
    wd1 = tf.get_variable(name='wd1', shape=[next_shape, 1024], dtype=tf.float32,
                          initializer=tf.contrib.layers.xavier_initializer())
    bd1 = tf.Variable(self.b_alpha * tf.random_normal([1024]))
    dense = tf.reshape(conv3, [-1, wd1.get_shape().as_list()[0]])
    dense = tf.nn.relu(tf.add(tf.matmul(dense, wd1), bd1))
    dense = tf.nn.dropout(dense, self.keep_prob)

    # Fully connected layer 2
    wout = tf.get_variable('name', shape=[1024, self.max_captcha * self.char_set_len], dtype=tf.float32,
                           initializer=tf.contrib.layers.xavier_initializer())
    bout = tf.Variable(self.b_alpha * tf.random_normal([self.max_captcha * self.char_set_len]))

    with tf.name_scope('y_prediction'):
        y_predict = tf.add(tf.matmul(dense, wout), bout)

    return y_predict

`

The code is incomplete

Hello,I tried to run your spp_layer.py,but it went wrong. Maybe The datail of implementation is right, but we can't correctly demonstrate it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.