yifanjiang19 / sppnet-pytorch Goto Github PK

A simple Spatial Pyramid Pooling layer which could be added in CNN

License: Apache License 2.0

Python 100.00%

python pytorch cnn sppnet-pytorch spp-layer cnn-model detection-network deep-learning

sppnet-pytorch's Introduction

sppnet-pytorch

SPP layer could be added in CNN model between convolutional layer and fully-connected lay, so that you can input multi-size images into your CNN model. We use this structure in the paper Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond

The function spatial_pyramid_pool() in file spp_layer.py is independent. It could be added in your own models.

See this:Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Citation

If you find this work useful for your research, please cite:

@article{ouyang2018pedestrian,
  title={Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond},
  author={Ouyang, Xi and Cheng, Yu and Jiang, Yifan and Li, Chun-Liang and Zhou, Pan},
  journal={arXiv preprint arXiv:1804.02047},
  year={2018}
}

and

@inproceedings{he2014spatial,
  title={Spatial pyramid pooling in deep convolutional networks for visual recognition},
  author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
  booktitle={European conference on computer vision},
  pages={346--361},
  year={2014},
  organization={Springer}
}

sppnet-pytorch's People

Contributors

Stargazers

Watchers

Forkers

yinst2012 yli96 zhf459 marsggbo yinizhizhu klpek zhaowei-im jhsa26 lengweiping1983 sfpeace pokern daydreamer2023 zhengvh scorpjd leviawang deepblue0822 mynameiziji yihengjiang o7s8r6 curbsideprophet amirunpri2018 xiangnanhe nevin06 rcampbell95 michael-hsu shouwangbuqi sohailkhanmarwat zhengzhiteng b2220333 zhe-meng regeee sudabai666 houxy12 wuxiangchao hongwang01 chengquan minzhangm antecede mygit007hub newbeeleoma zhuqingling sevengoddess crystalxian ethanzhu23 kdongyi wwwht hvning guozeqi wulele2 zhangjiahuan17 tsejing wbxing liyongsheng-tech zhaolei0413 feiyungu abhishek-trivedi txsan jawaechan liyantett zhangyingyue kang9779 paleblackless daibin bairw660606 chuka19952 augustme melanie0828 molamolabbb fasladodo raghav07mishra jimons doldolseoul drntt xymf hidonsea zzz1412 gengxiaomeng eurus202425 naixinlu wjjludy jhuangvvd hitcbw malyang erwingeist peng-lin wangfp-516 kuijiang94 frank-zzm tip2tip terrisgo learner888 knightpalace zhmi fathimarajeena kingdomji 996781424 jqjin123 beyondcr789 bututoubaobei-0108 hengdeng

sppnet-pytorch's Issues

DataLoader not support image with different size

Hi, I have used your SPP layer implement but I found that pytorch not support tensor with different size. So if I want to forward image to the net, I need to to set batch_size=1 like this:
dataLoader = DataLoader(dataSet, batch_size=1, shuffle=True)
So the training speed is really slow, because we backprop for each image. Do you have any idea for this?

Many thanks

Help

Hello, could you tell me how to get the formula of padding? Thank you very much!

The difference with AdaptivePooling in PyTorch

Hi, I wonder what is the difference between your implementation of SPP layer and AdaptativeAvgPool2d or AdaptativeMaxPool2d in PyTorch lib? Doc: AdaptativeAvgPool2d

Standard AdaptiveAvgPool2d receives pooled size as input and accept any size feature maps, I think this is an implementation of SPP layer.

So, what is the difference between your implementation and those standard functions?

error: 'SPP_NET' object has no attribute 'LReLU1'

Stride calculation

I noticed you used math.ceil in your computation of the pool stride but the paper uses math.floor, is there a reason for that?

why padding in spatial_pyramid_pool function

I'm confused about your spatial_pyramid_pool function, in this function I can't understand why pad?
I think [n,c,w,h] - > [n,c,a,a] (a is the number in Variable 'output_num' ) - > [n,-1] ,then feed this to fc. I think the right code is
h_wid = math.ceil(previous_conv_size[0]/out_pool_size[i])
w_wid = math.ceil(previous_conv_size[1]/out_pool_size[i])
h_str = math.floor(previous_conv_size[0]/out_pool_size[i])
w_str = math.floor(previous_conv_size[1]/out_pool_size[i])
maxpool = nn.MaxPool2d((h_wid, w_wid), stride=(h_str, w_str))

the size of input tensor have to > 26*26 !!!

for the stride 2 conv in class SPP_NET,
we have to ensure that the size of tensor > 26 * 26

otherwise, the kernal(4 * 4) of self.conv5 will larger than the conv5 input
i.e. the conv5 input will < 4 * 4 if the input x in forward < 26 * 26

questions about the training of the sppnet

Hi! I am confused about the training of multi-size images. Based on the paper,

In other words, during training we implement the varying-input-size SPP-net by two fixed-size networks that share parameters

we train each full epoch on one network, and then switch to the other one (keeping all weights) for the next full epoch

My interpretion is as below: Provided the epoch number is 15 and the size of training images is classified 3 categories, like 180, 224, 250. In the training process, epoch 0 is trained using images with the size of 180, epoch 1 is trained using images with the size of 224, and epoch 2 is trained with the size of 250. That is, the model is trained using the images with the size of 180, 224, 250 in order, right?

Thanks in advance! Looking forward to your help.

I want to know how to use your code.

I want to know how to use your code.
This is my code
`import tensorflow as tf
import numpy as np
import os
from PIL import Image
import random
import math
import torch
import torch.nn as nn
from torch.nn import init
class CNN(object):
def init(self, image_height, image_width, max_captcha, char_set, model_save_dir):
self.image_height = image_height
self.image_width = image_width
self.max_captcha = max_captcha
self.char_set = char_set
self.char_set_len = len(char_set)
self.model_save_dir = model_save_dir # 模型路径
with tf.name_scope('parameters'):
self.w_alpha = 0.01
self.b_alpha = 0.1
# tf初始化占位符
with tf.name_scope('data'):
self.X = tf.placeholder(tf.float32, [None, self.image_height * self.image_width]) # 特征向量
self.Y = tf.placeholder(tf.float32, [None, self.max_captcha * self.char_set_len]) # 标签
self.keep_prob = tf.placeholder(tf.float32) # dropout值

@staticmethod
def convert2gray(img):

    if len(img.shape) > 2:
        r, g, b = img[:, :, 0], img[:, :, 1], img[:, :, 2]
        gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
        return gray
    else:
        return img

def text2vec(self, text):
    """
    转标签为oneHot编码
    :param text: str
    :return: numpy.array
    """
    text_len = len(text)
    if text_len > self.max_captcha:
        raise ValueError('验证码最长{}个字符'.format(self.max_captcha))

    vector = np.zeros(self.max_captcha * self.char_set_len)

    for i, ch in enumerate(text):
        idx = i * self.char_set_len + self.char_set.index(ch)
        vector[idx] = 1
    return vector

def spatial_pyramid_pool(self, previous_conv, num_sample, previous_conv_size, out_pool_size):
    '''
    previous_conv: a tensor vector of previous convolution layer
    num_sample: an int number of image in the batch
    previous_conv_size: an int vector [height, width] of the matrix features size of previous convolution layer
    out_pool_size: a int vector of expected output size of max pooling layer

    returns: a tensor vector with shape [1 x n] is the concentration of multi-level pooling
    '''
    # print(previous_conv.size())
    for i in range(len(out_pool_size)):
        # print(previous_conv_size)
        h_wid = int(math.ceil(previous_conv_size[0] / out_pool_size[i]))
        w_wid = int(math.ceil(previous_conv_size[1] / out_pool_size[i]))
        h_pad = (h_wid * out_pool_size[i] - previous_conv_size[0] + 1) / 2
        w_pad = (w_wid * out_pool_size[i] - previous_conv_size[1] + 1) / 2
        maxpool = nn.MaxPool2d((h_wid, w_wid), stride=(h_wid, w_wid), padding=(h_pad, w_pad))
        x = maxpool(previous_conv)
        if (i == 0):
            spp = x.view(num_sample, -1)
            # print("spp size:",spp.size())
        else:
            # print("size:",spp.size())
            spp = torch.cat((spp, x.view(num_sample, -1)), 1)
    return spp
def model(self):
    x = tf.reshape(self.X, shape=[-1, self.image_height, self.image_width, 1])
    print(">>> input x: {}".format(x))

    # Convolution layer1
    wc1 = tf.get_variable(name='wc1', shape=[3, 3, 1, 32], dtype=tf.float32,
                          initializer=tf.contrib.layers.xavier_initializer())
    bc1 = tf.Variable(self.b_alpha * tf.random_normal([32]))
    conv1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(x, wc1, strides=[1, 1, 1, 1], padding='SAME'), bc1))
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv1 = tf.nn.dropout(conv1, self.keep_prob)

    # Convolution layer 2
    wc2 = tf.get_variable(name='wc2', shape=[3, 3, 32, 64], dtype=tf.float32,
                          initializer=tf.contrib.layers.xavier_initializer())
    bc2 = tf.Variable(self.b_alpha * tf.random_normal([64]))
    conv2 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv1, wc2, strides=[1, 1, 1, 1], padding='SAME'), bc2))
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv2 = tf.nn.dropout(conv2, self.keep_prob)

    # Convolution layer 3
    wc3 = tf.get_variable(name='wc3', shape=[3, 3, 64, 128], dtype=tf.float32,
                          initializer=tf.contrib.layers.xavier_initializer())
    bc3 = tf.Variable(self.b_alpha * tf.random_normal([128]))
    conv3 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv2, wc3, strides=[1, 1, 1, 1], padding='SAME'), bc3))
    conv3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv3 = tf.nn.dropout(conv3, self.keep_prob)
    print(">>> convolution 3: ", conv3.shape)
    next_shape = conv3.shape[1] * conv3.shape[2] * conv3.shape[3]

    #I want to know how to use your code.



    #
    # Fully connected layer 1
    wd1 = tf.get_variable(name='wd1', shape=[next_shape, 1024], dtype=tf.float32,
                          initializer=tf.contrib.layers.xavier_initializer())
    bd1 = tf.Variable(self.b_alpha * tf.random_normal([1024]))
    dense = tf.reshape(conv3, [-1, wd1.get_shape().as_list()[0]])
    dense = tf.nn.relu(tf.add(tf.matmul(dense, wd1), bd1))
    dense = tf.nn.dropout(dense, self.keep_prob)

    # Fully connected layer 2
    wout = tf.get_variable('name', shape=[1024, self.max_captcha * self.char_set_len], dtype=tf.float32,
                           initializer=tf.contrib.layers.xavier_initializer())
    bout = tf.Variable(self.b_alpha * tf.random_normal([self.max_captcha * self.char_set_len]))

    with tf.name_scope('y_prediction'):
        y_predict = tf.add(tf.matmul(dense, wout), bout)

    return y_predict

The code is incomplete

Hello，I tried to run your spp_layer.py，but it went wrong. Maybe The datail of implementation is right, but we can't correctly demonstrate it.

Question: No Backward？

I just wonder if it is suitable for defining a module '-SPP_NET ' without backward method.
Because in the forward method, you use self-defined layer 'spp_layer ' which does not have a backward implementation.
As shown in http://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html
Usually in pytorch, if we want to defining new autograd function, we should implement the forward and backward together?