Giter Club home page Giter Club logo

pytorch-deform-conv-v2's People

Contributors

4uiiurz1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-deform-conv-v2's Issues

Why not use grid_sample?

I think the function in PyTorch grid_sample can get the value given the position, and that may be more convenient

Meaning of q_lt, q_rb, q_lb, q_rt?

I know roughly these variables represent some kind of boundary coordinates. But what on earth do they represent? What does "lt" "rb" mean exactly?

How do i train offset individual?

Thanks for your contribution,I want that feature and offset are separated,like
forward(self,feature,offset):.....
How can i do this?

ValueError: Modules that have backward hooks assigned can't be compiled: Conv2d(512, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

up up,想了很久不知道怎么解决,全部的问题如下:
Traceback (most recent call last):
File "D:/Users/wwj/project/tph-yolov5-main/train.py", line 637, in
main(opt)
File "D:/Users/wwj/project/tph-yolov5-main/train.py", line 534, in main
train(opt.hyp, opt, device, callbacks)
File "D:/Users/wwj/project/tph-yolov5-main/train.py", line 355, in train
callbacks.run('on_train_batch_end', ni, model, imgs, targets, paths, plots, opt.sync_bn)
File "D:\Users\wwj\project\tph-yolov5-main\utils\callbacks.py", line 76, in run
logger['callback'](*args, **kwargs)
File "D:\Users\wwj\project\tph-yolov5-main\utils\loggers_init_.py", line 86, in on_train_batch_end
self.tb.add_graph(torch.jit.trace(de_parallel(model), imgs[0:1], strict=False), [])
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 750, in trace
_module_class,
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 942, in trace_module
module = make_module(mod, _module_class, _compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1068, in init
submodule, TracedModule, _compilation_unit=None
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1068, in init
submodule, TracedModule, _compilation_unit=None
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1068, in init
submodule, TracedModule, _compilation_unit=None
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1061, in init
+ str(orig)

Some questions about the testing process for classification tasks

I have a question. We used Deformable Conv in classification tasks. We set the training batchsize the same as im2col_step. During the test process, we put different numbers of test samples in test batch (e.g. test the testing dataset by input one sample per time, or test the testing dataset by inputing ten samples per time), and get different classification results. It seems that how many samples we input to the network each time impacts the final classification results. So why is this happening? Will you kindly give me some advice? What's the relationship between testing batchsize and im2col_step? What's the relationship between training batchsize and im2col_step?

Implemented as in the article?

First of all thank you for implementing the v2 of this paper and maintaining.
** warning - I am mainly a keras/tf user **
If I am reading this correctly x_offset is the original latent space warped ( non-rigidly ) by offsets that were found by p_conv. So x_offset is of shape [batch_size x height x width x features]. The warp happens only in the height and width dimensions ( naturally ) . You then use a regular convolution on top of that.

From reading the paper I think the author intended that the offsets be unique for each filter pixel. That is that the procedure should be :

  1. find offsets
  2. fetch the feature space per filter pixel ( should be [batch_size x height x width x features x filters size]
  3. multiply each feature by the relevant weight
    This way two nearby pixels in the latent space can overlap if they wanted.

Am I wrong? It seems like all of the implementations online do something similar to what you did so I assume I am wrong.
Thanks,
Dan

No module named torch.meshgrid

I have used all the torch versions 0.4.0 0.4.1 1.0.0, but there is always an error 'no module named torch.meshgrid' , could you please help me solve the problem

Some questions about the network structure

Why there is only one down sampling(MaxPool2d after the first layer of conv2d) in your network structure? Can you tell me the reason for designing the network like this? Why not add MaxPool2d after the each layer of conv2d to reduce parameters.
If the original image parameters(size) are large, the RAM may not enough for training. What can I do if there are too many parameters?

What's modulation for?

Actually, I do not understand what the function of modulation. Could you explain it? Thanks so much.

About the learning rate setting of p_conv and m_conv

You set the gradient of p_conv and m_conv to 0.1 times the other layers, but I find the gradient has no change after backward.
I use the following code to test.

    def _set_lr(module, grad_input, grad_output):
        print('grad input:', grad_input)
        print('grad output:', grad_output)
        grad_input = (grad_input[i] * 0.1 for i in range(len(grad_input)))
        grad_output = (grad_output[i] * 0.1 for i in range(len(grad_output)))
    x = torch.randn(4, 3, 5, 5)
    y_ = torch.randn(4, 1, 5, 5)
    loss = nn.L1Loss()

    d_conv = DeformConv2d(inc=3, outc=1, modulation=True)

    y = d_conv.forward(x)
    l = loss(y, y_)
    l.backward()

    print('p conv grad:')
    print(d_conv.p_conv.weight.grad)
    print('m conv grad:')
    print(d_conv.m_conv.weight.grad)
    print('conv grad:')
    print(d_conv.conv.weight.grad)

The gradient of p_conv is same with the grad_input, but I think the gradient of p_conv is 0.1 times the gradient of the grad_input. Am I wrong?
image
image

a problem of _get_p_0

Firstly,thanks for your work
I noticed that p_0_x and p_0_y all start from 1,so the first coordinate is (1,1).But if the kernel_size isn't 1,there will be negative coordinate.Thought you use torch.clamp to aviod those negative value.But I don't think it is the convolution operation described by the author.I think the coordinate should begin from ((kernel_size-1)/2,(kernel_size-1)/2)

question about kernel_size

does the kernel_size only equal 3? i set kernel_size=5 ,the result of conv is the same with kernel_size=3,why?

Maybe a little problem

I think the there is a mistake here:

self.conv = nn.Conv2d(inc, outc, kernel_size=kernel_size, stride=kernel_size, bias=bias)

stride=kernel_size shoule be stride=stride. Is that right?

Solve specific GPU problem.

When I hope the whole net run on the GPU2, instead of GPU0. Current code always has some tensors running on GPU0, and lead a sum operation p = p_0 + p_n + offset failed, which is not what I want.

Modified point:

   at _get_p_n and _get_p_0 functions, add device parameter. 

In file: https://github.com/4uiiurz1/pytorch-deform-conv-v2/blob/master/deform_conv_v2.py

Original code:

'
def _get_p_n(self, N, dtype):
p_n_x, p_n_y = torch.meshgrid(
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1),
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1))
# (2N, 1)
p_n = torch.cat([torch.flatten(p_n_x), torch.flatten(p_n_y)], 0)
p_n = p_n.view(1, 2*N, 1, 1).type(dtype)

    return p_n

def _get_p_0(self, h, w, N, dtype):
    p_0_x, p_0_y = torch.meshgrid(
        torch.arange(1, h*self.stride+1, self.stride),
        torch.arange(1, w*self.stride+1, self.stride))
    p_0_x = torch.flatten(p_0_x).view(1, 1, h, w).repeat(1, N, 1, 1)
    p_0_y = torch.flatten(p_0_y).view(1, 1, h, w).repeat(1, N, 1, 1)
    p_0 = torch.cat([p_0_x, p_0_y], 1).type(dtype)

    return p_0

def _get_p(self, offset, dtype):
N, h, w = offset.size(1)//2, offset.size(2), offset.size(3)

    # (1, 2N, 1, 1)
    p_n = self._get_p_n(N, **dtype)**
    # (1, 2N, h, w)
    p_0 = self._get_p_0(h, w, N, **dtype)**
    p = p_0 + p_n + offset
    return p

'

Suggested modified code:

'
def _get_p_n(self, N, device, dataType):
p_n_x, p_n_y = torch.meshgrid(
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1),
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1))
# (2N, 1)
p_n = torch.cat([torch.flatten(p_n_x), torch.flatten(p_n_y)], 0)
p_n = p_n.view(1, 2*N, 1, 1).to(device, dtype=dataType)

    return p_n


def _get_p_0(self, h, w, N, device, dataType):
    p_0_x, p_0_y = torch.meshgrid(
        torch.arange(1, h*self.stride+1, self.stride),
        torch.arange(1, w*self.stride+1, self.stride))
    p_0_x = torch.flatten(p_0_x).view(1, 1, h, w).repeat(1, N, 1, 1)
    p_0_y = torch.flatten(p_0_y).view(1, 1, h, w).repeat(1, N, 1, 1)
    p_0 = torch.cat([p_0_x, p_0_y], 1).to(device, dtype=dataType)

    return p_0

def _get_p(self, offset, dtype):
    N, h, w = offset.size(1)//2, offset.size(2), offset.size(3)

    # (1, 2N, 1, 1)
    p_n = self._get_p_n(N, offset.device, offset.dtype)
    # (1, 2N, h, w)
    p_0 = self._get_p_0(h, w, N, offset.device, offset.dtype)
    p = p_0 + p_n + offset
    return p

'

deform_conv_3d

Hi, I want to use 3D deform_conv in my study, I read you 'deform_conv_v2.py', I try to extend your code to 3D deform_conv, while I no idea how to modify these code below:
q_lt = torch.cat([torch.clamp(q_lt[..., :N], 0, x.size(2)-1), torch.clamp(q_lt[..., N:], 0, x.size(3)-1)], dim=-1).long()
q_rb = torch.cat([torch.clamp(q_rb[..., :N], 0, x.size(2)-1), torch.clamp(q_rb[..., N:], 0, x.size(3)-1)], dim=-1).long()
q_lb = torch.cat([q_lt[..., :N], q_rb[..., N:]], dim=-1)
q_rt = torch.cat([q_rb[..., :N], q_lt[..., N:]], dim=-1)

for 3D data, we should use trilinear, which need eight sample number, while for 2D data, we only need four sample number to perform bilinear interpolation. I have no idea how to get other six sample location? Could you help me?

Best wishes,
Meixiang Huang

A large error when kernel_size > 3.

I ran dcnv2 with torchvision.ops.deform_conv2d, and got the same result with kernel_size=3.
But got different result when kernel_size>3.
My implementation of dcnv2 below:

def torch_initialize_weights(modules):
    # weight initialization
    for m in modules():
        if isinstance(m, torch.nn.Conv2d):
            torch.nn.init.kaiming_normal_(m.weight, mode='fan_out')
            if m.bias is not None:
                torch.nn.init.zeros_(m.bias)
        elif isinstance(m, torch.nn.BatchNorm2d):
            torch.nn.init.ones_(m.weight)
            torch.nn.init.zeros_(m.bias)
        elif isinstance(m, nn.Linear):
            torch.nn.init.normal_(m.weight, 0, 0.01)
            if m.bias is not None:
                torch.nn.init.zeros_(m.bias)
        elif isinstance(m, torch.nn.ConvTranspose2d):
            torch.nn.init.kaiming_normal_(m.weight, mode='fan_out')
            if m.bias is not None:
                torch.nn.init.zeros_(m.bias)

class TorchDeformableConvV2_split(torch.nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size,
                 stride=1,
                 padding=0,
                 dilation=1,
                 groups=1,
                 bias=False,
                 ):
        super(TorchDeformableConvV2, self).__init__()
        self.offset_channel = 2 * kernel_size**2
        self.mask_channel = kernel_size**2

        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.stride = stride

        self.conv_offset = torch.nn.Conv2d(in_channels,
                                           2 * kernel_size * kernel_size,
                                           kernel_size=kernel_size,
                                           stride=stride,
                                           padding=self.padding,
                                           bias=True)

        self.conv_modulator = torch.nn.Conv2d(in_channels,
                                              1 * kernel_size * kernel_size,
                                              kernel_size=kernel_size,
                                              stride=stride,
                                              padding=self.padding,
                                              bias=True)

        self.conv_dcn = torchvision.ops.DeformConv2d(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=(kernel_size - 1) // 2 * dilation,
            dilation=dilation,
            groups=groups,
            bias=bias,
        )

        torch_initialize_weights(self.modules)

    def forward(self, x):
        offset = self.conv_offset(x)
        mask = torch.sigmoid(self.conv_modulator(x))
        y = self.conv_dcn(x, offset, mask=mask)
        return y

Is there something wrong with my code?

Is

    g_lt = (1 + (q_lt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_lt[..., N:].type_as(p) - p[..., N:]))
    g_rb = (1 - (q_rb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_rb[..., N:].type_as(p) - p[..., N:]))
    g_lb = (1 + (q_lb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_lb[..., N:].type_as(p) - p[..., N:]))
    g_rt = (1 - (q_rt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_rt[..., N:].type_as(p) - p[..., N:]))

bilinear kernel is wrong?

        g_lt = (1 + (q_lt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_lt[..., N:].type_as(p) - p[..., N:]))
        g_rb = (1 - (q_rb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_rb[..., N:].type_as(p) - p[..., N:]))
        g_lb = (1 + (q_lb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_lb[..., N:].type_as(p) - p[..., N:]))
        g_rt = (1 - (q_rt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_rt[..., N:].type_as(p) - p[..., N:]))

This computation is different from standard bilinear algorithm, is it wrong or another algorithm?

confused bout the size of x and offsets

Thanks for your sharing code, and i'm cofused about the size of each x and offsets.
If the shape of x is (b,c,h,w), kernel_size is 3, padding is 1, the offsets should be (b,18,h,w) and the x_offset (b,c,3h,3w) is the deformable form of original input x? Finally, the output still is (b,c,h,w) after a convolution layer(kernel_size is 3,no padding and stride is 3)?
Please point out the mistake if my understanding is wrong.
Thanks you again and look forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.