4uiiurz1 / pytorch-deform-conv-v2 Goto Github PK

View Code? Open in Web Editor NEW

726.0 726.0 141.0 30 KB

PyTorch implementation of Deformable ConvNets v2 (Modulated Deformable Convolution)

License: MIT License

Python 100.00%

deformable-convnets deformable-convolutional-networks pytorch

pytorch-deform-conv-v2's People

Contributors

Stargazers

Watchers

Forkers

ddeeppnneett fanyangmeng zgsxwsdxg kekedan ouya-bytes sleepingidea anorthman charlottesean bolinpu ashwathaithal xgmiao flt19940317 ns3284 johnshin86 liu3xing3long huangwenwenlili zehaoy avagreeen frizy-up tony-leeee worksking hqmetaphor guoruiwang wellxiong jason4521 yogsin lipanr xzk7 jingang-cv sunting78 baiyubaiyu ssjatmhmy brain-tumor zhzgithub ailihong zuojianhao shualite baiboat zjplab wangshuaixian jryongithub knmac twistedmove lilujunai ttpro1995 qszno2 binianzjl thanatos123456 strongdiamond mengkunzhao baihang04151430 cookiecheng liangliu123 ttakenaga h-wenfeng aabiao rensimon yonghoonkwon doudou123456 chisyliu zzwei1 kelsey2018 tomsirliu shuai-xie adrianosantospb cassiel829 linchengqiao527 huxianer zhangweichen2006 firminsun bjmajic chnxindong shijiestu lmpan spadeliu xrosliang holmes-gu seasoncarl yaoceyi mldl yilak1 cweizen xinyuegtxy pointcloudyc ppalantir xj-xx wqw5233 lukaka4331 antordragon sunr1seee fateeeeee oldfemalepig shuguoj xiaowenhe ioyy900205 henkwu aupendu ruofei7 ma3252788 jinglongdu

pytorch-deform-conv-v2's Issues

Why not use grid_sample?

I think the function in PyTorch grid_sample can get the value given the position, and that may be more convenient

Meaning of q_lt, q_rb, q_lb, q_rt?

I know roughly these variables represent some kind of boundary coordinates. But what on earth do they represent? What does "lt" "rb" mean exactly?

why the gpu memory is rising when training?

Finally,the memory explodes

How do i train offset individual?

Thanks for your contribution,I want that feature and offset are separated,like
forward(self,feature,offset):.....
How can i do this?

No batch Normalization in deformable conv?

Will you add deform group param next ?

The implementation of creating a new feature map and conv with stride=kernel_size is so cool .
Will you add deform group param next ?

ValueError: Modules that have backward hooks assigned can't be compiled: Conv2d(512, 18, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

up up，想了很久不知道怎么解决，全部的问题如下：
Traceback (most recent call last):
File "D:/Users/wwj/project/tph-yolov5-main/train.py", line 637, in
main(opt)
File "D:/Users/wwj/project/tph-yolov5-main/train.py", line 534, in main
train(opt.hyp, opt, device, callbacks)
File "D:/Users/wwj/project/tph-yolov5-main/train.py", line 355, in train
callbacks.run('on_train_batch_end', ni, model, imgs, targets, paths, plots, opt.sync_bn)
File "D:\Users\wwj\project\tph-yolov5-main\utils\callbacks.py", line 76, in run
logger['callback'](*args, **kwargs)
File "D:\Users\wwj\project\tph-yolov5-main\utils\loggers_init_.py", line 86, in on_train_batch_end
self.tb.add_graph(torch.jit.trace(de_parallel(model), imgs[0:1], strict=False), [])
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 750, in trace
_module_class,
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 942, in trace_module
module = make_module(mod, _module_class, _compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1068, in init
submodule, TracedModule, _compilation_unit=None
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1068, in init
submodule, TracedModule, _compilation_unit=None
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1068, in init
submodule, TracedModule, _compilation_unit=None
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1061, in init
+ str(orig)

Some questions about the testing process for classification tasks

I have a question. We used Deformable Conv in classification tasks. We set the training batchsize the same as im2col_step. During the test process, we put different numbers of test samples in test batch (e.g. test the testing dataset by input one sample per time, or test the testing dataset by inputing ten samples per time), and get different classification results. It seems that how many samples we input to the network each time impacts the final classification results. So why is this happening? Will you kindly give me some advice? What's the relationship between testing batchsize and im2col_step? What's the relationship between training batchsize and im2col_step?

Implemented as in the article?

First of all thank you for implementing the v2 of this paper and maintaining.
** warning - I am mainly a keras/tf user **
If I am reading this correctly x_offset is the original latent space warped ( non-rigidly ) by offsets that were found by p_conv. So x_offset is of shape [batch_size x height x width x features]. The warp happens only in the height and width dimensions ( naturally ) . You then use a regular convolution on top of that.

From reading the paper I think the author intended that the offsets be unique for each filter pixel. That is that the procedure should be :

find offsets
fetch the feature space per filter pixel ( should be [batch_size x height x width x features x filters size]
multiply each feature by the relevant weight
This way two nearby pixels in the latent space can overlap if they wanted.

Am I wrong? It seems like all of the implementations online do something similar to what you did so I assume I am wrong.
Thanks,
Dan

No module named torch.meshgrid

I have used all the torch versions 0.4.0 0.4.1 1.0.0, but there is always an error 'no module named torch.meshgrid' , could you please help me solve the problem

Some questions about the network structure

Why there is only one down sampling(MaxPool2d after the first layer of conv2d) in your network structure? Can you tell me the reason for designing the network like this? Why not add MaxPool2d after the each layer of conv2d to reduce parameters.
If the original image parameters(size) are large, the RAM may not enough for training. What can I do if there are too many parameters?

Is it useful to use dcn on 1x1conv?

hello,is it useful to use dcn on 1x1conv?

What's modulation for?

Actually, I do not understand what the function of modulation. Could you explain it? Thanks so much.

No such file or directory: 'input/scaled_mnist_train.npz'

About the learning rate setting of p_conv and m_conv

You set the gradient of p_conv and m_conv to 0.1 times the other layers, but I find the gradient has no change after backward.
I use the following code to test.

    def _set_lr(module, grad_input, grad_output):
        print('grad input:', grad_input)
        print('grad output:', grad_output)
        grad_input = (grad_input[i] * 0.1 for i in range(len(grad_input)))
        grad_output = (grad_output[i] * 0.1 for i in range(len(grad_output)))

    x = torch.randn(4, 3, 5, 5)
    y_ = torch.randn(4, 1, 5, 5)
    loss = nn.L1Loss()

    d_conv = DeformConv2d(inc=3, outc=1, modulation=True)

    y = d_conv.forward(x)
    l = loss(y, y_)
    l.backward()

    print('p conv grad:')
    print(d_conv.p_conv.weight.grad)
    print('m conv grad:')
    print(d_conv.m_conv.weight.grad)
    print('conv grad:')
    print(d_conv.conv.weight.grad)

The gradient of p_conv is same with the grad_input, but I think the gradient of p_conv is 0.1 times the gradient of the grad_input. Am I wrong?

a problem of _get_p_0

Firstly,thanks for your work
I noticed that p_0_x and p_0_y all start from 1,so the first coordinate is (1,1).But if the kernel_size isn't 1,there will be negative coordinate.Thought you use torch.clamp to aviod those negative value.But I don't think it is the convolution operation described by the author.I think the coordinate should begin from ((kernel_size-1)/2,(kernel_size-1)/2)

deform

question about kernel_size

does the kernel_size only equal 3? i set kernel_size=5 ,the result of conv is the same with kernel_size=3,why?

Confused with edge value.

pytorch-deform-conv-v2/deform_conv_v2.py

Line 64 in 918742a

floor_p = p - (p - torch.floor(p))

Why not the value of edge position in the padding area is not computed by bilinear interpolation, instead of sampling the floor-position value?

Maybe a little problem

I think the there is a mistake here:

self.conv = nn.Conv2d(inc, outc, kernel_size=kernel_size, stride=kernel_size, bias=bias)

stride=kernel_size shoule be stride=stride. Is that right?

I think the weight should be initialized as 0 rather than 0.5

pytorch-deform-conv-v2/deform_conv_v2.py

Line 25 in 9ccc492

nn.init.constant_(self.m_conv.weight, 0.5)

Solve specific GPU problem.

When I hope the whole net run on the GPU2, instead of GPU0. Current code always has some tensors running on GPU0, and lead a sum operation p = p_0 + p_n + offset failed, which is not what I want.

Modified point:

   at _get_p_n and _get_p_0 functions, add device parameter.

In file: https://github.com/4uiiurz1/pytorch-deform-conv-v2/blob/master/deform_conv_v2.py

Original code:

'
def _get_p_n(self, N, dtype):
p_n_x, p_n_y = torch.meshgrid(
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1),
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1))
# (2N, 1)
p_n = torch.cat([torch.flatten(p_n_x), torch.flatten(p_n_y)], 0)
p_n = p_n.view(1, 2*N, 1, 1).type(dtype)

    return p_n

def _get_p_0(self, h, w, N, dtype):
    p_0_x, p_0_y = torch.meshgrid(
        torch.arange(1, h*self.stride+1, self.stride),
        torch.arange(1, w*self.stride+1, self.stride))
    p_0_x = torch.flatten(p_0_x).view(1, 1, h, w).repeat(1, N, 1, 1)
    p_0_y = torch.flatten(p_0_y).view(1, 1, h, w).repeat(1, N, 1, 1)
    p_0 = torch.cat([p_0_x, p_0_y], 1).type(dtype)

    return p_0

def _get_p(self, offset, dtype):
N, h, w = offset.size(1)//2, offset.size(2), offset.size(3)

    # (1, 2N, 1, 1)
    p_n = self._get_p_n(N, **dtype)**
    # (1, 2N, h, w)
    p_0 = self._get_p_0(h, w, N, **dtype)**
    p = p_0 + p_n + offset
    return p

Suggested modified code:

'
def _get_p_n(self, N, device, dataType):
p_n_x, p_n_y = torch.meshgrid(
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1),
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1))
# (2N, 1)
p_n = torch.cat([torch.flatten(p_n_x), torch.flatten(p_n_y)], 0)
p_n = p_n.view(1, 2*N, 1, 1).to(device, dtype=dataType)

    return p_n


def _get_p_0(self, h, w, N, device, dataType):
    p_0_x, p_0_y = torch.meshgrid(
        torch.arange(1, h*self.stride+1, self.stride),
        torch.arange(1, w*self.stride+1, self.stride))
    p_0_x = torch.flatten(p_0_x).view(1, 1, h, w).repeat(1, N, 1, 1)
    p_0_y = torch.flatten(p_0_y).view(1, 1, h, w).repeat(1, N, 1, 1)
    p_0 = torch.cat([p_0_x, p_0_y], 1).to(device, dtype=dataType)

    return p_0

def _get_p(self, offset, dtype):
    N, h, w = offset.size(1)//2, offset.size(2), offset.size(3)

    # (1, 2N, 1, 1)
    p_n = self._get_p_n(N, offset.device, offset.dtype)
    # (1, 2N, h, w)
    p_0 = self._get_p_0(h, w, N, offset.device, offset.dtype)
    p = p_0 + p_n + offset
    return p

Have you test the speed of deform conv?

Is there a large gap about speed between deform conv & norm conv?

deform_conv_3d

Hi, I want to use 3D deform_conv in my study, I read you 'deform_conv_v2.py', I try to extend your code to 3D deform_conv, while I no idea how to modify these code below:
q_lt = torch.cat([torch.clamp(q_lt[..., :N], 0, x.size(2)-1), torch.clamp(q_lt[..., N:], 0, x.size(3)-1)], dim=-1).long()
q_rb = torch.cat([torch.clamp(q_rb[..., :N], 0, x.size(2)-1), torch.clamp(q_rb[..., N:], 0, x.size(3)-1)], dim=-1).long()
q_lb = torch.cat([q_lt[..., :N], q_rb[..., N:]], dim=-1)
q_rt = torch.cat([q_rb[..., :N], q_lt[..., N:]], dim=-1)

for 3D data, we should use trilinear, which need eight sample number, while for 2D data, we only need four sample number to perform bilinear interpolation. I have no idea how to get other six sample location? Could you help me?

Best wishes,
Meixiang Huang

A large error when kernel_size > 3.

I ran dcnv2 with torchvision.ops.deform_conv2d, and got the same result with kernel_size=3.
But got different result when kernel_size>3.
My implementation of dcnv2 below:

def torch_initialize_weights(modules):
    # weight initialization
    for m in modules():
        if isinstance(m, torch.nn.Conv2d):
            torch.nn.init.kaiming_normal_(m.weight, mode='fan_out')
            if m.bias is not None:
                torch.nn.init.zeros_(m.bias)
        elif isinstance(m, torch.nn.BatchNorm2d):
            torch.nn.init.ones_(m.weight)
            torch.nn.init.zeros_(m.bias)
        elif isinstance(m, nn.Linear):
            torch.nn.init.normal_(m.weight, 0, 0.01)
            if m.bias is not None:
                torch.nn.init.zeros_(m.bias)
        elif isinstance(m, torch.nn.ConvTranspose2d):
            torch.nn.init.kaiming_normal_(m.weight, mode='fan_out')
            if m.bias is not None:
                torch.nn.init.zeros_(m.bias)

class TorchDeformableConvV2_split(torch.nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size,
                 stride=1,
                 padding=0,
                 dilation=1,
                 groups=1,
                 bias=False,
                 ):
        super(TorchDeformableConvV2, self).__init__()
        self.offset_channel = 2 * kernel_size**2
        self.mask_channel = kernel_size**2

        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.stride = stride

        self.conv_offset = torch.nn.Conv2d(in_channels,
                                           2 * kernel_size * kernel_size,
                                           kernel_size=kernel_size,
                                           stride=stride,
                                           padding=self.padding,
                                           bias=True)

        self.conv_modulator = torch.nn.Conv2d(in_channels,
                                              1 * kernel_size * kernel_size,
                                              kernel_size=kernel_size,
                                              stride=stride,
                                              padding=self.padding,
                                              bias=True)

        self.conv_dcn = torchvision.ops.DeformConv2d(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=(kernel_size - 1) // 2 * dilation,
            dilation=dilation,
            groups=groups,
            bias=bias,
        )

        torch_initialize_weights(self.modules)

    def forward(self, x):
        offset = self.conv_offset(x)
        mask = torch.sigmoid(self.conv_modulator(x))
        y = self.conv_dcn(x, offset, mask=mask)
        return y

Is there something wrong with my code?

Is

    g_lt = (1 + (q_lt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_lt[..., N:].type_as(p) - p[..., N:]))
    g_rb = (1 - (q_rb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_rb[..., N:].type_as(p) - p[..., N:]))
    g_lb = (1 + (q_lb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_lb[..., N:].type_as(p) - p[..., N:]))
    g_rt = (1 - (q_rt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_rt[..., N:].type_as(p) - p[..., N:]))

bilinear kernel is wrong?

        g_lt = (1 + (q_lt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_lt[..., N:].type_as(p) - p[..., N:]))
        g_rb = (1 - (q_rb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_rb[..., N:].type_as(p) - p[..., N:]))
        g_lb = (1 + (q_lb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_lb[..., N:].type_as(p) - p[..., N:]))
        g_rt = (1 - (q_rt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_rt[..., N:].type_as(p) - p[..., N:]))

This computation is different from standard bilinear algorithm, is it wrong or another algorithm?

confused bout the size of x and offsets

Thanks for your sharing code, and i'm cofused about the size of each x and offsets.
If the shape of x is (b,c,h,w), kernel_size is 3, padding is 1, the offsets should be (b,18,h,w) and the x_offset (b,c,3h,3w) is the deformable form of original input x? Finally, the output still is (b,c,h,w) after a convolution layer(kernel_size is 3,no padding and stride is 3)?
Please point out the mistake if my understanding is wrong.
Thanks you again and look forward to your reply.

The memory is not fridendly?

Firstly thank you for your great work. I have a deeper understanding for deformable convolution after reading this script. But I find this version uses huge memory when runs on GPU compared with a cuda version (https://github.com/CharlesShang/DCNv2) . Could you tell me why?

Thank you.