4uiiurz1 / pytorch-deform-conv-v2 Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of Deformable ConvNets v2 (Modulated Deformable Convolution)
License: MIT License
PyTorch implementation of Deformable ConvNets v2 (Modulated Deformable Convolution)
License: MIT License
I think the function in PyTorch grid_sample can get the value given the position, and that may be more convenient
I know roughly these variables represent some kind of boundary coordinates. But what on earth do they represent? What does "lt" "rb" mean exactly?
Finally,the memory explodes
Thanks for your contribution,I want that feature and offset are separated,like
forward(self,feature,offset):.....
How can i do this?
No batch Normalization in deformable conv?
The implementation of creating a new feature map and conv with stride=kernel_size is so cool .
Will you add deform group param next ?
up up,想了很久不知道怎么解决,全部的问题如下:
Traceback (most recent call last):
File "D:/Users/wwj/project/tph-yolov5-main/train.py", line 637, in
main(opt)
File "D:/Users/wwj/project/tph-yolov5-main/train.py", line 534, in main
train(opt.hyp, opt, device, callbacks)
File "D:/Users/wwj/project/tph-yolov5-main/train.py", line 355, in train
callbacks.run('on_train_batch_end', ni, model, imgs, targets, paths, plots, opt.sync_bn)
File "D:\Users\wwj\project\tph-yolov5-main\utils\callbacks.py", line 76, in run
logger['callback'](*args, **kwargs)
File "D:\Users\wwj\project\tph-yolov5-main\utils\loggers_init_.py", line 86, in on_train_batch_end
self.tb.add_graph(torch.jit.trace(de_parallel(model), imgs[0:1], strict=False), [])
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 750, in trace
_module_class,
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 942, in trace_module
module = make_module(mod, _module_class, _compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1068, in init
submodule, TracedModule, _compilation_unit=None
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1068, in init
submodule, TracedModule, _compilation_unit=None
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1068, in init
submodule, TracedModule, _compilation_unit=None
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 568, in make_module
return _module_class(mod, _compilation_unit=_compilation_unit)
File "D:\Users\wwj\Anaconda\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 1061, in init
+ str(orig)
I have a question. We used Deformable Conv in classification tasks. We set the training batchsize the same as im2col_step. During the test process, we put different numbers of test samples in test batch (e.g. test the testing dataset by input one sample per time, or test the testing dataset by inputing ten samples per time), and get different classification results. It seems that how many samples we input to the network each time impacts the final classification results. So why is this happening? Will you kindly give me some advice? What's the relationship between testing batchsize and im2col_step? What's the relationship between training batchsize and im2col_step?
First of all thank you for implementing the v2 of this paper and maintaining.
** warning - I am mainly a keras/tf user **
If I am reading this correctly x_offset is the original latent space warped ( non-rigidly ) by offsets that were found by p_conv. So x_offset is of shape [batch_size x height x width x features]. The warp happens only in the height and width dimensions ( naturally ) . You then use a regular convolution on top of that.
From reading the paper I think the author intended that the offsets be unique for each filter pixel. That is that the procedure should be :
Am I wrong? It seems like all of the implementations online do something similar to what you did so I assume I am wrong.
Thanks,
Dan
I have used all the torch versions 0.4.0 0.4.1 1.0.0, but there is always an error 'no module named torch.meshgrid' , could you please help me solve the problem
Why there is only one down sampling(MaxPool2d after the first layer of conv2d) in your network structure? Can you tell me the reason for designing the network like this? Why not add MaxPool2d after the each layer of conv2d to reduce parameters.
If the original image parameters(size) are large, the RAM may not enough for training. What can I do if there are too many parameters?
hello,is it useful to use dcn on 1x1conv?
Actually, I do not understand what the function of modulation. Could you explain it? Thanks so much.
You set the gradient of p_conv and m_conv to 0.1 times the other layers, but I find the gradient has no change after backward.
I use the following code to test.
def _set_lr(module, grad_input, grad_output):
print('grad input:', grad_input)
print('grad output:', grad_output)
grad_input = (grad_input[i] * 0.1 for i in range(len(grad_input)))
grad_output = (grad_output[i] * 0.1 for i in range(len(grad_output)))
x = torch.randn(4, 3, 5, 5)
y_ = torch.randn(4, 1, 5, 5)
loss = nn.L1Loss()
d_conv = DeformConv2d(inc=3, outc=1, modulation=True)
y = d_conv.forward(x)
l = loss(y, y_)
l.backward()
print('p conv grad:')
print(d_conv.p_conv.weight.grad)
print('m conv grad:')
print(d_conv.m_conv.weight.grad)
print('conv grad:')
print(d_conv.conv.weight.grad)
The gradient of p_conv is same with the grad_input, but I think the gradient of p_conv is 0.1 times the gradient of the grad_input. Am I wrong?
Firstly,thanks for your work
I noticed that p_0_x and p_0_y all start from 1,so the first coordinate is (1,1).But if the kernel_size isn't 1,there will be negative coordinate.Thought you use torch.clamp to aviod those negative value.But I don't think it is the convolution operation described by the author.I think the coordinate should begin from ((kernel_size-1)/2,(kernel_size-1)/2)
does the kernel_size only equal 3? i set kernel_size=5 ,the result of conv is the same with kernel_size=3,why?
pytorch-deform-conv-v2/deform_conv_v2.py
Line 64 in 918742a
Why not the value of edge position in the padding area is not computed by bilinear interpolation, instead of sampling the floor-position value?
I think the there is a mistake here:
self.conv = nn.Conv2d(inc, outc, kernel_size=kernel_size, stride=kernel_size, bias=bias)
stride=kernel_size
shoule be stride=stride
. Is that right?
pytorch-deform-conv-v2/deform_conv_v2.py
Line 25 in 9ccc492
When I hope the whole net run on the GPU2, instead of GPU0. Current code always has some tensors running on GPU0, and lead a sum operation p = p_0 + p_n + offset failed, which is not what I want.
at _get_p_n and _get_p_0 functions, add device parameter.
In file: https://github.com/4uiiurz1/pytorch-deform-conv-v2/blob/master/deform_conv_v2.py
'
def _get_p_n(self, N, dtype):
p_n_x, p_n_y = torch.meshgrid(
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1),
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1))
# (2N, 1)
p_n = torch.cat([torch.flatten(p_n_x), torch.flatten(p_n_y)], 0)
p_n = p_n.view(1, 2*N, 1, 1).type(dtype)
return p_n
def _get_p_0(self, h, w, N, dtype):
p_0_x, p_0_y = torch.meshgrid(
torch.arange(1, h*self.stride+1, self.stride),
torch.arange(1, w*self.stride+1, self.stride))
p_0_x = torch.flatten(p_0_x).view(1, 1, h, w).repeat(1, N, 1, 1)
p_0_y = torch.flatten(p_0_y).view(1, 1, h, w).repeat(1, N, 1, 1)
p_0 = torch.cat([p_0_x, p_0_y], 1).type(dtype)
return p_0
def _get_p(self, offset, dtype):
N, h, w = offset.size(1)//2, offset.size(2), offset.size(3)
# (1, 2N, 1, 1)
p_n = self._get_p_n(N, **dtype)**
# (1, 2N, h, w)
p_0 = self._get_p_0(h, w, N, **dtype)**
p = p_0 + p_n + offset
return p
'
'
def _get_p_n(self, N, device, dataType):
p_n_x, p_n_y = torch.meshgrid(
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1),
torch.arange(-(self.kernel_size-1)//2, (self.kernel_size-1)//2+1))
# (2N, 1)
p_n = torch.cat([torch.flatten(p_n_x), torch.flatten(p_n_y)], 0)
p_n = p_n.view(1, 2*N, 1, 1).to(device, dtype=dataType)
return p_n
def _get_p_0(self, h, w, N, device, dataType):
p_0_x, p_0_y = torch.meshgrid(
torch.arange(1, h*self.stride+1, self.stride),
torch.arange(1, w*self.stride+1, self.stride))
p_0_x = torch.flatten(p_0_x).view(1, 1, h, w).repeat(1, N, 1, 1)
p_0_y = torch.flatten(p_0_y).view(1, 1, h, w).repeat(1, N, 1, 1)
p_0 = torch.cat([p_0_x, p_0_y], 1).to(device, dtype=dataType)
return p_0
def _get_p(self, offset, dtype):
N, h, w = offset.size(1)//2, offset.size(2), offset.size(3)
# (1, 2N, 1, 1)
p_n = self._get_p_n(N, offset.device, offset.dtype)
# (1, 2N, h, w)
p_0 = self._get_p_0(h, w, N, offset.device, offset.dtype)
p = p_0 + p_n + offset
return p
'
Is there a large gap about speed between deform conv & norm conv?
Hi, I want to use 3D deform_conv in my study, I read you 'deform_conv_v2.py', I try to extend your code to 3D deform_conv, while I no idea how to modify these code below:
q_lt = torch.cat([torch.clamp(q_lt[..., :N], 0, x.size(2)-1), torch.clamp(q_lt[..., N:], 0, x.size(3)-1)], dim=-1).long()
q_rb = torch.cat([torch.clamp(q_rb[..., :N], 0, x.size(2)-1), torch.clamp(q_rb[..., N:], 0, x.size(3)-1)], dim=-1).long()
q_lb = torch.cat([q_lt[..., :N], q_rb[..., N:]], dim=-1)
q_rt = torch.cat([q_rb[..., :N], q_lt[..., N:]], dim=-1)
for 3D data, we should use trilinear, which need eight sample number, while for 2D data, we only need four sample number to perform bilinear interpolation. I have no idea how to get other six sample location? Could you help me?
Best wishes,
Meixiang Huang
I ran dcnv2
with torchvision.ops.deform_conv2d
, and got the same result with kernel_size=3
.
But got different result when kernel_size>3
.
My implementation of dcnv2
below:
def torch_initialize_weights(modules):
# weight initialization
for m in modules():
if isinstance(m, torch.nn.Conv2d):
torch.nn.init.kaiming_normal_(m.weight, mode='fan_out')
if m.bias is not None:
torch.nn.init.zeros_(m.bias)
elif isinstance(m, torch.nn.BatchNorm2d):
torch.nn.init.ones_(m.weight)
torch.nn.init.zeros_(m.bias)
elif isinstance(m, nn.Linear):
torch.nn.init.normal_(m.weight, 0, 0.01)
if m.bias is not None:
torch.nn.init.zeros_(m.bias)
elif isinstance(m, torch.nn.ConvTranspose2d):
torch.nn.init.kaiming_normal_(m.weight, mode='fan_out')
if m.bias is not None:
torch.nn.init.zeros_(m.bias)
class TorchDeformableConvV2_split(torch.nn.Module):
def __init__(self,
in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=1,
bias=False,
):
super(TorchDeformableConvV2, self).__init__()
self.offset_channel = 2 * kernel_size**2
self.mask_channel = kernel_size**2
self.padding = padding
self.dilation = dilation
self.groups = groups
self.stride = stride
self.conv_offset = torch.nn.Conv2d(in_channels,
2 * kernel_size * kernel_size,
kernel_size=kernel_size,
stride=stride,
padding=self.padding,
bias=True)
self.conv_modulator = torch.nn.Conv2d(in_channels,
1 * kernel_size * kernel_size,
kernel_size=kernel_size,
stride=stride,
padding=self.padding,
bias=True)
self.conv_dcn = torchvision.ops.DeformConv2d(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2 * dilation,
dilation=dilation,
groups=groups,
bias=bias,
)
torch_initialize_weights(self.modules)
def forward(self, x):
offset = self.conv_offset(x)
mask = torch.sigmoid(self.conv_modulator(x))
y = self.conv_dcn(x, offset, mask=mask)
return y
Is there something wrong with my code?
g_lt = (1 + (q_lt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_lt[..., N:].type_as(p) - p[..., N:]))
g_rb = (1 - (q_rb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_rb[..., N:].type_as(p) - p[..., N:]))
g_lb = (1 + (q_lb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_lb[..., N:].type_as(p) - p[..., N:]))
g_rt = (1 - (q_rt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_rt[..., N:].type_as(p) - p[..., N:]))
g_lt = (1 + (q_lt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_lt[..., N:].type_as(p) - p[..., N:]))
g_rb = (1 - (q_rb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_rb[..., N:].type_as(p) - p[..., N:]))
g_lb = (1 + (q_lb[..., :N].type_as(p) - p[..., :N])) * (1 - (q_lb[..., N:].type_as(p) - p[..., N:]))
g_rt = (1 - (q_rt[..., :N].type_as(p) - p[..., :N])) * (1 + (q_rt[..., N:].type_as(p) - p[..., N:]))
This computation is different from standard bilinear algorithm, is it wrong or another algorithm?
Thanks for your sharing code, and i'm cofused about the size of each x and offsets.
If the shape of x is (b,c,h,w), kernel_size is 3, padding is 1, the offsets should be (b,18,h,w) and the x_offset (b,c,3h,3w) is the deformable form of original input x? Finally, the output still is (b,c,h,w) after a convolution layer(kernel_size is 3,no padding and stride is 3)?
Please point out the mistake if my understanding is wrong.
Thanks you again and look forward to your reply.
Firstly thank you for your great work. I have a deeper understanding for deformable convolution after reading this script. But I find this version uses huge memory when runs on GPU compared with a cuda version (https://github.com/CharlesShang/DCNv2) . Could you tell me why?
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.