thomasverelst / dynconv Goto Github PK

Code for Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference (CVPR2020)

Home Page: https://arxiv.org/abs/1912.03203

Python 45.42% Cuda 53.88% C++ 0.03% Cython 0.67%

dynamic-convolutions deep-learning neural-networks conditional-execution dynamic-execution pytorch conditional-computation

dynconv's People

Contributors

Stargazers

Watchers

Forkers

killsking lilujunai forks-learning shaohuilin daydreamer2023 githubltqc pkurainbow easonl13 snehashis1997 digantamisra98 yanchengwang jiaerwang0328 elena-qiu cheungbh

dynconv's Issues

About multi-gpu training

Thanks for your awesome work! Is there any idea how multi-gpu training is supported? Because you know training ResNet-101 on ImageNet with a single GPU is unacceptably slow.

发生异常: RuntimeError CUDA error: the launch timed out and was terminated File "/home/lym/Compare experiment new/classification/main_cifar.py", line 72, in main model = net_module(sparse=args.budget >= 0, pretrained=args.pretrained).to(device=device) File "/home/lym/Compare experiment new/classification/main_cifar.py", line 232, in <module> main()

This code is buggy and the environment configuration is fine, the server is 4090,24 gigabytes of video memory。
model = net_module(sparse=args.budget >= 0, pretrained=args.pretrained).to(device=device)

question about the sparsity_target

Hello, this is brilliant work, I want to use the binary gumbel-softmax for my work. But there are some problems.
I used the soft mask for the first layer only (just apply the generated mask to the features after the first layer)，and I found a strange phenomenon。The gumbel noise seemed to influence the training process too much. I plotted the sparsity loss only, and I found I usually couldn't obtain the sparsity target I set. Is this process right?
temp=5.0

temp=1.0

Training on Google Colab

Hello,

Thank you for your effort to make your great work open source. I just wanted to ask you if you have a version of your code compatible with google colab to be able to run it without having GPU?

Does the cuda version support standard convolutions?

Thanks for your amazing work.

I want to apply your method to a standard convolution instead of a depthwise one.
Does your cuda code support standard convolutions?

A question about soft-mask calculation

Wonderful job！I studied your paper and code these days, which is very enlightening to me.

I have a question about the code to calculate the soft-mask by soft = self.maskconv(x). I'm not quite sure what the reason for choosing this (conv+fc) network to calculate the soft-mask. Thank you for your kind help.

/annot/valid.json is missing

Thanks for your inspiring work on dynamic convolution. I am interested in this exciting work and try to test the efficiency of dynconv. I run the test code for pose estimation and met a exception: No such file or directory: '$mpii_root/annot/valid.json'.
I wonder how can I get this file.

BTW, when I set the stride = 2 for conv3x3_dw. I often crashed with 'THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCCachingHostAllocator.cpp line=296 error=77 : an illegal memory access was encountered'. Could you help me debug those issues? Thanks in advance.

Pengyu Zhang.

Mask calculation

Insightful work!!!
During the study of your paper, I have some questions (My English is not very good, and I am not aggressive, just some confusion):

The first problem is Figure 2. After Sigmoid, everything should be >= 0, but the figure still use threshold 0 to make decision. From the paper, I think there should be 0.5; or No Sigmoid used.

The second problem is about the code.

        if gumbel_noise:
            eps = self.eps
            U1, U2 = torch.rand_like(x), torch.rand_like(x)
            g1, g2 = -torch.log(-torch.log(U1 + eps)+eps), - \
                torch.log(-torch.log(U2 + eps)+eps)
            x = x + g1 - g2

        soft = torch.sigmoid(x / gumbel_temp)
        hard = ((soft >= 0.5).float() - soft).detach() + soft

However, the paper said, "Note that this formulation has no logarithms or exponentials in the forward pass, typically expensive computations on hardware platforms"
So in the code, why not just use soft >= 0, and no sigmoid operation.

Thanks for your kind help!

Questions about mask generation

Hi @thomasverelst

Congrats, nice work! I have two questions out of curiosity:

Forward pass: Why did you choose to sample from the Bernoulli distribution instead of the Gumbel-softmax? To my knowledge, sampling from the Bernoulli distribution introduces a bias in the gradient estimation which could make optimization trickier. I understand that you would not be able to use sparse convolutions in the training but I wonder if there is another reason.
Have you tried annealing the temperature parameter to less than 1?

license

Thanks for your work.
What type of license this project has? MIT, GNU or anything else?

Questions about mask usage in convolution

Hi,

Thanks for your great work!

I have some questions about the mask usage in your convolution operation. I'm wondering what is the meaning to assign conv_module.__mask__ with mask. I checked that the conv_module(x) function does not consider the conv_module.__mask__ property when operating.

dynconv/classification/dynconv/layers.py

Lines 14 to 18 in 19e4c58

 def conv1x1(conv_module, x, mask, fast=False): 

 w = conv_module.weight.data 

 mask.flops_per_position += w.shape[0]*w.shape[1] 

 conv_module.__mask__ = mask 

 return conv_module(x)

Therefore, I can't get how the masks are applied in network forward propagation, such as the basicblock in

dynconv/classification/models/resnet_util.py

Lines 66 to 70 in 19e4c58

 x = dynconv.conv3x3(self.conv1, x, None, mask_dilate) 

 x = dynconv.bn_relu(self.bn1, self.relu, x, mask_dilate) 

 x = dynconv.conv3x3(self.conv2, x, mask_dilate, mask) 

 x = dynconv.bn_relu(self.bn2, None, x, mask) 

 out = identity + dynconv.apply_mask(x, mask)

It seems that only the mask in dynconv.apply_mask(x, mask) works.

About the "Classification with efficient sparse MobileNetV2"

Excellent work !!!

Recently I have studied your paper and code , which is very enlightening to me. I sincerely think that your work is of great significence for us to study the dynamic convolutions. Thank you for your excellent work very much!

By the way, could you please tell me the time when the code of "Classification with efficient sparse MobileNetV2" will be published? I think it would be also a wonderful code !!!

Thank you very much!

About pose environment

Hi. Thanks for your work. I am currently running the pose demo, but I failed to build the lib folder, which shows
"make: *** No targets specified and no makefile found. Stop".
What should I do for it?

Ponder_Cost_Plotting

File "main_cifar.py", line 224, in validate
viz.plot_ponder_cost(meta['masks'])
File "/Data2/xyz/dynconv-master/classification/utils/viz.py", line 26, in plot_ponder_cost
ponder_cost = ponder_cost_map(masks)
File "/Data2/xyz/dynconv-master/classification/dynconv/utils.py", line 23, in ponder_cost_map
return out.squeeze(0).cpu().numpy()
AttributeError: 'NoneType' object has no attribute 'squeeze'

Why this is showing while plotting the ponder_cost_map?

	def conv1x1(conv_module, x, mask, fast=False):
	w = conv_module.weight.data
	mask.flops_per_position += w.shape[0]*w.shape[1]
	conv_module.__mask__ = mask
	return conv_module(x)

	x = dynconv.conv3x3(self.conv1, x, None, mask_dilate)
	x = dynconv.bn_relu(self.bn1, self.relu, x, mask_dilate)
	x = dynconv.conv3x3(self.conv2, x, mask_dilate, mask)
	x = dynconv.bn_relu(self.bn2, None, x, mask)
	out = identity + dynconv.apply_mask(x, mask)