Giter Club home page Giter Club logo

deltacnn's Introduction

DeltaCNN

DeltaCNN caches intermediate feature maps from previous frames to accelerate inference of new frames by only processing updated pixels. DeltaCNN can be used as a drop-in replacement for most layers of a CNN by simply replacing the PyTorch layers with the DeltaCNN equivalent. Model weights and inference logic can be reused without the need for retraining. All layers are implemented in CUDA, other devices are currently not supported.

A preprint of the paper is available on Arxiv.

Find more information about the project on the Project Website

Prediction Updates Prediction Updates

Table of Contents

1 Setup

Prerequsites

DeltaCNN depends on:

Please install these packages before installing DeltaCNN.

Install DeltaCNN Framework

  • Navigate to DeltaCNN root directory
  • Run python setup.py install --user (This can take a few minutes)

2 Example project

example/mobilenetv2_webcam_example.py contains a simple example that showcase all steps needed for replacing PyTorch's CNN layers by DeltaCNN. In this example, all steps required to port a network are highlighted with # added and # replaced by. In the main file, we load the original CNN, and the DeltaCNN variant, and run both on webcam video input. Play around with the DCThreshold.t_default to see how the performance and accuracy change with different values. For the sake of simplicity, we avoided steps like fusing batch normalization layers together with convolutional layers or tuning thresholds for each layer individually.

3 Using DeltaCNN in your project

Using DeltaCNN in your CNN project should in most cases be as easy as replacing all layers in the CNN with the DeltaCNN equivalent and adding a dense-to-sparse (DCSparsify()) layer at the beginning and a sparse-to-dense (DCDensify()) layer at the end. However, some things need to be considered when replacing the layers.

3.1 Replacing Layers

Nonlinear layers need unique instances for every location they are used in the model as they cache input/output feature maps at the current stage. To be safe, create a unique instance for every use of a layer in the model. For example, this toy model can be converted as follows.

####### PyTorch
from torch import nn
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(...)
        self.conv2 = nn.Conv2d(...)
        self.conv3 = nn.Conv2d(...)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        return self.relu(self.conv3(x))
####### DeltaCNN
import deltacnn
class CNN(deltacnn.DCModule):
    def __init__(self):
        super(CNN, self).__init__()
        self.sparsify = deltacnn.DCSparsify()
        self.conv1 = deltacnn.DCConv2d(...)
        self.conv2 = deltacnn.DCConv2d(...)
        self.conv3 = deltacnn.DCConv2d(...)
        self.relu1 = deltacnn.DCActivation(activation="relu")
        self.relu2 = deltacnn.DCActivation(activation="relu")
        self.relu3 = deltacnn.DCActivation(activation="relu")
        self.densify = deltacnn.DCDensify()

    def forward(self, x):
        x = self.sparsify(x)
        x = self.relu1(self.conv1(x))
        x = self.relu2(self.conv2(x))
        return self.densify(self.relu3(self.conv3(x)))

or simply:

####### DeltaCNN simplified
import deltacnn
class CNN(deltacnn.DCModule):
    def __init__(self):
        super(CNN, self).__init__()
        self.sparsify = deltacnn.DCSparsify()
        self.conv1 = deltacnn.DCConv2d(..., activation="relu")
        self.conv2 = deltacnn.DCConv2d(..., activation="relu")
        self.conv3 = deltacnn.DCConv2d(..., activation="relu", dense_out=True)

    def forward(self, x):
        x = self.sparsify(x)
        x = self.conv1(x)
        x = self.conv2(x)
        return self.conv3(x)

3.2 Perform custom operations not supported by DeltaCNN

If you want to add layers not included in DeltaCNN or apply operations on the feature maps directly, be aware of the feature maps used in DeltaCNN. DeltaCNN propagates only Delta updates between the layers. The output of a DeltaCNN layer consists of a Delta tensor and an update mask. Be careful when directly accessing these values, as skipped pixels are not initialized and contain random values.

If you apply custom operations onto the feature maps, the safest way is to add a DCDensify() layer, apply your operation and then convert the features back to Delta features using DCSparsify(). For example:

####### PyTorch
from torch import nn
class Normalize(nn.Module):
    def forward(self, x):
        return x / x.max()
####### DeltaCNN
from deltacnn import DCDensify, DCSparsify, DCModule
class Normalize(DCModule):
    def __init__(self):
        super(Normalize, self).__init__()
        self.densify = DCDensify()
        self.sparsify = DCSparsify()

    def forward(self, x):
        x = self.densify(x)
        x = x / x.max()
        return self.sparsify(x)

3.3 Weights and features memory layout

DeltaCNN kernels only support torch.channels_last memory format. Furthermore, it expects a specific memory layout for the weights used in convolutional layers. Thus, after loading the weights from disk, process the filters before the first call. And be sure to convert the network input to channels last memory format.

class MyDCModel(DCModule):
   ...

device = "cuda:0"
model = MyDCModel(...)
load_weights(model, weights_path) # weights are stored in PyTorch standard format
model.to(device, memory_format=torch.channels_last) # set the network in channels last mode
model.process_filters() # convert filters into DeltaCNN format

for frame in video:
   frame = frame.to(device).contiguous(memory_format=torch.channels_last)
   out = model(frame)

3.4 Custom thresholds

The easiest way to try DeltaCNN is to use a set a global threshold the DCThreshold.t_default variable before instantiating the model. Good starting points are thresholds in the range between 0.05 to 0.3, but this can vary strongly depending on the network and the noise of the video. If video noise is an issue, specify a larger threshold for the DCSparsify layer using the delta_threshold parameter and compensate if using update mask dilation. For example: DCSparsify(delta_threshold=0.3, dilation=15). Thresholds can also be loaded from json files containing the threshold index as key. Set the path to the thresholds using DCThreshold.path = <path> and load the thresholds after predicting the first frame. For example:

for frame_idx, frame in enumerate(video):
   frame = frame.to(device).contiguous(memory_format=torch.channels_last)
   out = self.model(frame)
   
   if frame_idx == 0:
       DCThreshold.path = threshold_path
       DCThreshold.load_thresholds() 

3.5 Tuning thresholds

On first call, all buffers are allocated in the size of the current input, the layers and logging layers are initialized and all truncation layers register their thresholds in the DCThreshold class. Optimizing the thresholds in a front-to-back manner can be done by iterating over all items stored in the ordered dictionary DCThreshold.t. For example:

sequence = load_video()
DCThreshold.t_default = 0.0
model = init_model()
ref_loss = calc_loss(model, sequence)
max_per_layer_loss_increase = 1.001 # some random number
step_size = 2 # some random number

for key in DCThreshold.t.keys():
    start_loss = calc_loss(model, sequence)
    DCThreshold.t[key] = 0.001 # some random number

    while calc_loss(model, sequence) < start_loss * max_per_layer_loss_increase:
        DCThreshold.t[key] *= step_size
    DCThreshold.t[key] /= step_size # since loss with prev threshold was already too large, go back a step

For better ways to tune the thresholds, please read the respective section in the DeltaCNN paper.

Supported operations

DeltaCNN focuses on end-to-end sparse inference and therefore comes with common CNN layers besides convolutions. Yet, being a small research project, it does not provide all layers you might need or even support the provided layers in all possible configurations. If you want to use a layer that is not included in DeltaCNN, please open an issue. If you have some experience with CUDA, you can add new layers to DeltaCNN yourself - please consider creating a pull request to make DeltaCNN even better.

As a rough overview, DeltaCNN features the following layers:

  • DCSparsify / DCDensify: convert dense features to sparse delta features + update mask and back
  • DCConv2d: Kernel sizes of 1x1, 3x3 and 5x5. All convolutions support striding of 1x1 and 2x2 as well as dilation of any factor and depthwise convolutions. Additionally, kernel size of 7x7 with a stride of 2x2 is also implemented for ResNet. All kernels can be used in float16 and float32 mode. However, as DeltaCNN does not support Tensor Cores (which cuDNN automatically uses in 16 bit mode), performance comparisons against cuDNN should be done in 32 bit mode for apples to apples comparisons.
  • DCActivation: ReLU, ReLU6, LeakyReLU, Sigmoid and Swish.
  • DCMaxPooling, DCAdaptiveAveragePooling: Average and maximum are supported for different kernel sizes.
  • DCUpsamplingNearest2d: By factors of 2, 4, 8 or 16.
  • DCBatchNorm2d: BatchNorm parameters are converted into scale and offset on initialization.
  • DCAdd: Adding two tensors (e.g. skip connection)
  • DCConcatenate: Concatenating two tensors along channel dimension (e.g. skip connection)

Tips & Tricks

  • As a starting point, we would suggest to use a small global threshold, or even 0 and to iteratively increase the threshold on the input until the accuracy decreases. Try to use a update mask dilation on the first layer together with high thresholds to compensate noise. Afterwards, try increasing the global threshold to the maximum that does not significantly reduce accuracy. Use this threshold as baseline when fine tuning individual truncation thresholds.

  • Fusing batch normalization layers together with convolutional layers can have a large impact on performance.

  • Switch between DeltaCNN and cuDNN inference mode without changing the layers by setting DCConv2d.backend to DCBackend.deltacnn or DCBackend.cudnn.

Cite

@article{parger2022deltacnn,
    title = {DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos},
    author = {Mathias Parger, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, Markus Steinberger},
    journal = {CVPR 2022},
    year = {2022},
    month = jun
}

License

DeltaCNN is released under the CC BY-NC 4.0 license. See LICENSE for additional details about it. See also our Terms of Use and Privacy Policy.

deltacnn's People

Contributors

dabeschte avatar

Stargazers

Shenghsin Tai avatar Ali Şahan Yalçın avatar Yang, Zeyu avatar  avatar  avatar tomoyukun avatar RkΩs avatar Hancheng Ye avatar KXzhang avatar Cong Huang USTC avatar ousheng avatar Lu Yan avatar Edmundo Sanz-Gadea avatar GluePudding avatar Tomasz Latkowski avatar Jinwoo Hwang avatar  avatar Kaiwen avatar  avatar  avatar Sami BARCHID avatar  avatar mingyu avatar  avatar RF Liang avatar Karamjot Singh avatar Xiaoyu Xiang avatar Yi (Jerry) Li avatar Avinash Madasu avatar Chinthaka avatar Fatih BAŞATEMUR avatar  avatar Boris Bogaerts avatar Harsh Mishra avatar FifthDimensionDev avatar Lucas Dias Maciel avatar  avatar  avatar Jeffrey Fetzer avatar Heitor Rapela Medeiros avatar Ben Ahlbrand avatar DS.Xu avatar  avatar 爱可可-爱生活 avatar Raghvender avatar  avatar Mumtozbek Akhmadjanov avatar Eason ZHANG avatar snoop2head avatar Hongyeob Kim avatar 胡亮 avatar QiQi avatar Ryuichiro Hataya avatar fwcd avatar  avatar Christoph Reich avatar  avatar

Watchers

 avatar Cami Williams avatar  avatar Arun Sathiya avatar Kostas Georgiou avatar  avatar

deltacnn's Issues

build failure on windows

I try to install DeltaCNN on windows 10. I use Anaconda, python 3.9, pytorch 1.13, cuda 11.7, and vs 2022.

On the first try, I get the following error message:

running install
F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing src\torchdeltacnn.egg-info\PKG-INFO
writing dependency_links to src\torchdeltacnn.egg-info\dependency_links.txt
writing top-level names to src\torchdeltacnn.egg-info\top_level.txt
reading manifest file 'src\torchdeltacnn.egg-info\SOURCES.txt'
adding license file 'LICENSE.txt'
writing manifest file 'src\torchdeltacnn.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
creating build\lib.win-amd64-cpython-39
creating build\lib.win-amd64-cpython-39\deltacnn
copying src\deltacnn\cuda_kernels.py -> build\lib.win-amd64-cpython-39\deltacnn
copying src\deltacnn\filter_conversion.py -> build\lib.win-amd64-cpython-39\deltacnn
copying src\deltacnn\logging_layers.py -> build\lib.win-amd64-cpython-39\deltacnn
copying src\deltacnn\sparse_layers.py -> build\lib.win-amd64-cpython-39\deltacnn
copying src\deltacnn\utils.py -> build\lib.win-amd64-cpython-39\deltacnn
copying src\deltacnn\__init__.py -> build\lib.win-amd64-cpython-39\deltacnn
running build_ext
Traceback (most recent call last):
  File "E:\Code\ppvas\reference\DeltaCNN-main\setup.py", line 44, in <module>
    setup(
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
    return run_commands(dist)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
    dist.run_commands()
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\dist.py", line 1208, in run_command
    super().run_command(command)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\install.py", line 74, in run
    self.do_egg_install()
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\dist.py", line 1208, in run_command
    super().run_command(command)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\dist.py", line 1208, in run_command
    super().run_command(command)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
    self.build()
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 112, in build
    self.run_command('build_ext')
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\dist.py", line 1208, in run_command
    super().run_command(command)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
    _build_ext.run(self)
  File "F:\anaconda3\envs\privacy\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
    self.build_extensions()
  File "F:\anaconda3\envs\privacy\lib\site-packages\torch\utils\cpp_extension.py", line 485, in build_extensions
    compiler_name, compiler_version = self._check_abi()
  File "F:\anaconda3\envs\privacy\lib\site-packages\torch\utils\cpp_extension.py", line 875, in _check_abi
    raise UserWarning(msg)
UserWarning: It seems that the VC environment is activated but DISTUTILS_USE_SDK is not set.This may lead to multiple activations of the VC env.Please set `DISTUTILS_USE_SDK=1` and try again.

Then, I run the command "set DISTUTILS_USE_SDK=1", and give it a second try. I get the following error message:

running install
F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
F:\anaconda3\envs\privacy\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing src\torchdeltacnn.egg-info\PKG-INFO
writing dependency_links to src\torchdeltacnn.egg-info\dependency_links.txt
writing top-level names to src\torchdeltacnn.egg-info\top_level.txt
reading manifest file 'src\torchdeltacnn.egg-info\SOURCES.txt'
adding license file 'LICENSE.txt'
writing manifest file 'src\torchdeltacnn.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
error: [WinError 2] The system cannot find the file specified

It does not specify which file cannot be found.

On Linux, it works. I have no idea how to solve it.

RuntimeError using delta_cudnn mode

Hi! I encountered a runtime error when I tried to run in delta_cudnn mode, as shown below:

Traceback (most recent call last):
  File "/home/efficientdet/validate_dc.py", line 258, in <module>
    main()
  File "/home/efficientdet/validate_dc.py", line 254, in main
    validate(args)
  File "/home/efficientdet/validate_dc.py", line 197, in validate
    class_out_dc, box_out_dc = dc_model(input)
  File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/efficientdet/effdet/efficientdet_deltacnn.py", line 598, in forward
    x = self.sparsify(x) ## added 
  File "/home/liandongze/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/efficientdet/src/deltacnn/sparse_layers.py", line 1260, in forward
    self.prev_in[tiled_mask] = input
RuntimeError: shape mismatch: value tensor of shape [3, 640, 640] cannot be broadcast to indexing result of shape [461448]

The same code has no issue running in cudnn mode and deltacnn mode.
This error seems to occur when it's running on the second frame, any idea on what might be the problem?

Thank you!

Lower speed after using large batch size

Hi, authors! Thanks for your great work! But I have a question about the FPS at a large batch size. We have tested the latency at batchsize=1 on high-end GPUs, whose result is aligned with the reported speedup in Table 1. However, when we increase the batch size to 32 (or smaller, like 4, 16) as Table 1 does, the latency by dense or sparse inference is larger than cuDNN, which is against the reported results in Table 1. And the memory overhead is much larger than cuDNN. The experiment is conducted with YOLOv5s on MOT16, tested on a Tesla V100 GPU. The input size is set as (1088, 608), and we also tested the input size of (640, 640), whose result is similar. I'd appreciate it greatly if you could give some explanations!

No module named 'deltacnn.cuda'

Hello. Thanks for your great work. When I was running your example, an error occurs to me:
No module named 'deltacnn.cuda'
I successfully built the module using
python setup.py install --user
It can find package deltacnn, but it cannot find the cuda submodule.
How could I fix it?

Issue with threshold tuning

Hi, I am back again :)

I am trying to do some threshold tuning for my model, but encountered the problem that DCThreshold.t is empty after running the model. It's odd because I have checked that is it running in deltacnn mode, so it should at least have the layers with default threshold 0 right?
Any idea what could be the problem? Or suggestion on how I should debug the issue? (other checks?)

Thank you!

Creating custom layer

Hi! Back with more questions again :)

  1. I am attempting to write the upsamplebilinear layer, and several 3D layers, so am trying to understand the CUDA codes. Starting with the easier one, I am looking at DCUpsampingNearest2d first, and specifically this kernel function:

    __global__ void deltacnn_sparse_upsample_kernel(const scalar_t * __restrict__ val_in, scalar_t * __restrict__ val_out, const uint32_t *mask_in, uint32_t *mask_out, Dimensions dim) {

    The configuration of the kernel seem to be numBlocks=(B*W*H)/2, and threadsPerBlock=2, so (B*W*H)/2 number of pixels for the input will be processed in parallel? Each pixel will be assigned to one thread, so accessing corresponding mask values or val_in/val_out values for that pixel will need to index according to the global thread index?

  2. I have not fully understood the kernel code yet, but it means that the upsample layer is not being processed in tiles? So only the convolution layer is processed in tiles, and nonlinear layers are processed pixel-by-pixel?

  3. For very sparse processing mode (in the case of convolution), all filter weights will have to be loaded just like the dense processing mode, just that they are only used for computation on an array of active pixels instead of the entire tile, is that right?

THANK YOU!!

Error when running setup script

Dear,

When attempting to install DeltaCNN via the setup.py, the following error occurs:

"ModuleNotFoundError: No module named 'deltacnn.cuda'"

Both on Linux and Windows this error occurs. Al packages were installed as required but the error persisted.
Could you please help us?

With kind regards,

Ruben

Clarification needed on dense acumulation

In section 2.1 :
"The dense results are only accumulated at the final layer. This way, we also accelerate non-convolutional layers like pooling, up-sampling and activations."

This seems contrary to Figure 2 and point just next to it:

"Furthermore, we avoid switching between sparse and dense computations and only need to cache accumulated values for nonlinear layers, reducing the number of caches, without loss in accuracy."

In figure 2 we can see "Dense Accumulated Values" for "Sparse Activation", "Sparse Pooling" apart from "Output Accumulation".

Can you please elaborate? Is dense accumulation happening at non-linear layers as well? Thanks.

General questions regarding the paper

Hi, I am back with more questions :)

  1. In Section 2.2, it was stated that “we use a spatial (per pixel) sparsity, since structured sparsity better suits SIMD architectures than per value sparsity”, what does “per value sparsity” refer to here?

  2. In Section 3.2, it was stated that a benefit of using tiled convolutions is that “input features and filter parameters can be kept local and reused multiple times”, could you elaborate more on this?

  3. In Section 3.3, it was stated that “we also need to limit the increase in accuracy when tuning for thresholds to avoid overfitting n the small subset of the training set”, why would there be increase in accuracy when increasing threshold would result in increase of loss?

  4. In the supplementary material, Section 1.2, it was stated that “This mode loads only pixels of the filter weights that are required and iterates over an array of active pixels contrary to iterating over all pixels and checking the update flag”, could you elaborate more on this? For very sparse mode, if checking of update flag is not needed, how does it know which pixel is updated?

  5. In the same section, regarding depth-wise convolutions, it was stated that “input values are used less often and only by a single thread, removing the necessity of keeping data local to a CTA”, what does it mean by keeping data local to a CTA?
    Depth-wise convolutions use per-pixel sparse operation, could you elaborate more on this? So it decides pixel by pixel if update is needed, if it is needed, then convolution is performed on this one pixel?

  6. Regarding results in Table 2, for RTX 3090 b=1, are the results obtained using a single GPU?
    I have also observed that, for MOT16-d1, GTX 1050 b=1, the difference between FPS of cuDNN and ‘ours dense’ is 10.5 vs 11.6 (a rather small difference), but a large difference when threshold is tuned, which is ‘ours sparse’, FPS of 27.4. However, for RTX 3090 b=1, the different between cuDNN and ‘ours dense’ is rather large, 23.3 vs 45.0, but further tuning threshold only increases FPS to 45.2. Is there any reasons for such observations? To clarify, for ‘ours dense’ and ‘ours sparse’, both are performed in sparse operations, but ‘ours dense’ has negative thresholds so no values are truncated, while ‘ours sparse’ has tuned thresholds, is that right?

Thank you so much!

ImportError: /home/.local/lib/python3.9/site-packages/torchdeltacnn-0.0.0-py3.9-linux-x86_64.egg/deltacnn/cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor6deviceEv

Hi, I am trying to use the mobilenetv2_deltacnn provided to do some inferencing, but encountered the following error:

Traceback (most recent call last):
File "/home/example/mobilenetv2_inference.py", line 48, in
from deltacnn.sparse_layers import DCBackend, DCConv2d, DCThreshold
File "/home/.local/lib/python3.9/site-packages/torchdeltacnn-0.0.0-py3.9-linux-x86_64.egg/deltacnn/init.py", line 6, in
from .cuda_kernels import sparse_conv, sparse_deconv, sparse_pooling
File "/home/.local/lib/python3.9/site-packages/torchdeltacnn-0.0.0-py3.9-linux-x86_64.egg/deltacnn/cuda_kernels.py", line 7, in
from deltacnn.cuda import sparse_conv_bias_wrapper_masked, sparse_deconv_bias_wrapper_masked
ImportError: /home/.local/lib/python3.9/site-packages/torchdeltacnn-0.0.0-py3.9-linux-x86_64.egg/deltacnn/cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor6deviceEv

I am using PyTorch 1.11.0 and CUDA 11.3.
Could you please advice on how to resolve this? Thank you!

About (pretrained) HRNet and Pose-ResNet

Hello.
Thank you for your excellent paper and implementation.
Now that you have released your MobileNetV2 implementation, do you have any plans to release any of the (trained) HR-Net, Pose-ResNet, or EfficientDet used in the experiments in the paper?

Issues when adding layers not included in DeltaCNN

Hi, I am trying to add the layer BatchNormAct2d, as shown below, into a DeltaCNN model:

class BatchNormAct2d(nn.BatchNorm2d):
    def __init__(
            self,
            num_features,
            eps=1e-5,
            momentum=0.1,
            affine=True,
            track_running_stats=True,
            apply_act=True,
            act_layer=nn.ReLU,
            inplace=True,
            drop_layer=None,
            device=None,
            dtype=None
    ):
        factory_kwargs = {'device': device, 'dtype': dtype}
        super(BatchNormAct2d, self).__init__(
            num_features, eps=eps, momentum=momentum, affine=affine, track_running_stats=track_running_stats,
            **factory_kwargs
        )

        self.drop = nn.Identity()
        if act_layer is not None and apply_act:
            act_args = dict(inplace=True) if inplace else {}
            self.act = act_layer(**act_args)
        else:
            self.act = nn.Identity()

    def forward(self, x):
        _assert(x.ndim == 4, f'expected 4D input (got {x.ndim}D input)')

        if self.momentum is None:
            exponential_average_factor = 0.0
        else:
            exponential_average_factor = self.momentum

        if self.training and self.track_running_stats:
            if self.num_batches_tracked is not None:  
                self.num_batches_tracked = self.num_batches_tracked + 1  
                if self.momentum is None:  
                    exponential_average_factor = 1.0 / float(self.num_batches_tracked)
                else: 
                    exponential_average_factor = self.momentum

        if self.training:
            bn_training = True
        else:
            bn_training = (self.running_mean is None) and (self.running_var is None)

        x = F.batch_norm(
            x,
            self.running_mean if not self.training or self.track_running_stats else None,
            self.running_var if not self.training or self.track_running_stats else None,
            self.weight,
            self.bias,
            bn_training,
            exponential_average_factor,
            self.eps,
        )
        x = self.drop(x)
        x = self.act(x)
        return x

My attempt to modify it is as shown below:

class DCIdentity(DCModule):
    def __init__(self) -> None:
        super(DCIdentity, self).__init__()

    def forward(self, x):
        return x

class DCBatchNormAct2d(DCBatchNorm2d): 
    def __init__(
            self,
            num_features,
            eps=1e-5,
            momentum=0.1,
            affine=True,
            track_running_stats=True,
            apply_act=True,
            act_layer="relu",
            inplace=True,
            drop_layer=None,
            device=None,
            dtype=None
    ):
        factory_kwargs = {'device': device, 'dtype': dtype}
        super(DCBatchNormAct2d, self).__init__(
            num_features, eps=eps, momentum=momentum, affine=affine, track_running_stats=track_running_stats,
            **factory_kwargs
        )
        self.densify = DCDensify()  
        self.drop = DCIdentity()  
        if act_layer is not None and apply_act:
            self.act = DCActivation(act_layer, inplace)
        else:
            self.act = DCIdentity() 
        self.sparsify = DCSparsify() 

    def forward(self, x):
        x = self.densify(x)  
        _assert(x.ndim == 4, f'expected 4D input (got {x.ndim}D input)')

        if self.momentum is None:
            exponential_average_factor = 0.0
        else:
            exponential_average_factor = self.momentum

        bn_training = (self.running_mean is None) and (self.running_var is None)

        x = F.batch_norm(  
            x,
            self.running_mean if not self.training or self.track_running_stats else None,
            self.running_var if not self.training or self.track_running_stats else None,
            self.weight,
            self.bias,
            bn_training,
            exponential_average_factor,
            self.eps,
        )
        x = self.sparsify(x)  
        x = self.drop(x)
        x = self.act(x)
        return x  

However, I have encountered the following CUDA error (the class DCBatchNormAct2d is in norm_act_dc.py):

Traceback (most recent call last):
  File "/home/efficientdet/validate_deltacnn.py", line 251, in <module>
    main()
  File "/home/efficientdet/validate_deltacnn.py", line 247, in main
    validate(args)
  File "/home/efficientdet/validate_deltacnn.py", line 217, in validate
    output_dc = dc_model(input, img_info=target)  
  File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/efficientdet/effdet/bench_deltacnn.py", line 104, in forward
    class_out, box_out = self.model(x)
  File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/efficientdet/effdet/efficientdet_deltacnn.py", line 577, in forward
    x = self.backbone(x)
  File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/efficientdet/timm/models/efficientnet_deltacnn.py", line 240, in forward
    x = b(x)
  File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/efficientdet/timm/models/efficientnet_blocks_dc.py", line 148, in forward
    x = self.bn2(x)
  File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/efficientdet/timm/models/layers/norm_act_dc.py", line 72, in forward
    x = self.densify(x)  
  File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/efficientdet/src/deltacnn/sparse_layers.py", line 1308, in forward
    self.prev_out = torch.zeros_like(input)
RuntimeError: CUDA error: an illegal memory access was encountered

I would like to seek advice on whether my modification to the layer is correct? Thank you so much!

memory requirement for DeltaCNN

Hi! I am currently trying to find out the additional memory requirement needed for DeltaCNN.

Just to clarify so that I am going in the right direction. The extra memory would be from the single channel masks for every layer in the model (which are propagated together with the output feature maps), as well as the buffers for the nonlinear layers. The buffers would be the same size as the feature maps in the respective layer. Nonlinear layers like sparsify, densify, pooling require one extra buffer, whereas activation requires two extra buffers.
So for instance, if the input is (1,3,640,640), then sparsify layer would require a buffer storing this same shape of (1,3,640,640) is that right? So the extra parameters stored for the first sparsify layer alone is already 1,228,800 pixels?

Thank you!

RuntimeError: Kernel sizes other than 7x7, 5x5, 3x3 and 1x1 not supported. got 1x3

Hi, thank you so much for sharing the great work!

I have been trying to implement the training code for efficientdet with DeltaCNN. However, I have encountered this following RuntimeError as shown:

Kernel sizes other than 7x7, 5x5, 3x3 and 1x1 not supported. got 1x3
Traceback (most recent call last):
File "/home/efficientdet/train_deltacnn.py", line 669, in
main()
File "/home/efficientdet/train_deltacnn.py", line 417, in main
train_metrics = train_epoch(
File "/home/efficientdet/train_deltacnn.py", line 552, in train_epoch
output = model(input, target)
File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/efficientdet/effdet/bench_deltacnn.py", line 136, in forward
class_out, box_out = self.model(x)
File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/efficientdet/effdet/efficientdet_deltacnn.py", line 577, in forward
x = self.backbone(x)
File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/efficientdet/timm/models/efficientnet_deltacnn.py", line 620, in forward
x = b(x)
File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/efficientdet/timm/models/efficientnet_blocks_dc.py", line 143, in forward
x = self.conv_dw(x)
File "/home/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/efficientdet/src/deltacnn/sparse_layers.py", line 448, in forward
return self._forward_delta_conv(input, first_iter)
File "/home/efficientdet/src/deltacnn/sparse_layers.py", line 613, in _forward_delta_conv
out = sparse_conv(
File "/home/efficientdet/src/deltacnn/cuda_kernels.py", line 40, in sparse_conv
sparse_conv_bias_wrapper_masked(x, filter, bias, out, mask, out_mask, stride, padding, dilation, groups, pad_mode_int, sub_tile_sparsity)
RuntimeError: Caught an unknown exception!
Killing subprocess 113535
Traceback (most recent call last):
File "/home/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/home/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/anaconda3/bin/python', '-u', 'train_deltacnn.py', '--local_rank=0', '/home/datasets/unzipped/mot16', '--dataset', 'mot', '--model', 'efficientdet_d0', '--num-classes', '12', '-b', '4', '--lr', '.09', '--warmup-epochs', '5', '--amp', '--sync-bn', '--model-ema']' returned non-zero exit status 1.

I believed the line that caused the error is as followed, when depthwise convolution is used:
self.conv_dw = DCConv2d(32, 32, 3, stride=1, dilation=1, padding_mode='zeros', groups=32)

Could you please advice me on what might have caused this error?
Thank you!

About resnet50

Hello. Would you release the resenet50 inference demo?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.