Giter Club home page Giter Club logo

torch-pruning's Introduction


Towards Any Structural Pruning

Test Status Tested PyTorch Versions License Downloads Latest Version Open In Colab arXiv

[Documentation & Tutorials] [FAQ]

Torch-Pruning (TP) is a library for structural pruning with the following features:

For more technical details, please refer to our CVPR'23 paper:

DepGraph: Towards Any Structural Pruning
Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, Xinchao Wang
Learning and Vision Lab, National University of Singapore

Update:

Features:

Contact Us:

Please do not hesitate to open an issue if you encounter any problems with the library or the paper.
Or Join our Discord or WeChat group for a chat:

Table of Contents

Installation

Torch-Pruning is compatible with both PyTorch 1.x and 2.x versions. However, it is highly recommended to use PyTorch 2.0.

pip install torch-pruning 

or

git clone https://github.com/VainF/Torch-Pruning.git

Quickstart

Here we provide a quick start for Torch-Pruning. More explained details can be found in Tutorals

How It Works

In structural pruning, a "Group" is defined as the minimal removable unit within deep networks. Most groups are composed of multiple layers that are interdependent and need to be pruned together in order to maintain the integrity of the resulting structures. However, deep networks often have complex dependencies among their layers, making structural pruning a challenging task. This work addresses this challenge by introducing an automated mechanism called "DepGraph." DepGraph allows for seamless parameter grouping and facilitates pruning in various types of deep networks.

A Minimal Example of DepGraph

Please ensure that your model is set up to enable AutoGrad without torch.no_grad or .requires_grad=False.

import torch
from torchvision.models import resnet18
import torch_pruning as tp

model = resnet18(pretrained=True).eval()

# 1. Build dependency graph for resnet18
DG = tp.DependencyGraph().build_dependency(model, example_inputs=torch.randn(1,3,224,224))

# 2. Group coupled layers for model.conv1
group = DG.get_pruning_group( model.conv1, tp.prune_conv_out_channels, idxs=[2, 6, 9] )

# 3. Prune grouped layers altogether
if DG.check_pruning_group(group): # avoid full pruning, i.e., channels=0.
    group.prune()
    
# 4. Save & Load
model.zero_grad() # clear gradients
torch.save(model, 'model.pth') # We can not use .state_dict as the model structure is changed.
model = torch.load('model.pth') # load the pruned model

The above example demonstrates the basic pruning pipeline with DepGraph. The target layer resnet.conv1 is coupled with multiple layers, necessitating their simultaneous removal during structural pruning. To observe the cascading effect of pruning operations, we can print the groups and observe how one pruning operation can "trigger" others. In the subsequent outputs, "A => B" indicates that pruning operation "A" triggers pruning operation "B." The group[0] refers to the pruning root in DG.get_pruning_group. For more details about grouping, please refer to Wiki - DepGraph & Group.

--------------------------------
          Pruning Group
--------------------------------
[0] prune_out_channels on conv1 (Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)) => prune_out_channels on conv1 (Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)), idxs=[2, 6, 9] (Pruning Root)
[1] prune_out_channels on conv1 (Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)) => prune_out_channels on bn1 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), idxs=[2, 6, 9]
[2] prune_out_channels on bn1 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on _ElementWiseOp_20(ReluBackward0), idxs=[2, 6, 9]
[3] prune_out_channels on _ElementWiseOp_20(ReluBackward0) => prune_out_channels on _ElementWiseOp_19(MaxPool2DWithIndicesBackward0), idxs=[2, 6, 9]
[4] prune_out_channels on _ElementWiseOp_19(MaxPool2DWithIndicesBackward0) => prune_out_channels on _ElementWiseOp_18(AddBackward0), idxs=[2, 6, 9]
[5] prune_out_channels on _ElementWiseOp_19(MaxPool2DWithIndicesBackward0) => prune_in_channels on layer1.0.conv1 (Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
[6] prune_out_channels on _ElementWiseOp_18(AddBackward0) => prune_out_channels on layer1.0.bn2 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), idxs=[2, 6, 9]
[7] prune_out_channels on _ElementWiseOp_18(AddBackward0) => prune_out_channels on _ElementWiseOp_17(ReluBackward0), idxs=[2, 6, 9]
[8] prune_out_channels on _ElementWiseOp_17(ReluBackward0) => prune_out_channels on _ElementWiseOp_16(AddBackward0), idxs=[2, 6, 9]
[9] prune_out_channels on _ElementWiseOp_17(ReluBackward0) => prune_in_channels on layer1.1.conv1 (Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
[10] prune_out_channels on _ElementWiseOp_16(AddBackward0) => prune_out_channels on layer1.1.bn2 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)), idxs=[2, 6, 9]
[11] prune_out_channels on _ElementWiseOp_16(AddBackward0) => prune_out_channels on _ElementWiseOp_15(ReluBackward0), idxs=[2, 6, 9]
[12] prune_out_channels on _ElementWiseOp_15(ReluBackward0) => prune_in_channels on layer2.0.downsample.0 (Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)), idxs=[2, 6, 9]
[13] prune_out_channels on _ElementWiseOp_15(ReluBackward0) => prune_in_channels on layer2.0.conv1 (Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
[14] prune_out_channels on layer1.1.bn2 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on layer1.1.conv2 (Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
[15] prune_out_channels on layer1.0.bn2 (BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) => prune_out_channels on layer1.0.conv2 (Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), idxs=[2, 6, 9]
--------------------------------

How to scan all groups (Advanced):

We can use DG.get_all_groups(ignored_layers, root_module_types) to scan and prune all groups sequentially. Each group will begin with a layer that matches one type in the root_module_types parameter. Note that DG.get_all_groups is only responsible for grouping and does not have any knowledge or understanding of which parameters should be pruned. Therefore, it is necessary to specify the pruning idxs using group.prune(idxs=idxs). This feature is useful when you want to implement your own pruning algorithms.

for group in DG.get_all_groups(ignored_layers=[model.conv1], root_module_types=[nn.Conv2d, nn.Linear]):
    # handle groups in sequential order
    idxs = [2,4,6] # your pruning indices
    group.prune(idxs=idxs)
    print(group)

High-level Pruners

With DepGraph, we developed several high-level pruners in this repository to facilitate effortless pruning. By specifying the desired channel pruning ratio, a pruner will scan all prunable groups, estimate the importance, prune the entire model, and fine-tune it using your own training code. For detailed information on this process, please refer to this tutorial, which shows how to implement a slimming pruner from scratch. Additionally, a more practical example is available in benchmarks/main.py.

import torch
from torchvision.models import resnet18
import torch_pruning as tp

model = resnet18(pretrained=True)
example_inputs = torch.randn(1, 3, 224, 224)

# 1. Importance criterion
imp = tp.importance.GroupTaylorImportance() # or GroupNormImportance(p=2), GroupHessianImportance(), etc.

# 2. Initialize a pruner with the model and the importance criterion
ignored_layers = []
for m in model.modules():
    if isinstance(m, torch.nn.Linear) and m.out_features == 1000:
        ignored_layers.append(m) # DO NOT prune the final classifier!

pruner = tp.pruner.MetaPruner( # We can always choose MetaPruner if sparse training is not required.
    model,
    example_inputs,
    importance=imp,
    pruning_ratio=0.5, # remove 50% channels, ResNet18 = {64, 128, 256, 512} => ResNet18_Half = {32, 64, 128, 256}
    # pruning_ratio_dict = {model.conv1: 0.2, model.layer2: 0.8}, # customized pruning ratios for layers or blocks
    ignored_layers=ignored_layers,
)

# 3. Prune & finetune the model
base_macs, base_nparams = tp.utils.count_ops_and_params(model, example_inputs)
if isinstance(imp, tp.importance.GroupTaylorImportance):
    # Taylor expansion requires gradients for importance estimation
    loss = model(example_inputs).sum() # A dummy loss, please replace this line with your loss function and data!
    loss.backward() # before pruner.step()

pruner.step()
macs, nparams = tp.utils.count_ops_and_params(model, example_inputs)
# finetune the pruned model here
# finetune(model)
# ...

Global Pruning

With the option of global pruning (global_pruning=True), adaptive sparsity will be allocated to different layers based on their global rank of importance. While this strategy can offer performance advantages, it also carries the potential of overly pruning specific layers, resulting in a substantial decline in overall performance. If you're not very familiar with pruning, it's recommended to begin with global_pruning=False.

Sparse Training

Some pruners like BNScalePruner and GroupNormPruner support sparse training. This can be easily achieved by inserting pruner.update_regularizer() before training, and pruner.regularize(model) between loss.backward() and optimizer.step(). The pruner will accumulate the regularization gradients to .grad.

for epoch in range(epochs):
    model.train()
    pruner.update_regularizer() # <== initialize regularizer
    for i, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        out = model(data)
        loss = F.cross_entropy(out, target)
        loss.backward() # after loss.backward()
        pruner.regularize(model) # <== for sparse training
        optimizer.step() # before optimizer.step()

Interactive Pruning

All high-level pruners offer support for interactive pruning. You can utilize the method pruner.step(interactive=True) to retrieve all the groups and interactively prune them by calling group.prune(). This feature is particularly useful when you want to have control over or monitor the pruning process.

for i in range(iterative_steps):
    for group in pruner.step(interactive=True): # Warning: groups must be handled sequentially. Do not keep them as a list.
        print(group) 
        # do whatever you like with the group 
        dep, idxs = group[0] # get the idxs
        target_module = dep.target.module # get the root module
        pruning_fn = dep.handler # get the pruning function
        group.prune()
        # group.prune(idxs=[0, 2, 6]) # It is even possible to change the pruning behaviour with the idxs parameter
    macs, nparams = tp.utils.count_ops_and_params(model, example_inputs)
    # finetune your model here
    # finetune(model)
    # ...

Soft Pruning

It is easy to implement Soft Pruning leveraging interactive=True, which zeros out parameters without removing them. An example can be found in tests/test_soft_pruning.py

Group-level Pruning

With DepGraph, it is easy to design some "group-level" criteria to estimate the importance of a whole group rather than a single layer. This feature can be also used to sparsify coupled layers, making all the to-be-pruned parameters consistently sparse. In Torch-pruning, all pruners work at the group level. Check the following results to see how grouping improves the performance of pruning.

  • Pruning a ResNet50 pre-trained on ImageNet-1K without fine-tuning.
  • Pruning a Vision Transformer pre-trained on ImageNet-1K without fine-tuning.

Modify module attributes or forward function

In some implementation, model forward might rely on some static attributes. For example in convformer_s18 of timm, we have:

class Scale(nn.Module):
    """
    Scale vector by element multiplications.
    """

    def __init__(self, dim, init_value=1.0, trainable=True, use_nchw=True):
        super().__init__()
        self.shape = (dim, 1, 1) if use_nchw else (dim,) # static shape, which should be updated after pruning
        self.scale = nn.Parameter(init_value * torch.ones(dim), requires_grad=trainable)

    def forward(self, x):
        return x * self.scale.view(self.shape) # => x * self.scale.view(-1, 1, 1), this works for pruning

where the forward function relies on self.shape during forwarding. But, the true self.shape changed after pruning, which should be manually updated.

Save and Load

Method 1:

The following script saves the whole model object (structure+weights) as a 'model.pth'.

model.zero_grad() # Remove gradients
torch.save(model, 'model.pth') # without .state_dict
model = torch.load('model.pth') # load the pruned model

Method 2 (Experimental Features):

Re-create pruned models from unpruned ones using tp.state_dict and tp.load_state_dict.

# save the pruned state_dict, which includes both pruned parameters and modified attributes
state_dict = tp.state_dict(pruned_model) # the pruned model, e.g., a resnet-18-half
torch.save(state_dict, 'pruned.pth')

# create a new model, e.g. resnet18
new_model = resnet18().eval()

# load the pruned state_dict into the unpruned model.
loaded_state_dict = torch.load('pruned.pth', map_location='cpu')
tp.load_state_dict(new_model, state_dict=loaded_state_dict)

Refer to tests/test_serialization.py for an ViT example. In this example, we will prune the model and modify some attributes like model.hidden_dims.

Low-level Pruning Functions

Although it is possible to manually prune your model using low-level functions, this approach can be cumbersome and time-consuming due to the need for meticulous management of dependencies. Therefore, we strongly recommend utilizing the high-level pruners mentioned earlier to streamline and simplify the pruning process. These pruners provide a more convenient and efficient way to perform pruning on your models. To manually prune the model.conv1 of a ResNet-18, the pruning pipeline should look like this:

tp.prune_conv_out_channels( model.conv1, idxs=[2,6,9] )

# fix the broken dependencies manually
tp.prune_batchnorm_out_channels( model.bn1, idxs=[2,6,9] )
tp.prune_conv_in_channels( model.layer2[0].conv1, idxs=[2,6,9] )
...

The following pruning functions are available:

'prune_conv_out_channels',
'prune_conv_in_channels',
'prune_depthwise_conv_out_channels',
'prune_depthwise_conv_in_channels',
'prune_batchnorm_out_channels',
'prune_batchnorm_in_channels',
'prune_linear_out_channels',
'prune_linear_in_channels',
'prune_prelu_out_channels',
'prune_prelu_in_channels',
'prune_layernorm_out_channels',
'prune_layernorm_in_channels',
'prune_embedding_out_channels',
'prune_embedding_in_channels',
'prune_parameter_out_channels',
'prune_parameter_in_channels',
'prune_multihead_attention_out_channels',
'prune_multihead_attention_in_channels',
'prune_groupnorm_out_channels',
'prune_groupnorm_in_channels',
'prune_instancenorm_out_channels',
'prune_instancenorm_in_channels',

Customized Layers

Please refer to examples/transformers/prune_hf_swin.py, which implements a new pruner for the customized module SwinPatchMerging. A more simple example is available at tests/test_customized_layer.py.

Benchmarks

Our results on {ResNet-56 / CIFAR-10 / 2.00x}

Method Base (%) Pruned (%) $\Delta$ Acc (%) Speed Up
NIPS [1] - - -0.03 1.76x
Geometric [2] 93.59 93.26 -0.33 1.70x
Polar [3] 93.80 93.83 +0.03 1.88x
CP [4] 92.80 91.80 -1.00 2.00x
AMC [5] 92.80 91.90 -0.90 2.00x
HRank [6] 93.26 92.17 -0.09 2.00x
SFP [7] 93.59 93.36 +0.23 2.11x
ResRep [8] 93.71 93.71 +0.00 2.12x
Ours-L1 93.53 92.93 -0.60 2.12x
Ours-BN 93.53 93.29 -0.24 2.12x
Ours-Group 93.53 93.77 +0.38 2.13x

Latency

Latency test on ResNet-50, Batch Size=64.

[Iter 0]        Pruning ratio: 0.00,         MACs: 4.12 G,   Params: 25.56 M,        Latency: 45.22 ms +- 0.03 ms
[Iter 1]        Pruning ratio: 0.05,         MACs: 3.68 G,   Params: 22.97 M,        Latency: 46.53 ms +- 0.06 ms
[Iter 2]        Pruning ratio: 0.10,         MACs: 3.31 G,   Params: 20.63 M,        Latency: 43.85 ms +- 0.08 ms
[Iter 3]        Pruning ratio: 0.15,         MACs: 2.97 G,   Params: 18.36 M,        Latency: 41.22 ms +- 0.10 ms
[Iter 4]        Pruning ratio: 0.20,         MACs: 2.63 G,   Params: 16.27 M,        Latency: 39.28 ms +- 0.20 ms
[Iter 5]        Pruning ratio: 0.25,         MACs: 2.35 G,   Params: 14.39 M,        Latency: 34.60 ms +- 0.19 ms
[Iter 6]        Pruning ratio: 0.30,         MACs: 2.02 G,   Params: 12.46 M,        Latency: 33.38 ms +- 0.27 ms
[Iter 7]        Pruning ratio: 0.35,         MACs: 1.74 G,   Params: 10.75 M,        Latency: 31.46 ms +- 0.20 ms
[Iter 8]        Pruning ratio: 0.40,         MACs: 1.50 G,   Params: 9.14 M,         Latency: 29.04 ms +- 0.19 ms
[Iter 9]        Pruning ratio: 0.45,         MACs: 1.26 G,   Params: 7.68 M,         Latency: 27.47 ms +- 0.28 ms
[Iter 10]       Pruning ratio: 0.50,         MACs: 1.07 G,   Params: 6.41 M,         Latency: 20.68 ms +- 0.13 ms
[Iter 11]       Pruning ratio: 0.55,         MACs: 0.85 G,   Params: 5.14 M,         Latency: 20.48 ms +- 0.21 ms
[Iter 12]       Pruning ratio: 0.60,         MACs: 0.67 G,   Params: 4.07 M,         Latency: 18.12 ms +- 0.15 ms
[Iter 13]       Pruning ratio: 0.65,         MACs: 0.53 G,   Params: 3.10 M,         Latency: 15.19 ms +- 0.01 ms
[Iter 14]       Pruning ratio: 0.70,         MACs: 0.39 G,   Params: 2.28 M,         Latency: 13.47 ms +- 0.01 ms
[Iter 15]       Pruning ratio: 0.75,         MACs: 0.29 G,   Params: 1.61 M,         Latency: 10.07 ms +- 0.01 ms
[Iter 16]       Pruning ratio: 0.80,         MACs: 0.18 G,   Params: 1.01 M,         Latency: 8.96 ms +- 0.02 ms
[Iter 17]       Pruning ratio: 0.85,         MACs: 0.10 G,   Params: 0.57 M,         Latency: 7.03 ms +- 0.04 ms
[Iter 18]       Pruning ratio: 0.90,         MACs: 0.05 G,   Params: 0.25 M,         Latency: 5.81 ms +- 0.03 ms
[Iter 19]       Pruning ratio: 0.95,         MACs: 0.01 G,   Params: 0.06 M,         Latency: 5.70 ms +- 0.03 ms
[Iter 20]       Pruning ratio: 1.00,         MACs: 0.01 G,   Params: 0.06 M,         Latency: 5.71 ms +- 0.03 ms

Please refer to benchmarks for more details.

Series of Works

DepGraph: Towards Any Structural Pruning [Project] [Paper]
Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, Xinchao Wang
CVPR 2023

LLM-Pruner: On the Structural Pruning of Large Language Models [Project] [arXiv]
Xinyin Ma, Gongfan Fang, Xinchao Wang
NeurIPS 2023

Structural Pruning for Diffusion Models [Project] [arxiv]
Gongfan Fang, Xinyin Ma, Xinchao Wang
NeurIPS 2023

DeepCache: Accelerating Diffusion Models for Free [Project] [Arxiv]
Xinyin Ma, Gongfan Fang, and Xinchao Wang
CVPR 2024

0.1% Data Makes Segment Anything Slim [Project] [Arxiv]
Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang
Preprint 2023

Citation

@inproceedings{fang2023depgraph,
  title={Depgraph: Towards any structural pruning},
  author={Fang, Gongfan and Ma, Xinyin and Song, Mingli and Mi, Michael Bi and Wang, Xinchao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16091--16101},
  year={2023}
}

torch-pruning's People

Contributors

eltociear avatar flyzxm5177 avatar ghimiredhikura avatar hollylee2000 avatar horseee avatar hovavalon avatar hyunseok-kim0 avatar jonnykong avatar miocio-nora avatar nus-lv-admin avatar pleb631 avatar serjio42 avatar trouble404 avatar vainf avatar xiwuchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torch-pruning's Issues

Keyerror: MkldnnConvolutionBackward with 1D Layers

Hello,

I encountered an issue when trying to use pruning.DependencyGraph function with a CNN that has Conv1d, BatchNorm1d and F.avg_pool1d.

CNN Class

class Conv1DNet(nn.Module):
    def __init__(self, num_classes=10):
        super(Conv1DNet, self).__init__()
 
        self.conv1 = nn.Conv1d(1, 64, kernel_size=3, stride=2, padding=1, bias=False)
        self.bn1 = nn.BatchNorm1d(64)
        self.linear = nn.Linear(512, num_classes)
 
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.avg_pool1d(out, 2*16*281)
        feature = out.view(out.size(0), -1)
        out = self.linear(feature)
        return out

Code run

model = Conv1DNet()

DG = pruning.DependencyGraph(model, fake_input=torch.randn(1,1, 144000))

pruning_plan = DG.get_pruning_plan(model.conv1, pruning.prune_conv, idxs=[2, 6, 9] )
print(pruning_plan)

pruning_plan.exec()

Output

Traceback (most recent call last):
  File "issue_torch_pruning.py", line 27, in <module>
    DG = pruning.DependencyGraph(model, fake_input=torch.randn(1,1, 144000) )
  File "C:\Users\User\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 186, in __init__
    self.build_dependency(model, fake_input)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 321, in build_dependency
    self._traverse_graph(out.grad_fn)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 437, in _traverse_graph
    _recursively_detect_dependencies(begin_node, 0)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 434, in _recursively_detect_dependencies
    _recursively_detect_dependencies(u[0], path_id)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 434, in _recursively_detect_dependencies
    _recursively_detect_dependencies(u[0], path_id)
  File "C:\Users\User1\Anaconda3\envs\env-gpu\lib\site-packages\torch_pruning\dependency.py", line 407, in _recursively_detect_dependencies
    node_module = self.grad_fn_to_module[node]
KeyError: <MkldnnConvolutionBackward object at 0x000001A1AE08B448>

Thanks for your help !

Functionality to add rounding of filters number for pruning

Hi. I think it will be useful to add functionality to round number of pruned channels to provided number (32 or 16, for example). I've made it locally in prune/strategy.py script. It really accelerate inference speed!
If it can be useful to others, I can try to make pull request with this functionality this week. Any thoughts?

problem in assigning different pruning indices to different layers in RESNET-56

I was trying to prune Resnet56

Code given below for the model

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import init
import math

class DownsampleA(nn.Module):  

  def __init__(self, nIn, nOut, stride):
    super(DownsampleA, self).__init__() 
    self.avg = nn.AvgPool2d(kernel_size=1, stride=stride)   

  def forward(self, x):   
    x = self.avg(x)  
    return torch.cat((x, x.mul(0)), 1)  

class DownsampleC(nn.Module):     

  def __init__(self, nIn, nOut, stride):
    super(DownsampleC, self).__init__()
    assert stride != 1 or nIn != nOut
    self.conv = nn.Conv2d(nIn, nOut, kernel_size=1, stride=stride, padding=0, bias=False)

  def forward(self, x):
    x = self.conv(x)
    return x

class ResNetBasicblock(nn.Module):
  expansion = 1
  """
  RexNet basicblock (https://github.com/facebook/fb.resnet.torch/blob/master/models/resnet.lua)
  """
  def __init__(self, inplanes, planes, stride=1, downsample=None):
    super(ResNetBasicblock, self).__init__()

    self.conv_a = nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
    self.bn_a = nn.BatchNorm2d(planes)

    self.conv_b = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn_b = nn.BatchNorm2d(planes)

    self.downsample = downsample

  def forward(self, x):
    residual = x

    basicblock = self.conv_a(x)
    basicblock = self.bn_a(basicblock)
    basicblock = F.relu(basicblock, inplace=True)

    basicblock = self.conv_b(basicblock)
    basicblock = self.bn_b(basicblock)

    if self.downsample is not None:
      residual = self.downsample(x)
    
    return F.relu(residual + basicblock, inplace=True)

class CifarResNet(nn.Module):
  """
  ResNet optimized for the Cifar dataset, as specified in
  https://arxiv.org/abs/1512.03385.pdf
  """
  def __init__(self, block, depth, num_classes):
    """ Constructor
    Args:
      depth: number of layers.
      num_classes: number of classes
      base_width: base width
    """
    super(CifarResNet, self).__init__()

    #Model type specifies number of layers for CIFAR-10 and CIFAR-100 model
    assert (depth - 2) % 6 == 0, 'depth should be one of 20, 32, 44, 56, 110'
    layer_blocks = (depth - 2) // 6
    print ('CifarResNet : Depth : {} , Layers for each block : {}'.format(depth, layer_blocks))

    self.num_classes = num_classes

    self.conv_1_3x3 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn_1 = nn.BatchNorm2d(16)

    self.inplanes = 16
    self.stage_1 = self._make_layer(block, 16, layer_blocks, 1)
    self.stage_2 = self._make_layer(block, 32, layer_blocks, 2)
    self.stage_3 = self._make_layer(block, 64, layer_blocks, 2)
    self.avgpool = nn.AvgPool2d(8)
    self.classifier = nn.Linear(64*block.expansion, num_classes)

    for m in self.modules():
      if isinstance(m, nn.Conv2d):
        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
        m.weight.data.normal_(0, math.sqrt(2. / n))
        #m.bias.data.zero_()
      elif isinstance(m, nn.BatchNorm2d):
        m.weight.data.fill_(1)
        # m.bias.data.zero_()
      elif isinstance(m, nn.Linear):
        init.kaiming_normal(m.weight)
        # m.bias.data.zero_()

  def _make_layer(self, block, planes, blocks, stride=1):
    downsample = None
    if stride != 1 or self.inplanes != planes * block.expansion:
      downsample = DownsampleA(self.inplanes, planes * block.expansion, stride)

    layers = []
    layers.append(block(self.inplanes, planes, stride, downsample))
    self.inplanes = planes * block.expansion
    for i in range(1, blocks):
      layers.append(block(self.inplanes, planes))

    return nn.Sequential(*layers)

  def forward(self, x):
    x = self.conv_1_3x3(x)
    x = F.relu(self.bn_1(x), inplace=True)
    x = self.stage_1(x)
    x = self.stage_2(x)
    x = self.stage_3(x)
    x = self.avgpool(x)
    x = x.view(x.size(0), -1)
    return self.classifier(x)

def resnet56(num_classes=10):
  """Constructs a ResNet-56 model for CIFAR-10 (by default)
  Args:
    num_classes (uint): number of classes
  """
  model = CifarResNet(ResNetBasicblock, 56, num_classes)
  return model

When we prune the first ever layer of RESNET layer name here being 'conv_1_3x3 ' before the resnet blocks the, As the layer is connected to second Conv layer of every resnet block so they also get pruned with 'conv_1_3x3' but when I try to assign different indices to different conv layers they get assigned the same index

what I mean is
let say
we have conv-layer-2 of resnet block-1 -> LYR_X (name to refer later in description)
and
conv-layer-2 of resnet block-2 -> LYR_Y (name to refer later in description)
also they both are connected with skip connection as this is resnet

I generate pruning plan for pruning 'conv_1_3x3'
let say at indices [2,3,4]

so due to dependency graph the LYR_X and LYR_Y also get assigned the same pruning indices [2,3,4]
BUT
I want to assign different pruning indices to LYR_X and LYR_Y
LYR_X -> [3,5,9]
LYR_Y -> [2,6,8]

Earlier you suggested to manually change the indices

A temporary fix:
You can create a pruning plan, and modify the index of pruning_conv and pruning_related_xxx manually.

so i tried doing this

pruning_plan.plan[0][1][:] = [2,8] # -> conv_1_3x3
pruning_plan.plan[5][1][:] = [3,6] # -> LYR_X 

print(pruning_plan.plan[0], pruning_plan.plan[5])

but for both the layers i was getting [3,6].
instead of getting different indices for different layers

what i have observed is that it assigns the last assigned indices to all the layers.
here its [3,6]

Can you please tell me how can I assign different indices for different layers.

fc_node.inputs[0])

Sometimes I get an error in the form:

File "/workspace/code/Development/Workflow.py", line 843, in run_single_pruning_experiment
amount=pruning_spec["amount"], params= run_params[0])
File "/workspace/code/Development/Workflow.py", line 645, in prune_conv_layers_of_model
DG.build_dependency(model, example_inputs=inp)
File "/opt/conda/lib/python3.7/site-packages/torch_pruning/dependency.py", line 341, in build_dependency
self.update_index()
File "/opt/conda/lib/python3.7/site-packages/torch_pruning/dependency.py", line 512, in update_index
self._set_fc_index_transform( node )
File "/opt/conda/lib/python3.7/site-packages/torch_pruning/dependency.py", line 523, in _set_fc_index_transform
feature_channels = _get_out_channels_of_in_node(fc_node.inputs[0])
IndexError: list index out of range

so I don't understand the reason of the problem, but i don't change my model and sometimes it works and sometimes not. So probably there is a bug in the code. Did someone else face this problem?

[Not an Issue] Thank you

Hi all,

This is not an issue, but a thank you for this amazing project.

I have tested several PyTorch pruning libraries and written my own, and so far, this is the best library that really provides what it promise. A smaller/faster model without too much accuracy loss, even for complicated architectures.

So thank you :)

Feel free to close this issue after you read our appreciation.

Prune Conv to FC bug

Hi @VainF,
After pruning, conv -> linear, shouldn't the shape for the linear be (8,12) instead of (8, 15) ? Here is a minimal working example to reproduce:

import sys, os
sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_pruning as tp

def seed_everything(seed: int):
    import random, os
    import numpy as np
    import torch
    
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    
seed_everything(42)

class NN(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=2, out_channels=4, kernel_size=3)
        self.linear1 = nn.Linear(in_features=16, out_features=8)
        
    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = torch.flatten(x)
        x = self.linear1(x)
        x = F.relu(x)
        return x

x = torch.randn(1,2,4,4)
model = NN()
strategy = tp.strategy.RandomStrategy()
idxs = strategy(model.conv1.weight, amount=0.25)
DG = tp.DependencyGraph()
DG.build_dependency(model, example_inputs=x)

print(model.conv1.weight.shape) # (4, 2, 3, 3)
print(model.linear1.weight.shape) # (8, 16) 

pruning_plan = DG.get_pruning_plan(model.conv1, tp.prune_conv, idxs=idxs)
pruning_plan.exec()

print(model.conv1.weight.shape) # Expected: (3, 2, 3, 3) / Res: (3, 2, 3, 3)
print(model.linear1.weight.shape) # Expected: (8, 12) / Res: (8, 15) 

Are quantized networks supported?

Hi,
I'm curious to know whether the quantized version of networks are supported, as today I tried that and faced this issue :

QuantizedResnet18 took 35.105 ms [min/max: 35.1/35.1] ms for one forward pass!
Size (MB): 22.23 (initial 87.9)
Number of Parameters: 0.0M
normal resnet took 3624.206 ms [min/max: 3624.2/3624.2] ms 
start of pruning...
Traceback (most recent call last):
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 91, in <module>
    model = prune_model(model)
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 76, in prune_model
    prune_conv( m.conv1, block_prune_probs[blk_id] )
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 58, in prune_conv
    weight = conv.weight.detach().cpu().numpy()
AttributeError: 'function' object has no attribute 'detach'

Seens like quantized operators are not supported. Is it true or am I missing sth?
Thanks in advance

Global unstructured pruning

Hi, how can I implement global unstructured pruning using this library? It seems I can only prune individual layers and not the entire model

Thanks

Just specifying first layer in transformer, whole model is getting pruned

By specifying just first layer the pruning tool is pruning whole model. Some code snippect is
pruning_idxs = strategy(model.electra.embeddings.word_embeddings.weight, amount=0.4) pruning_plan = DG.get_pruning_plan(model.electra.embeddings.word_embeddings,tp.prune_embedding, idxs=pruning_idxs)
No idea why its pruning other layers than the specified ones.

How to prune VGGNet like networks which incorporate linear layers?

Hi @VainF,
What's the best way to prune VGGNet like architectures?
I found myself, readding the classifier, as trying to add linear layers to the list of prunable layers will also prune the classifier at the end.
Currently I'm doing :

import torch
import torch.nn as nn
import torch_pruning as tp
import random

def random_prune(model, example_inputs, output_transform):
    model.cpu().eval()
    prunable_module_type = ( nn.Conv2d, nn.BatchNorm2d, nn.Linear )
    prunable_modules = [ m for m in model.modules() if isinstance(m, prunable_module_type) ]
    ori_size = tp.utils.count_params( model )
    DG = tp.DependencyGraph().build_dependency( model, example_inputs=example_inputs, output_transform=output_transform )
    for layer_to_prune in prunable_modules:
        # select a layer
        if isinstance( layer_to_prune, nn.Conv2d ):
            prune_fn = tp.prune_conv
        elif isinstance(layer_to_prune, nn.BatchNorm2d):
            prune_fn = tp.prune_batchnorm
        elif isinstance(layer_to_prune, nn.Linear):
            prune_fn = tp.prune_linear
            
        ch = tp.utils.count_prunable_channels( layer_to_prune )
        rand_idx = random.sample( list(range(ch)), min( ch//2, 10 ) )
        plan = DG.get_pruning_plan( layer_to_prune, prune_fn, rand_idx)
        plan.exec()

    print(model)
    with torch.no_grad():
        out = model( example_inputs )
        if output_transform:
            out = output_transform(out)
        print( "  Params: %s => %s"%( ori_size, tp.utils.count_params(model) ) )
        print( "  Output: ", out.shape )
        print("------------------------------------------------------\n")
    return model

Here is the toy model I'm using at the moment :

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, self.fc1.in_features)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

and This is how its used :

example_inputs = torch.randn(1, 3, 32, 32)
output_transform=None
net2 = random_prune(net, example_inputs=example_inputs, output_transform=output_transform)
net2.fc3 = nn.Linear(net2.fc2.out_features, 10)
print(net2)

Whats the best way of pruning VGG like networks in this case?
Thank you very much in advance

Refactor into PyTorch BasePruningMethod ?

Dear @VainF,

Awesome work with this library.
I wondered if you could refactor your library to match prune API provided by PyTorch.

It would make integration into upstream libraries simpler.

Best,
T.C

Not able to handle Conv-FC Dependency

I used the model
and I trained and pruned all conv layers with prob 0.3
as given in the code for l1 norm

I gave as input same as in the example for resnet
but it is not handling conv-fc dependency

class my_model(nn.Module):
  def __init__(self):
    super(my_model,self).__init__()
    self.conv1 = nn.Conv2d(3,16,kernel_size=3,stride=1,padding=1)
    self.conv2 = nn.Conv2d(16,32,kernel_size=3,stride=1,padding=1)
    self.conv3 = nn.Conv2d(32,64,kernel_size=3,stride=1,padding=1)
    self.pool = nn.MaxPool2d(2, 2)
    self.fc1 = nn.Linear(4*4*64,64)
    self.fc2 = nn.Linear(64,10)
  def forward(self,inp):
    ab = self.pool(F.relu(self.conv1(inp)))
    ab = self.pool(F.relu(self.conv2(ab)))
    ab = self.pool(F.relu(self.conv3(ab)))
    ab = ab.view(ab.shape[0],-1)
    ab = F.relu(self.fc1(ab))
    ab = F.relu(self.fc2(ab))
    return ab
    

I got this warning
Warning: Unrecognized Conv-FC Dependency. Please handle the dependency manually
Warning: Unrecognized Conv-FC Dependency. Please handle the dependency manually

can you please help with this

After pruning, the Resnet18 has negative channel (RuntimeError: Given groups=1, expected weight to be at least 1 at dimension 0, but got weight of size [0, 119, 3, 3] instead)

Hi,
This is a followup to the previous issue #7 which was fixed earlier. However, After the model is successfully pruned (The parameters are down to 4.7M from the initial 21.8M the model fails to do a forward pass and I get this error :

Traceback (most recent call last):
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 67, in <module>
    out = model(img_fake)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "d:\codes\face\python\FV\models.py", line 212, in forward
    x = self.layer4(x)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\container.py", line 117, in forward
    input = module(input)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "d:\codes\face\python\FV\models.py", line 139, in forward
    out = self.conv1(out)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\conv.py", line 419, in forward
    return self._conv_forward(input, self.weight)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch\nn\modules\conv.py", line 416, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, expected weight to be at least 1 at dimension 0, but got weight of size [0, 119, 3, 3] instead

By looking at the model after the pruning I noticed :

      (bn0): BatchNorm2d(119, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(119, -105, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(-105, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(-105, 151, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(151, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(119, 151, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(151, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

The conv2d has negative output/input channels, so does the BatchNorm! and This is causing the issue it seems.

issues in assigning different prune indices to different conv layers

I gave pruning_prob as 0.3 when we prune the second conv layer only or when we prune both first and second conv layer. The dimensions come out to be 32 but instead they should be 45 as int(64*0.3) = 45.

Just to emphasise on the fact I have given
-> The original arch
-> the arch when only pruned the first conv layer in the residual block (This part is fine)
-> the arch when only pruned the second conv layer in the residual block (this is the problem)

Here i have explained only for the first residual block but similarly for further blocks as well

I think it is pruning twice as int(int(64*0.3)*0.3) = 32 which we are getting.
Please resolve the issue

---------------------Before Pruning----------------------------------------

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (linear): Linear(in_features=512, out_features=10, bias=True)
)

--------------------------After Pruning | only first conv layer in residual block------------------------

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 45, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(45, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(45, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 45, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(45, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(45, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 90, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(90, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 90, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(90, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(90, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 180, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(180, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(180, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 180, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(180, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(180, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 359, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(359, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(359, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 359, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(359, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(359, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (linear): Linear(in_features=512, out_features=10, bias=True)
)


----------------- After pruning | only second conv layer in residual block ------------------------

ResNet(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 63, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(63, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(32, 63, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(63, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(63, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 63, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(63, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(63, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(126, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(63, 126, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(126, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(126, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(126, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(126, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 252, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(252, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(126, 252, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(252, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(252, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 252, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(252, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (linear): Linear(in_features=252, out_features=10, bias=True)
)

About L1_ Norm Sort

Thank you for your wonderful work!
It seems that there is no sparse training to determine how to select channels, but simply sort the weight. If the weight has positive and negative, go through the following code:
L1_ norm = np.sum(weight, axis = (1,2,3))
It's also close to 0.

Feature Request - Grouped convolutions

Hi,

@VainF Thank you very much for this project, great work !

I was wondering if you are planning on adding support for conv layer with arbitrary groups parameter (currently there is only support when groups=in_channels=out_channels - known issue in README) ?

Thank you in advance !

Error while trying to prune Lenet-5 architecture with MNIST dataset

DG.build_dependency(model, example_inputs=th.randn(1,1,28,28))
File "/usr/local/lib/python3.6/dist-packages/torch_pruning/dependency.py", line 309, in build_dependency
self.update_index()
File "/usr/local/lib/python3.6/dist-packages/torch_pruning/dependency.py", line 315, in update_index
self._set_fc_index_transform( node )
File "/usr/local/lib/python3.6/dist-packages/torch_pruning/dependency.py", line 437, in _set_fc_index_transform
feature_channels = _get_in_node_out_channels(fc_node.inputs[0])
IndexError: list index out of range

KeyError while attempting to prune yolov5 model

Hello,

I am trying to prune yolov5 model using Torch_Pruning. It fails with error message:
KeyError: Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))

Detailed Traceback:
Traceback (most recent call last):
File "models/prune_TP_git_issue.py", line 117, in
new_model = prune_model(model, img)
File "models/prune_TP_git_issue.py", line 64, in prune_model
prune_conv(mm3, SPARSITY)
File "models/prune_TP_git_issue.py", line 48, in prune_conv
raise e
File "models/prune_TP_git_issue.py", line 46, in prune_conv
plan = DG.get_pruning_plan(conv, tp.prune_conv, prune_index)
File "/home/tfs/venv_yolov5_Torch_Pruning/lib/python3.6/site-packages/torch_pruning-0.2.2-py3.6.egg/torch_pruning/dependency.py", line 330, in get_pruning_plan
KeyError: Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))

Attaching the prune_TP.py code.
(a) save the code under models dir
(b) download pre-trained model and place it under weights dir
(c) invoke the code from the main dir as follows:
$ python models/prune_TP_git_issue.py

Also attaching the logfile from this run. It looks like DG.module_to_node dictionary is not built correctly - it seems to have only 1 entry:
------------ BEGIN : DG.module_to_node ---------------------
{Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1)): <Node: (model.24.m.2 (Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))), None)>}
------------ END : DG.module_to_node ---------------------

prune_TP_git_issue.yolov5.py.zip
log.prune_TP_git_issue.txt

Resnet Pruning Confusion

Could you please tell if we prune Resnet when we prune conv1 in the resnet block and let say we prune indices (1,3,4) in that layer(according to L1 Norm).
Does the layers that are dependent on this layer get pruned at indices (1,3,4) only
OR
Will it apply the L1 Norm again for the layers that are dependent?

KeyError on Simple Model

Hey there! Thanks for the repo and the great work

I tried it with a simple model:

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.conv1 =  nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)

    def forward(self, x):
        print("x: ", x.shape)
        y = self.conv1(x)
        print("y: ", y.shape)
        x = F.relu(y)
        z = self.conv2(x)
        print("z: ", z.shape)
        return x


net = Net()

inn = torch.randn((1,3, 256, 256))
out = net(inn)



import torch_pruning as tp
net.eval().cpu()

input_tensor = inn.clone().cpu()
# 1. setup strategy (L1 Norm)
strategy = tp.strategy.L1Strategy() # or tp.strategy.RandomStrategy()

# 2. build layer dependency for resnet18
DG = tp.DependencyGraph()
DG.build_dependency(net, example_inputs=input_tensor)
print("modules!: ", net.modules())
excluded_layers = [ ] # list(model.model[-1].modules())
num_params_before_pruning = tp.utils.count_params( net )
for m in net.modules():

    if isinstance( m, torch.nn.Conv2d ):
        print("m.groups: ", m.groups)
        if m.groups < 2:
            prune_fn = tp.prune_conv
            idxss = strategy(m.weight, amount=0.1)
            pruning_plan = DG.get_pruning_plan( m, prune_fn, idxs=idxss )
            print(pruning_plan)
            pruning_plan.exec()
    else:
        continue


num_params_after_pruning = tp.utils.count_params( net )
print( "  Params: %s => %s"%( num_params_before_pruning, num_params_after_pruning))

running this code give me the following error:

import torch_pruning as tp
net.eval().cpu()

input_tensor = inn.clone().cpu()
# 1. setup strategy (L1 Norm)
strategy = tp.strategy.L1Strategy() # or tp.strategy.RandomStrategy()

# 2. build layer dependency for resnet18
DG = tp.DependencyGraph()
DG.build_dependency(net, example_inputs=input_tensor)
print("modules!: ", net.modules())
excluded_layers = [ ] # list(model.model[-1].modules())
num_params_before_pruning = tp.utils.count_params( net )
for m in net.modules():

    if isinstance( m, torch.nn.Conv2d ):
        print("m.groups: ", m.groups)
        if m.groups < 2:
            prune_fn = tp.prune_conv
            idxss = strategy(m.weight, amount=0.1)
            pruning_plan = DG.get_pruning_plan( m, prune_fn, idxs=idxss )
            print(pruning_plan)
            pruning_plan.exec()
    else:
        continue


num_params_after_pruning = tp.utils.count_params( net )
print( "  Params: %s => %s"%( num_params_before_pruning, num_params_after_pruning))

Am I doing sth wrong here?

Thanks in advance

About weight mismatch

Hi @VainF ! Thank you for your work, but I have some strange problems:
After pruning on my own model, the weight of the model does not match any more.
(RuntimeError: Given groups=1, weight of size [31, 32, 3, 3], expected input[16, 31, 38, 38] to have 32 channels, but got 31 channels instead)
Therefore, i checked the weight of the model and the results are as follows:
(0): Conv2d(3, 31, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): PReLU(num_parameters=1) (2): Conv2d(32, 31, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): PReLU(num_parameters=1) (4): Conv2d(32, 31, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): PReLU(num_parameters=1) (6): Conv2d(32, 31, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): PReLU(num_parameters=1) (8): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): PReLU(num_parameters=1)

The code I used is as follows:

def pruning():
    Model_saved = './Weight/'
    model = torch.load(Model_saved + 'Model.pth').cpu()

    DG = tp.DependencyGraph()
    DG.build_dependency(model, example_inputs=torch.randn(1, 3, 38, 38).float().cpu())

    for m in model.modules():
        if isinstance(m, nn.Conv2d):
            mode = tp.prune_conv
        elif isinstance(m, nn.Linear):
            mode = tp.prune_linear
        elif isinstance(m, nn.BatchNorm2d):
            mode = tp.prune_batchnorm
        else:
            continue

        weight = m.weight.detach().cpu().numpy()
        out_channels = weight.shape[0]
        L1_norm = np.sum(np.abs(weight))
        num_pruned = int(out_channels * 0.2)
        prune_index = np.argsort(L1_norm)[:num_pruned].tolist()
        pruning_plan = DG.get_pruning_plan(m, mode, idxs=prune_index)
        print(pruning_plan)
        pruning_plan.exec()
    return model

In fact, all the Inception blocks in my model encountered this problem.
I'm not sure about the specific reasons.
I hope I can get your advice, thank you for your help.

无法进行剪枝

when I pruner https://github.com/chenjun2hao/DDRNet.pytorch
I meet the error

`
def prune_model(model):
model.cpu()
DG = tp.DependencyGraph().build_dependency(model, torch.randn((1,3,1024,2048)))
def prune_conv(conv, amount=0.2):
strategy = tp.strategy.L1Strategy()
pruning_index = strategy(conv.weight, amount=amount)
print(pruning_index)
# weight = conv.weight.detach().cpu().numpy()
# out_channels = weight.shape[0]
# L1_norm = np.sum( np.abs(weight), axis=(1,2,3))
# num_pruned = int(out_channels * amount)
# pruning_index = np.argsort(L1_norm)[:num_pruned].tolist() # remove filters with small L1-Norm
plan = DG.get_pruning_plan(conv, tp.prune_conv, pruning_index)
plan.exec()

prunable_modules = [ m for m in model.modules() if isinstance(m, nn.Conv2d) ]
for layer_to_prune in prunable_modules:
    print(layer_to_prune)
    prune_conv(layer_to_prune, 0.5)

return model

`

=> loading final_layer.bn1.num_batches_tracked from pretrained model
=> loading final_layer.conv1.weight from pretrained model
=> loading final_layer.bn2.weight from pretrained model
=> loading final_layer.bn2.bias from pretrained model
=> loading final_layer.bn2.running_mean from pretrained model
=> loading final_layer.bn2.running_var from pretrained model
=> loading final_layer.bn2.num_batches_tracked from pretrained model
=> loading final_layer.conv2.weight from pretrained model
=> loading final_layer.conv2.bias from pretrained model

Number of Parameters before pruner: 5.7M
torch.Size([1, 3, 128, 256])
Traceback (most recent call last):
File "./tools/pruner_new.py", line 142, in
main()
File "./tools/pruner_new.py", line 134, in main
prune_model(model)
File "./tools/pruner_new.py", line 84, in prune_model
prune_conv(layer_to_prune, 0.8)
File "./tools/pruner_new.py", line 63, in prune_conv
plan = DG.get_pruning_plan(conv, tp.prune_conv, pruning_index)
File "/algdata02/yiming.yu/DDRNet.pytorch_pruner/envp_20210903/lib/python3.7/site-packages/torch_pruning/dependency.py", line 378, in get_pruning_plan
root_node = self.module_to_node[module]
KeyError: Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)

Model Size of Pruned model

I'm just starting on Model pruning and your work really helps a lot, I would really like to know how did you calculate the pruned Model size, thank you.

RecursionError during pruning

Hi - thanks for a wonderful tool. I am trying to test it out with a pretrained model from here. However I am encountering the following error:

module name: encode1.conv0
pruning_idxs: [4, 5, 8, 9, 13, 14, 16, 17, 18, 19, 21, 23, 24, 26, 28, 30, 31, 32, 35, 36, 37, 38, 39, 40, 41, 43, 46, 48, 49, 53, 56, 58]
Traceback (most recent call last):
  File "/home/nikhil/projects/green_comp_neuro/FastSurfer/FastSurferCNN/torch_prune_test.py", line 159, in <module>
    load_pretrained(pretrained_ckpt, params_model, model, dummy_data, save_path)
  File "/home/nikhil/projects/green_comp_neuro/FastSurfer/FastSurferCNN/torch_prune_test.py", line 120, in load_pretrained
    model = torch_prune(model, dummy_data, params_model['prune_type'], params_model['prune_percent'])
  File "/home/nikhil/projects/green_comp_neuro/FastSurfer/FastSurferCNN/torch_prune_test.py", line 92, in torch_prune
    pruning_plan = DG.get_pruning_plan( module, tp.prune_conv, idxs=pruning_idxs )
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 398, in get_pruning_plan
    _fix_denpendency_graph(root_node, pruning_fn, idxs)
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 397, in _fix_denpendency_graph
    _fix_denpendency_graph(dep.broken_node, dep.handler, new_indices)
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 397, in _fix_denpendency_graph
    _fix_denpendency_graph(dep.broken_node, dep.handler, new_indices)
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 397, in _fix_denpendency_graph
    _fix_denpendency_graph(dep.broken_node, dep.handler, new_indices)
  [Previous line repeated 990 more times]
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 387, in _fix_denpendency_graph
    new_indices = dep.index_transform(indices)
  File "../../Torch-Pruning/torch_pruning/dependency.py", line 148, in __call__
    if self.reverse==True:
RecursionError: maximum recursion depth exceeded in comparison

The network architecture is based on this paper. Here is a figure showing the details:
image

Below is my test script that uses the model definition and pretrained weights from the model repo

# IMPORTS
import argparse
import nibabel as nib
import numpy as np
from datetime import datetime
import time
import sys
import os
import glob
import os.path as op
import logging
import torch
import torch.nn as nn
import torch.nn.utils.prune as prune
import torch.nn.functional as F

from torch.autograd import Variable
from torch.utils.data.dataloader import DataLoader
from torchvision import transforms, utils

from scipy.ndimage.filters import median_filter, gaussian_filter
from skimage.measure import label, regionprops
from skimage.measure import label

from collections import OrderedDict
from os import makedirs

from models.networks import FastSurferCNN
import pandas as pd

# torch-pruning
sys.path.append('../../Torch-Pruning')
import torch_pruning as tp

def options_parse():
    """
    Command line option parser
    """
    parser = argparse.ArgumentParser()

    # Options for model parameters setup (only change if model training was changed)
    parser.add_argument('--num_filters', type=int, default=64,
                        help='Filter dimensions for DenseNet (all layers same). Default=64')
    parser.add_argument('--num_classes_ax_cor', type=int, default=79,
                        help='Number of classes to predict in axial and coronal net, including background. Default=79')
    parser.add_argument('--num_classes_sag', type=int, default=51,
                        help='Number of classes to predict in sagittal net, including background. Default=51')
    parser.add_argument('--num_channels', type=int, default=7,
                        help='Number of input channels. Default=7 (thick slices)')
    parser.add_argument('--kernel_height', type=int, default=5, help='Height of Kernel (Default 5)')
    parser.add_argument('--kernel_width', type=int, default=5, help='Width of Kernel (Default 5)')
    parser.add_argument('--stride', type=int, default=1, help="Stride during convolution (Default 1)")
    parser.add_argument('--stride_pool', type=int, default=2, help="Stride during pooling (Default 2)")
    parser.add_argument('--pool', type=int, default=2, help='Size of pooling filter (Default 2)')

    sel_option = parser.parse_args()

    return sel_option

def torch_prune(model,dummy_data,prune_type,prune_percent):

    print(f'compressing model with prune type: {prune_type}, sparsity: {prune_percent}')

    # 1. setup strategy (L1 Norm)
    strategy = tp.strategy.L1Strategy() # or tp.strategy.RandomStrategy()

    # 2. build layer dependency for resnet18
    DG = tp.DependencyGraph()
    DG.build_dependency(model, example_inputs=dummy_data)

    # 3. get a pruning plan from the dependency graph.
    for name, module in model.named_modules():
        if isinstance(module, torch.nn.Conv2d):
            print(f'module name: {name}')
           
            pruning_idxs = strategy(module.weight, amount=prune_percent) # or manually selected pruning_idxs=[2, 6, 9, ...]
            print(f'pruning_idxs: {pruning_idxs}')
            pruning_plan = DG.get_pruning_plan( module, tp.prune_conv, idxs=pruning_idxs )
            print(pruning_plan)

            # 4. execute this plan (prune the model)
            pruning_plan.exec()


def load_pretrained(pretrained_ckpt, params_model, model):
    model_state = torch.load(pretrained_ckpt, map_location=params_model["device"])
    new_state_dict = OrderedDict()

    # FastSurfer model specific configs
    for k, v in model_state["model_state_dict"].items():

        if k[:7] == "module." and not params_model["model_parallel"]:
            new_state_dict[k[7:]] = v

        elif k[:7] != "module." and params_model["model_parallel"]:
            new_state_dict["module." + k] = v

        else:
            new_state_dict[k] = v

    model.load_state_dict(new_state_dict)
    model.eval()
    
    return model

if __name__ == "__main__":

    args = options_parse() 

    plane = "Axial"
    pretrained_ckpt = f'../checkpoints/{plane}_Weights_FastSurferCNN/ckpts/Epoch_30_training_state.pkl'

    # Put it onto the GPU or CPU
    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")

    # Set up model for axial and coronal networks
    params_model = {'num_channels': args.num_channels, 'num_filters': args.num_filters,
                      'kernel_h': args.kernel_height, 'kernel_w': args.kernel_width,
                      'stride_conv': args.stride, 'pool': args.pool,
                      'stride_pool': args.stride_pool, 'num_classes': args.num_classes_ax_cor,
                      'kernel_c': 1, 'kernel_d': 1,
                      'model_parallel': False,
                      'device': device
                      }

    # Select the model
    model = FastSurferCNN(params_model)
    model.to(device)
 
    # Load pretrained weights
    model = load_pretrained(pretrained_ckpt, params_model, model)

    # Prune model
    dummy_data = torch.ones(1, 7, 256, 256)
    model = torch_prune(model, dummy_data, prune_type='L1', prune_percent=0.5)

    # Save pruned model
    # save_path = f'./{plane}_pruned.pth'
    # torch.save(model, save_path)

I will appreciate any help or suggestions! Thanks!

how to get the definition of model class after pruning

Thanks for the excellent work! But in general, it need the definition of model class for deploying. Only get the model weights and architecture via torch.save(model, save_path) in other machines may get some trouble (for examples). Is there some ways to get the pruning model defination? Thanks :D

update_index递归报错

递归好像没有截止条件,prune_resnet18_cifar10.py 运行的时候在update_index内部报这个错误是怎么回事?
maximum recursion depth exceeded while calling a Python object

How do I compress the fully connected model

Hello, is it possible to compress such model?

class DNN(nn.Module):    #dnn网络
    def __init__(self, input_size, num_classes, HIDDEN_UNITS):
        super().__init__()
        self.fc1 = nn.Linear(input_size, HIDDEN_UNITS)
        self.fc2 = nn.Linear(HIDDEN_UNITS, num_classes)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        y_hat = self.fc2(x)
        return y_hat

How to assign 'example_inputs',I report the mistake

File "E:\workspace\Graduatio-Project\Torch-Pruning-master\torch_pruning\dependency.py", line 443, in _set_fc_index_transform
    stride = fc_in_features // feature_channels
ZeroDivisionError: integer division or modulo by zero

Can't build dependency graph for model with multiple inputs.

I have a model which takes 2 inputs, image and embeddings.
Here is a simple inputs that I have

in1 = torch.rand(size=(1, 3, 256, 256))  
in2 = torch.rand(size=(512, 1))
out = model(in1, in2)

This is how I am passing 2 inputs. Now, in building dependency, here is what I've tried,

strategy = tp.strategy.L1Strategy()
example_inputs=(Xt, embeds)

DG = tp.DependencyGraph()
DG.build_dependency(G, example_inputs=example_inputs)

Also, for ONNX and tensorflow, I've also faced the same problem but I've solved it with "*" ahead of inputs.

out = G(*example_inputs) This works.

But, DG.build_dependency(G, example_inputs=*example_inputs) this gives an error. Let me know if something is unclear.

Error when pruning keeps only a single filter

Error

   File "/home/reda/miniconda3/envs/yolov5/lib/python3.8/site-packages/torch_pruning/dependency.py", line 375, in get_pruning_plan
    _fix_denpendency_graph(root_node, pruning_fn, idxs)
  File "/home/reda/miniconda3/envs/yolov5/lib/python3.8/site-packages/torch_pruning/dependency.py", line 368, in _fix_denpendency_graph
    if len(new_indices)==0:
TypeError: object of type 'int' has no len()

What the node looks like

<Node: (6.m.0.cv2.conv (Conv2d(128, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)), <MkldnnConvolutionBackward object at 0x7f05ceabb2e0>)>

What this looks like is that pruning this layer outputs a single-channel Conv2D, and then new_indices takes the value of single int rather than a list.

Rounding bug

HI, @VainF . There is one more bug, has found in Torch-Pruning/torch_pruning/prune/strategy.py in def round_pruning_amount .
The thing is when you meet that cases:

round_pruning_amount(total_parameters=16, n_to_prune=1, round_to=16)
>> 16

And after that pruning behavior, layer left without all parameters, that is an error. The right pruning behavior is to return pruning amount = 0.
So, I propose add correction line in def round_pruning_amount after 11'th line in Torch-Pruning/torch_pruning/prune/strategy.py:
elif total_parameters <= round_to: return 0

We also can change the entire rounding function to more clear but rougher rounding logic:

def round_pruning_amount(total_parameters, n_to_prune, round_to):
    """round the parameter amount after pruning to an integer multiple of `round_to`.
    """
    n_remain = round_to*max(int(total_parameters - n_to_prune)//round_to, 1)
    return max(total_parameters - n_remain, 0)

Both the variants will correct existing bug. What you think?

Bug with rounding

HI, @VainF . I've found a bug it the rounding function round_pruning_amount from strategy.py.
The case is with parameters: total_parameters=30, n_to_prune=1, round_to=8 . Current function code returns "-2" that raises an error. The right return variant is "6".
My suggestion is to add extra condition the 14'th line in strategy.py:
if (compensation < round_to // 2 or n_to_prune + compensation < round_to) and after_pruning > round_to:

torch.jit.trace of pruned model fails when L2Norm is involved

Hello,

I am attempting to use Torch_Pruning to prune SSD model.

Note that I use this fork:
https://github.com/dkurt/ssd.pytorch/tree/opencv_support

Once I pruned away some of conv filters in the vgg layers, I get the following error with torch.jit.trace on the pruned model:

$ python prune_TP_git_issue.py --model ssd300_mAP_77.43_v2.pth
. . .
File "prune_TP_git_issue.py", line 62, in
model_output = torch.jit.trace(model, torch_image)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/jit/init.py", line 882, in trace
check_tolerance, _force_outplace, _module_class)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/jit/init.py", line 1034, in trace_module
module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in call
result = self._slow_forward(*input, **kwargs)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(*input, **kwargs)
File "/nas4/tfs/ssd.pytorch/ssd.py", line 89, in forward
s = self.L2Norm(x)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in call
result = self._slow_forward(*input, **kwargs)
File "/home/tfs/venv_ssd.pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward
result = self.forward(*input, **kwargs)
File "/nas4/tfs/ssd.pytorch/layers/modules/l2norm.py", line 23, in forward
out = self.weight.view(1, -1, 1, 1) * x
RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1

Attaching the python file.
prune_TP_git_issue.ssd.pytorch.py.zip

pruning Resnet18 fails (KeyError: Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False))

Hi, Thanks a lot for your kind and great contribution.
I am currently trying to prune a custom resnet18 model which was trained for face recognition.
The model is pretty much the same as the normal resnet18 with some minor differences (you can see the actual model definition here

Heres my model if you are intrested
<bound method Module.__repr__ of ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (prelu): PReLU(num_parameters=1)
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): IRBlock(
      (bn0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=64, out_features=4, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=4, out_features=64, bias=True)
          (3): Sigmoid()
        )
      )
    )
    (1): IRBlock(
      (bn0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=64, out_features=4, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=4, out_features=64, bias=True)
          (3): Sigmoid()
        )
      )
    )
  )
  (layer2): Sequential(
    (0): IRBlock(
      (bn0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=128, out_features=8, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=8, out_features=128, bias=True)
          (3): Sigmoid()
        )
      )
    )
    (1): IRBlock(
      (bn0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=128, out_features=8, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=8, out_features=128, bias=True)
          (3): Sigmoid()
        )
      )
    )
  )
  (layer3): Sequential(
    (0): IRBlock(
      (bn0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=256, out_features=16, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=16, out_features=256, bias=True)
          (3): Sigmoid()
        )
      )
    )
    (1): IRBlock(
      (bn0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=256, out_features=16, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=16, out_features=256, bias=True)
          (3): Sigmoid()
        )
      )
    )
  )
  (layer4): Sequential(
    (0): IRBlock(
      (bn0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=512, out_features=32, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=32, out_features=512, bias=True)
          (3): Sigmoid()
        )
      )
    )
    (1): IRBlock(
      (bn0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (prelu): PReLU(num_parameters=1)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (se): SEBlock(
        (avg_pool): AdaptiveAvgPool2d(output_size=1)
        (fc): Sequential(
          (0): Linear(in_features=512, out_features=32, bias=True)
          (1): PReLU(num_parameters=1)
          (2): Linear(in_features=32, out_features=512, bias=True)
          (3): Sigmoid()
        )
      )
    )
  )
  (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc): Linear(in_features=25088, out_features=512, bias=True)
  (bn3): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)>

I used your prune_model() function from examples/prune_resnet18_cifar10.py#L83 and only changed the resnet.BasicBlock to IRBlock and the input size from 32 to 112 in my model the rest is the same :
here is the whole script :

import torch
import torch_pruning as pruning
from models import resnet18, load_model, BasicBlock, Bottleneck, IRBlock, SEBlock, ResNet

def prune_model(model):
    model.cpu()
    # my resnet18 was trained on 112x112 images, so we changed 32 to 112
    DG = pruning.DependencyGraph().build_dependency( model, torch.randn(1, 3, 112, 112))
    def prune_conv(conv, pruned_prob):
        weight = conv.weight.detach().cpu().numpy()
        out_channels = weight.shape[0]
        L1_norm = np.sum(weight, axis=(1, 2, 3))
        num_pruned = int(out_channels * pruned_prob)
        prune_index = np.argsort(L1_norm)[:num_pruned].tolist() # remove filters with small L1-Norm
        plan = DG.get_pruning_plan(conv, pruning.prune_conv, prune_index)
        plan.exec()
    
    block_prune_probs = [0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3]
    blk_id = 0
    for m in model.modules():
        if isinstance( m, IRBlock):
            prune_conv( m.conv1, block_prune_probs[blk_id] )
            prune_conv( m.conv2, block_prune_probs[blk_id] )
            blk_id+=1
    return model    

# load the resnet18 model : 
model = resnet18(pretrained=False, use_se=True)
model = load_model(model, 'BEST_checkpoint_r18.tar')
model.eval()
# prune  the model   
prune_model(model)

but upon running this snippet of code, I get this error ;

Traceback (most recent call last):
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 52, in <module>
    prune_model(model)
  File "d:\Codes\face\python\FV\Pruning\prune.py", line 46, in prune_model
    prune_conv( m.conv1, block_prune_probs[blk_id] )
  File "d:\Codes\fac_ver\python\FV\Pruning\prune.py", line 39, in prune_conv
    plan = DG.get_pruning_plan(conv, pruning.prune_conv, prune_index)
  File "C:\Users\User\Anaconda3\Lib\site-packages\torch_pruning\dependency.py", line 328, in get_pruning_plan
    root_node = self.module_to_node[module]
KeyError: Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

Could you kindly please tell me what I'm missing here?
Thanks a lot in advance

ShuffleNet architecture

Hello Gongfan,
I would like to learn from you how I could extend your code to work on the shufflenet (1_0, 1_5, 2_0) architectures. Could you please provide pointers in your code where the change must be introduced?
Thank you.

Peter

PS: I have spent some time playing with your code but I could not figure how to incorporate the channel split and channel shuffle operations in the shufflenet architecture into the pruning operations.

no defined nparams_to_prune in strucured.py

no defined nparams_to_prune in first return
the function PReLUPruning() in structured.py

class PReLUPruning(BasePruningFunction):
    @staticmethod
    def prune_params(layer: nn.PReLU, idxs: list) -> nn.Module:
        if layer.num_parameters == 1:
            return layer, nparams_to_prune
        keep_idxs = list(set(range(layer.num_parameters)) - set(idxs))
        layer.num_parameters = layer.num_parameters - len(idxs)
        layer.weight = torch.nn.Parameter(layer.weight.data.clone()[keep_idxs])
        return layer

    @staticmethod
    def calc_nparams_to_prune(layer: nn.PReLU, idxs: Sequence[int]) -> int:
        nparams_to_prune = 0 if layer.num_parameters == 1 else len(idxs)
        return nparams_to_prune

Yolov5 and Detectron R101FPN model for pruning

@JonnyKong @VainF thanks for sharing your repo , i have following query

  1. can we perform pruning on yolov5 v4 s/l/m versions of the model
  2. the detectron pretrained model like FasterRCNN-R101FPN model can it be pruned if so what are the steps to follow
    THanks in adavance

Is it necessary to transfer model to cpu?

Hello.In torch.pruning/dependency.py there is a line model.eval().cpu() . With this i cant use model RAFT (optical flow model) which i'm currently researching (it fails on

raise RuntimeError("module must have its parameters and buffers "
                                   "on device {} (device_ids[0]) but found one of "
                                   "them on device: {}".format(self.src_device_obj, t.device))

even if i'm transferrng it on cpu myself). But if i'm commenting this mentioned line model.eval().cpu() then programm passes through
DG.build_dependency(model, example_inputs=[torch.randn(1, 3, 440, 1024), torch.randn(1, 3, 440, 1024)])
just fine. So, is this line model.eval().cpu() is necessary in torch_pruning? Is torch_pruning works on cpu only?

Thanks in advance.

Default pruning_dimension is not supported for non tensor example inputs

From build_dependency:

if pruning_dim >= 0:
    pruning_dim = pruning_dim - len(example_inputs.size())

pruning_dim is 1 by default, and even though using a list or a dictionary of inputs is supported, here an exceptions occurs since example_inputs is a list and doesn't have a size attribute.

I think it could be fixed this way:

if isinstance(example_inputs, torch.Tensor):
    pruning_dim = pruning_dim - len(example_inputs.size())
elif isinstance(example_inputs, (tuple, list)):
    pruning_dim = pruning_dim - len(example_inputs[0].size())
else:
    raise Exception("pruning with non negative dimension is not supported for input of type {}".format(str(type(example_inputs))))

If anyone familiar with the DependencyGraph's code has a better idea I would be glad to hear about it.

help on object has no attribute 'name'

trying to prune efficientnet by lukemelas
getting
AttributeError: 'SwishImplementationBackward' object has no attribute 'name'
I tried a lot of solutions that came up in my mind but all failed
I also tried to see some documentations on

<class 'AccumulateGrad'>
<class 'ViewBackward'>
<class 'ViewBackward'>
<class 'MeanBackward1'>
<class 'ViewBackward'>

But I couldn't find them.
Swish looks like this btw

class SwishImplementation(torch.autograd.Function):
    @staticmethod
    def forward(ctx, i):
        result = i * torch.sigmoid(i)
        ctx.save_for_backward(i)
        return result

    @staticmethod
    def backward(ctx, grad_output):
        i = ctx.saved_variables[0]
        sigmoid_i = torch.sigmoid(i)
        return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))


class MemoryEfficientSwish(nn.Module):
    def forward(self, x):
        return SwishImplementation.apply(x)

I appreciate any help on this... (spending about 5 hours on this haha.. I'm pretty sure I'm going to have another after this.. but I want to give it a try)

Or if anyone succeeded in filter pruning on efficientnet, I would love to hear your experiences...

thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.