yiwenguo / dynamic-network-surgery Goto Github PK

Caffe implementation for dynamic network surgery.

License: Other

C++ 88.69% Cuda 5.54% CMake 3.24% Makefile 0.77% Protocol Buffer 1.69% Python 0.07%

dynamic-network-surgery's Introduction

Dynamic network surgery

Dynamic network surgery is a very effective method for DNN pruning. To better use it with python and matlab, you may also need a classic version of the Caffe framework. For the convolutional and fully-connected layers to be pruned, change their layer types to "CConvolution" and "CInnerProduct" respectively. Then, pass "cconvolution_param" and "cinner_product_param" messages to these modified layers for better pruning performance.

Example for usage

Below is an example for pruning the "ip1" layer in LeNet5:

layer {
  name: "ip1"
  type: "CInnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
  cinner_product_param {
    gamma: 0.0001
    power: 1
    c_rate: 4
    iter_stop: 14000  
    weight_mask_filler {
      type: "constant"
      value: 1
    }
    bias_mask_filler {
      type: "constant"
      value: 1
    }        
  }   
}

Citation

Please cite our work in your publications if it helps your research:

@inproceedings{guo2016dynamic,		
  title = {Dynamic Network Surgery for Efficient DNNs},
  author = {Guo, Yiwen and Yao, Anbang and Chen, Yurong},
  booktitle = {Advances in neural information processing systems (NIPS)},
  year = {2016}
}

and do not forget about Caffe:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

Enjoy your own surgeries!

dynamic-network-surgery's People

Contributors

Stargazers

Watchers

dynamic-network-surgery's Issues

Alexnet training process and hyperparameter

Hi Yiwen,

I was trying to prune Alexnet but only made little progress. Would you please share the detailed training process and hyperparameters?

The training tricks you provided in #12 are very useful, but I still cannot reproduce the results in your paper. Here are some problems I've encountered during the pruning process:

Problem 1. conv layers are easy to prune but ip layers are not.

Let's name the layers in Alexnet to be pruned as conv1, conv2, conv3, conv4, conv5, ip1, ip2, and ip3. conv layers and ip layers are divided into different pruning groups. So I trained conv1 to conv5 (type: "CConvolution") together in the first place, while leaving the ip1 to ip3 as normal innerproduct layer (type: "InnerProduct"). This step is successful.

But when I move on to try to fine-tune and prune the ip1 layers (hyperparameters in the Cconv' layers are kept the same, ip2andip3` are still "InnerProduct" type), it does not converge and the accuracy does not increase either before the loss blows up (the loss always blows up to 87.3365, don't know why this number).
Problem 2. Hyperparameters.

c_rate
I used [-0.7 ,0.5] for the c_rate of conv layers (negative c_rate for first conv layer) and [0.5, 1.6] for the ip layers when pruning and finetuning, so that each layer's pruning rate are almost the same as circled in the following picture.

learning rate (lr)
10^-5 to 10^-6. Larger lr makes the loss blow up very quickly. Sometimes even 10^-5 can only last for few hundreds to thousands interations (batch size 1024) before the loss blows up.

With the above parameters and training process in Problem 1, the result does not converge.

I think lr and c_rate are the most two important hyperparameters during the pruning process. Is my understanding correct?
Problem 3. Pruning rate degradation.
In order to get a converged results, I used smaller c_rate for the ip layers so that around 40% of the total parameters are kept. Now the result converges and accuracy is around 56%-57% .

But I found that if we compare the caffemodels snapshot in the early stage of pruning and those of the late stage, around 40% of total parameters are kept in the early stage model, while 100% of total parameters are kept in late stage model, which means the pruning fails.

I think this is because the mu and std are only calculated in the first iteration. After tens of thousands of iterations, the mu and std have changed dramatically. Do you think this could be the possible reason for this problem?

Thank you very much for your patience!
It would be of great help if more detailed training process could be offered.

Thanks!

mu std is nan problem

I0606 14:51:26.174983 20195 solver.cpp:269] Solving Oxford102_VGG_CNN_S
I0606 14:51:26.174996 20195 solver.cpp:270] Learning Rate Policy: step
I0606 14:51:26.180789 20195 solver.cpp:314] Iteration 0, Testing net (#0)
I0606 14:51:28.089752 20195 blocking_queue.cpp:50] Data layer prefetch queue empty
I0606 14:52:14.187464 20195 solver.cpp:363] Test net output #0: accuracy = 0.00967742
I0606 14:52:14.189384 20195 compress_conv_layer.cu:170] -nan -nan 0
I0606 14:52:14.234892 20195 compress_conv_layer.cu:170] -nan -nan 0
I0606 14:52:14.284571 20195 compress_conv_layer.cu:170] -nan -nan 0
I0606 14:52:14.313534 20195 compress_conv_layer.cu:170] -nan -nan 0
I0606 14:52:14.370311 20195 compress_conv_layer.cu:170] -nan -nan 0
I0606 14:52:14.434296 20195 compress_inner_product_layer.cu:171] -nan -nan 0
I0606 14:52:14.467793 20195 compress_inner_product_layer.cu:171] -nan -nan 0
I0606 14:52:14.473366 20195 compress_inner_product_layer.cu:171] -nan -nan 0

source code question

Hi,
Thanks for your work!
I found that the code in the file compress_inner_product_layer.cpp, line 139, 140, 145, and 146, might be problematic.
this->mu += fabs(weight[k]);
this->std += weight[k]*weight[k];

this->mu += fabs(bias[k]);
this->std += bias[k]*bias[k];

Should they be the following? With the “mask” multiplied.
this->mu += fabs(weight[k]* weightMask[k]);
this->std += weight[k]*weight[k* weightMask[k]];

this->mu += fabs(bias[k]* biasMask[k]);
this->std += bias[k]*bias[k]* biasMask[k];

Thanks,
Kai

Unable to train new model

on training a model following error occurs -
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0327 18:35:26.106482 11582 layer_factory.hpp:68] Check failed: registry.count(type) == 0 (1 vs. 0) Layer type CConvolution already registered.
*** Check failure stack trace: ***
Aborted (core dumped)

Any idea how to fix this.??

how to load pretrained model?

Hello, when I want to use your code to prune lenet-5 , I replace ip1 and ip2 with your cinner_product_layer.
But because cinner_product_layer has 4 blobs(weight, bias, weight mask, bias mask) and inner_product_layer just has 2 blobs , so caffe can't load pretrained lenet-5 model lenet_iter_10000.model which is in examples/mnist to this surgery net.
So how to load pretraine model to prune it , should I write python or matlab code to transfer parameters of pretrained model to your surgery net manually?

Cannot Compress Model

DNS is a good method and thank you for sharing your code!

My question is:
The compilation and installation was successful and it did not cost much effort ( Only your code was used and the original Caffe code was not added, so I assume your code could be used as a standalone package ).
But when I tried to compress the Lenet5, as suggested int the README, I only changed the "ip1" layer's type to "CInnerProduct" and added the "cinner_product_param" part, the result did not converge, and the size of the ouput caffemodel is 3.2M, even larger thant the original size 1.7M.

So I was wondering if you have encountered this kind of problem before and what is possibly my mis-operation.

The following is the prototxt file, as is in the caffe examples, only the "CInnerProduct" part changed:

name: "LeNet"
layer {
　name: "mnist"
　type: "Data"
　top: "data"
　top: "label"
　include {
　　phase: TRAIN
　}
　transform_param {
　　scale: 0.00390625
　}
　data_param {
　　source: "examples/mnist/mnist_train_lmdb"
　　batch_size: 64
　　backend: LMDB
　}
}
layer {
　name: "mnist"
　type: "Data"
　top: "data"
　top: "label"
　include {
　　phase: TEST
　}
　transform_param {
　　scale: 0.00390625
　}
　data_param {
　　source: "examples/mnist/mnist_test_lmdb"
　　batch_size: 100
　　backend: LMDB
　}
}
layer {
　name: "conv1"
　type: "Convolution"
　bottom: "data"
　top: "conv1"
　param {
　　lr_mult: 1
　}
　param {
　　lr_mult: 2
　}
　convolution_param {
　　num_output: 20
　　kernel_size: 5
　　stride: 1
　　weight_filler {
　　　type: "xavier"
　　}
　　bias_filler {
　　　type: "constant"
　　}
　}
}
layer {
　name: "pool1"
　type: "Pooling"
　bottom: "conv1"
　top: "pool1"
　pooling_param {
　　pool: MAX
　　kernel_size: 2
　　stride: 2
　}
}
layer {
　name: "conv2"
　type: "Convolution"
　bottom: "pool1"
　top: "conv2"
　param {
　　lr_mult: 1
　}
　param {
　　lr_mult: 2
　}
　convolution_param {
　　num_output: 50
　　kernel_size: 5
　　stride: 1
　　weight_filler {
　　　type: "xavier"
　　}
　　bias_filler {
　　　type: "constant"
　　}
　}
}
layer {
　name: "pool2"
　type: "Pooling"
　bottom: "conv2"
　top: "pool2"
　pooling_param {
　　pool: MAX
　　kernel_size: 2
　　stride: 2
　}
}
layer {
　name: "ip1"
　type: "CInnerProduct"
　bottom: "pool2"
　top: "ip1"
　param {
　　lr_mult: 1
　}
　param {
　　lr_mult: 2
　}
　inner_product_param {
　　num_output: 500
　　weight_filler {
　　　type: "xavier"
　　}
　　bias_filler {
　　　type: "constant"
　　}
　}
　cinner_product_param {
　　gamma: 0.0001
　　power: 1
　　c_rate: 4
　　iter_stop: 14000
　　weight_mask_filler {
　　　type: "constant"
　　　value: 1
　　}
　　bias_mask_filler {
　　　type: "constant"
　　　value: 1
　　}
　}
}
layer {
　name: "relu1"
　type: "ReLU"
　bottom: "ip1"
　top: "ip1"
}
layer {
　name: "ip2"
　type: "InnerProduct"
　bottom: "ip1"
　top: "ip2"
　param {
　　lr_mult: 1
　}
　param {
　　lr_mult: 2
　}
　inner_product_param {
　　num_output: 10
　　weight_filler {
　　　type: "xavier"
　　}
　　bias_filler {
　　　type: "constant"
　　}
　}
}
layer {
　name: "accuracy"
　type: "Accuracy"
　bottom: "ip2"
　bottom: "label"
　top: "accuracy"
　include {
　　phase: TEST
　}
}
layer {
　name: "loss"
　type: "SoftmaxWithLoss"
　bottom: "ip2"
　bottom: "label"
　top: "loss"
}

The output from iteration 9000 to iteration 10000 is as following:
(accuracy lingering around 0.1135 )

I0602 04:05:34.931988 15322 solver.cpp:314] Iteration 9000, Testing net (#0)
I0602 04:05:35.897229 15322 solver.cpp:363] Test net output #0: accuracy = 0.1135
I0602 04:05:35.897274 15322 solver.cpp:363] Test net output #1: loss = 2.30104 (* 1 = 2.30104 loss)
I0602 04:05:35.906638 15322 solver.cpp:226] Iteration 9000, loss = 2.30204
I0602 04:05:35.906673 15322 solver.cpp:242] Train net output #0: loss = 2.30204 (* 1 = 2.30204 loss)
I0602 04:05:35.906682 15322 solver.cpp:521] Iteration 9000, lr = 0.00617924
I0602 04:05:37.375916 15322 solver.cpp:226] Iteration 9100, loss = 2.2923
I0602 04:05:37.376133 15322 solver.cpp:242] Train net output #0: loss = 2.2923 (* 1 = 2.2923 loss)
I0602 04:05:37.376145 15322 solver.cpp:521] Iteration 9100, lr = 0.00615496
I0602 04:05:38.845537 15322 solver.cpp:226] Iteration 9200, loss = 2.30995
I0602 04:05:38.845561 15322 solver.cpp:242] Train net output #0: loss = 2.30995 (* 1 = 2.30995 loss)
I0602 04:05:38.845568 15322 solver.cpp:521] Iteration 9200, lr = 0.0061309
I0602 04:05:40.314781 15322 solver.cpp:226] Iteration 9300, loss = 2.31165
I0602 04:05:40.314803 15322 solver.cpp:242] Train net output #0: loss = 2.31165 (* 1 = 2.31165 loss)
I0602 04:05:40.314811 15322 solver.cpp:521] Iteration 9300, lr = 0.00610706
I0602 04:05:41.782209 15322 solver.cpp:226] Iteration 9400, loss = 2.29439
I0602 04:05:41.782232 15322 solver.cpp:242] Train net output #0: loss = 2.29439 (* 1 = 2.29439 loss)
I0602 04:05:41.782239 15322 solver.cpp:521] Iteration 9400, lr = 0.00608343
I0602 04:05:43.237807 15322 solver.cpp:314] Iteration 9500, Testing net (#0)
I0602 04:05:44.201413 15322 solver.cpp:363] Test net output #0: accuracy = 0.1135
I0602 04:05:44.201436 15322 solver.cpp:363] Test net output #1: loss = 2.30121 (* 1 = 2.30121 loss)
I0602 04:05:44.210533 15322 solver.cpp:226] Iteration 9500, loss = 2.30612
I0602 04:05:44.210551 15322 solver.cpp:242] Train net output #0: loss = 2.30612 (* 1 = 2.30612 loss)
I0602 04:05:44.210559 15322 solver.cpp:521] Iteration 9500, lr = 0.00606002
I0602 04:05:45.679636 15322 solver.cpp:226] Iteration 9600, loss = 2.30252
I0602 04:05:45.679658 15322 solver.cpp:242] Train net output #0: loss = 2.30252 (* 1 = 2.30252 loss)
I0602 04:05:45.679666 15322 solver.cpp:521] Iteration 9600, lr = 0.00603682
I0602 04:05:47.147786 15322 solver.cpp:226] Iteration 9700, loss = 2.29213
I0602 04:05:47.147809 15322 solver.cpp:242] Train net output #0: loss = 2.29213 (* 1 = 2.29213 loss)
I0602 04:05:47.147817 15322 solver.cpp:521] Iteration 9700, lr = 0.00601382
I0602 04:05:48.616607 15322 solver.cpp:226] Iteration 9800, loss = 2.29719
I0602 04:05:48.616629 15322 solver.cpp:242] Train net output #0: loss = 2.29719 (* 1 = 2.29719 loss)
I0602 04:05:48.616637 15322 solver.cpp:521] Iteration 9800, lr = 0.00599102
I0602 04:05:50.084087 15322 solver.cpp:226] Iteration 9900, loss = 2.2912
I0602 04:05:50.084110 15322 solver.cpp:242] Train net output #0: loss = 2.2912 (* 1 = 2.2912 loss)
I0602 04:05:50.084120 15322 solver.cpp:521] Iteration 9900, lr = 0.00596843
I0602 04:05:51.538485 15322 solver.cpp:399] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodel
I0602 04:05:51.553609 15322 solver.cpp:684] Snapshotting solver state to binary proto fileexamples/mnist/lenet_iter_10000.solverstate
I0602 04:05:51.606297 15322 solver.cpp:295] Iteration 10000, loss = 2.29934
I0602 04:05:51.606360 15322 solver.cpp:314] Iteration 10000, Testing net (#0)
I0602 04:05:52.568142 15322 solver.cpp:363] Test net output #0: accuracy = 0.1135
I0602 04:05:52.568188 15322 solver.cpp:363] Test net output #1: loss = 2.30109 (* 1 = 2.30109 loss)
I0602 04:05:52.568197 15322 solver.cpp:300] Optimization Done.
I0602 04:05:52.568205 15322 caffe.cpp:184] Optimization Done.

Thank you very much!

The size of model increases doubly.

Hi, thank for your great work. I try to prun Alex net training with Imagenet (learning rate = 0.03). I use compression rate (c_rate = 18) for both convolution and fully connected layers and keep the other parameters same as the examples of your project. I see the size of sparse model increases doubly from 233 MB to 438 MB. Could you help me to explain the procedure? Thank you so much.

request for a whole caffe version with a python and matlab ports

sorry for disturbing you, can you provide a a whole caffe version with a python and matlab ports?

train result question

using the dynamic network surgery ,I can train lenet5 on mnist database reach 99.07% accuracy,but when I
test the compression results,I find that Many parameters are not 0, but are very close to 0.The program did not write the corresponding zero program ? The training results from and downloaded caffemodel from your project there are some differences.can you tell me how to fix it or how to use the dynamic network surgery train lenet5 well?

c_rate setting

Hello, Is there any algorithm or formula for setting the c_rate value in the cconvolution_param or cinnerproduct_param.I tried out different values and this repo works weird with float c_rate values. The value of 4 in the sample prototxt for lenet doesn't work in my case. Can you help to come up with the right value for c_rate?

help

I do not understand what you mean by the following sentence?
“Dynamic network surgery is a very effective method for DNN pruning. To better use it with python and matlab, you may also need a classic version of the Caffe framework. ”
Can you explain it in detail? Because you provide the caffe code is missing examples and matlab file, in the translation of the time there will be a lot of problems, you mean to use your caffe file to replace you below to the caffe code in the document or?

The compile problem

Hello! Thanks for your work. Could you help me to fix the compile problem? When I make all -j8, the errors src/caffe/layers/compress_conv_layer.cpp: In instantiation of ‘void caffe::CConvolutionLayer::Forward_cpu(const std::vector<caffe::Blob>&, const std::vector<caffe::Blob>&) [with Dtype = float]’:
src/caffe/layers/compress_conv_layer.cpp:205:1: required from here
src/caffe/layers/compress_conv_layer.cpp:81:36: error: ‘class caffe::CConvolutionLayer’ has no member named ‘iter_’
if (this->std==0 && this->iter_==0){
^
src/caffe/layers/compress_conv_layer.cpp:118:26: error: ‘class caffe::CConvolutionLayer’ has no member named ‘iter_’ will happen. Is the relevent .hpp file missing? I can not find the compress_conv_layer.hpp in the package. Thanks for your help!

Pruned model size is the same as the original model

@yiwenguo Thanks for your nice sharing!
this work can efficiently compress the number of parameters in LeNet-5 and AlexNet by a factor of 108 and 17.7 respectively. However, I found that the size of pruned models is the same as the original models. Could you please share the code that make the size of pruned model less than original model?

the hyper-parameters in the paper

hi，我看了一下代码，不是太清楚论文中的超参数是怎么得到的？
比如：

if (pow(1+(this->gamma)*(this->iter_),-(this->power))>r && (this->iter_)<(this->iter_stop_)) { 	
			for (unsigned int k = 0;k < this->blobs_[0]->count(); ++k) {
				if (weightMask[k]==1 && fabs(weight[k])<=0.9*std::max(mu+crate*std,Dtype(0)))
					weightMask[k] = 0;
				else if (weightMask[k]==0 && fabs(weight[k])>1.1*std::max(mu+crate*std,Dtype(0)))
					weightMask[k] = 1;
			}	
			if (this->bias_term_) {       
				for (unsigned int k = 0;k < this->blobs_[1]->count(); ++k) {
					if (biasMask[k]==1 && fabs(bias[k])<=0.9*std::max(mu+crate*std,Dtype(0)))
						biasMask[k] = 0;
					else if (biasMask[k]==0 && fabs(bias[k])>1.1*std::max(mu+crate*std,Dtype(0)))
						biasMask[k] = 1;
				}    
			} 
		}

中的fabs(weight[k])<=0.9std::max(mu+cratestd,Dtype(0))
这里的规则是怎么得到的，0.9std::max(mu+cratestd,Dtype(0) 这个数适用于所有的网络么，我是不太清楚，求解答

Lifecycle of using Dynamic-Network-Surgery

Hi @yiwenguo,

Thankyou so much for sharing your work with the community!

I do have a caffemodel that is trained on my own dataset. It follows an architecture that is similar to Resnet 101, but has some extra layers. I am about to use DNS to see how much I am able to compress it.

This is how I plan to go about. Can you please confirm whether my understanding on how to use the DNS is correct or not:

INPUT: My trained model (caffemodel and prototxt)
Step 1: Modify the convolution layers in my prototxt to CConvolution and fc layers to CInnerProduct and pass appropriate messages to the modified layers.
Step 2: Finetune my network for some iterations.
caffe train -solver my_modified.prototxt -weights my_trained_model.caffemodel
Step 3: Do the post processing that you mentioned here on the caffe model that gets generated in Step 2.
OUTPUT: Smaller caffemodel, from step 3

Thanks,
Joseph

A minor problem in LayerSetup of compress_conv_layer.cpp

In LayerSetup of compress_conv_layer.cpp, you have the following lines (you are using the bias_mask_filler for the weightMask):

// Intialize and fill the weightmask
this->blobs_[1].reset(new Blob<Dtype>(this->blobs_[0]->shape()));
shared_ptr<Filler<Dtype> > bias_mask_filler(GetFiller<Dtype>(
    cconv_param.bias_mask_filler()));
bias_mask_filler->Fill(this->blobs_[1].get());

which should be

// Initialize and fill the weightMask
this->blobs_[1].reset(new Blob<Dtype>(this->blobs_[0]->shape()));
shared_ptr<Filler<Dtype> > weight_mask_filler(GetFiller<Dtype>(
    cconv_param.weight_mask_filler()));
weight_mask_filler->Fill(this->blobs_[1].get());

How to resume the training using solverstate?

Hai,
I tried to resume the pruning training using solver state. During resumption the mu and std values are zero. Hence pruning is not happening (X factor is always 1 while I am resuming the training). It seems that mu and std values are not stored. Is it possible to resume or storing the mu and std values? How to resume the pruning training using solver state?

train problem

when i finetune resnet model, I got this error:
I0814 15:49:26.345660 3407 net.cpp:774] Copying source layer conv1
F0814 15:49:26.345674 3407 net.cpp:777] Check failed: target_blobs.size() == source_layer.blobs_size() (4 vs. 2) Incompatible number of blobs for layer conv1

it may because there are only 2 items(weights and bias) in convolution layer's param in the original caffemodel, but cconvolution layer has 4 items. when it make copy from source layer to target cconvolution layer, it got error.

Could you give an example of the prototxt ? Thank you!

Values of threshold parameters

Can you please provide the values of a_k and b_k (the thresholds used in Eq. 3 in your paper) for the models tested?

No regularization?

Is there no regularization used in the models? How can we ensure than that the weights become closer to zero?

fix your code in newest caffe branch。

thanks for you paper and code .
as far as i know ，your code was based on caffe-rc2 ,maybe even more older caffe branch, i am interested in putting your code into newest caffe branch,but they have different code structure ,so i want to know which layers you had modified .
could you give me some clues?

Best wishes!

training problem

thanks for your work, I use your woke to pruning resnet18, but I find it does not work, did your share a solver prototxt to pruning network?

Error: ‘SolverAction’ has not been declared

Hello! Thanks for your work. Could you help me to fix the compile problem? When I make all, the error:
In file included from src/caffe/util/signal_handler.cpp:7:0:
./include/caffe/util/signal_handler.h:12:17: error: ‘SolverAction’ has not been declared
SignalHandler(SolverAction::Enum SIGINT_action,
^
./include/caffe/util/signal_handler.h:12:36: error: expected ‘)’ before ‘SIGINT_action’
SignalHandler(SolverAction::Enum SIGINT_action,
^
./include/caffe/util/signal_handler.h:15:3: error: ‘ActionCallback’ does not name a type
ActionCallback GetActionFunction();
^
./include/caffe/util/signal_handler.h:17:3: error: ‘SolverAction’ does not name a type
SolverAction::Enum CheckForSignals() const;
^
./include/caffe/util/signal_handler.h:18:3: error: ‘SolverAction’ does not name a type
SolverAction::Enum SIGINT_action_;
^
./include/caffe/util/signal_handler.h:19:3: error: ‘SolverAction’ does not name a type
SolverAction::Enum SIGHUP_action_;
^
src/caffe/util/signal_handler.cpp:88:29: error: expected constructor, destructor, or type conversion before ‘(’ token
SignalHandler::SignalHandler(SolverAction::Enum SIGINT_action,
^
src/caffe/util/signal_handler.cpp:99:1: error: ‘SolverAction’ does not name a type
SolverAction::Enum SignalHandler::CheckForSignals() const {
^
src/caffe/util/signal_handler.cpp:111:1: error: ‘ActionCallback’ does not name a type
ActionCallback SignalHandler::GetActionFunction() {
^
make: *** [.build_release/src/caffe/util/signal_handler.o] Error 1

The caffe version is classic. Thank you!

question in Backward code

Hi, thanks for your great work. I have some doubt about the Backward code:

1. template <typename Dtype>
2. void CConvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
3.       const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
4. 	const Dtype* weightTmp = this->weight_tmp_.cpu_data();  
5. 	const Dtype* weightMask = this->blobs_[2]->cpu_data();
6. 	Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();
7.   for (int i = 0; i < top.size(); ++i) {
8.     const Dtype* top_diff = top[i]->cpu_diff();    
9.     // Bias gradient, if necessary.
10.     if (this->bias_term_ && this->param_propagate_down_[1]) {
11. 			const Dtype* biasMask = this->blobs_[3]->cpu_data();
12.       Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();			
13. 			for (unsigned int k = 0;k < this->blobs_[1]->count(); ++k) {
14. 				bias_diff[k] = bias_diff[k]*biasMask[k];
15. 			}
16.       for (int n = 0; n < this->num_; ++n) {
17.         this->backward_cpu_bias(bias_diff, top_diff + top[i]->offset(n));
18.       }
19.     }
20.     if (this->param_propagate_down_[0] || propagate_down[i]) {
21. 			const Dtype* bottom_data = bottom[i]->cpu_data();
22. 			Dtype* bottom_diff = bottom[i]->mutable_cpu_diff();	
23. 			for (unsigned int k = 0;k < this->blobs_[0]->count(); ++k) {
24. 				weight_diff[k] = weight_diff[k]*weightMask[k];
25. 			}
26.       for (int n = 0; n < this->num_; ++n) {
27.         // gradient w.r.t. weight. Note that we will accumulate diffs.
28.         if (this->param_propagate_down_[0]) {
29.           this->weight_cpu_gemm(bottom_data + bottom[i]->offset(n),
30.               top_diff + top[i]->offset(n), weight_diff);
31.         }
32.         // gradient w.r.t. bottom data, if necessary.
33.         if (propagate_down[i]) {
34.           this->backward_cpu_gemm(top_diff + top[i]->offset(n), weightTmp,
35.               bottom_diff + bottom[i]->offset(n));
36.         }
37.       }
38.     }
39.   }
40. }

To my understanding of caffe, the diff of weight blob is always set to 0 before each iteration. That's to say, weights_diff[k] and bias_diff[k] are always 0 before the backward_cpu_bias and weight_cpu_gemm. So operations of line 14 & line 24 are redundant. What do you really want to do? Does it should be weightTmp instead of weight_diff on line 24?

Thanks very much!

compilation error

Hi,
I got a compilation error as below pasted, anyone knows how can I solve it?

dynamic-network-surgery$ make all -j8
find: ‘examples’: No such file or directory
find: ‘matlab/+caffe/private’: No such file or directory
find: ‘examples’: No such file or directory
find: ‘matlab/’: No such file or directory
find: ‘examples’: No such file or directory
PROTOC src/caffe/proto/caffe.proto
CXX src/caffe/util/db.cpp
CXX src/caffe/util/upgrade_proto.cpp
CXX src/caffe/util/db_leveldb.cpp
CXX src/caffe/util/blocking_queue.cpp
CXX src/caffe/util/benchmark.cpp
CXX src/caffe/util/insert_splits.cpp
CXX src/caffe/util/math_functions.cpp
CXX src/caffe/util/cudnn.cpp
CXX src/caffe/util/io.cpp
CXX src/caffe/util/hdf5.cpp
CXX src/caffe/util/db_lmdb.cpp
CXX src/caffe/util/im2col.cpp
CXX src/caffe/parallel.cpp
CXX src/caffe/internal_thread.cpp
CXX src/caffe/net.cpp
CXX src/caffe/layer.cpp
CXX src/caffe/layer_factory.cpp
CXX src/caffe/common.cpp
CXX src/caffe/syncedmem.cpp
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetCConvolutionLayer(const caffe::LayerParameter&)’:
src/caffe/layer_factory.cpp:57:24: error: ‘CConvolutionParameter_Engine_CUDNN’ was not declared in this scope
} else if (engine == CConvolutionParameter_Engine_CUDNN) {
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetTanHLayer(const caffe::LayerParameter&) [with Dtype = double]’:
src/caffe/layer_factory.cpp:187:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetTanHLayer(const caffe::LayerParameter&) [with Dtype = float]’:
src/caffe/layer_factory.cpp:187:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetSoftmaxLayer(const caffe::LayerParameter&) [with Dtype = double]’:
src/caffe/layer_factory.cpp:164:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetSoftmaxLayer(const caffe::LayerParameter&) [with Dtype = float]’:
src/caffe/layer_factory.cpp:164:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetSigmoidLayer(const caffe::LayerParameter&) [with Dtype = double]’:
src/caffe/layer_factory.cpp:141:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetSigmoidLayer(const caffe::LayerParameter&) [with Dtype = float]’:
src/caffe/layer_factory.cpp:141:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetReLULayer(const caffe::LayerParameter&) [with Dtype = double]’:
src/caffe/layer_factory.cpp:118:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetReLULayer(const caffe::LayerParameter&) [with Dtype = float]’:
src/caffe/layer_factory.cpp:118:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetPoolingLayer(const caffe::LayerParameter&) [with Dtype = double]’:
src/caffe/layer_factory.cpp:95:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetPoolingLayer(const caffe::LayerParameter&) [with Dtype = float]’:
src/caffe/layer_factory.cpp:95:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetConvolutionLayer(const caffe::LayerParameter&) [with Dtype = double]’:
src/caffe/layer_factory.cpp:39:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
src/caffe/layer_factory.cpp: In function ‘boost::shared_ptr<caffe::Layer > caffe::GetConvolutionLayer(const caffe::LayerParameter&) [with Dtype = float]’:
src/caffe/layer_factory.cpp:39:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
Makefile:519: recipe for target '.build_release/src/caffe/layer_factory.o' failed
make: *** [.build_release/src/caffe/layer_factory.o] Error 1
make: *** Waiting for unfinished jobs....