csyhhu / l-obs Goto Github PK

View Code? Open in Web Editor NEW

74.0 74.0 18.0 45.11 MB

Codes for Layer-wise Optimal Brain Surgeon

License: MIT License

Python 100.00%

l-obs's People

Contributors

Stargazers

Watchers

Forkers

xindongol zhujiahui net-titech idiosyncraticdragon henryhuang329 pascalinn rahimentezari mriberodiaz chamboin jawaechan vulwsztyn justcode288 duyongqi finamintoastcrunch msreina25 inkwanhwang

l-obs's Issues

Some confuses about the repository

I faced some problems and I hope you could give me help after I studied the paper and the codes.

I have run the lenet300-100/LOBS.py several times and the compression ratio always equals to zero even I deleted the generated files except original parameters. It's worth to you that I just modify your code a little.
Could you give a list of the version of packages which your code using. I find the lenet300-100/LOBS.py is likely using python3 and the ResNet 50 using python2. Morely I want to know the other packages version such as numpy, tensorflow and so on.
Appreciate for you reply!
BaiYin
1-27-2018

Hi BaiYin,

Thanks for using the codes. For the first question, let me have a check. I will update to you asap. For the second question, I am using python2, both in lenet300-100 and ResNet. The only difference maybe whether there is a () after 'print'. I think the update-to-date numpy and tensorflow is fine. If anything things about function deprecated, pls let me know I will update my code.

Shangyu,
1-27-2018

Hi BaiYin,

Sorry for my careless. I think I have solved the problem that compression ratios are all zeros, it is a minor error of using bracket. I have update the lenet300-100/L-OBS.py, pls sync it and you should see the difference. Also you could check your generated prunned weights and calculate the compression ratio in stead of just watch my print info.
Pls let me know if there is any other problems~
Thanks again for using our code. I will refine the code in Feb using PyTorch. If you are interested, pls notice~

Shangyu,
1-27-2018

bug in prune-weights-ResNet18.py?

Hi,

I was looking into the ResNet18 pytorch code, and noticed the following sequence:

	if layer_name == 'fc':
		layer_type = 'F'
	else:
		layer_type = 'R'

	if layer_type == 'C':
                [...]

Unless I'm wrong, this means that none of the layers will be treated as Conv for this code.
Is this behaviour intended?

Thanks,
Dan

Tensorflow-Prune on GPU

Is there a way to run your tensorflow code for Lenet 300-100 on GPU? like as you did for computing hessian for Alexnet on GPU. Pruning on CPU takes very long time.

Where you use the tolerable error threshold?

Some questions about re-implementing L-OBS on Lenet5

Hi,

Thank you very much for sharing your code.

I am trying to re-implementing your L-OBS algorithm for the purpose of learning. I have successfully used this algorithm in a fully connected neural network. However, I encountered some problems when applying this algorithm to cnn (lenet-5 here). Thus, I try to find some solutions in your code.

I found that the lenet-5 model in your code (in the dev branch) is different from the general one. I would like to ask, is this the model used for pruning in the paper, or is it just an example. And, if it just an example, could you please give me some help in implementing the L-OBS algorithm on lenet-5 (mainly feature map special combination problem).

Hope to get your help. I look forward to hearing from you soon.

Best wishes,
Hui

ValueError: GraphDef cannot be larger than 2GB.

Thank you very much for sharing the code publicly.

I tweaked the code to run it on GPU without requiring much memory. (can send a PR)
When I ran calculate_hessian_inverse.py I get the following error

ValueError: GraphDef cannot be larger than 2GB.

I could create the hessian inverse matrices of ResNet-50 by running calculate_hessian_inverse.py several times for the remaining layers.

Have you encountered this issue? I could not figure out why the graph keeps growing.

Difficulty in reproducing NIPS'17 results for AlexNet using PyTorch code

Hi Shangyu,

Thanks a lot for sharing PyTorch code for applying LOBS on various ImageNet CNNs.

I could run the code perfectly after a couple of minor error/syntax corrections required due to Python version differences (2.xx vs 3.xx).

I kept almost all the default settings. I pruned AlexNet successfully and validated it on entire ImageNet validation set. However, I could not reproduce the numbers that were published in your NIP'17 paper for AlexNet.

In NIP'17 paper, for 11% CR, AlexNet achieves top1 error of 50.04% and top5 error of 26.87% without retraining. However, when I ran your PyTorch code, the resulting AlexNet only achieved top1 error of 70.37% and top5 error of 45.97%, without retraining. These error rates are much higher than the numbers reported in paper. Kindly find below the terminal output after running validate-AlexNet.py script:

`[adpatil2@csl-420-07 ImageNet]$ python validate-AlexNet.py

Overall compression rate (nnz/total): 0.127041
==> Preparing data..
/data/L-OBS/PyTorch/ImageNet/utils.py:135: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
input_var = torch.autograd.Variable(input, volatile=True)
/data/L-OBS/PyTorch/ImageNet/utils.py:136: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
target_var = torch.autograd.Variable(target, volatile=True)
/data/L-OBS/PyTorch/ImageNet/utils.py:144: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
losses.update(loss.data[0], input.size(0))
Test: [0/1000] Loss 4.3182 (4.3182) Prec@1 38.000 (38.000) Prec@5 60.000 (60.000)
Test: [200/1000] Loss 4.3394 (4.2516) Prec@1 24.000 (29.144) Prec@5 46.000 (53.652)
Test: [400/1000] Loss 4.3978 (4.2476) Prec@1 30.000 (29.631) Prec@5 54.000 (54.354)
Test: [600/1000] Loss 4.5474 (4.2397) Prec@1 18.000 (29.601) Prec@5 46.000 (54.326)
Test: [800/1000] Loss 4.6500 (4.2459) Prec@1 18.000 (29.630) Prec@5 44.000 (54.102)

Prec@1 29.630 Prec@5 54.030
`

Will you please help me understand why we are seeing such difference? Am I running your code incorrectly? Did you carry out any additional finetuning/processing to achieve published NIP'17 results? Is that finetuning code not part of the released PyTorch code here?

Please let me know what you think. I am looking forward to hearing from you.

Ameya

Basic OBS algorithm

@csyhhu
Hello, I am an undergraduate. I need to use the most basic OBS algorithm to write my graduation thesis. Only you have written the code about OBS on the Internet. I would like to see your code changed, but it is a bit abstruse. Do you have any basic OBS algorithm (OBD can also be used)? If there is any, please do not hesitate to comment, my email [email protected] Please forgive me for any interruption. Thank you very much.
Hope to get your help!

Best wishes
fei

Some question about the pytorch implementation

Hi, I modified your code for pruning a ResNet18 model trained on CIFAR10 datasets, I was confused with the function adjust_mean_var in utils.py. In your code you put model in train but no backward operation in this function, what does this code do?

AlexNet - TinyImagenet

First of all thanks for sharing the code.
I used your pytorch code for Alexnet on Tiny ImageNet. When I tried to retrain the network after pruning, I got the NaN loss. This is due to that the output becomes very large in magnitude (+- 10e39). Do you have any idea about this?

Found some error when use L-OBS to prune deep cnn(alexnet, vgg, etc)

Hi Shangyu,
Thanks for sharing the code!
After reading your paper, I do some experiments. I found the code at L-OBS/lenet300-100 do a good jop when prune lenet300-100(some small bug exist, but it is easy to correct it).
But I found an error at L-OBS/Resnet50/calculate_hessian_inverse.py, when calculate the hessian matrix:

def calculate_hessian_fc_tf(layer_inputs):
	a = tf.expand_dims(layer_inputs, axis=-1)
	# print 'a shape: %s' %a.get_shape()
	a = tf.concat([a, tf.ones([tf.shape(a)[0], 1, 1])], axis=1)
	# print 'a shape: %s' %a.get_shape()
	# print 'get_patches_op shape: %s' %get_patches_op.get_shape()
	b = tf.expand_dims(layer_inputs, axis=1)
	b = tf.concat([b, tf.ones([tf.shape(b)[0], 1, 1])], axis=2)
	# print 'b shape: %s' %b.get_shape()
	outprod = tf.multiply(a, b)
	# print 'outprod shape: %s' %outprod.get_shape()
	**return tf.reduce_mean(outprod, axis=0)**#Average hessian matrix  at axis batch_size

My understanding is, before calculate the hessian inverse, hessian matrix should divide by dataset_size which equal to batch_size*num_batch, instead of divide by batch_size. It is that right?
Some error also happen at L-OBS/Resnet50/prune_weights.py

def prune_weights_fc(weights, biases, hessian_inverse, CR):
	n_hidden_1 = int(weights.shape[0])
	n_hidden_2 = int(weights.shape[1])
	gate_w = np.ones([n_hidden_1, n_hidden_2])
	gate_b = np.ones([n_hidden_2])

	sensitivity = np.array([])

	for i in range(n_hidden_2):
		sensitivity = np.hstack(
			(sensitivity, 0.5 * (np.hstack((weights.T[i], biases[i])) ** 2) / np.diag(hessian_inverse)))

	sorted_index = np.argsort(sensitivity)  # Sort from small to big

	# Begin pruning
	n_total = int(n_hidden_1 * n_hidden_2)
	n_total_prune = int(n_hidden_1 * n_hidden_2 * (1 - CR))
	for i in range(n_total_prune):
		prune_index = sorted_index[i]
		x_index = prune_index / (n_hidden_1 + 1)  # next layer num  0----n_hidden_2
		y_index = prune_index % (n_hidden_1 + 1)  # this layer num  0----n_hidden_1

		if y_index == n_hidden_1:  # b
			delta_w = (-biases[x_index] / (hessian_inverse[y_index][y_index])) * hessian_inverse.T[y_index]
			gate_b[x_index] = 0
		else:
			delta_w = (-weights[x_index][y_index] / hessian_inverse[y_index][y_index]) * hessian_inverse.T[y_index]
			gate_w[x_index][y_index] = 0
			weights[x_index] = weights[x_index] + delta_w[0: -1].T
                        '''
                        I think it should be:
                        delta_w = (-weights[y_index][x_index] / hessian_inverse[y_index][y_index]) * hessian_inverse.T[y_index]
			gate_w[y_index][x_index] = 0
			weights.T[x_index] = weights.T[x_index] + delta_w[0: -1]
                        '''
                         


		biases[x_index] = biases[x_index] + delta_w[-1]

		# Watch info
		if i % n_total == 0 and i != 0:
			CR = int(100 - (i / n_total) * 5)
			print '[%s] Now prune to CR: %d' % (datetime.now(), CR)

	weights = weights * gate_w
	biases = biases * gate_b

	if not os.path.exists('pruned_weights/%s/' % layer_name):
		os.mkdir('pruned_weights/%s/' % layer_name)
	np.save('pruned_weights/%s/weights.npy' % (layer_name, CR), weights)
	np.save('pruned_weights/%s/biases.npy' % (layer_name, CR), biases)

After I correct the bugs and modify some code to use gpu to calculate hessian_inverse(After testing，using gpu do not affect the result of hessian_inverse), I prune resnet50 with prune percent 60% and the output of fc layer is all Nan. Some error must happen.
Have you test this code? Or is it a totally toy code? Because I found lots of bugs in the code and it did not work when I use it to prune deep cnn. I have test Resnet50(output Nan), alexnet(precision decline a lot), vgg16(precision decline a lot)
Waiting for your answer.
Best wishes!

Question about dimension of Hessian matrix of conv layers

Hi Shangyu,

Thanks for sharing the code!

I have question about the way Hessian matrix is calculated in "calculate_hessian_conv_tf" function, which is in file calculate_hessian_inverse.py

Consider a convolutional layer consisting of 128 filters, each having dimensions 3x3x64. So, here number of input channels is 64, while number of output channels is 128.

For this layer, Hessian matrix computed by your function "calculate_hessian_conv_tf" has dimensions 64x64. However, following your NIPS2017 paper, I think, the dimensions of the Hessian matrix in this case should be (3x3x64)x(3x3x64) = 576x576. This is because, there are 3x3x64=576 weights in the filter.

Firthermore, when I used the 64x64 dimensional Hessian computed above to compute sensitivities, I get a dimensional mismatch error in function "prune_weights_conv". specifically, it is complaining about following line:

The error is that "row_kernel" has shape (576,) while "diag_hess_inv" has shape (64,). I think, this error is because the Hessian matrix computed here is of dimensions 64x64, instead of 576x576.

Please let me know what I am missing here and how you resolved such mismatch in your ResNet-50 simulations.

Thanks for your help.

Ameya Patil

MNIST after prune precision= MNIST before prune precision

I was running your code for mnist on tensorflow. I commented out computing Hessians and edge cuts (lines 219-248), So I load your precomputed weights after pruning. This gives me 0.9839 as test accuracy.
I was not sure, so I commented out pruned weights and biases(lines 257-266) and replaced them with the original weights and bias BEFORE pruning, i.e. lines 175-185. This gives me exactly the same result. I am not sure whether after pruned wights are correctly loaded? Or something is wrong with computed pruned weights in your folders?

update:
I also replace the after pruned lines(lines 257-266) with this:
weights = {
'fc1': tf.Variable(np.random.rand(784, 300).astype('float32')),
'fc2': tf.Variable(np.random.rand(300, 100).astype('float32')),
'fc3': tf.Variable(np.random.rand(100, 10).astype('float32'))
}
biases = {
'fc1': tf.Variable(np.random.rand(300).astype('float32')),
'fc2': tf.Variable(np.random.rand(100).astype('float32')),
'fc3': tf.Variable(np.random.rand(10).astype('float32'))
}
it is also 0.9836

ValueError: GraphDef cannot be larger than 2GB.

Hi Shangyu

When I tried your code to calculate Hessian matrix I got the error ValueError: GraphDef cannot be larger than 2GB.

How can I solve the problem?

thanks

Setya W.P

csyhhu / l-obs Goto Github PK

l-obs's People

Contributors

Stargazers

Watchers

Forkers

l-obs's Issues

Recommend Projects

Recommend Topics

Recommend Org