csyhhu / l-obs Goto Github PK
View Code? Open in Web Editor NEWCodes for Layer-wise Optimal Brain Surgeon
License: MIT License
Codes for Layer-wise Optimal Brain Surgeon
License: MIT License
I faced some problems and I hope you could give me help after I studied the paper and the codes.
Hi BaiYin,
Thanks for using the codes. For the first question, let me have a check. I will update to you asap. For the second question, I am using python2, both in lenet300-100 and ResNet. The only difference maybe whether there is a () after 'print'. I think the update-to-date numpy and tensorflow is fine. If anything things about function deprecated, pls let me know I will update my code.
Shangyu,
1-27-2018
Hi BaiYin,
Sorry for my careless. I think I have solved the problem that compression ratios are all zeros, it is a minor error of using bracket. I have update the lenet300-100/L-OBS.py, pls sync it and you should see the difference. Also you could check your generated prunned weights and calculate the compression ratio in stead of just watch my print info.
Pls let me know if there is any other problems~
Thanks again for using our code. I will refine the code in Feb using PyTorch. If you are interested, pls notice~
Shangyu,
1-27-2018
Hi,
I was looking into the ResNet18 pytorch code, and noticed the following sequence:
if layer_name == 'fc':
layer_type = 'F'
else:
layer_type = 'R'
if layer_type == 'C':
[...]
Unless I'm wrong, this means that none of the layers will be treated as Conv for this code.
Is this behaviour intended?
Thanks,
Dan
Is there a way to run your tensorflow code for Lenet 300-100 on GPU? like as you did for computing hessian for Alexnet on GPU. Pruning on CPU takes very long time.
Hi,
Thank you very much for sharing your code.
I am trying to re-implementing your L-OBS algorithm for the purpose of learning. I have successfully used this algorithm in a fully connected neural network. However, I encountered some problems when applying this algorithm to cnn (lenet-5 here). Thus, I try to find some solutions in your code.
I found that the lenet-5 model in your code (in the dev branch) is different from the general one. I would like to ask, is this the model used for pruning in the paper, or is it just an example. And, if it just an example, could you please give me some help in implementing the L-OBS algorithm on lenet-5 (mainly feature map special combination problem).
Hope to get your help. I look forward to hearing from you soon.
Best wishes,
Hui
Thank you very much for sharing the code publicly.
I tweaked the code to run it on GPU without requiring much memory. (can send a PR)
When I ran calculate_hessian_inverse.py I get the following error
ValueError: GraphDef cannot be larger than 2GB.
I could create the hessian inverse matrices of ResNet-50 by running calculate_hessian_inverse.py several times for the remaining layers.
Have you encountered this issue? I could not figure out why the graph keeps growing.
Hi Shangyu,
Thanks a lot for sharing PyTorch code for applying LOBS on various ImageNet CNNs.
I could run the code perfectly after a couple of minor error/syntax corrections required due to Python version differences (2.xx vs 3.xx).
I kept almost all the default settings. I pruned AlexNet successfully and validated it on entire ImageNet validation set. However, I could not reproduce the numbers that were published in your NIP'17 paper for AlexNet.
In NIP'17 paper, for 11% CR, AlexNet achieves top1 error of 50.04% and top5 error of 26.87% without retraining. However, when I ran your PyTorch code, the resulting AlexNet only achieved top1 error of 70.37% and top5 error of 45.97%, without retraining. These error rates are much higher than the numbers reported in paper. Kindly find below the terminal output after running validate-AlexNet.py script:
`[adpatil2@csl-420-07 ImageNet]$ python validate-AlexNet.py
Overall compression rate (nnz/total): 0.127041
==> Preparing data..
/data/L-OBS/PyTorch/ImageNet/utils.py:135: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
input_var = torch.autograd.Variable(input, volatile=True)
/data/L-OBS/PyTorch/ImageNet/utils.py:136: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
target_var = torch.autograd.Variable(target, volatile=True)
/data/L-OBS/PyTorch/ImageNet/utils.py:144: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
losses.update(loss.data[0], input.size(0))
Test: [0/1000] Loss 4.3182 (4.3182) Prec@1 38.000 (38.000) Prec@5 60.000 (60.000)
Test: [200/1000] Loss 4.3394 (4.2516) Prec@1 24.000 (29.144) Prec@5 46.000 (53.652)
Test: [400/1000] Loss 4.3978 (4.2476) Prec@1 30.000 (29.631) Prec@5 54.000 (54.354)
Test: [600/1000] Loss 4.5474 (4.2397) Prec@1 18.000 (29.601) Prec@5 46.000 (54.326)
Test: [800/1000] Loss 4.6500 (4.2459) Prec@1 18.000 (29.630) Prec@5 44.000 (54.102)
Will you please help me understand why we are seeing such difference? Am I running your code incorrectly? Did you carry out any additional finetuning/processing to achieve published NIP'17 results? Is that finetuning code not part of the released PyTorch code here?
Please let me know what you think. I am looking forward to hearing from you.
Ameya
@csyhhu
Hello, I am an undergraduate. I need to use the most basic OBS algorithm to write my graduation thesis. Only you have written the code about OBS on the Internet. I would like to see your code changed, but it is a bit abstruse. Do you have any basic OBS algorithm (OBD can also be used)? If there is any, please do not hesitate to comment, my email [email protected] Please forgive me for any interruption. Thank you very much.
Hope to get your help!
Best wishes
fei
Hi, I modified your code for pruning a ResNet18 model trained on CIFAR10 datasets, I was confused with the function adjust_mean_var
in utils.py
. In your code you put model in train but no backward operation in this function, what does this code do?
First of all thanks for sharing the code.
I used your pytorch code for Alexnet on Tiny ImageNet. When I tried to retrain the network after pruning, I got the NaN loss. This is due to that the output becomes very large in magnitude (+- 10e39). Do you have any idea about this?
Hi Shangyu,
Thanks for sharing the code!
After reading your paper, I do some experiments. I found the code at L-OBS/lenet300-100 do a good jop when prune lenet300-100(some small bug exist, but it is easy to correct it).
But I found an error at L-OBS/Resnet50/calculate_hessian_inverse.py, when calculate the hessian matrix:
def calculate_hessian_fc_tf(layer_inputs):
a = tf.expand_dims(layer_inputs, axis=-1)
# print 'a shape: %s' %a.get_shape()
a = tf.concat([a, tf.ones([tf.shape(a)[0], 1, 1])], axis=1)
# print 'a shape: %s' %a.get_shape()
# print 'get_patches_op shape: %s' %get_patches_op.get_shape()
b = tf.expand_dims(layer_inputs, axis=1)
b = tf.concat([b, tf.ones([tf.shape(b)[0], 1, 1])], axis=2)
# print 'b shape: %s' %b.get_shape()
outprod = tf.multiply(a, b)
# print 'outprod shape: %s' %outprod.get_shape()
**return tf.reduce_mean(outprod, axis=0)**#Average hessian matrix at axis batch_size
My understanding is, before calculate the hessian inverse, hessian matrix should divide by dataset_size which equal to batch_size*num_batch, instead of divide by batch_size. It is that right?
Some error also happen at L-OBS/Resnet50/prune_weights.py
def prune_weights_fc(weights, biases, hessian_inverse, CR):
n_hidden_1 = int(weights.shape[0])
n_hidden_2 = int(weights.shape[1])
gate_w = np.ones([n_hidden_1, n_hidden_2])
gate_b = np.ones([n_hidden_2])
sensitivity = np.array([])
for i in range(n_hidden_2):
sensitivity = np.hstack(
(sensitivity, 0.5 * (np.hstack((weights.T[i], biases[i])) ** 2) / np.diag(hessian_inverse)))
sorted_index = np.argsort(sensitivity) # Sort from small to big
# Begin pruning
n_total = int(n_hidden_1 * n_hidden_2)
n_total_prune = int(n_hidden_1 * n_hidden_2 * (1 - CR))
for i in range(n_total_prune):
prune_index = sorted_index[i]
x_index = prune_index / (n_hidden_1 + 1) # next layer num 0----n_hidden_2
y_index = prune_index % (n_hidden_1 + 1) # this layer num 0----n_hidden_1
if y_index == n_hidden_1: # b
delta_w = (-biases[x_index] / (hessian_inverse[y_index][y_index])) * hessian_inverse.T[y_index]
gate_b[x_index] = 0
else:
delta_w = (-weights[x_index][y_index] / hessian_inverse[y_index][y_index]) * hessian_inverse.T[y_index]
gate_w[x_index][y_index] = 0
weights[x_index] = weights[x_index] + delta_w[0: -1].T
'''
I think it should be:
delta_w = (-weights[y_index][x_index] / hessian_inverse[y_index][y_index]) * hessian_inverse.T[y_index]
gate_w[y_index][x_index] = 0
weights.T[x_index] = weights.T[x_index] + delta_w[0: -1]
'''
biases[x_index] = biases[x_index] + delta_w[-1]
# Watch info
if i % n_total == 0 and i != 0:
CR = int(100 - (i / n_total) * 5)
print '[%s] Now prune to CR: %d' % (datetime.now(), CR)
weights = weights * gate_w
biases = biases * gate_b
if not os.path.exists('pruned_weights/%s/' % layer_name):
os.mkdir('pruned_weights/%s/' % layer_name)
np.save('pruned_weights/%s/weights.npy' % (layer_name, CR), weights)
np.save('pruned_weights/%s/biases.npy' % (layer_name, CR), biases)
After I correct the bugs and modify some code to use gpu to calculate hessian_inverse(After testing,using gpu do not affect the result of hessian_inverse), I prune resnet50 with prune percent 60% and the output of fc layer is all Nan. Some error must happen.
Have you test this code? Or is it a totally toy code? Because I found lots of bugs in the code and it did not work when I use it to prune deep cnn. I have test Resnet50(output Nan), alexnet(precision decline a lot), vgg16(precision decline a lot)
Waiting for your answer.
Best wishes!
Hi Shangyu,
Thanks for sharing the code!
I have question about the way Hessian matrix is calculated in "calculate_hessian_conv_tf" function, which is in file calculate_hessian_inverse.py
Consider a convolutional layer consisting of 128 filters, each having dimensions 3x3x64. So, here number of input channels is 64, while number of output channels is 128.
For this layer, Hessian matrix computed by your function "calculate_hessian_conv_tf" has dimensions 64x64. However, following your NIPS2017 paper, I think, the dimensions of the Hessian matrix in this case should be (3x3x64)x(3x3x64) = 576x576. This is because, there are 3x3x64=576 weights in the filter.
Firthermore, when I used the 64x64 dimensional Hessian computed above to compute sensitivities, I get a dimensional mismatch error in function "prune_weights_conv". specifically, it is complaining about following line:
<sensitivity = (0.5 * (row_kernel ** 2) ) / diag_hess_inv #I am ignoring biases here >
The error is that "row_kernel" has shape (576,) while "diag_hess_inv" has shape (64,). I think, this error is because the Hessian matrix computed here is of dimensions 64x64, instead of 576x576.
Please let me know what I am missing here and how you resolved such mismatch in your ResNet-50 simulations.
Thanks for your help.
Ameya Patil
I was running your code for mnist on tensorflow. I commented out computing Hessians and edge cuts (lines 219-248), So I load your precomputed weights after pruning. This gives me 0.9839 as test accuracy.
I was not sure, so I commented out pruned weights and biases(lines 257-266) and replaced them with the original weights and bias BEFORE pruning, i.e. lines 175-185. This gives me exactly the same result. I am not sure whether after pruned wights are correctly loaded? Or something is wrong with computed pruned weights in your folders?
update:
I also replace the after pruned lines(lines 257-266) with this:
weights = {
'fc1': tf.Variable(np.random.rand(784, 300).astype('float32')),
'fc2': tf.Variable(np.random.rand(300, 100).astype('float32')),
'fc3': tf.Variable(np.random.rand(100, 10).astype('float32'))
}
biases = {
'fc1': tf.Variable(np.random.rand(300).astype('float32')),
'fc2': tf.Variable(np.random.rand(100).astype('float32')),
'fc3': tf.Variable(np.random.rand(10).astype('float32'))
}
it is also 0.9836
Hi Shangyu
When I tried your code to calculate Hessian matrix I got the error ValueError: GraphDef cannot be larger than 2GB.
How can I solve the problem?
thanks
Setya W.P
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.