It is really nice to have comparison of INT8 Performance - without Pruning vs with Pru

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Ah makes a lot of sense, thanks <a class="user-mention notranslate" data-hovercard-typ

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

INT8 Performance without Pruning about sparseml HOT 4 CLOSED

neuralmagic commented on May 22, 2024

INT8 Performance without Pruning

from sparseml.

Comments (4)

markurtz commented on May 22, 2024

Hi @ajithAI, thank you for the feedback! We are actively working on filling in more info for comparisons within all of the optimization categories and will begin filling in more over the coming weeks. Right now it's a general rule that pruning will give roughly 2X more performance over the top of quantization, at least that's what we aim for internally when creating the models and the engine. For the sparsity levels, they're all around 80% for the high-performance models on ResNet-50. We'll be sure to make this information more accessible in the future for the blogs and tutorials!

Additionally would love to hear any feedback on the new model pages we're rolling out as we'll be doing one for ResNet-50 soon, we just launched the YOLOv3 one here, so please let us know what additional information would be important for you on that page.

We have a new UI coming out for the SparseZoo in the next few weeks that will make all of these comparisons easier and list out the level of pruning for each model. Let us know if you'd like to be an alpha tester on that as will be making an announcement in our Slack and Discord communities before pushing publicly!

Can you explain a bit more what you mean by specifying the constraint on the pruning ratio?

from sparseml.

ajithAI commented on May 22, 2024

Hi @markurtz, Thanks for your explanation. So, Un-pruned ResNet50 Model gives over 1,000 FPS Throughput ( which is great, when comparing with Nvidia-T4 Throughput of 5,563 FPS Performance ). And taking the advantage of Pruning, achieveing 2090 FPS on CPU is pretty amazing. Thanks for the information !!

Regarding the YOLOv3 example, I will try to go with the entire flow and will let you know my experience.

I can have a look on SparseZoo UI, but, I am not sure how deep can I go into, becasue of my current bandwidth.

Pruning Ratio Constrain : Nvidia can sparse only 50% of model with their latest Ampere family. They cant sparse less, they cant sparse more. There are constrains how much we can sparse based on accelerators. And in addition to this, in some usecases, even 10% drop in accuracy is bearable. In cases like that, where throughput is real interest, can we prune model beyond limits, say, I need a model with 90% Pruning where any accuracy loss is fine. Here, in this case, I have constrain on Pruning ratio. In Neural Magic application, can we specify the ( min, max ) pruning ratios, or traget for desired Throughput, say, I need 5000 FPS and any extend of pruning is fine. Something like that.

And is there any paper that I can read about the method of Neural Magic pruning. Just thinking broad on how can Neural Magic Pruning methodologies can be applied on to FPGA Accelerators, where we can program at a hardware levels.

from sparseml.

markurtz commented on May 22, 2024

Ah makes a lot of sense, thanks @ajithAI!

For the pruning ratio, yes, you're free to specify even more sparsity by editing the recipes we have or creating one from scratch. All of the recipes are set up to have sectioned sparsity variables at the top of the recipes, increasing these will give the result you desire. The DeepSparse engine generally has an exponential relationship with sparsity and performance provided everything is compute-bound. If layers are memory bound, such as with depthwise convolutions, then sparsity won't give much speedup. This is some of the core technology that we're working on improving though -- executing more of the networks depthwise to make the model more compute-bound.

Our pruning methodologies follow gradual magnitude pruning as we have found this to be the most consistent and give the best results. The one caveat is that it takes more training time as compared to other methods. Song Han's 2015 paper is probably the best to go through for this: https://arxiv.org/abs/1506.02626

from sparseml.

jeanniefinks commented on May 22, 2024

Hello @ajithAI
As there has been no further commentary, I am going to go ahead and close this thread out. But if you have more comments, please re-open and we'd love to chat. Lastly, if you have not starred our sparseml repo already, and you feel inclined, please do! Thank you in advance for your support! https://github.com/neuralmagic/sparseml/

Best, Jeannie / Neural Magic

from sparseml.

INT8 Performance without Pruning about sparseml HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent