Giter Club home page Giter Club logo

Comments (4)

markurtz avatar markurtz commented on May 22, 2024

Hi @ajithAI, thank you for the feedback! We are actively working on filling in more info for comparisons within all of the optimization categories and will begin filling in more over the coming weeks. Right now it's a general rule that pruning will give roughly 2X more performance over the top of quantization, at least that's what we aim for internally when creating the models and the engine. For the sparsity levels, they're all around 80% for the high-performance models on ResNet-50. We'll be sure to make this information more accessible in the future for the blogs and tutorials!

Additionally would love to hear any feedback on the new model pages we're rolling out as we'll be doing one for ResNet-50 soon, we just launched the YOLOv3 one here, so please let us know what additional information would be important for you on that page.

We have a new UI coming out for the SparseZoo in the next few weeks that will make all of these comparisons easier and list out the level of pruning for each model. Let us know if you'd like to be an alpha tester on that as will be making an announcement in our Slack and Discord communities before pushing publicly!

Can you explain a bit more what you mean by specifying the constraint on the pruning ratio?

from sparseml.

ajithAI avatar ajithAI commented on May 22, 2024

Hi @markurtz, Thanks for your explanation. So, Un-pruned ResNet50 Model gives over 1,000 FPS Throughput ( which is great, when comparing with Nvidia-T4 Throughput of 5,563 FPS Performance ). And taking the advantage of Pruning, achieveing 2090 FPS on CPU is pretty amazing. Thanks for the information !!

Regarding the YOLOv3 example, I will try to go with the entire flow and will let you know my experience.

I can have a look on SparseZoo UI, but, I am not sure how deep can I go into, becasue of my current bandwidth.

Pruning Ratio Constrain : Nvidia can sparse only 50% of model with their latest Ampere family. They cant sparse less, they cant sparse more. There are constrains how much we can sparse based on accelerators. And in addition to this, in some usecases, even 10% drop in accuracy is bearable. In cases like that, where throughput is real interest, can we prune model beyond limits, say, I need a model with 90% Pruning where any accuracy loss is fine. Here, in this case, I have constrain on Pruning ratio. In Neural Magic application, can we specify the ( min, max ) pruning ratios, or traget for desired Throughput, say, I need 5000 FPS and any extend of pruning is fine. Something like that.

And is there any paper that I can read about the method of Neural Magic pruning. Just thinking broad on how can Neural Magic Pruning methodologies can be applied on to FPGA Accelerators, where we can program at a hardware levels.

from sparseml.

markurtz avatar markurtz commented on May 22, 2024

Ah makes a lot of sense, thanks @ajithAI!

For the pruning ratio, yes, you're free to specify even more sparsity by editing the recipes we have or creating one from scratch. All of the recipes are set up to have sectioned sparsity variables at the top of the recipes, increasing these will give the result you desire. The DeepSparse engine generally has an exponential relationship with sparsity and performance provided everything is compute-bound. If layers are memory bound, such as with depthwise convolutions, then sparsity won't give much speedup. This is some of the core technology that we're working on improving though -- executing more of the networks depthwise to make the model more compute-bound.

Our pruning methodologies follow gradual magnitude pruning as we have found this to be the most consistent and give the best results. The one caveat is that it takes more training time as compared to other methods. Song Han's 2015 paper is probably the best to go through for this: https://arxiv.org/abs/1506.02626

from sparseml.

jeanniefinks avatar jeanniefinks commented on May 22, 2024

Hello @ajithAI
As there has been no further commentary, I am going to go ahead and close this thread out. But if you have more comments, please re-open and we'd love to chat. Lastly, if you have not starred our sparseml repo already, and you feel inclined, please do! Thank you in advance for your support! https://github.com/neuralmagic/sparseml/

Best, Jeannie / Neural Magic

from sparseml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.