Comments (4)
Hi @ajithAI, thank you for the feedback! We are actively working on filling in more info for comparisons within all of the optimization categories and will begin filling in more over the coming weeks. Right now it's a general rule that pruning will give roughly 2X more performance over the top of quantization, at least that's what we aim for internally when creating the models and the engine. For the sparsity levels, they're all around 80% for the high-performance models on ResNet-50. We'll be sure to make this information more accessible in the future for the blogs and tutorials!
Additionally would love to hear any feedback on the new model pages we're rolling out as we'll be doing one for ResNet-50 soon, we just launched the YOLOv3 one here, so please let us know what additional information would be important for you on that page.
We have a new UI coming out for the SparseZoo in the next few weeks that will make all of these comparisons easier and list out the level of pruning for each model. Let us know if you'd like to be an alpha tester on that as will be making an announcement in our Slack and Discord communities before pushing publicly!
Can you explain a bit more what you mean by specifying the constraint on the pruning ratio?
from sparseml.
Hi @markurtz, Thanks for your explanation. So, Un-pruned ResNet50 Model gives over 1,000 FPS Throughput ( which is great, when comparing with Nvidia-T4 Throughput of 5,563 FPS Performance ). And taking the advantage of Pruning, achieveing 2090 FPS on CPU is pretty amazing. Thanks for the information !!
Regarding the YOLOv3 example, I will try to go with the entire flow and will let you know my experience.
I can have a look on SparseZoo UI, but, I am not sure how deep can I go into, becasue of my current bandwidth.
Pruning Ratio Constrain : Nvidia can sparse only 50% of model with their latest Ampere family. They cant sparse less, they cant sparse more. There are constrains how much we can sparse based on accelerators. And in addition to this, in some usecases, even 10% drop in accuracy is bearable. In cases like that, where throughput is real interest, can we prune model beyond limits, say, I need a model with 90% Pruning where any accuracy loss is fine. Here, in this case, I have constrain on Pruning ratio. In Neural Magic application, can we specify the ( min, max ) pruning ratios, or traget for desired Throughput, say, I need 5000 FPS and any extend of pruning is fine. Something like that.
And is there any paper that I can read about the method of Neural Magic pruning. Just thinking broad on how can Neural Magic Pruning methodologies can be applied on to FPGA Accelerators, where we can program at a hardware levels.
from sparseml.
Ah makes a lot of sense, thanks @ajithAI!
For the pruning ratio, yes, you're free to specify even more sparsity by editing the recipes we have or creating one from scratch. All of the recipes are set up to have sectioned sparsity variables at the top of the recipes, increasing these will give the result you desire. The DeepSparse engine generally has an exponential relationship with sparsity and performance provided everything is compute-bound. If layers are memory bound, such as with depthwise convolutions, then sparsity won't give much speedup. This is some of the core technology that we're working on improving though -- executing more of the networks depthwise to make the model more compute-bound.
Our pruning methodologies follow gradual magnitude pruning as we have found this to be the most consistent and give the best results. The one caveat is that it takes more training time as compared to other methods. Song Han's 2015 paper is probably the best to go through for this: https://arxiv.org/abs/1506.02626
from sparseml.
Hello @ajithAI
As there has been no further commentary, I am going to go ahead and close this thread out. But if you have more comments, please re-open and we'd love to chat. Lastly, if you have not starred our sparseml repo already, and you feel inclined, please do! Thank you in advance for your support! https://github.com/neuralmagic/sparseml/
Best, Jeannie / Neural Magic
from sparseml.
Related Issues (20)
- Got error on YOLOv8n `sparseml.ultralytics.train` train starting HOT 2
- Llama 2 sparsity support HOT 1
- My own model HOT 1
- PyTorch 2.1.0 and Lightning 2.1.0 Support: AssertionError on `assert self._strategy is not None` HOT 13
- Question on quantization size HOT 2
- Add ScheduledModifierManager.from_str HOT 1
- Adding a `.pre-commit-config.yaml` file for maintaining consistent style and code quality. HOT 3
- Oriented Bounding Box support HOT 1
- Sparse ML not working for Transformers HOT 3
- Models with loops in their graph can't be converted to DeepSparse after QAT HOT 4
- RecursionError when converting LlaMa model to ONNX HOT 6
- Error converting mistral to onnx HOT 13
- SparseML/YOLOv5s - ValueError: Unable to find any modifiers in given recipe. HOT 1
- Feature Request: Oriented Bounding Box Sparsification for YOLOv5/YOLOv8 on Custom Models/Datasets HOT 1
- [Roadmap] SparseML Roadmap Q1 2024 HOT 1
- Regarding the execution speed and model size after Sparsifying ResNet-50 HOT 2
- Class Index change observed when validating a yolov5 pruned sparseml model HOT 2
- yolov5 sparse fine tuning error HOT 2
- [Roadmap] SparseML Roadmap Q2 2024
- Does Sparseml support Integer-Arithmetic-Only Inference? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparseml.