Giter Club home page Giter Club logo

Comments (2)

mgoin avatar mgoin commented on May 12, 2024 1

Hi @jz-exwzd thanks for opening this issue and being patient.

I was able to run your notebook on my system:

{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 25952256, 'architecture': 'x86_64', 'available_cores_per_socket': 18, 'available_num_cores': 18, 'available_num_hw_threads': 36, 'available_num_numa': 1, 'available_num_sockets': 1, 'available_sockets': 1, 'available_threads_per_core': 2, 'cores_per_socket': 18, 'isa': 'avx512', 'num_cores': 18, 'num_hw_threads': 36, 'num_numa': 1, 'num_sockets': 1, 'threads_per_core': 2, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz', 'vnni': False}

Using your setup I was able to see similar performance to what you reported and what I saw in the saved outputs of the notebook cells. We have some tuning work to do for these "edge-case" batch sizes like 15 and 17 but are a little unsure of how to improve general performance for low sparsity models.

One note on the engine benchmarking in the notebook is I noticed there are not many iterations being measured for each scenario and since each inference is single-digit milliseconds there is a lot of jitter. I recommend running benchmarks for a few seconds to get an accurate measurement i.e.
engine.benchmark(inputs, num_iterations=200, num_warmup_iterations=100)

Answers to your direct questions:

  1. In the attached pruning recipe the final_sparsity is set to 60%, which is quite low compared to the usual 80-90% models we produce on the SparseZoo, sometimes even with quantization as well. The simple answer here is the models we push are much more sparse than 60% and so the performance difference is larger. While it is model specific in how effective sparsity can be, for instance 60% sparsity is more meaningful on BERT than ResNet, we can say generally the less compute-bound a model is the less effective sparsity will be. In addition to the sparsity I saw the image size to the model is quite small at 3x32x32 which also makes it difficult to find the space for speedup when there isn't that much compute to remove.

  2. The deepsparse engine does have different sets of algorithms that activate for identified structures, especially in CNN models like ResNet and MobileNet. The specific behavior you mentioned is likely tied to the batch size being divisible by 16. There is tuning that needs to happen for these batch size 15 and 17 cases since they use a different approach. Also for multi-socket systems this can get into even more edge cases to have evenly divisible batch sizes to ensure work is distributed evenly. Uniformly increasing throughput as batch size increases is the hope but unfortunately modern systems are quite heterogenous so this is difficult to achieve. We will work harder on this.

Hope this was of help and thanks again for the detailed report. Let me know if you have more questions.

  • Michael

from deepsparse.

jz-exwzd avatar jz-exwzd commented on May 12, 2024 1

Hi Michael,

Thank you for your detailed reply. It is very informative. I am glad that I reached out to the team about this issue.

I generally agree on the replies to both questions, especially about the first one. Basically there needs to be sufficient redundancy in the model for the pruning to actually exploit it and subsequently achieve speed up using DeepSparse. I guess there is not much meaning in pruning a not so complex model.

Thank you once again and keep up the the good work.

Best regards,
Chai Jiazheng

from deepsparse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.