Describe the bug Hi, I have been experimenting with pruning with

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

pruned models run faster than unpruned models only when batch size is of certain size (2^n) about deepsparse HOT 2 CLOSED

neuralmagic commented on May 12, 2024

pruned models run faster than unpruned models only when batch size is of certain size (2^n)

from deepsparse.

Comments (2)

mgoin commented on May 12, 2024 1

Hi @jz-exwzd thanks for opening this issue and being patient.

I was able to run your notebook on my system:

{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 25952256, 'architecture': 'x86_64', 'available_cores_per_socket': 18, 'available_num_cores': 18, 'available_num_hw_threads': 36, 'available_num_numa': 1, 'available_num_sockets': 1, 'available_sockets': 1, 'available_threads_per_core': 2, 'cores_per_socket': 18, 'isa': 'avx512', 'num_cores': 18, 'num_hw_threads': 36, 'num_numa': 1, 'num_sockets': 1, 'threads_per_core': 2, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz', 'vnni': False}

Using your setup I was able to see similar performance to what you reported and what I saw in the saved outputs of the notebook cells. We have some tuning work to do for these "edge-case" batch sizes like 15 and 17 but are a little unsure of how to improve general performance for low sparsity models.

One note on the engine benchmarking in the notebook is I noticed there are not many iterations being measured for each scenario and since each inference is single-digit milliseconds there is a lot of jitter. I recommend running benchmarks for a few seconds to get an accurate measurement i.e.
engine.benchmark(inputs, num_iterations=200, num_warmup_iterations=100)

Answers to your direct questions:

In the attached pruning recipe the final_sparsity is set to 60%, which is quite low compared to the usual 80-90% models we produce on the SparseZoo, sometimes even with quantization as well. The simple answer here is the models we push are much more sparse than 60% and so the performance difference is larger. While it is model specific in how effective sparsity can be, for instance 60% sparsity is more meaningful on BERT than ResNet, we can say generally the less compute-bound a model is the less effective sparsity will be. In addition to the sparsity I saw the image size to the model is quite small at 3x32x32 which also makes it difficult to find the space for speedup when there isn't that much compute to remove.
The deepsparse engine does have different sets of algorithms that activate for identified structures, especially in CNN models like ResNet and MobileNet. The specific behavior you mentioned is likely tied to the batch size being divisible by 16. There is tuning that needs to happen for these batch size 15 and 17 cases since they use a different approach. Also for multi-socket systems this can get into even more edge cases to have evenly divisible batch sizes to ensure work is distributed evenly. Uniformly increasing throughput as batch size increases is the hope but unfortunately modern systems are quite heterogenous so this is difficult to achieve. We will work harder on this.

Hope this was of help and thanks again for the detailed report. Let me know if you have more questions.

Michael

from deepsparse.

jz-exwzd commented on May 12, 2024 1

Hi Michael,

Thank you for your detailed reply. It is very informative. I am glad that I reached out to the team about this issue.

I generally agree on the replies to both questions, especially about the first one. Basically there needs to be sufficient redundancy in the model for the pruning to actually exploit it and subsequently achieve speed up using DeepSparse. I guess there is not much meaning in pruning a not so complex model.

Thank you once again and keep up the the good work.

Best regards,
Chai Jiazheng

from deepsparse.

pruned models run faster than unpruned models only when batch size is of certain size (2^n) about deepsparse HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent