neuralmagic / deepsparse Goto Github PK

View Code? Open in Web Editor NEW

2.9K 53.0 167.0 140.41 MB

Sparsity-aware deep learning inference runtime for CPUs

Home Page: https://neuralmagic.com/deepsparse/

License: Other

Makefile 0.13% Python 99.67% Dockerfile 0.20%

machinelearning onnx inference computer-vision object-detection pruning quantization pretrained-models nlp cpus

deepsparse's Introduction

DeepSparse

Sparsity-aware deep learning inference runtime for CPUs

DeepSparse is a CPU inference runtime that takes advantage of sparsity to accelerate neural network inference. Coupled with SparseML, our optimization library for pruning and quantizing your models, DeepSparse delivers exceptional inference performance on CPU hardware.

✨NEW✨ DeepSparse LLMs

Neural Magic is excited to announce initial support for performant LLM inference in DeepSparse with:

sparse kernels for speedups and memory savings from unstructured sparse weights.
8-bit weight and activation quantization support.
efficient usage of cached attention keys and values for minimal memory movement.

Try It Now

Install (requires Linux):

pip install -U deepsparse-nightly[llm]

Run inference:

from deepsparse import TextGeneration
pipeline = TextGeneration(model="zoo:mpt-7b-dolly_mpt_pretrain-pruned50_quantized")

prompt="""
Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: what is sparsity? ### Response:
"""
print(pipeline(prompt, max_new_tokens=75).generations[0].text)

# Sparsity is the property of a matrix or other data structure in which a large number of elements are zero and a smaller number of elements are non-zero. In the context of machine learning, sparsity can be used to improve the efficiency of training and prediction.

Check out the TextGeneration documentation for usage details and get the latest sparsified LLMs on our HF Collection.

Sparsity 🤝 Performance

Developed in collaboration with IST Austria, our recent paper details a new technique called Sparse Fine-Tuning, which allows us to prune MPT-7B to 60% sparsity during fine-tuning without drop in accuracy. With our new support for LLMs, DeepSparse accelerates the sparse-quantized model 7x over the dense baseline:

Learn more about our Sparse Fine-Tuning research.

Check out the model running live on Hugging Face.

LLM Roadmap

Following this initial launch, we are rapidly expanding our support for LLMs, including:

Productizing Sparse Fine-Tuning: Enable external users to apply sparse fine-tuning to their datasets via SparseML.
Expanding model support: Apply our sparse fine-tuning results to Llama 2 and Mistral models.
Pushing for higher sparsity: Improving our pruning algorithms to reach even higher sparsity.

Computer Vision and NLP Models

In addition to LLMs, DeepSparse supports many variants of CNNs and Transformer models, such as BERT, ViT, ResNet, EfficientNet, YOLOv5/8, and many more! Take a look at the Computer Vision and Natural Language Processing domains of SparseZoo, our home for optimized models.

Installation

Install via PyPI (optional dependencies detailed here):

pip install deepsparse

To experiment with the latest features, there is a nightly build available using pip install deepsparse-nightly or you can clone and install from source using pip install -e path/to/deepsparse.

System Requirements

Hardware: x86 AVX2, AVX-512, AVX-512 VNNI and ARM v8.2+
Operating System: Linux
Python: 3.8-3.11
ONNX versions 1.5.0-1.15.0, ONNX opset version 11 or higher

For those using Mac or Windows, we recommend using Linux containers with Docker.

Deployment APIs

DeepSparse includes three deployment APIs:

Engine is the lowest-level API. With Engine, you compile an ONNX model, pass tensors as input, and receive the raw outputs.
Pipeline wraps the Engine with pre- and post-processing. With Pipeline, you pass raw data and receive the prediction.
Server wraps Pipelines with a REST API using FastAPI. With Server, you send raw data over HTTP and receive the prediction.

Engine

The example below downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo, compiles the model, and runs inference on randomly generated input. Users can provide their own ONNX models, whether dense or sparse.

from deepsparse import Engine

# download onnx, compile
zoo_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none"
compiled_model = Engine(model=zoo_stub, batch_size=1)

# run inference (input is raw numpy tensors, output is raw scores)
inputs = compiled_model.generate_random_inputs()
output = compiled_model(inputs)
print(output)

# > [array([[-0.3380675 ,  0.09602544]], dtype=float32)] << raw scores

Pipeline

Pipelines wrap Engine with pre- and post-processing, enabling you to pass raw data and receive the post-processed prediction. The example below downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo, sets up a pipeline, and runs inference on sample data.

from deepsparse import Pipeline

# download onnx, set up pipeline
zoo_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none"  
sentiment_analysis_pipeline = Pipeline.create(
  task="sentiment-analysis",    # name of the task
  model_path=zoo_stub,          # zoo stub or path to local onnx file
)

# run inference (input is a sentence, output is the prediction)
prediction = sentiment_analysis_pipeline("I love using DeepSparse Pipelines")
print(prediction)
# > labels=['positive'] scores=[0.9954759478569031]

Server

Server wraps Pipelines with REST APIs, enabling you to set up a model-serving endpoint running DeepSparse. This enables you to send raw data to DeepSparse over HTTP and receive the post-processed predictions. DeepSparse Server is launched from the command line and configured via arguments or a server configuration file. The following downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo and launches a sentiment analysis endpoint:

deepsparse.server \
  --task sentiment-analysis \
  --model_path zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none

Sending a request:

import requests

url = "http://localhost:5543/v2/models/sentiment_analysis/infer" # Server's port default to 5543
obj = {"sequences": "Snorlax loves my Tesla!"}

response = requests.post(url, json=obj)
print(response.text)
# {"labels":["positive"],"scores":[0.9965094327926636]}

Additional Resources

Use Cases Page for more details on supported tasks
Pipelines User Guide for Pipeline documentation
Server User Guide for Server documentation
Benchmarking User Guide for benchmarking documentation
Cloud Deployments and Demos
User Guide for more detailed documentation

Product Usage Analytics

DeepSparse gathers basic usage telemetry, including, but not limited to, Invocations, Package, Version, and IP Address, for Product Usage Analytics purposes. Review Neural Magic's Products Privacy Policy for further details on how we process this data.

To disable Product Usage Analytics, run:

export NM_DISABLE_ANALYTICS=True

Confirm that telemetry is shut off through info logs streamed with engine invocation by looking for the phrase "Skipping Neural Magic's latest package version check."

Community

Get In Touch

For more general questions about Neural Magic, complete this form.

License

DeepSparse Community is free to use and is licensed under the Neural Magic DeepSparse Community License. Some source code, example files, and scripts included in the DeepSparse GitHub repository or directory are licensed under the Apache License Version 2.0 as noted.
DeepSparse Enterprise requires a Trial License or can be fully licensed for production, commercial applications.

Cite

Find this project useful in your research or other communications? Please consider citing:

@misc{kurtic2023sparse,
      title={Sparse Fine-Tuning for Inference Acceleration of Large Language Models}, 
      author={Eldar Kurtic and Denis Kuznedelev and Elias Frantar and Michael Goin and Dan Alistarh},
      year={2023},
      url={https://arxiv.org/abs/2310.06927},
      eprint={2310.06927},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kurtic2022optimal,
      title={The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models}, 
      author={Eldar Kurtic and Daniel Campos and Tuan Nguyen and Elias Frantar and Mark Kurtz and Benjamin Fineran and Michael Goin and Dan Alistarh},
      year={2022},
      url={https://arxiv.org/abs/2203.07259},
      eprint={2203.07259},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@InProceedings{
    pmlr-v119-kurtz20a, 
    title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, 
    author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, 
    booktitle = {Proceedings of the 37th International Conference on Machine Learning}, 
    pages = {5533--5543}, 
    year = {2020}, 
    editor = {Hal Daumé III and Aarti Singh}, 
    volume = {119}, 
    series = {Proceedings of Machine Learning Research}, 
    address = {Virtual}, 
    month = {13--18 Jul}, 
    publisher = {PMLR}, 
    pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},
    url = {http://proceedings.mlr.press/v119/kurtz20a.html}
}

@article{DBLP:journals/corr/abs-2111-13445,
  author    = {Eugenia Iofinova and Alexandra Peste and Mark Kurtz and Dan Alistarh},
  title     = {How Well Do Sparse Imagenet Models Transfer?},
  journal   = {CoRR},
  volume    = {abs/2111.13445},
  year      = {2021},
  url       = {https://arxiv.org/abs/2111.13445},
  eprinttype = {arXiv},
  eprint    = {2111.13445},
  timestamp = {Wed, 01 Dec 2021 15:16:43 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2111-13445.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

All Thanks To Our Contributors

deepsparse's People

Contributors

Stargazers

Watchers

Forkers

alexm-nm ranalytica rxflamel tdeboissiere jrmeyer b1sounours yangwang92 adrianosantospb oluwabukolaadegboro jawaechan markurtz jadielam hzitoun baby123456 motus monastersky z-y00 yash1996 idontcalculate vixadd phucnhs sbhmajum369 shrejais ibrandiay clementpoiret shenmayufei skalskip syed-cbot techthiyanes manikant92 inquestgeronimo mdmmn378 adbmd deanofthewebb tdg5 chibukach zeyus loudjine dumpmemory vishalbelsare bin2000 dbarbuzzi haodeqi dnth databill86 neophack heatdh christhaliyath sunilsurineni eldarkurtic singularperturbation k2m5t2 rui-shen-afk joskid sararijo solderzzc gandalf012 williamsnick606 maheshs11 marcus-arcadius apurvnagvenkar mwess danieltanfh95 tejagollapudi c-nr corner4world ajunlonglive gg-big-org gpalrepo standardgalactic codeaudit ayushexel vikasojha666 zengxianxian727 jimbog farhanbadr09 tamminhdiep97 stjordanis vwulf shenyang70s seanigami senstar-hsoleimani jazib-sudo djord97 arpitjain799 alihssan iq-scm rogersaloo mengjin001 brunoscaglione safaribookingshub kal729 eamon-cai dineshchitlangia vpoul trungmaster5 gouravjha7 zeroxclem havietisov kp-forks

deepsparse's Issues

Please make an ARM version for Raspberry Pi 4

Hello,

I'm BM. A very nice and polite guy.

Please make an ARM version for Raspberry Pi 4

getting low fps & inference issue

1.i used this repo
https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo
& this command
!python annotate.py
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94
--source "/content/loc1min.mp4"
--quantized-inputs
--image-shape 416 416
--save-dir '/content/ops/'
--model-config '/content/coco128.yaml'
--device 'cpu'

& im getting low fps on cpu, (yolov5s model) its normal fps or should we get 50-60 fps? bcz you have mentioned that model will be 10x faster. but its very less.

i trained model usiing sparsml repo on coco128 data for 40 epochs & converted .pth model into onnx & tried same inference script
!python annotate.py
/content/sparseml/integrations/ultralytics-yolov5/yolov5/runs/train/exp2/weights/best.onnx
--source "/content/loc1min.mp4"
--quantized-inputs
--image-shape 416 416
--save-dir '/content/ops/'
--model-config '/content/coco128.yaml'
--device 'cpu'
& getting this issue.

What's wrong here? my goal is to use a custom data train model with sparceml & do inference using deepspare.

Is Deepsparse specially optimized for certain model architectures regardless of the sparsity?

Hi there,

I have been experimenting with the DeepSparse engine, and this is my second issue. I thought initially that DeepSparse engine is a general engine designed to exploit the sparsity in a model to achieve faster inference speed. However, recently I discovered that regardless of the sparsity of the model, model's architecture seems to play a bigger role in the final inference speed achieved by the DeepSparse engine. For example, when I compared the Resnet models' inference speed with and without the DeepSparse engine (all models having zero sparsity) , the inference speed using the DeepSparse engine is much faster despite the zero sparsity. This is the same with the EfficientNet models and the MobileNet models. But, the previously described behavior is not observed in networks like ResNext, SeResNext, ViT, etc. I have a feeling that when using DeepSparse engine, pruning/ high model's sparsity plays a secondary role, and the main reason for the speed up is the model's architecture.

May I know what is the view of the NeuralMagic team about the observation/ opinion above regarding the DeepSparse engine?

Thank you.

ERROR: Failed building wheel for deepsparse & unable to install deepsparse on windows as well as on macos.

Describe the bug
A clear and concise description of what the bug is.

Expected behavior
A clear and concise description of what you expected to happen.

Environment
Include all relevant environment information:

Mac OS
Python version 3.9.7:
DeepSparse version latest
CPU - m1 chip

To Reproduce
Exact steps to reproduce the behavior:
conda created environment
installed onnx - and pip installed deepsparse

Errors
error: Native Mac is currently unsupported for the DeepSparse Engine. Please run on a Linux system or within a Linux container on Mac. More info can be found in our docs here: https://docs.neuralmagic.com/deepsparse/source/hardware.html
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for deepsparse
Failed to build deepsparse
ERROR: Could not build wheels for deepsparse, which is required to install pyproject.toml-based projects

Any workplan for supporting ARM CPUs

Great contribution.
It would be better if deepsparse could support ARM devices.

Does a C API exist for deepsparse or is this python only and are all benchmarks via python?

Just a quick question. Is it possible to use deepsparse for inference directly in other languages e.g. C++, C# or similar? Or is all code written in python?

Inference is faster with ONNX runtime

Bug description
I tried training aware pruning on a custom transformer model, reaching the desired accuracy and sparsity (65% total). I then exported the model via ModuleExporter. When I ran the model via the DeepSparse Engine, I got a slightly higher latency compared to when I ran the same exported model via the ONNX runtime.

Expected behavior
The inference latency of the DeepSparse Engine should be much lower than the inference latency obtained from running the model via the ONNX runtime.

Environment
Include all relevant environment information:

OS: Debian GNU/Linux 10
Python version: 3.8.13
DeepSparse version or commit hash: 1.0.2
ML framework version(s): torch 1.9.1
Other Python package versions: sparseml 1.0.1, NumPy 1.21.0, ONNX 1.10.1
CPU info:

{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576,
'L3_cache_size': 25952256, 'architecture': 'x86_64', 'available_cores_per_socket': 4,
'available_num_cores': 4, 'available_num_hw_threads': 8, 'available_num_numa': 1,
'available_num_sockets': 1, 'available_sockets': 1, 'available_threads_per_core': 2,
'cores_per_socket': 4, 'isa': 'avx512', 'num_cores': 4, 'num_hw_threads': 8, 'num_numa': 1,
'num_sockets': 1, 'threads_per_core': 2, 'vendor': 'GenuineIntel',
'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) CPU @ 3.10GHz', 'vnni': True}

To Reproduce
One can skip the training part and randomly zero out some of a weights of a trained transformer model in PyTorch and try executing the ONNX converted model via the engine and also via the ONNX runtime.

YOLOv3 examples link in the root README leads to a 404

The public https://github.com/neuralmagic/deepsparse/tree/main/examples file has the following link to YOLOv3 examples.
https://github.com/neuralmagic/deepsparse/blob/main/examples/ultralytics-yolov3/
This link leads to a GitHub 404 page, indicating that the target directory does not exist.

I expected the link to bring me to a valid page. Perhaps it should be https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo ?

how to save result from engine.run output?

Hi, I want to use your quant and sparse yolov5s model.
when I run compile_file engine (engine.run) I get a list of array of this shapes:
(1,3,40,40,85) , (1,3,20,20,85),..
how can I get croped image (as array) from this output?(as yolo)

yolo itself use non_max_suppression on model output (it has such shape: (1,2550,85) ) and then some post processing on it. how about your model?

because of I need to integrate it with other codes, I want to get and save result from model without annotate.py . pls don't suggest it!

Large Activation Buffers Constraint & Solving for Model Processing/Performance Conflicts

v.0.2 Known Issues

In rare cases where a tensor, used as the input or output to an operation, is larger than 2GB, the engine can segfault. Users should decrease the batch size as a workaround.
In some cases, models running complicated pre- or post-processing steps could diminish the DeepSparse Engine performance by up to a factor of 10x due to hyperthreading, as two engine threads can run on the same physical core. Address the performance issue by trying the following recommended solutions in order of preference:
1. Enable thread binding
If that does not give performance benefit or you want to try additional options:
1. Use the numactl utility to prevent the process from running on hyperthreads.
2. Manually set the thread affinity in Python as follows:
```
import os
from deepsparse.cpu import cpu_architecture
ARCH = cpu_architecture()

if ARCH.vendor == "GenuineIntel":
    os.sched_setaffinity(0, range(ARCH.num_physical_cores()))
elif ARCH.vendor == "AuthenticAMD":
    os.sched_setaffinity(0, range(0, 2*ARCH.num_physical_cores(), 2))
else:
    raise RuntimeError(f"Unknown CPU vendor {ARCH.vendor}")
```

deepsparse installation fail

Describe the bug
Unable to install deepsparse engine.

Expected behavior
Should have installed from the pip install deepsparse command.

Environment
Include all relevant environment information:

OS : Windows 11
Python version : 3.9

To Reproduce
pip install deepsparse

Errors
Please see the attached error log.

error.log

Unable to install deepsparse-transformers dependencies

The bug
Unable to install deepsparse-transformers dependencies

Environment

OS release='4.15.0-117-generic', version=118-Ubuntu SMP, machine='x86_64':
Python 3.6.12 :
DeepSparse version 0.12.1:
ML framework version(s) torch-1.10.2:
Other Python package versions transformers-4.18.0 :
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture():
{'vendor': 'GenuineIntel', 'isa': 'avx2', 'vnni': False, 'num_sockets': 2, 'available_sockets': 2, 'cores_per_socket': 12, 'available_cores_per_socket': 12, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 31457280}

To Reproduce
Steps to reproduce the behavior:
1.pip install deepsparse
2.pip install https://github.com/neuralmagic/transformers/releases/download/nightly/transformers-4.18.0.dev0-py3-none-any.whl

Errors
I wanted to try deepsparse.transformers for summarization task but as an output I got error

which lead me to the error I am writing you about

I tried to find transformers-4.18.0.dev0-py3-none-any.whl following the path but I can not find the folder releases in transformers
https://github.com/neuralmagic/transformers

Is there plan to support sparse NN influence on mobile devices, such as Arm CPUs?

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Unable to install deepsparse

trying : pip install sparseml[torchvision] deepsparse
got the following error,

PS: Used in new env as well, same error , although this command works fine in colab.
Help me out, what am i missing here ?

shold we need to do all epochs training? can we resume training?

i stopped model training at 150 epoch & trying to convert it into onnx weight, but im getting issue? so should i need to do continue training? whats the impact if we stop earlier & do resume training>

Custom ONNX with LSTM and dynamic batch size

I am trying to compile an LSTM based ONNX model, but the kernel dies. It works with CNN based ONNX. Also, it is possible to have a model with dynamic batch size?
Ubuntu - 18.04
Python - 3.8
ONNX - 1.9.0
deepsparse - 0.12.1

Inference slowed down by pruning and quantization

Hi, I'm trying to use DeepSparse to run 1D CNNs for audio processing. I was benchmarking the performance depending on whether or not pruning (with 0.9 sparsity) and/or quantization is used, i.e. four different models. I'm using an AVX512 CPU (no VNNI).

I would expect the ordering of performance (higher is better) to be

vanilla < quantized <= quantized+pruned
vanilla < pruned <= quantized+pruned

If I understand correctly, quantized+pruned might not improve performance because "sparse quantization" only works with VNNI as stated here.

However, the inference times I measured for the models are:

vanilla: 0.029604
quantized: 0.03177
pruned+quantized: 0.096895
pruned: 0.13405
which does not correspond to the expectations. I got these times using my own custom benchmark, but using deepsparse.benchmark path-to-model.onnx -nstreams 1 -s sync -b 1 confirmed the times as well.

Here are .onnx files of the models I was using -- I hope the naming scheme is clear.

I pruned/quantized the models using SparseML, where I built the recipes using these templates:

base_recipe_template = """
version: 0.1.0
modifiers:
    - !EpochRangeModifier
        start_epoch: 0.0
        end_epoch: 2

    - !LearningRateModifier
        start_epoch: 0
        end_epoch: 2
        init_lr: 0.005
        lr_class:  ExponentialLR
        lr_kwargs:
            gamma: 0.9
"""

pruning_template = """
    - !GMPruningModifier
        start_epoch: 0
        end_epoch: 1
        update_frequency: 1.0
        init_sparsity: 0.05
        final_sparsity: 0.9
        mask_type: block4
        params: {layers_to_prune}
"""

quantization_template = """
    - !QuantizationModifier
        start_epoch: 0.0
"""

where layers_to_prune = sparseml.pytorch.utils.get_prunable_layers(model). I only add pruning_template for the pruned models and quantization_template for the quantized models.

I initialize the model randomly and train on random data (torch.randn_like(...)).

Expected behavior: quantization and pruning should decrease inference time, not increase it.

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: Debian GNU/Linux 11 (bullseye)
Python version [e.g. 3.7]: 3.8.12
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.12.2
ML framework version(s) [e.g. torch 1.7.1]: Torch 1.9.1
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: SparseML 0.12.2, ONNX 1.10.1,
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture(): {'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 1, 'available_cores_per_socket': 1, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 40370176}

To Reproduce: Run the .onnx files (with batch size 1) and measure latency, e.g. using deepsparse.benchmark path-to-model.onnx -nstreams 1 -s sync -b 1

Am I doing something wrong? Why is this happening?

Run Dense Teacher Creation instruction Data download failed

Describe the bug
Run the instruction
sparseml.transformers.token_classification \ --output_dir models/teacher \ --model_name_or_path zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none \ --recipe zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none?recipe_type=transfer-token_classification \ --recipe_args '{"init_lr":0.00003}' \ --dataset_name conll2003 --per_device_train_batch_size 32 \ --per_device_eval_batch_size 32 --preprocessing_num_workers 6 \ --do_train --do_eval --evaluation_strategy epoch --fp16 \ --save_strategy epoch --save_total_limit 1

https://neuralmagic.com/use-cases/sparse-named-entity-recognition/

Always prompt
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.neuralmagic.com', port=443): Max retries exceeded with url: /models/download/nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none/vocab.txt?release_version=0.7.0 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f3747532a90>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

Expected behavior

Download data automated.

Environment

use the https://github.com/neuralmagic/deepsparse/tree/main/docker Dokerfile

docker build -t deepsparse_docker .

docker run -itd --gpus all -v $(pwd):/root/deepsparse -p 5543:5543 --name deepsparse deepsparse_docker

docker exec -it deepsparse bash

>>> import deepsparse.cpu
>>> print(deepsparse.cpu.cpu_architecture())

{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 28835840, 'architecture': 'x86_64', 'available_cores_per_socket': 20, 'available_num_cores': 40, 'available_num_hw_threads': 80, 'available_num_numa': 2, 'available_num_sockets': 2, 'available_sockets': 2, 'available_threads_per_core': 2, 'cores_per_socket': 20, 'isa': 'avx512', 'num_cores': 40, 'num_hw_threads': 80, 'num_numa': 2, 'num_sockets': 2, 'threads_per_core': 2, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz', 'vnni': True}

To Reproduce
Exact steps to reproduce the behavior:

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

Additional context
Add any other context about the problem here. Also include any relevant files.

YOLOv5 pruned_quant-aggressive_94 exception

Describe the bug
I was trying to run demo code with YOLOv5 pruned_quant-aggressive_94 model on g4dn.x2large and encountered this exception.

Stack trace

  | 2021-12-16T15:36:11.889+01:00 | Overwriting original model shape (640, 640) to (800, 800)
  | 2021-12-16T15:36:11.889+01:00 | Original model path: /mnt/pylot/unleash_models/yolov5_optimised/yolov5-s/pruned_quant-aggressive_94.onnx, new temporary model saved to /tmp/tmpd8kad_7r
  | 2021-12-16T15:36:11.890+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized) (system=avx512, binary=avx512)
  | 2021-12-16T15:36:13.559+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized)
  | 2021-12-16T15:36:13.559+01:00 | Date: 12-16-2021 @ 14:36:13 UTC
  | 2021-12-16T15:36:13.559+01:00 | OS: Linux ip-10-0-2-22.ap-southeast-2.compute.internal 4.14.173-137.229.amzn2.x86_64 #1 SMP Wed Apr 1 18:06:08 UTC 2020
  | 2021-12-16T15:36:13.559+01:00 | Arch: x86_64
  | 2021-12-16T15:36:13.559+01:00 | CPU: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
  | 2021-12-16T15:36:13.559+01:00 | Vendor: GenuineIntel
  | 2021-12-16T15:36:13.559+01:00 | Cores/sockets/threads: [4, 1, 8]
  | 2021-12-16T15:36:13.559+01:00 | Available cores/sockets/threads: [4, 1, 8]
  | 2021-12-16T15:36:13.559+01:00 | L1 cache size data/instruction: 32k/32k
  | 2021-12-16T15:36:13.559+01:00 | L2 cache size: 1Mb
  | 2021-12-16T15:36:13.559+01:00 | L3 cache size: 35.75Mb
  | 2021-12-16T15:36:13.559+01:00 | Total memory: 30.9605G
  | 2021-12-16T15:36:13.559+01:00 | Free memory: 14.6592G
  | 2021-12-16T15:36:13.559+01:00 | Assertion at ./src/include/wand/jit/pooling/common.hpp:239
  | 2021-12-16T15:36:13.559+01:00 | Backtrace:
  | 2021-12-16T15:36:13.560+01:00 | 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 1# wand::detail::assert_fail(char const*, char const*, int) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 2# 0x00007F4B71E55271 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 3# 0x00007F4B71E55125 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 4# 0x00007F4B71E554FD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 5# 0x00007F4B71E5A4E0 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 6# 0x00007F4B71E5A89A in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 7# 0x00007F4B71E5CDE8 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 8# 0x00007F4B7101F93B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 9# 0x00007F4B7101FAF9 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 10# 0x00007F4B7101B9D5 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 11# 0x00007F4B71042618 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 12# 0x00007F4B71042C91 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 13# 0x00007F4B71070667 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 14# 0x00007F4B70BFA76B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 15# 0x00007F4B70BEA8FC in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 16# 0x00007F4B70BD7A4F in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 17# 0x00007F4B71156499 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 18# 0x00007F4B70C0A3EF in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 19# 0x00007F4B70C28DCD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 20# 0x00007F4B70C28EF3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 21# 0x00007F4B70C295B3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 22# 0x00007F4B71FB8E10 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 23# 0x00007F4CFA2C06DB in /lib/x86_64-linux-gnu/libpthread.so.0
  | 2021-12-16T15:36:13.560+01:00 | Please email a copy of this stack trace and any additional information to: [email protected]

Environment

Ubuntu 18.04
Python 3.8
ML framework version(s)

torch @ https://download.pytorch.org/whl/cu110/torch-1.7.1%2Bcu110-cp38-cp38-linux_x86_64.whl
torchvision @ https://download.pytorch.org/whl/cu110/torchvision-0.8.2%2Bcu110-cp38-cp38-linux_x86_64.whl

Other Python package versions

sparseml==0.9.0
sparsezoo==0.9.0
numpy==1.21.4
onnx==1.9.0
onnxruntime==1.7.0

Is there any chance you could help me out to debug that issue?

Could not build wheels for deepsparse, which is required to install pyproject.toml-based projects

im using windows 10 , python 3.7 & creayed new venv for this project,
so window system dosnt have support of deepsparse?

Deep Sparse vs onnx models

Hello,
I have trained several pruned models and saved the weights (using other pruning methods) and converted the saved models to onnx (using torch). I'm interested in comparing their inference times. The results are confusing as the trends do not stay the same when I change the batch size. Also, for some batch size and model combinations, onnx is faster than Deep Sparse which is confusing. I was wondering if there is an explanation for that, or I'm missing something.

Converting onnx model to deepsparse

Hi,
I'm trying to convert an onnx model to a deepsparse model, here is the code:

from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs
onnx_filepath = "fom.onnx"
batch_size = 1

# Generate random sample input
inputs = generate_random_inputs(onnx_filepath, batch_size)

# Compile and run
engine = compile_model(onnx_filepath, batch_size)
outputs = engine.run(inputs)

**Environment**
Include all relevant environment information:
1. Ubuntu 18.04:
2. Python version 3.7.9. :
3. DeepSparse version 0.8.0 :
4. torch 1.9.0+cu102:
5. Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
6. CPU {'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': True, 'num_sockets': 2, 'available_sockets': 2, 'cores_per_socket': 18, 'available_cores_per_socket': 18, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 25952256}

**Errors**
[     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #0 of shape = [1, 3, 256, 256]
[     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #1 of shape = [1, 10, 2]
[     INFO            onnx.py: 128 - generate_random_inputs() ] -- generating random input #2 of shape = [1, 10, 2]
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized) (system=avx512, binary=avx512)
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.8.0 (68df72e1) (release) (optimized)
Date: 12-05-2021 @ 12:58:29 EST
OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
Arch: x86_64
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Vendor: GenuineIntel
OS: Linux visiongpu49 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021
Arch: x86_64
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Vendor: GenuineIntel
Cores/sockets/threads: [36, 2, 72]
Available cores/sockets/threads: [36, 2, 72]
L1 cache size data/instruction: 32k/32k
L2 cache size: 1Mb
L3 cache size: 24.75Mb
Total memory: 507.367G
Free memory: 22.4387G

Assertion at ./src/include/wand/engine/compute/planner.hpp:131

Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
1# 0x00007F36EB17C234 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
2# 0x00007F36EB185889 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
3# 0x00007F36EB185982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
4# 0x00007F36EB18AA8A in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
5# 0x00007F36EB18AB00 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
6# 0x00007F36EA7E985D in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
7# 0x00007F36EA7EE443 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
8# 0x00007F36EA76BD6B in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
9# 0x00007F36EA75AB3F in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
10# 0x00007F36EA75C1C1 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
11# 0x00007F36EADA9668 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
12# 0x00007F36EADAC0A2 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
13# 0x00007F36EADAF3B9 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
14# 0x00007F36EA73B76C in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
15# 0x00007F36EA7414C3 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
16# 0x00007F36EA6FB982 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
17# 0x00007F36EA6FBC05 in /data/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, int, wand::safe_type<wand::parallel::use_current_affinity_tag, bool>, std::shared_ptr<wand::parallel::scheduler_factory_t>) in /data/lib/python3.7/site-packages/deepsparse/avx512/libdeepsparse.so
19# 0x00007F3771031D1B in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
20# 0x00007F3771031F39 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
21# 0x00007F377105D5C5 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
22# 0x00007F377104B250 in /data/lib/python3.7/site-packages/deepsparse/avx512/deepsparse_engine.so
23# _PyMethodDef_RawFastCallDict in python

Please email a copy of this stack trace and any additional information to: [email protected]
Aborted

Do you have any ideas why the code is failing?

ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'

Describe the bug
Trying to run the server-client example.

Environment
Include all relevant environment information:
Ubuntu 18.04

Python version : 3.8
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:

deepsparse 0.1.1

CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

{'vendor': 'GenuineIntel', 'isa': 'avx2', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 8, 'available_cores_per_socket': 8, 'threads_per_core': 1, 'available_threads_per_core': 1, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 12582912}

To Reproduce
from deepsparse.utils import arrays_to_bytes, bytes_to_arrays

Errors
Traceback (most recent call last):
File "server.py", line 62, in
from deepsparse.utils import arrays_to_bytes, bytes_to_arrays
ImportError: cannot import name 'arrays_to_bytes' from 'deepsparse.utils'

Cannot import deepsparse from WSL: cannot get cpu topology

Describe the bug

For testing purposes, I want to try if my code works on Windows Subsystem for Linux (WSL2). I'm using Ubuntu 18.04LTS.

Once on Ubuntu on WSL, I create a new python virtual env, then pip install deepsparse.

After that, while trying to import deepsparse I get:

>>> import deepsparse
arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
Traceback (most recent call last):
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
    info_str = subprocess.check_output(file_path).decode("utf-8")
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
    from .engine import *
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
    from deepsparse.lib import init_deepsparse_lib
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
    CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
    arch = cpu_architecture()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
    arch = _parse_arch_bin()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
    self.memo[args] = self.f(*args)
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
    raise OSError(
OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.

Expected behavior

Maybe it should work on WSL :)

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: Ubuntu 18.04LTS (On Windows 10, WSL2)
Python version [e.g. 3.7]: 3.9
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.8.0
ML framework version(s) [e.g. torch 1.7.1]: 1.10.0
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

This is basically what's not working

To Reproduce
Exact steps to reproduce the behavior:

On windows 10, activate WSL
Install Ubuntu 18.04 from microsoft store
On Ubuntu, create virtual env (I personnaly use mamba or conda)
pip install deepsparse
import deepsparse

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

>>> import deepsparse
arch.bin: ./src/include/cpu_info/cpu_info.hpp:515: std::shared_ptr<cpu_info::topology> cpu_info::detect_topology_from_cpuid_api(): Assertion `!thread.exists' failed.
Traceback (most recent call last):
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 119, in _parse_arch_bin
    info_str = subprocess.check_output(file_path).decode("utf-8")
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/__init__.py", line 28, in <module>
    from .engine import *
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in <module>
    from deepsparse.lib import init_deepsparse_lib
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in <module>
    CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 216, in cpu_details
    arch = cpu_architecture()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 148, in cpu_architecture
    arch = _parse_arch_bin()
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 47, in __call__
    self.memo[args] = self.f(*args)
  File "/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/cpu.py", line 123, in _parse_arch_bin
    raise OSError(
OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/home/cp264607/mambaforge/envs/hsf/lib/python3.9/site-packages/deepsparse/arch.bin' died with <Signals.SIGABRT: 6>.

Additional context
Add any other context about the problem here. Also include any relevant files.

How to derive the probability of the predictions in deepsparse? For classification for example

Additional context
Related-To: AICoE/elyra-aidevsecops-tutorial#297

Huggingface base Wav2Vec2 model crashing

Describe the bug
Hello,

I am trying to compile the onnx-converted model of a sparse Huggingface base Wav2Vec2 model (where sparsity was obtained via unstructured magnitude pruning) through compile_model :

dse_network = compile_model(onnx_filepath, batch_size=batch_size, num_cores=1, num_streams=1)

My kernel crashed and I received the following message:

Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
1# 0x00007FFB125A27C4 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
2# 0x00007FFB125A8906 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
3# 0x00007FFB125A89F2 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
4# 0x00007FFB125B12FA in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
5# 0x00007FFB125B1370 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
6# 0x00007FFB11B1F76D in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
7# 0x00007FFB11B25BCF in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
8# 0x00007FFB11A92015 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
9# 0x00007FFB11A81939 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
10# 0x00007FFB11A82AF1 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
11# 0x00007FFB1213F938 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
12# 0x00007FFB121423B3 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
13# 0x00007FFB121456B9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
14# 0x00007FFB11A6312B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
15# 0x00007FFB11A6B3CE in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
16# 0x00007FFB11A11C1A in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
17# 0x00007FFB11A11ED5 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
18# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::shared_ptrwand::parallel::scheduler_factory_t) in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/libdeepsparse.so
19# 0x00007FFBE3641649 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so
20# 0x00007FFBE364184B in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so
21# 0x00007FFBE36788B6 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so
22# 0x00007FFBE364B0F9 in /opt/conda/lib/python3.9/site-packages/deepsparse/avx512/deepsparse_engine.so
23# 0x0000561F0FD79B66 in /opt/conda/bin/python

Please email a copy of this stack trace and any additional information to: [email protected]

Environment
Include all relevant environment information:

OS : Ubuntu 18.04.5 LTS
Python version [e.g. 3.7]: Python 3.9.4
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 1.0.2
ML framework version(s) [e.g. torch 1.7.1]: torch 1.11.0
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: onnxruntime 1.12.0, onnx 1.12.0,
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

>>> import deepsparse.cpu
>>> print(deepsparse.cpu.cpu_architecture())

{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 31719424, 'architecture': 'x86_64', 'available_cores_per_socket': 19, 'available_num_cores': 38, 'available_num_hw_threads': 76, 'available_num_numa': 2, 'available_num_sockets': 2, 'available_sockets': 2, 'available_threads_per_core': 2, 'cores_per_socket': 19, 'isa': 'avx512', 'num_cores': 38, 'num_hw_threads': 76, 'num_numa': 2, 'num_sockets': 2, 'threads_per_core': 2, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz', 'vnni': False}

Would you please have any solution?
Thank you

ModuleNotFoundError: No module named 'utils'

Describe the bug
Download the yolo example to try.
https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo

But, after run pip3 install -r .\requirements.txt, throw an error

ERROR: Command errored out with exit status 1:
   command: 'c:\users\user\appdata\local\programs\python\python39\python.exe' 'c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py' get_requires_for_build_wheel 'C:\Users\user\AppData\Local\Temp\tmp7thtj5al'
       cwd: C:\Users\user\AppData\Local\Temp\pip-install-dwp60hed\deepsparse_e6f915f0eb184b6b86733212e02460ac
  Complete output (18 lines):
  Traceback (most recent call last):
    File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 280, in <module>
      main()
    File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "C:\Users\user\AppData\Local\Temp\pip-build-env-0_dlvtg5\overlay\Lib\site-packages\setuptools\build_meta.py", line 
177, in get_requires_for_build_wheel
      return self._get_build_requires(
    File "C:\Users\user\AppData\Local\Temp\pip-build-env-0_dlvtg5\overlay\Lib\site-packages\setuptools\build_meta.py", line 
159, in _get_build_requires
      self.run_setup()
    File "C:\Users\user\AppData\Local\Temp\pip-build-env-0_dlvtg5\overlay\Lib\site-packages\setuptools\build_meta.py", line 
281, in run_setup
      super(_BuildMetaLegacyBackend,
    File "C:\Users\user\AppData\Local\Temp\pip-build-env-0_dlvtg5\overlay\Lib\site-packages\setuptools\build_meta.py", line 
174, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 24, in <module>
      from utils.artifacts import (
  ModuleNotFoundError: No module named 'utils'

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: Window 10
Python version [e.g. 3.7]: 3.96

Crash when importing

I installed deepsparse using Pip. When I try to import it, Python immediately crashes:

(venv) vvolhejn@eu-login-21 ~> python3
Python 3.8.5 (default, Sep 27 2021, 10:10:37)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import deepsparse
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.1 (18c5ee67) (release) (optimized)
Date: 05-24-2022 @ 15:40:28 CEST
OS: Linux eu-login-21 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022
Arch: x86_64
CPU:
Vendor:
Cores/sockets/threads: [0, 0, 0]
Available cores/sockets/threads: [0, 0, 0]
L1 cache size data/instruction: 0k/0k
L2 cache size: 0Mb
L3 cache size: 0Mb
Total memory: 47.349G
Free memory: 8.72411G

Assertion at src/lib/core/cpu.cpp:273

Backtrace:
 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /cluster/home/vvolhejn/venv/lib64/python3.8/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 1# wand::detail::assert_fail(char const*, char const*, int) in /cluster/home/vvolhejn/venv/lib64/python3.8/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 2# 0x00002B061C304E9B in /cluster/home/vvolhejn/venv/lib64/python3.8/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 3# 0x00002B061C30513C in /cluster/home/vvolhejn/venv/lib64/python3.8/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 4# 0x00002B06015189C3 in /lib64/ld-linux-x86-64.so.2
 5# 0x00002B060151D59E in /lib64/ld-linux-x86-64.so.2
 6# 0x00002B06015187D4 in /lib64/ld-linux-x86-64.so.2
 7# 0x00002B060151CB8B in /lib64/ld-linux-x86-64.so.2
 8# 0x00002B06020F8FAB in /lib64/libdl.so.2
 9# 0x00002B06015187D4 in /lib64/ld-linux-x86-64.so.2
10# 0x00002B06020F95AD in /lib64/libdl.so.2
11# dlopen in /lib64/libdl.so.2
12# _PyImport_FindSharedFuncptr in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
13# _PyImport_LoadDynamicModuleWithSpec in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
14# 0x00002B06018D0449 in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
15# 0x00002B060180CE03 in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
16# PyVectorcall_Call in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
17# _PyEval_EvalFrameDefault in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
18# _PyEval_EvalCodeWithName in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
19# _PyFunction_Vectorcall in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
20# _PyEval_EvalFrameDefault in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
21# 0x00002B0601798209 in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
22# _PyEval_EvalFrameDefault in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0
23# 0x00002B0601798209 in /cluster/apps/nss/gcc-8.2.0/python/3.8.5/x86_64/lib64/libpython3.8.so.1.0

Please email a copy of this stack trace and any additional information to: [email protected]
fish: Job 1, 'python3' terminated by signal SIGABRT (Abort)

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: CentOS Linux release 7.9.2009 (Core)
Python version [e.g. 3.7]: 3.8.5
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.12.1 (18c5ee67)
ML framework version(s) [e.g. torch 1.7.1]: n/a
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:

numpy: 1.22.3
onnx: 1.10.1
onnxruntime: 1.11.1, also tried 1.10.0 with the same results

CPU info:

{
"vendor" : "GenuineIntel",
"isa" : "avx2",
"vnni" : false,
"num_sockets" : 1,
"available_sockets" : 0,
"cores_per_socket" : 0,
"available_cores_per_socket" : 0,
"threads_per_core" : 0,
"available_threads_per_core" : 0,
"L1_instruction_cache_size" : 32768,
"L1_data_cache_size" : 32768,
"L2_cache_size" : 262144,
"L3_cache_size" : 6291456
}

Is deepsparse only for CPU?

Is deepsparse useful only for optimisations on CPU?
Can I use deepsparse for memory optimizations on GPU also? If yes can you please share some tutorial for it.

Thanks

no better speed on yolo quant

Hi!
How is it going?

At first ,thanks for your good repo and helping to make better and faster model.
I use your yolo example for getting better speed, and I compare base, pruned and quant models as you said. but all result were aproximatly same .
there is no vnni warning, and my server is ubuntu 18
my code is:

import os

models :

yolov5s_base = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none"

yolov5s_pruned ="zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96"

yolov5s_pruned_quant = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94"

source_img = "img.bmp"

print("\n base inference:\n")
bash_cmd = f"python annotate.py {yolov5s_base} --source {source_img} --image-shape 640 640 "
os.system(bash_cmd)

print("\n pruned inference:\n")
bash_cmd = f"python annotate.py {yolov5s_pruned } --source {source_img} --image-shape 640 640 "
os.system(bash_cmd)

print("\n pruned_quant inference:\n")
bash_cmd = f"python annotate.py {yolov5s_pruned_quant} --source {source_img} --quantized-inputs --image-shape 640 640 "
os.system(bash_cmd)

when I run this code in bash script, I get this results:

base inference:

2022-03-08 20:28:15 main INFO Results will be saved to annotation_results/deepsparse-annotations-8
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none downloaded to /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx
2022-03-08 20:28:17 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/cdaaf2c9-a2f1-45d2-841d-45ce123e7b25/model.onnx
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512)
2022-03-08 20:28:18 main INFO Inference 0 processed in 128.20696830749512 ms
2022-03-08 20:28:18 main INFO Results saved to annotation_results/deepsparse-annotations-8

pruned inference:

2022-03-08 20:28:19 main INFO Results will be saved to annotation_results/deepsparse-annotations-9
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 downloaded to /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx
2022-03-08 20:28:21 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512)
2022-03-08 20:28:23 main INFO Inference 0 processed in 124.91464614868164 ms
2022-03-08 20:28:23 main INFO Results saved to annotation_results/deepsparse-annotations-9

pruned_quant inference:

2022-03-08 20:28:24 main INFO Results will be saved to annotation_results/deepsparse-annotations-10
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 downloaded to /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx
2022-03-08 20:28:26 main INFO Compiling DeepSparse model for /home/fteam/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx512, binary=avx512)
2022-03-08 20:28:28 main INFO Inference 0 processed in 114.76516723632812 ms
2022-03-08 20:28:28 main INFO Results saved to annotation_results/deepsparse-annotations-10

as you see quant pruned has no more speed !
pls guide me to get faster result
thanks!

Export yolov5s-quantized-prunning model to onnx

I'm trying export model YOLOv5s Pruned Quantized to ONNX. ONNX version this model from model:zoo doesn't work with opencv dnn. So I tried download .pt version and export to onnx with dynamic option.

here is command:
sparseml.yolov5.export_onnx --weights model.pt --dynamic --simplify --include onnx

And I get this error:

File "/opt/conda/bin/sparseml.yolov5.export_onnx", line 8, in <module>
    sys.exit(export())
  File "/opt/conda/lib/python3.8/site-packages/sparseml/yolov5/scripts.py", line 60, in export
    export_run(**vars(opt))
  File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 712, in export_run
    main(opt)
  File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 706, in main
    run(**vars(opt))
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 595, in run
    model, extras = load_checkpoint(type_='ensemble', weights=weights, device=device)  # load FP32 model
  File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 531, in load_checkpoint
    state_dict = load_state_dict(model, state_dict, run_mode=not ensemble_type, exclude_anchors=exclude_anchors)
  File "/opt/conda/lib/python3.8/site-packages/yolov5/export.py", line 555, in load_state_dict
    model.load_state_dict(state_dict, strict=not run_mode)  # load
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
	Unexpected key(s) in state_dict: "model.24.anchor_grid".

Deepsparse BERT Classification model.

I want to convert a BERT classification model twith Deepsparse. I'm unable to find any appropriate examples for the.

I have an ONNX converted a model fine tuned with Hugging face API.
Any help with this regard is highly appreciated.

Thanks,
Subhasis

The AVX instruction set is unknown. Set NM_ARCH to one of avx512,avx2 to continue.

Hello, thanks for your work. I got an issue when executing the example. Not sure if I misunderstand anything.
Sorry that I mislabel it to a bug, but I don't know how to delete this label.

Describe the bug
A clear and concise description of what the bug is.
To execute the example - ultralytics-yolo, it would have the error as following.

The AVX instruction set is unknown. Set NM_ARCH to one of avx512,avx2 to continue.

Expected behavior
A clear and concise description of what you expected to happen.
It could successfully have object detection output.

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: Ubuntu 16.04
Python version [e.g. 3.7]: 3.9.10
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.11.0
ML framework version(s) [e.g. torch 1.7.1]: torch 1.9.0
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
- Onnx: 1.10.1
- sparseml: 0.11.0
- numpy: 1.22.3
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

>>> import deepsparse.cpu
>>> print(deepsparse.cpu.cpu_architecture())

I can't execute the above command, but got the same error.
My CPU is Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz.

To Reproduce
Exact steps to reproduce the behavior:

python annotate.py   zoo:cv/detection/yolov5-s/pytorch/ultr
alytics/coco/pruned_quant-aggressive_94  --source /home/dev/Documents/teco/experiment/tflite1/test1.jpg  --image-shape 416 416 --device cpu
# or
python annotate.py -h

Errors

Traceback (most recent call last):
File "/home/dev/Documents/teco/experiment/deepsparse/examples/ultralytics-yolo/annotate.py", line 119, in
from deepsparse import compile_model
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/init.py", line 33, in
from .engine import *
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/engine.py", line 44, in
from deepsparse.lib import init_deepsparse_lib
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/lib.py", line 27, in
CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/cpu.py", line 242, in cpu_details
arch = cpu_architecture()
File "/home/dev/mambaforge/envs/tflite1-env/lib/python3.9/site-packages/deepsparse/cpu.py", line 184, in cpu_architecture
raise OSError(
OSError: Neural Magic: The AVX instruction set is unknown. Set NM_ARCH to one of avx512,avx2 to continue.

Additional context
Add any other context about the problem here. Also include any relevant files.

yolov5 speed is very slow

Hi, I test yolov5s model in ubuntu 16.04 but the speed is very slow as below figure. It warns that VNNI instructions not detected, quantization speedup not well supported. What't the problem and how should I solve it ?

My cpu information as below figure shown

Can deepsparse run on windows 10?

Hi, I would like to try deepsparse with yolov5 on windows 10.
Could the existing code run on windows 10?

I tried but so far no success.

Can't run quickstart on DigitalOcean

Describe the bug
Can't run quickstart https://github.com/neuralmagic/deepsparse#quickstart-with-sparsezoo-onnx-models on DigitalOcean droplet with "Premium Intel" CPU

Expected behavior
No error message

Environment

OS Ubuntu 20.04 (LTS) x64
Python version 3.8
DeepSparse version 0.5.1

To Reproduce
Exact steps to reproduce the behavior:

apt update
apt-get -y install python3-venv build-essential cmake
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install deepsparse

Run "ResNet-50 Dense" example linked above

Errors
OSError: neuralmagic: encountered exception while trying read arch.bin: Command '/root/venv/lib/python3.8/site-packages/deepsparse/arch.bin' returned non-zero exit status 1.

Additional context
The binary doesn't output anything:


(venv) root@ubuntu-s-1vcpu-2gb-intel-lon1-01:~# /root/venv/lib/python3.8/site-packages/deepsparse/arch.bin
(venv) root@ubuntu-s-1vcpu-2gb-intel-lon1-01:~#

Possible to use DeepSparse with `tf.keras.applications` models?

Hi.

I would be interested to know if this is possible.

Expected Acceleration

What is the maximum expected acceleration from Deepsparse? (Assuming that the model is sparsified further with sparseml as well)

In the documentation here, achieving GPU-level performance on CPUs is promised, however in this article (Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference), the maximum acceleration on top of cpu is stated as 2.5x. So it is 2.5 times faster than running on cpu which is not very close to gpu performance.

Thanks!

Comparison with OpenVino

Hi, is there anyone compares DeepSparse with OpenVino?

CPU usage only at 50% instead of 100%

Describe the bug
I installed DeepSparse on a different machine than the one causing trouble in my last issue, but I'm running into another problem: DeepSparse is only using 50% of the CPU, namely only one of the two cores (vCPUs, rather).
This happens no matter the setting of num_cores in the deepsparse.compile_model() call. I've tried everything between 0 and 4.

Expected behavior
The CPU usage should be close to 100%. This is indeed what happens when running using other frameworks such as ONNX Runtime.

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: Debian GNU/Linux 11 (bullseye)
Python version [e.g. 3.7]: 3.8.12
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.12.2
ML framework version(s) [e.g. torch 1.7.1]: TensorFlow 2.8.0
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: tf2onnx 1.11.1
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture(): {'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 1, 'available_cores_per_socket': 1, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 40370176}

To Reproduce
Run the following script:

import psutil

import deepsparse
import tf2onnx
import tensorflow as tf
import numpy as np
import onnx

print(deepsparse.cpu_architecture())

hidden_size = 512
n_layers = 5

orig_model = tf.keras.Sequential(
    [tf.keras.layers.Input(shape=(hidden_size,))]
    + [
        tf.keras.layers.Dense(hidden_size, activation=tf.nn.relu)
        for _ in range(n_layers)
    ]
)


input_signature = [
    tf.TensorSpec([1] + orig_model.input.shape[1:], dtype=np.float32, name="input")
]

onnx_model, _ = tf2onnx.convert.from_keras(orig_model, input_signature, opset=13)
onnx_filepath = "/tmp/debug.onnx"

onnx.save(onnx_model, onnx_filepath)

batch_size = 32

engine = deepsparse.compile_model(
    onnx_filepath,
    batch_size=batch_size,
    num_cores=3,
    # util.get_n_cpus_available()
)

data = np.random.randn(batch_size, *orig_model.input.shape[1:]).astype(np.float32)

psutil.cpu_percent()  # Run once to initialize

for i in range(500):
    engine.run([data])

print("Average CPU usage:", psutil.cpu_percent())

This is the output:

2022-06-20 14:26:39.778197: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64:/opt/intel/openvino_2022.1.0.643/tools/compile_tool:/opt/intel/openvino_2022.1.0.643/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022.1.0.643/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022.1.0.643/runtime/lib/intel64
2022-06-20 14:26:39.778240: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
{'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 1, 'available_cores_per_socket': 1, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 40370176}
2022-06-20 14:26:41.876161: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/openvino_2022/tools/compile_tool:/opt/intel/openvino_2022/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022/runtime/lib/intel64:/opt/intel/openvino_2022.1.0.643/tools/compile_tool:/opt/intel/openvino_2022.1.0.643/runtime/3rdparty/tbb/lib::/opt/intel/openvino_2022.1.0.643/runtime/3rdparty/hddl/lib:/opt/intel/openvino_2022.1.0.643/runtime/lib/intel64
2022-06-20 14:26:41.876214: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-06-20 14:26:41.876248: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (n1-west1-1): /proc/driver/nvidia/version does not exist
2022-06-20 14:26:41.876626: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-20 14:26:42.060953: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2022-06-20 14:26:42.061236: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-06-20 14:26:42.063087: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  function_optimizer: function_optimizer did nothing. time = 0.016ms.
  function_optimizer: function_optimizer did nothing. time = 0.001ms.

WARNING:tensorflow:From /home/vaclav/venv3.8/lib/python3.8/site-packages/tf2onnx/tf_loader.py:711: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-06-20 14:26:42.173088: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2022-06-20 14:26:42.173281: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-06-20 14:26:42.202669: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1164] Optimization results for grappler item: graph_to_optimize
  constant_folding: Graph size after: 28 nodes (-10), 37 edges (-10), time = 14.961ms.
  function_optimizer: function_optimizer did nothing. time = 0.004ms.
  constant_folding: Graph size after: 28 nodes (0), 37 edges (0), time = 4.993ms.
  function_optimizer: function_optimizer did nothing. time = 0.002ms.

DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized) (system=avx512, binary=avx512)
Average CPU usage: 49.6

Inspecting htop while the script is running also tells me the CPU usage is around 50%.

Overly large text annotation

Describe the bug
When running annotate.py in https://github.com/neuralmagic/deepsparse/tree/main/examples/ultralytics-yolo
the text annotation on the inference window is too large.

The text annotation (images_per_sec) on the inference window is so large to the point some of the texts are not visible because they exceed the window size.

Screenshot attached 👇

Expected behavior
All texts should be visible within the window.

Deepsparse model compilation error on YOLOv7

Bug Description
I am trying to sparsify and run YOLOv7 on Deepsparse. I used SparseML to sparsify the model, and was able to get a sparsity of 0.75. Exporting this sparse YOLOv7 to an onnx model and running it on OpenVino was successful so the model was created fine (albeit with only moderate performance improvements).
Running the same onnx model on Deepsparse however results in an error while compiling the model: compile_model(onnx_filepath, batch_size)

Expected behavior
The onnx model should compile successfully and then run on Deepsparse.

Environment

OS: Ubuntu 20.04.3 LTS
Python version: 3.7.13
DeepSparse version or commit hash: 1.0.2
ML framework version(s): torch 1.9.0, torchvision 0.10.0
Other Python package versions: SparseML 1.0.1, numpy 1.19.5, ONNX 1.10.1
CPU info -
{'L1_data_cache_size': 32768, 'L1_instruction_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 36700160, 'architecture': 'x86_64', 'available_cores_per_socket': 12, 'available_num_cores': 24, 'available_num_hw_threads': 24, 'available_num_numa': 2, 'available_num_sockets': 2, 'available_sockets': 2, 'available_threads_per_core': 1, 'cores_per_socket': 12, 'isa': 'avx2', 'num_cores': 24, 'num_hw_threads': 24, 'num_numa': 2, 'num_sockets': 2, 'threads_per_core': 1, 'vendor': 'GenuineIntel', 'vendor_id': 'Intel', 'vendor_model': 'Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz', 'vnni': False}

To Reproduce
Exact steps to reproduce the behavior:

from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs

onnx_filepath = "pruned_finalized_144_torch.onnx"
batch_size = 1
inputs = generate_random_inputs(onnx_filepath, batch_size)

engine = compile_model(onnx_filepath, batch_size) #<---- error

Errors
2022-08-16 13:22:33 deepsparse.utils.onnx INFO Generating input 'images', type = float32, shape = [1, 3, 640, 640]
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.0.2 (7dc5fa34) (release) (optimized) (system=avx2, binary=avx2)
[nm_ort 7fe93a051340 >ERROR< supported_subgraphs /home/ubuntu/build/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:858] ==== FAILED TO COMPILE ====
Unexpected exception message: map::at
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.0.2 (7dc5fa34) (release) (optimized)
Date: 08-16-2022 @ 13:22:34 UTC
OS: Linux AZJAIVISIONGPUL05 5.11.0-1028-azure #31~20.04.2-Ubuntu SMP Tue Jan 18 08:46:15 UTC 2022
Arch: x86_64
CPU: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
Vendor: GenuineIntel
Cores/sockets/threads: [24, 2, 24]
Available cores/sockets/threads: [24, 2, 24]
L1 cache size data/instruction: 32k/32k
L2 cache size: 0.25Mb
L3 cache size: 35Mb
Total memory: 440.897G
Free memory: 339.802G

Assertion at /home/ubuntu/build/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:76

Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
1# 0x00007FE85F146492 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
2# 0x00007FE85F147F2C in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
3# 0x00007FE85F410261 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
4# 0x00007FE85FAA40B8 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
5# 0x00007FE85FAA6ACC in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
6# 0x00007FE85FAA9D99 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
7# 0x00007FE85F3F094B in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
8# 0x00007FE85F3F8BCE in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
9# 0x00007FE85F39F73D in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
10# 0x00007FE85F39F9D5 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
11# deepsparse::ort_engine::init(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, std::shared_ptrwand::parallel::scheduler_factory_t) in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/libdeepsparse.so
12# 0x00007FE8E9282309 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/deepsparse_engine.so
13# 0x00007FE8E92826DE in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/deepsparse_engine.so
14# 0x00007FE8E92C7D0D in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/deepsparse_engine.so
15# 0x00007FE8E9298A74 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/deepsparse/avx2/deepsparse_engine.so
16# _PyMethodDef_RawFastCallDict in /data3/anaconda3/envs/export_yolov7/bin/python
17# _PyObject_FastCallDict in /data3/anaconda3/envs/export_yolov7/bin/python
18# 0x000055E951FD01C3 in /data3/anaconda3/envs/export_yolov7/bin/python
19# PyObject_Call in /data3/anaconda3/envs/export_yolov7/bin/python
20# 0x000055E951F5EF94 in /data3/anaconda3/envs/export_yolov7/bin/python
21# 0x000055E951FDB847 in /data3/anaconda3/envs/export_yolov7/bin/python
22# 0x00007FE86CCFD907 in /data3/anaconda3/envs/export_yolov7/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-37m-x86_64-linux-gnu.so
23# _PyObject_FastCallKeywords in /data3/anaconda3/envs/export_yolov7/bin/python

Please email a copy of this stack trace and any additional information to: [email protected]
Aborted

Additional context

Sparse ONNX model on which the error appeared: link
Sparse pytorch model form which the ONNX model was created: link

Using DeepSparse in the browser - Web Assembly port

Web browsers are generally very resource constrained environments, which makes them a perfect place to use an engine like DeepSparse. Web Assembly is slowly becoming more mature, with features like threads and simd becoming available in major browsers, and with more features on the way soon: https://webassembly.org/roadmap/

The ONNX runtime has recently been ported to Web Assembly: https://github.com/microsoft/onnxruntime/tree/master/js/web and I've used it to create a demos of popular ML models running completely in the browser - including OpenAI's CLIP, for example: https://github.com/josephrocca/clip-image-sorter The problem is always that inference is quite slow, and in my experience WebGL tends to crash easily (WebGPU may help fix this when it is released), or the device just doesn't have enough GPU memory for the model, which forces me to use CPU-based backends.

So I'm wondering if the team has considered porting the DeepSparse engine to wasm via Emscripten?

Converting nanodet onnx model to deepsparse

Dear @jeanniefinks ,

Firstly, thanks for sharing your work.
We are trying to apply sparseml to NanoDet-Plus-m that is considered the most suitable for edge devices til now.

Here are some steps I have been trying:

Download pytorch (.pth) model then convert to onnx model. I even tried: sparseml.onnx_export, I was able to convert to model.onnx, but still failed in the next step.
Convert onnx model to deepsparse. It is similar to the issue #218
Use both deepsparse and sparsify

I already tried on varying environments:

OS: Ubuntu16.4/18.4
CPU: avx avx2, grep -o 'avx[^ ]*' /proc/cpuinfo
Varying deepsparse, onnx/onnxruntine versions
torch: 1.8.2+cpu

Code to produce error:

>>> from deepsparse import compile_model
>>> from deepsparse.utils import generate_random_inputs
>>> batch_size = 1
>>> onnx_filepath = "checkpoints/nanodet-plus-m_320.onnx"
>>> inputs = generate_random_inputs(onnx_filepath, batch_size)
[INFO            onnx.py:176 ] Generating input 'data', type = float32, shape = [1, 3, 320, 320]
>>> engine = compile_model(onnx_filepath, batch_size)

Error:

DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized) (system=avx2, binary=avx2)
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0 (c2458ea3) (release) (optimized)
Date: 03-08-2022 @ 08:50:06 UTC
OS: Linux 7a5617e3c49d 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021
Arch: x86_64
CPU: Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz
Vendor: GenuineIntel
Cores/sockets/threads: [8, 2, 16]
Available cores/sockets/threads: [8, 2, 16]
L1 cache size data/instruction: 32k/32k
L2 cache size: 0.25Mb
L3 cache size: 10Mb
Total memory: 127.793G
Free memory: 10.5767G

Assertion at ./src/include/wand/utility/pyramidal/task_graph_utils.hpp:133

Backtrace:
 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 1# 0x00007F2DCE3D0D08 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 2# 0x00007F2DCE3D5487 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 3# 0x00007F2DCE3D9B76 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 4# 0x00007F2DCE311F6F in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 5# 0x00007F2DCE3140A5 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 6# 0x00007F2DCE315444 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 7# 0x00007F2DCE315819 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 8# 0x00007F2DCE2C6E1B in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
 9# 0x00007F2DCE228704 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
10# 0x00007F2DCE228A32 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
11# 0x00007F2DCE228B78 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
12# 0x00007F2DCE228D5D in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
13# 0x00007F2DCE228FA8 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
14# 0x00007F2DCE229010 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
15# 0x00007F2DCD82BD47 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
16# 0x00007F2DCD8320CF in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
17# 0x00007F2DCD7AA52B in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
18# 0x00007F2DCD79A109 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
19# 0x00007F2DCD79B2C1 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
20# 0x00007F2DCDE266B8 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
21# 0x00007F2DCDE290CC in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
22# 0x00007F2DCDE2C399 in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0
23# 0x00007F2DCD77B9AB in /usr/local/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.10.0

Please email a copy of this stack trace and any additional information to: [email protected]
Aborted

It seems that you have your own onnxruntime?
Could you examine the NanoDet-Plus-m model? I really appreciate your time.

Kernel restart when running jupyter notebooks

Hi, I tried to run the following codes and it seems that it could run smoothly on my mac/terminal, but always died if I run in jupyter notebook:

from sparseml.pytorch.models import ModelRegistry
from sparseml.pytorch.datasets import ImagenetteDataset, ImagenetteSize

The error information:

DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.4.0 (bc00bf8b) (release) (optimized)
Date: 06-05-2021 @ 03:09:41 EDT
OS: Linux gv02.nyu.cluster 4.18.0-193.28.1.el8_2.x86_64 neuralmagic/sparseml#1 SMP Fri Oct 16 13:38:49 EDT 2020
Arch: x86_64
CPU:
Vendor:
Cores/sockets/threads: [0, 0, 0]
Available cores/sockets/threads: [0, 0, 0]
L1 cache size data/instruction: 0k/0k
L2 cache size: 0Mb
L3 cache size: 0Mb
Total memory: 377.337G
Free memory: 334.654G

Assertion at src/lib/core/cpu.cpp:263
Backtrace:
 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
 1# wand::detail::assert_fail(char const*, char const*, int) in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
 2# 0x0000148A66A5E51C in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
 3# 0x0000148A66A5EEDD in /ext3/miniconda3/lib/python3.7/site-packages/deepsparse/avx512/libonnxruntime.so.1.6.0
 4# 0x0000148AF53F8783 in /lib64/ld-linux-x86-64.so.2
 5# 0x0000148AF53FD24F in /lib64/ld-linux-x86-64.so.2
 6# _dl_catch_exception in /lib/x86_64-linux-gnu/libc.so.6
 7# 0x0000148AF53FC81A in /lib64/ld-linux-x86-64.so.2
 8# 0x0000148AF4BD4F96 in /lib/x86_64-linux-gnu/libdl.so.2
 9# _dl_catch_exception in /lib/x86_64-linux-gnu/libc.so.6
10# _dl_catch_error in /lib/x86_64-linux-gnu/libc.so.6
11# 0x0000148AF4BD5745 in /lib/x86_64-linux-gnu/libdl.so.2
12# dlopen in /lib/x86_64-linux-gnu/libdl.so.2
13# _PyImport_FindSharedFuncptr in /ext3/miniconda3/bin/python
14# _PyImport_LoadDynamicModuleWithSpec in /ext3/miniconda3/bin/python
15# 0x000055B382CAAE49 in /ext3/miniconda3/bin/python
16# _PyMethodDef_RawFastCallDict in /ext3/miniconda3/bin/python
17# _PyCFunction_FastCallDict in /ext3/miniconda3/bin/python
18# _PyEval_EvalFrameDefault in /ext3/miniconda3/bin/python
19# _PyEval_EvalCodeWithName in /ext3/miniconda3/bin/python
20# _PyFunction_FastCallKeywords in /ext3/miniconda3/bin/python
21# _PyEval_EvalFrameDefault in /ext3/miniconda3/bin/python
22# _PyFunction_FastCallKeywords in /ext3/miniconda3/bin/python
23# _PyEval_EvalFrameDefault in /ext3/miniconda3/bin/python

version:
I tried to install sparseml using pip install sparseml, and it will install torch with version 1.8.1+cu102 (which I found strange since the doc said sparseml requires <=1.8.0). I also tried to downgrade torch to 1.8.0 but the same error still happens.
The error appears both on CPU or GPU.

Failed to deploy deepsparse in the cluster.

Describe the bug
Failed to deploy deepsparse in the cluster.

Expected behavior

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: ubi8
Python version [e.g. 3.7]: py38
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.7.0
ML framework version(s) [e.g. torch 1.7.1]:
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]:
CPU info - output of deepsparse/src/deepsparse/arch.bin or output of cpu_architecture() as follows:

>>> import deepsparse.cpu
>>> print(deepsparse.cpu.cpu_architecture())

To Reproduce
Image: https://quay.io/repository/thoth-station/neural-magic-deepsparse

Errors

[2021-09-29 16:26:02 +0000] [22] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/opt/app-root/lib64/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/lib64/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/app-root/src/wsgi.py", line 61, in <module>
    from src.neural_magic_model import Model as NeuralMagicModel
  File "/opt/app-root/src/src/neural_magic_model.py", line 27, in <module>
    from deepsparse import compile_model
  File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/__init__.py", line 28, in <module>
    from .engine import *
  File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/engine.py", line 44, in <module>
    from deepsparse.lib import init_deepsparse_lib
  File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/lib.py", line 27, in <module>
    CORES_PER_SOCKET, AVX_TYPE, VNNI = cpu_details()
  File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/cpu.py", line 216, in cpu_details
    arch = cpu_architecture()
  File "/opt/app-root/lib64/python3.8/site-packages/deepsparse/cpu.py", line 167, in cpu_architecture
    raise OSError(
OSError: neuralmagic: cannot determine avx instruction set. Set NM_ARCH to one of avx2,avx512 to continue.

Additional context
Related-To: AICoE/elyra-aidevsecops-tutorial#297

pruned models run faster than unpruned models only when batch size is of certain size (2^n)

Describe the bug
Hi, I have been experimenting with pruning with SparseML and inference with DeepSparse.
There are two bugs/ questions that I would like to ask here:

I have found that for my own pruned models, they run slower on DeepSparse with batch size 1 than the unpruned version.
In fact, the pruned models' speed exceeds the unpruned version when the batch size is >=16.
For models downloaded from the SparseZoo, the pruned model is always faster than the unpruned version even at batch size==1.
Is there any known explanation for this?
For both SparseZoo pruned models and my own pruned models, when doing inference on DeepSparse, the speed is higher when using batch size of size 2^n, starting from 16.
If I change the batch size to 15 or 17 for example, the pruned models' speed decreases abruptly compared to the batch size 16 inference time.
This is not observed for unpruned models. The speed is relatively uniform across different batch sizes.
Is this an expected behavior of the DeepSparse engine?

Expected behavior

Pruned models should be faster than unpruned models on DeepSparse regardless of the batch size.
The inference speed on DeepSparse should be uniform regardless of the batch size.

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 18.04]: Amazon Linux AMI or Ubuntu 18.04 (both tested)
Python version [e.g. 3.7]: Python 3.6.13 or 3.8.5
DeepSparse version or commit hash [e.g. 0.1.0, f7245c8]: 0.11.1
ML framework version(s) [e.g. torch 1.7.1]: torch 1.9
Other Python package versions [e.g. SparseML, Sparsify, numpy, ONNX]: SparseML: 0.11.0
CPU info:
{'vendor': 'GenuineIntel', 'isa': 'avx512', 'vnni': False, 'num_sockets': 1, 'available_sockets': 1, 'cores_per_socket': 2, 'available_cores_per_socket': 2, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 1048576, 'L3_cache_size': 37486592}
or
{'vendor': 'GenuineIntel', 'isa': 'avx2', 'vnni': False, 'num_sockets': 2, 'available_sockets': 2, 'cores_per_socket': 20, 'available_cores_per_socket': 20, 'threads_per_core': 2, 'available_threads_per_core': 2, 'L1_instruction_cache_size': 32768, 'L1_data_cache_size': 32768, 'L2_cache_size': 262144, 'L3_cache_size': 52428800}

To Reproduce
Run the notebook with the corresponding one-shot pruning recipe inside the zip file.
oneshot_pruning.zip
(I show an example of one-shot pruning because it is faster to reproduce, but the same issue can be reproduced with training-aware
pruning.)

Use `deepsparse/examples/ultralytics-yolo/benchmark.py` to benchmark yolov5 base model with torch engine

Hello team. I was trying to use your yolov5 benchmarking script today, to compare fps achieved by default yolov5 model in pytorch and your optimized version running in deepsparse engine. Up until now, I was using your script to measure the performance of prooned and quantized models (as you may know as I was posting about it on your Slack) but I was using my own inference pipeline to measure the performance of the default yolov5 model. However, I'm starting to increasingly come to the conclusion that this comparison is unfair.

In your benchmark you use input in a specific format and size which allows you to reliably compare results.
I used video as input in my default yolov5 test. Today I accidentally made a test on another video and got radically different results. I'm starting to come to the conclusion that the resolution of the input video matters, because of the preprocessing that has to be done on each frame.

As a result, I decided to use your script, hoping that it will allow me to eliminate the variable - related to the input. And I tried using your script in this way:

python benchmark.py \
    zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none \
    --engine torch \
    --batch-size 1 \
    --num-iterations 500 \
    --num-warmup-iterations 100

Unfortunately, I was unsuccessful and the execution ended with an exception:

Loading torch model for zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
Traceback (most recent call last):
  File "benchmark.py", line 518, in <module>
    main()
  File "benchmark.py", line 514, in main
    benchmark_yolo(args)
  File "benchmark.py", line 448, in benchmark_yolo
    model, has_postprocessing = _load_model(args)
  File "benchmark.py", line 381, in _load_model
    model = torch.load(args.model_filepath)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 658, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 231, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 212, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none'

Is that by design? Wouldn't enabling such a benchmark make sense? Or am I just doing something wrong?

YOLOv5l model crashing on c5.12xlarge for batch size > 8

From community slack https://discuss-neuralmagic.slack.com/archives/C020FPF3MQX/p1657890578280219:

mt
8:09 AM
Hello!
I was using deepsparse on a checkpoint of a yolov5l model generated by --one-shot on a c5.12xlarge and got the following error for batch size >=8
2022-07-14 20:06:54 deepsparse.benchmark.benchmark_model INFO Thread pinning to cores enabled
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized) (system=avx512, binary=avx512)
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized)
Date: 07-14-2022 @ 20:06:59 UTC
OS: Linux data-workstation 5.4.0-1072-aws #77~18.04.1-Ubuntu SMP Thu Apr 7 21:38:47 UTC 2022
Arch: x86_64
CPU: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Vendor: GenuineIntel
Cores/sockets/threads: [24, 1, 48]
Available cores/sockets/threads: [24, 1, 48]
L1 cache size data/instruction: 32k/32k
L2 cache size: 1Mb
L3 cache size: 35.75Mb
Total memory: 92.2119G
Free memory: 90.8325G

Assertion at src/lib/engine/execution/pyramidal/exec_graph_utils.cpp:240

Backtrace:
0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
1# wand::detail::assert_fail(char const*, char const*, int) in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
2# 0x00007FEC9F562EBA in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
3# 0x00007FEC9F565E3A in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
4# 0x00007FEC9F550117 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
5# 0x00007FEC9F4C6C01 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
6# 0x00007FEC9F4C8502 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
7# 0x00007FEC9F4C8563 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
8# 0x00007FECA0C4A040 in /home/mt/.local/lib/python3.6/site-packages/deepsparse/avx512/libonnxruntime.so.1.10.0
9# 0x00007FEDCFE776DB in /lib/x86_64-linux-gnu/libpthread.so.0
10# clone in /lib/x86_64-linux-gnu/libc.so.6

Please email a copy of this stack trace and any additional information to: [email protected]
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.12.2 (13bc2991) (release) (optimized)deepsparse_testing.sh: line 2: 30994 Aborted deepsparse.benchmark -b $i checkpoints/logo_l_pruned_quant.onnx
I tried it with the nightly version as well, but it did not work. The process was just killed.

Unable to startup the server for image classification

When I run the script below, the error occur

deepsparse.server \
    --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned-moderate" \
    --task image_classification

ValueError: unsupported task given of image_classification for serve model config task='image_classification' model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned-moderate' batch_size=1 alias=None kwargs={} engine='deepsparse' num_cores=None scheduler='async'
.

neuralmagic / deepsparse Goto Github PK

deepsparse's Introduction

DeepSparse

Sparsity-aware deep learning inference runtime for CPUs

✨NEW✨ DeepSparse LLMs

Try It Now

Sparsity 🤝 Performance

LLM Roadmap

Computer Vision and NLP Models

Installation

System Requirements

Deployment APIs

Engine

Pipeline

Server

Additional Resources

Product Usage Analytics

Community

Get In Touch

License

Cite

All Thanks To Our Contributors

deepsparse's People

Contributors

Stargazers

Watchers

Forkers

deepsparse's Issues

v.0.2 Known Issues

models :

Recommend Projects

Recommend Topics

Recommend Org