Comments (3)
Hey @pradeepdev-1995 this document would be a good starting point. Just make sure to export to ONNX using SparseML https://github.com/neuralmagic/deepsparse/blob/main/docs/use-cases/nlp/text-classification.md
from deepsparse.
The /data/onnx_model
directory contain the onnx model file and tokenizer files
then tried this code
from deepsparse import Pipeline
# download onnx from sparsezoo and compile with batch size 1
sparsezoo_stub = "/data/onnx_model"
pipeline = Pipeline.create(
task="text-classification",
model_path=sparsezoo_stub, # sparsezoo stub or path to local ONNX
batch_size=1 # default batch size is 1
)
# run inference
sequences = [[
"Which is the best gaming laptop under 40k?"
]]
prediction = pipeline(sequences)
print(prediction)
but following error comes
---------------------------------------------------------------------------
DecodeError Traceback (most recent call last)
Cell In[2], line 5
3 # download onnx from sparsezoo and compile with batch size 1
4 sparsezoo_stub = "/data/onnx_model"
----> 5 pipeline = Pipeline.create(
6 task="text-classification",
7 model_path=sparsezoo_stub, # sparsezoo stub or path to local ONNX
8 batch_size=1 # default batch size is 1
9 )
11 # run inference
12 sequences = [[
13 "Which is the best gaming laptop under 40k?"
14 ]]
File /tmp/pip_packages/deepsparse/base_pipeline.py:210, in BasePipeline.create(task, **kwargs)
204 buckets = pipeline_constructor.create_pipeline_buckets(
205 task=task,
206 **kwargs,
207 )
208 return BucketingPipeline(pipelines=buckets)
--> 210 return pipeline_constructor(**kwargs)
File /tmp/pip_packages/deepsparse/transformers/pipelines/text_classification.py:152, in TextClassificationPipeline.__init__(self, top_k, return_all_scores, **kwargs)
145 def __init__(
146 self,
147 *,
(...)
150 **kwargs,
151 ):
--> 152 super().__init__(**kwargs)
154 self._top_k = _get_top_k(top_k, return_all_scores, self.config.num_labels)
155 self._return_all_scores = return_all_scores
File /tmp/pip_packages/deepsparse/transformers/pipelines/pipeline.py:110, in TransformersPipeline.__init__(self, sequence_length, trust_remote_code, config, tokenizer, **kwargs)
105 self._delay_overwriting_inputs = (
106 kwargs.pop("_delay_overwriting_inputs", None) or False
107 )
108 self._temp_model_directory = None
--> 110 super().__init__(**kwargs)
File /tmp/pip_packages/deepsparse/pipeline.py:199, in Pipeline.__init__(self, model_path, engine_type, batch_size, num_cores, num_streams, scheduler, input_shapes, context, executor, benchmark, _delay_engine_initialize, **kwargs)
196 self._engine_args["scheduler"] = scheduler
197 self._engine_args["num_streams"] = num_streams
--> 199 self.onnx_file_path = self.setup_onnx_file_path()
201 if _delay_engine_initialize:
202 self.engine = None
File /tmp/pip_packages/deepsparse/transformers/pipelines/pipeline.py:153, in TransformersPipeline.setup_onnx_file_path(self)
141 self.tokenizer = AutoTokenizer.from_pretrained(
142 deployment_path,
143 trust_remote_code=self._trust_remote_code,
144 model_max_length=self.sequence_length,
145 )
147 if not self._delay_overwriting_inputs:
148 # overwrite onnx graph to given required input shape
149 (
150 onnx_path,
151 self.onnx_input_names,
152 self._temp_model_directory,
--> 153 ) = overwrite_transformer_onnx_model_inputs(
154 onnx_path, max_length=self.sequence_length
155 )
157 if not self.config or not self.tokenizer:
158 raise RuntimeError(
159 "Invalid config or tokenizer provided. Please provide "
160 "paths to the files or ensure they exist in the `model_path` provided. "
161 "See `tokenizer` and `config` arguments for details."
162 )
File /tmp/pip_packages/deepsparse/transformers/helpers.py:110, in overwrite_transformer_onnx_model_inputs(path, batch_size, max_length, inplace)
92 """
93 Overrides an ONNX model's inputs to have the given batch size and sequence lengths.
94 Assumes that these are the first and second shape indices of the given model inputs
(...)
105 `inplace=False`, else None)
106 """
107 # overwrite input shapes
108 # if > 2Gb model is to be modified in-place, operate
109 # exclusively on the model graph
--> 110 model = onnx.load(path, load_external_data=not inplace)
111 initializer_input_names = set([node.name for node in model.graph.initializer])
112 external_inputs = [
113 inp for inp in model.graph.input if inp.name not in initializer_input_names
114 ]
File /tmp/pip_packages/onnx/__init__.py:170, in load_model(f, format, load_external_data)
156 """Loads a serialized ModelProto into memory.
157
158 Args:
(...)
167 Loaded in-memory ModelProto.
168 """
169 s = _load_bytes(f)
--> 170 model = load_model_from_string(s, format=format)
172 if load_external_data:
173 model_filepath = _get_file_path(f)
File /tmp/pip_packages/onnx/__init__.py:212, in load_model_from_string(***failed resolving arguments***)
202 """Loads a binary string (bytes) that contains serialized ModelProto.
203
204 Args:
(...)
209 Loaded in-memory ModelProto.
210 """
211 del format # Unused
--> 212 return _deserialize(s, ModelProto())
File /tmp/pip_packages/onnx/__init__.py:143, in _deserialize(s, proto)
140 if not (hasattr(proto, "ParseFromString") and callable(proto.ParseFromString)):
141 raise TypeError(f"No ParseFromString method is detected. Type is {type(proto)}")
--> 143 decoded = typing.cast(Optional[int], proto.ParseFromString(s))
144 if decoded is not None and decoded != len(s):
145 raise google.protobuf.message.DecodeError(
146 f"Protobuf decoding consumed too few bytes: {decoded} out of {len(s)}"
147 )
DecodeError: Error parsing message
from deepsparse.
Related Issues (20)
- Support for Llama2? HOT 2
- Paper: New algorithm for pruning HOT 3
- Does he support stable diffusion or fooocus, or other image generation acceleration? HOT 1
- Segmentation fault (core dumped) HOT 6
- https://www.therapyinsightspractice.com HOT 1
- Adding Contributors Section to readme.md HOT 1
- Research: 4-bit quantization HOT 5
- Assertion `!cache_sizes.empty()' failed HOT 2
- transformers_embedding-extraction for text-generation tasks HOT 3
- Question on quantization size HOT 1
- NM: error: Node (/model/Add_1) Op (Add) [ShapeInferenceError] Incompatible dimension HOT 5
- Using output_value as "token_embeddings" is broken for Sentence Transformer HOT 2
- docker access denied error HOT 8
- Unsupported ONNX type 10 for FP16 HOT 5
- Assertion at src/lib/core/topology.cpp:627 HOT 1
- yolo-v8 in onnx-runtime outperforms deepsparse on iMX8 HOT 3
- Python3.12? HOT 3
- Purpose of exporter.export_onnx(sample_batch=torch.randn(1, 1, 28, 28)) HOT 4
- deepsparse.TextGeneration doesn't accept `trust_remote_code` as an arg anymore HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepsparse.