lednik7 / clip-onnx Goto Github PK

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

License: MIT License

Python 100.00%

clip onnx onnxruntime onnxruntime-gpu python torch pytorch nlp computer-vision

clip-onnx's Introduction

Hi there 👋

I'm Maxim, a young Data Scientist, creative programmer with Time Series and NLP. I develop various models for practical purposes, create libraries for other programmers, and also quite often participate in professional activities.

Want to know more about me?

I have successfully completed Yandex Lyceum, as well as courses at Tinkoff Education. Now I participate in hackathons to develop my hard and soft skills.

🏆 GitHub Profile Trophy

📈 GitHub Stats

💼 Skills

More Skills

clip-onnx's People

Contributors

Stargazers

Watchers

clip-onnx's Issues

Performance is inconsistent with the original model

Hi, thanks for providing this useful tool!
However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model.
Here is the code I used to test the original model:

model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)

image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77]

image_features = model.encode_image(image)
text_features = model.encode_text(text)

logits_per_image, logits_per_text = model(image, text)
probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()

print("Label probs:", probs)

The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

Could you help me with this? Thanks!

Textual .forward() override bug.

Hi! First of all, thanks a lot for a great example project!
However, I'm pretty sure I have found a bug. onnx_model.encode_text() produces results that are not consistent with regular model output.
There is a statement in .utils:
take features from the eot embedding (eot_token is the highest number in each sequence).

But ruclip uses eos_id = 3, which clearly is not the highest token in the sequence.
This change was clearly made after you released you repo, but there was no version bump.
So, I tried tracing the model with hardcode eos id and original .where line from ruclip's encode_text:
x = x[torch.arange(x.shape[0]), torch.where(text == 3)[1]] @ self.text_projection
and it worked. I will submit a pull request with a fix, if I have free time later, but for now just wanted you to know, that running the default version of your notebook produces incorrect text_encoding results

Replace the operator of "torch.einsum"

q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk(
3, dim=-1)

@Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator?
this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

how to get dynamic length text onnx model?

i use CLIP-ONNX to convert clip to text onnx and image onnx, but i find that text onnx model cannot support dynamic input text length:

77 is my input text length, but i hope it can support dynamic length.

in the code "DEFAULT_EXPORT",
dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}},
i change it to: dynamic_axes={'input': {0: 'batch_size', 1: "sequence"}, 'output': {0: 'batch_size'}}
but it doesn't work.

how can i get dynamic input text model?

ERROR: missing-standard-symbolic-function

[CLIP ONNX] Start convert visual model
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 1 ERROR ========================
ERROR: missing-standard-symbolic-function

Exporting the operator 'aten::unflatten' to ONNX opset version 11 is not supported. Please feel free to request support or submit
a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.
None

Traceback (most recent call last):
File "convert.py", line 124, in
visual_path, textual_path = make_onnx(vis_model_dir, text_model_dir)
File "convert.py", line 37, in make_onnx
onnx_model.convert2onnx(image, text, verbose=True)
File "/nvme/nvme0//clip/CLIP-ONNX/clip_onnx/clip_converter.py", line 102, in convert2onnx
self.convert_visual(visual_input, visual_wrapper, visual_export_params)
File "/nvme/nvme0//clip/CLIP-ONNX/clip_onnx/clip_converter.py", line 52, in convert_visual
torch.onnx.export(visual,
File "/nvme/nvme0/conda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 506, in export
_export(
File "/nvme/nvme0/anaconda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 1548, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/nvme/nvme0//anaconda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
graph = _optimize_graph(
File "/nvme/nvme0/onda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 665, in _optimize_graph
graph = _C._jit_pass_onnx(graph, operator_export_type)
File "/nvme/nvme0/anaconda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 1901, in _run_symbolic_fun
ction
raise errors.UnsupportedOperatorError(
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::unflatten' to ONNX opset version 11 is not supported. Pl
ease feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.

Exception when running step 2

Exporting the operator 'aten::unflatten' to ONNX opset version 12 is not supported

Can't use CUDAExecutionProvider

Hey, I'm trying to use the code on GPU and I encountered 2 problems:

when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines):
ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

I can't figure out what is the problem. Any help?

Error on installing the torch version in requirements.txt

pip install git+https://github.com/Lednik7/CLIP-ONNX.git

ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
ERROR: No matching distribution found for torch==1.11.0+cu113

python version is 3.7.13

Examples don't work

ERROR: No matching distribution found for onnxruntime==1.11

Hi, Thanks for the great work!

I am having this error when I try to install the package.

ERROR: No matching distribution found for onnxruntime==1.11

Maybe we can update the requirements.txt?