Giter Club home page Giter Club logo

clip-onnx's Introduction

GitHub Banner

Stepik Badge Kaggle Badge Codeforces Badge visitors GitHub Lednik7

Hi there ๐Ÿ‘‹

I'm Maxim, a young Data Scientist, creative programmer with Time Series and NLP. I develop various models for practical purposes, create libraries for other programmers, and also quite often participate in professional activities.

Want to know more about me?

I have successfully completed Yandex Lyceum, as well as courses at Tinkoff Education. Now I participate in hackathons to develop my hard and soft skills.

๐Ÿ† GitHub Profile Trophy

trophy

๐Ÿ“ˆ GitHub Stats


Martin's GitHub Stats

๐Ÿ’ผ Skills

More Skills


clip-onnx's People

Contributors

blahblahhhj avatar lednik7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

clip-onnx's Issues

Performance is inconsistent with the original model

Hi, thanks for providing this useful tool!
However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model.
Here is the code I used to test the original model:

model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)

image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77]

image_features = model.encode_image(image)
text_features = model.encode_text(text)

logits_per_image, logits_per_text = model(image, text)
probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()

print("Label probs:", probs) 

The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

Could you help me with this? Thanks!

Textual .forward() override bug.

Hi! First of all, thanks a lot for a great example project!
However, I'm pretty sure I have found a bug. onnx_model.encode_text() produces results that are not consistent with regular model output.
There is a statement in .utils:
take features from the eot embedding (eot_token is the highest number in each sequence).

But ruclip uses eos_id = 3, which clearly is not the highest token in the sequence.
This change was clearly made after you released you repo, but there was no version bump.
So, I tried tracing the model with hardcode eos id and original .where line from ruclip's encode_text:
x = x[torch.arange(x.shape[0]), torch.where(text == 3)[1]] @ self.text_projection
and it worked. I will submit a pull request with a fix, if I have free time later, but for now just wanted you to know, that running the default version of your notebook produces incorrect text_encoding results

Replace the operator of "torch.einsum"

q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk(
3, dim=-1)

@Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator?
this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

how to get dynamic length text onnx model?

i use CLIP-ONNX to convert clip to text onnx and image onnx, but i find that text onnx model cannot support dynamic input text length:
ไผไธšๅพฎไฟกๆˆชๅ›พ_16812019819821

77 is my input text length, but i hope it can support dynamic length.

in the code "DEFAULT_EXPORT",
dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}},
i change it to: dynamic_axes={'input': {0: 'batch_size', 1: "sequence"}, 'output': {0: 'batch_size'}}
but it doesn't work.

how can i get dynamic input text model?

ERROR: missing-standard-symbolic-function

[CLIP ONNX] Start convert visual model
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 1 ERROR ========================
ERROR: missing-standard-symbolic-function

Exporting the operator 'aten::unflatten' to ONNX opset version 11 is not supported. Please feel free to request support or submit
a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.
None

Traceback (most recent call last):
File "convert.py", line 124, in
visual_path, textual_path = make_onnx(vis_model_dir, text_model_dir)
File "convert.py", line 37, in make_onnx
onnx_model.convert2onnx(image, text, verbose=True)
File "/nvme/nvme0//clip/CLIP-ONNX/clip_onnx/clip_converter.py", line 102, in convert2onnx
self.convert_visual(visual_input, visual_wrapper, visual_export_params)
File "/nvme/nvme0//clip/CLIP-ONNX/clip_onnx/clip_converter.py", line 52, in convert_visual
torch.onnx.export(visual,
File "/nvme/nvme0/conda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 506, in export
_export(
File "/nvme/nvme0/anaconda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 1548, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/nvme/nvme0//anaconda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
graph = _optimize_graph(
File "/nvme/nvme0/onda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 665, in _optimize_graph
graph = _C._jit_pass_onnx(graph, operator_export_type)
File "/nvme/nvme0/anaconda3/envs/py3/lib/python3.8/site-packages/torch/onnx/utils.py", line 1901, in _run_symbolic_fun
ction
raise errors.UnsupportedOperatorError(
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::unflatten' to ONNX opset version 11 is not supported. Pl
ease feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.

Can't use CUDAExecutionProvider

Hey, I'm trying to use the code on GPU and I encountered 2 problems:

  1. when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines):
    ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

  1. After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

I can't figure out what is the problem. Any help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.