Hi, after reading the code, it seems that the pretrained weights need to be used in co

An automated conversion of the model to ONNX can be found here <a href="https://huggin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Is it possible to convert the pretrained model into onnx format? about manga-ocr HOT 6 OPEN

LancerComet commented on May 20, 2024 1

Is it possible to convert the pretrained model into onnx format?

from manga-ocr.

Comments (6)

kha-white commented on May 20, 2024 4

Ok so I don't know exactly how to do the inference in onnx (I played around with it a little bit but it seemed rather tricky to do and I abandoned/postponed it), so I'll just tell you what I know.

This model has an encoder-decoder architecture. The encoder gets an image as an input and outputs a feature vector (this is encoder_last_hidden_state). Then, this feature vector is passed to the decoder, which is run iteratively, outputting one token at a time until reaching a special token indicating the end of the sequence. At each step, the decoder is being fed the sequence of all the tokens it had outputted so far (this is decoder_input_ids). The decoder outputs the tokens as logits - each token is represented by a vector of 6144 values corresponding to the vocab, as you correctly noticed. You can take the argmax from the logits vector to get the most probable token, but what actually happens is the logits are converted to probabilities and top N hypotheses are considered at each step (beam search). This is done by HuggingFace's generate method, which is unfortunately quite complex.

The tricky part is replicating the beam search (although it could be replaced with a simpler greedy search at cost of some accuracy drop) and getting all the little details right when passing around the tensors.

BTW I suppose that there is something wrong with both yours and that bot's onnx export. I think that there should be separate onnx file for encoder and decoder, since the encoder is run only once per inference and then the decoder is run iteratively until the end of the sequence is reached.

from manga-ocr.

kha-white commented on May 20, 2024

Conversion to ONNX is possible and has been done, but inference in other languages is not trivial, mainly because HuggingFace's generate method (basically beam search) would have to be reimplemented. Tokenizer and post-processing are relatively easy to substitute; tokenizer's decode can be replaced with a look-up table and the rest are some rather simple operations on strings.

from manga-ocr.

Mar2ck commented on May 20, 2024

An automated conversion of the model to ONNX can be found here https://huggingface.co/kha-white/manga-ocr-base/blob/refs%2Fpr%2F3/model.onnx

from manga-ocr.

LancerComet commented on May 20, 2024

@kha-white Thanks for your reply, I have converted an ONNX model with input and output shapes as follows:

Input Details:
Name: pixel_values, Shape: ['batch_size', 'num_channels', 'height', 'width'], Type: tensor(float)
Name: decoder_input_ids, Shape: ['batch_size', 'decoder_sequence_length'], Type: tensor(int64)

Output Details:
Name: logits, Shape: ['batch_size', 'decoder_sequence_length', 6144], Type: tensor(float)

I understand that pixel_values is a tensor of the bitmap, but I am unclear about decoder_input_ids. Does it come from data in the manga109 training set?

logits is a tensor of length 6144, I am currently guessing it is to be used in conjuction with vocab.txt.

@Mar2ck Wow I didn't see there is even a bot that turns model into onnx automatically! But I see it is a little different from mine:

Input Details:
Name: pixel_values, Shape: ['batch_size', 'num_channels', 'height', 'width'], Type: tensor(float)
Name: decoder_input_ids, Shape: ['batch_size', 'decoder_sequence_length'], Type: tensor(int64)

Output Details:
Name: logits, Shape: ['batch_size', 'decoder_sequence_length', 6144], Type: tensor(float)
Name: encoder_last_hidden_state, Shape: ['batch_size', 'encoder_sequence_length', 768], Type: tensor(float)

It has encoder_last_hidden_state in output which mine doesn't have. Have no clue why it happens.

from manga-ocr.

LancerComet commented on May 20, 2024

@kha-white Thank you for your reply, I'm starting to understand the whole workflow. I am currently looking for a solution regarding generate function. I saw some stuff about BeamSearch in the onnxruntime repository, but haven't delved into it yet. As for the onnx model issue, it's actually possible to create two separate models, but I had previously merged them while generating because I didn't have a deep understanding of the model. Again thank you very much for your response.

from manga-ocr.

mayocream commented on May 20, 2024

@LancerComet Hi, would you like to share your method to export the pre-trained model to onnx format? I am getting the below errors when exporting with optimum-cli:

$ optimum-cli export onnx --model kha-white/manga-ocr-base bin/                                                                                                     (base) 
Framework not specified. Using pt to export to ONNX.
Automatic task detection to image-to-text-with-past.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
/Users/mayo/miniconda3/lib/python3.11/site-packages/transformers/models/vit/feature_extraction_vit.py:28: FutureWarning: The class ViTFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ViTImageProcessor instead.
  warnings.warn(
Traceback (most recent call last):
  File "/Users/mayo/miniconda3/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 163, in main
    service.run()
  File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/commands/export/onnx.py", line 232, in run
    main_export(
  File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 399, in main_export
    onnx_config, models_and_onnx_configs = _get_submodels_and_onnx_configs(
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 82, in _get_submodels_and_onnx_configs
    onnx_config = onnx_config_constructor(
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/base.py", line 623, in with_past
    return cls(
           ^^^^
  File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/model_configs.py", line 1231, in __init__
    super().__init__(
  File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/config.py", line 322, in __init__
    raise ValueError(
ValueError: The decoder part of the encoder-decoder model is bert which does not need past key values.

Updated:

Got exporting succucced. Run
optimum-cli export onnx --model kha-white/manga-ocr-base --task vision2seq-lm bin/

from manga-ocr.

Is it possible to convert the pretrained model into onnx format? about manga-ocr HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent