Comments (6)
Ok so I don't know exactly how to do the inference in onnx (I played around with it a little bit but it seemed rather tricky to do and I abandoned/postponed it), so I'll just tell you what I know.
This model has an encoder-decoder architecture. The encoder gets an image as an input and outputs a feature vector (this is encoder_last_hidden_state
). Then, this feature vector is passed to the decoder, which is run iteratively, outputting one token at a time until reaching a special token indicating the end of the sequence. At each step, the decoder is being fed the sequence of all the tokens it had outputted so far (this is decoder_input_ids
). The decoder outputs the tokens as logits - each token is represented by a vector of 6144 values corresponding to the vocab, as you correctly noticed. You can take the argmax from the logits vector to get the most probable token, but what actually happens is the logits are converted to probabilities and top N hypotheses are considered at each step (beam search). This is done by HuggingFace's generate
method, which is unfortunately quite complex.
The tricky part is replicating the beam search (although it could be replaced with a simpler greedy search at cost of some accuracy drop) and getting all the little details right when passing around the tensors.
BTW I suppose that there is something wrong with both yours and that bot's onnx export. I think that there should be separate onnx file for encoder and decoder, since the encoder is run only once per inference and then the decoder is run iteratively until the end of the sequence is reached.
from manga-ocr.
Conversion to ONNX is possible and has been done, but inference in other languages is not trivial, mainly because HuggingFace's generate
method (basically beam search) would have to be reimplemented. Tokenizer and post-processing are relatively easy to substitute; tokenizer's decode
can be replaced with a look-up table and the rest are some rather simple operations on strings.
from manga-ocr.
An automated conversion of the model to ONNX can be found here https://huggingface.co/kha-white/manga-ocr-base/blob/refs%2Fpr%2F3/model.onnx
from manga-ocr.
@kha-white Thanks for your reply, I have converted an ONNX model with input and output shapes as follows:
Input Details:
Name: pixel_values, Shape: ['batch_size', 'num_channels', 'height', 'width'], Type: tensor(float)
Name: decoder_input_ids, Shape: ['batch_size', 'decoder_sequence_length'], Type: tensor(int64)
Output Details:
Name: logits, Shape: ['batch_size', 'decoder_sequence_length', 6144], Type: tensor(float)
I understand that pixel_values
is a tensor of the bitmap, but I am unclear about decoder_input_ids
. Does it come from data in the manga109 training set?
logits
is a tensor of length 6144, I am currently guessing it is to be used in conjuction with vocab.txt
.
@Mar2ck Wow I didn't see there is even a bot that turns model into onnx automatically! But I see it is a little different from mine:
Input Details:
Name: pixel_values, Shape: ['batch_size', 'num_channels', 'height', 'width'], Type: tensor(float)
Name: decoder_input_ids, Shape: ['batch_size', 'decoder_sequence_length'], Type: tensor(int64)
Output Details:
Name: logits, Shape: ['batch_size', 'decoder_sequence_length', 6144], Type: tensor(float)
Name: encoder_last_hidden_state, Shape: ['batch_size', 'encoder_sequence_length', 768], Type: tensor(float)
It has encoder_last_hidden_state
in output which mine doesn't have. Have no clue why it happens.
from manga-ocr.
@kha-white Thank you for your reply, I'm starting to understand the whole workflow. I am currently looking for a solution regarding generate
function. I saw some stuff about BeamSearch in the onnxruntime repository, but haven't delved into it yet. As for the onnx model issue, it's actually possible to create two separate models, but I had previously merged them while generating because I didn't have a deep understanding of the model. Again thank you very much for your response.
from manga-ocr.
@LancerComet Hi, would you like to share your method to export the pre-trained model to onnx format? I am getting the below errors when exporting with optimum-cli:
$ optimum-cli export onnx --model kha-white/manga-ocr-base bin/ (base)
Framework not specified. Using pt to export to ONNX.
Automatic task detection to image-to-text-with-past.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
/Users/mayo/miniconda3/lib/python3.11/site-packages/transformers/models/vit/feature_extraction_vit.py:28: FutureWarning: The class ViTFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ViTImageProcessor instead.
warnings.warn(
Traceback (most recent call last):
File "/Users/mayo/miniconda3/bin/optimum-cli", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 163, in main
service.run()
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/commands/export/onnx.py", line 232, in run
main_export(
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 399, in main_export
onnx_config, models_and_onnx_configs = _get_submodels_and_onnx_configs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 82, in _get_submodels_and_onnx_configs
onnx_config = onnx_config_constructor(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/base.py", line 623, in with_past
return cls(
^^^^
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/model_configs.py", line 1231, in __init__
super().__init__(
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/config.py", line 322, in __init__
raise ValueError(
ValueError: The decoder part of the encoder-decoder model is bert which does not need past key values.
Updated:
Got exporting succucced. Run
optimum-cli export onnx --model kha-white/manga-ocr-base --task vision2seq-lm bin/
from manga-ocr.
Related Issues (20)
- OCR Issue HOT 1
- MangaOcr tries to load an example for some reason HOT 2
- Degraded performace after reinstall. HOT 6
- How to deploy GPU for execution?
- error importing MangaOcr HOT 1
- Manga_ocr OFFLINE (enhancement)
- Stuck on 'Reading from directory'
- I keep getting this error when trying to install via powershell HOT 3
- [Feauture request] Add training doc
- 'NoneType' object has no attribute 'shape'
- AttributeError: can't set attribute 'feature_extractor'
- ROCm support HOT 1
- (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))
- Create a Nix package HOT 3
- Is it possible to release a binary with the models? HOT 1
- I have a problme when i try to download the manga_ocr_base from huggingface.co, and i fail to get the access HOT 1
- How to read the full contents of a manga page? HOT 2
- Warning message at start-up
- Error when closing manga-ocr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from manga-ocr.