lukas-blecher / latex-ocr Goto Github PK

View Code? Open in Web Editor NEW

11.6K 72.0 956.0 9.24 MB

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Home Page: https://lukas-blecher.github.io/LaTeX-OCR/

License: MIT License

Python 96.61% JavaScript 1.75% Jupyter Notebook 1.56% Dockerfile 0.05% Shell 0.02%

machine-learning transformer im2latex deep-learning image2text latex dataset pytorch im2markup ocr

latex-ocr's People

Contributors

Stargazers

Watchers

Forkers

stjordanis adaikalaraj weiwenlan gopinathcool r-haecker katie-lim dsp6414 juice91 zyh121382 hongjea-park nguyendinhduc jhxu-org geek3000 zzvdl soumyabrotobanerjee yongwookha texervn futurepaycc daydreamer2023 yuelupenbgpeng123 tobeeeelite caizy1709 longlongvip yinjc jurision tjxj trendingtechnology soheean catherineco ast-363 handsonic cloudyw yuxxinwang toread-jxj nepgearg tanphamnewbie saitoasukakawaii tjx222 yanqi1811 zixijuns brian-wuu lijianghu setyanp jcgoran super-zoe heyeshuang sararajeshb yu45020 kwon-jaehong shyamalschandra key7men xqy266 aartea subhashreek dotpyu tducthang phu-minh lv-tuan tungnguyen1234 duydthiph ductho9799 tongocduy1601 mlx15 hydrogen1999 xxcatullusxx danielphamvt hieuqn cxqntnt huyhoang17 hmthanh trungtv1207 kienkauko tuananh1406 rafaelmri htrang28 nguyenduyphuc ai-motive haidang124 dangxuanvuong98 trungit2001 realasking szha0068 vthuan3779 pritam-dey3 adelbennaceur wyh9297 weiquanpan jaakko-paavola kinyusui make-magic 93renke huynhnhathao thanhkaist pankajkarman hellmo718 wyukang muhammad-yousef ruacon35 sealed-puzzle binghuang2018

latex-ocr's Issues

RuntimeError: The size of tensor a (41) must match the size of tensor b (37) at non-singleton dimension 1

I have this error while training.

I use my own data and also use dataset.py to group label and images. (So I already made train, valid pkl files)

Is there any solution for this error? ( I think problem caused by width and height of images in yaml file..)

Deficiencies in recognition

It always recognize 0 as () or O.It seems that the simpler the formula, the less likely it is to predict success.

I'm trying to install using the
pip install -r requirements.txt
line of code, but it seems like my computer is stuck in an infinite loop. I successfully installed Pytorch and Python 3.7 before running the requirements line. At first it was unsuccessful and the error line recommended trying --user. No dice. I appreciate your help! I'm excited to try out your code.

what is the -tokenizer path_to_tokenizer

what is the -tokenizer path_to_tokenizer
python dataset/dataset.py --equations path_to_textfile --images path_to_images --tokenizer path_to_tokenizer --out dataset.pkl

snip failed

I found some problems happed in the process of snip,

/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

This problem leads to failure of prediction for latex.

ValueError: Expected positive integer steps_per_epoch, but got 0

Hi, I follow the README and I have this error while training.
Is there any solution for this error?
I appreciate your help! I'm excited to try out your code

.

i get an error qmutex destroying locked mutex when i'm trying to get the snippet picture, how do i fix this?

the error is not showing what line should i concern about.

Munch attribute error

I get the above error when trying to run pix2tex.py. How can I resolve this? I am running on Windows 10.

Model trained with the latest commit seems to be not working

Hi, I retrained the model with your latest commit c7898ab, and when I tried to run the pix2tex.py I got the below errors, It seems it not able to load the trained model, do you have any idea on that?

Traceback (most recent call last):
File "pix2tex.py", line 136, in
args, *objs = initialize(arguments)
File "pix2tex.py", line 49, in initialize
model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1224, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
Missing key(s) in state_dict: "decoder.net.attn_layers.layers.0.1.to_out.0.weight", "decoder.net.attn_layers.layers.0.1.to_out.0.bias", "decoder.net.attn_layers.layers.1.1.to_out.0.weight", "decoder.net.attn_layers.layers.1.1.to_out.0.bias", "decoder.net.attn_layers.layers.2.1.net.0.proj.weight", "decoder.net.attn_layers.layers.2.1.net.0.proj.bias", "decoder.net.attn_layers.layers.3.1.to_out.0.weight", "decoder.net.attn_layers.layers.3.1.to_out.0.bias", "decoder.net.attn_layers.layers.4.1.to_out.0.weight", "decoder.net.attn_layers.layers.4.1.to_out.0.bias", "decoder.net.attn_layers.layers.5.1.net.0.proj.weight", "decoder.net.attn_layers.layers.5.1.net.0.proj.bias", "decoder.net.attn_layers.layers.6.1.to_out.0.weight", "decoder.net.attn_layers.layers.6.1.to_out.0.bias", "decoder.net.attn_layers.layers.7.1.to_out.0.weight", "decoder.net.attn_layers.layers.7.1.to_out.0.bias", "decoder.net.attn_layers.layers.8.1.net.0.proj.weight", "decoder.net.attn_layers.layers.8.1.net.0.proj.bias", "decoder.net.attn_layers.layers.9.1.to_out.0.weight", "decoder.net.attn_layers.layers.9.1.to_out.0.bias", "decoder.net.attn_layers.layers.10.1.to_out.0.weight", "decoder.net.attn_layers.layers.10.1.to_out.0.bias", "decoder.net.attn_layers.layers.11.1.net.0.proj.weight", "decoder.net.attn_layers.layers.11.1.net.0.proj.bias".
Unexpected key(s) in state_dict: "encoder.patch_embed.backbone.stages.0.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm3.bias", "decoder.net.attn_layers.layers.0.1.to_out.weight", "decoder.net.attn_layers.layers.0.1.to_out.bias", "decoder.net.attn_layers.layers.1.1.to_out.weight", "decoder.net.attn_layers.layers.1.1.to_out.bias", "decoder.net.attn_layers.layers.2.1.net.0.0.weight", "decoder.net.attn_layers.layers.2.1.net.0.0.bias", "decoder.net.attn_layers.layers.3.1.to_out.weight", "decoder.net.attn_layers.layers.3.1.to_out.bias", "decoder.net.attn_layers.layers.4.1.to_out.weight", "decoder.net.attn_layers.layers.4.1.to_out.bias", "decoder.net.attn_layers.layers.5.1.net.0.0.weight", "decoder.net.attn_layers.layers.5.1.net.0.0.bias", "decoder.net.attn_layers.layers.6.1.to_out.weight", "decoder.net.attn_layers.layers.6.1.to_out.bias", "decoder.net.attn_layers.layers.7.1.to_out.weight", "decoder.net.attn_layers.layers.7.1.to_out.bias", "decoder.net.attn_layers.layers.8.1.net.0.0.weight", "decoder.net.attn_layers.layers.8.1.net.0.0.bias", "decoder.net.attn_layers.layers.9.1.to_out.weight", "decoder.net.attn_layers.layers.9.1.to_out.bias", "decoder.net.attn_layers.layers.10.1.to_out.weight", "decoder.net.attn_layers.layers.10.1.to_out.bias", "decoder.net.attn_layers.layers.11.1.net.0.0.weight", "decoder.net.attn_layers.layers.11.1.net.0.0.bias".

test failed with this file: pix2tex.py

Hi, hello, I am a newbie. I would like to ask how to use pix2tex.py for testing. I entered the image path as shown in the figure below:

Thank you very much for your help！

gui does not show the original text

Hey guys,

thanks for working on this, its a cool project. I have installed it and am using the GUI on win10. Here is what i see:

There is no upper part of the GUI, perhaps its just the copy of the screenshot so its nothing, but just asking if this might impact the functionality.

Finally I would like to know, how can I retranslate the LATEX OCR:

$\scriptstyle\pi(\infty);=;1\times1$

from

back into readable format.

Thanks!

Code understanding

I want to understand the whole code please direct me where i can find help

training time and equipment

Hi, thanks for your sharing code. I want to use this dataset to train a similar model. So I'd like to know how long your model has been trained and what kind of machine did you use?

how to ocr low width latex image

If image width is too low, the ocr result will useless.

I have try to reduce patch_size to 8, but the error occured:
Exception has occurred: RuntimeError The size of tensor a (33) must match the size of tensor b (129) at non-singleton dimension 1 File "F:\code\LaTeX-OCR\models.py", line 81, in forward_features x += self.pos_embed[:, pos_emb_ind] File "F:\code\LaTeX-OCR\train.py", line 48, in train encoded = encoder(im.to(device)) File "F:\code\LaTeX-OCR\train.py", line 88, in <module> train(args)

I have struggle this issue several days, Please tell me what can I do for this situation.

Thank you very much!

No module named 'PyQt5.QtWebEngineWidgets'

I'm getting this error every time I try to run gui.py:

Traceback (most recent call last):
  File "gui.py", line 6, in <module>
    from PyQt5.QtWebEngineWidgets import QWebEngineView
ModuleNotFoundError: No module named 'PyQt5.QtWebEngineWidgets'

Regarding PyQt5 related packages installed, this is what I have

PyQt5==5.15.5
PyQt5-Qt5==5.15.2
PyQt5-sip==12.9.0
PyQtWebEngine-Qt5==5.15.2

google/protobuf/pyext/descriptor.cc:358: bad argument to internal function

Hello, i got this error went i run python pix2tex.

I have torch 1.7 with cuda.

Training?

Really Nice Work.
Can you please provide a proper pipeline on how to train with own data.
I tried to use your formulae images and config (from google drive) to train but got error
RuntimeError: merge_sort: failed to synchronize: device-side assert triggered
Can you help in training.
Thanks

Model fails for a simple tex

Hi,

it's a great project, many thanks for it! The model needs some work though, I found it failing for relatively simple examples like this one: $\mathcal{X} \times \Theta \to \overbar{R}$ spitting out very different results in consecutive prediction attempts. Fundamentally this issue could be solved by the introduction of confidence thresholds...

Cheers,
Wojtek

Compatibility : Missing key(s)

Hi, author.
Thanks for your work. It will be great convenient for converting formulas to latex codes offline by this software.
I had installed its requirements under manjaro Linux today; however, it still threw an error like below if I executed the command python gui.py:

Traceback (most recent call last):
  File ".../LaTeX-OCR/gui.py", line 274, in <module>
    ex = App(arguments)
  File ".../LaTeX-OCR/gui.py", line 26, in __init__
    self.initModel()
  File ".../LaTeX-OCR/gui.py", line 33, in initModel
    args, *objs = pix2tex.initialize(self.args)
  File ".../LaTeX-OCR/pix2tex.py", line 55, in initialize
    model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
        Missing key(s) in state_dict: "decoder.net.attn_layers.layers.2.1.net.3.weight", "decoder.net.attn_layers.layers.2.1.net.3.bias", "decoder.net.attn_layers.layers.5.1.net.3.weight", "decoder.net.attn_layers.layers.5.1.net.3.bias", "decoder.net.attn_layers.layers.8.1.net.3.weight", "decoder.net.attn_layers.layers.8.1.net.3.bias", "decoder.net.attn_layers.layers.11.1.net.3.weight", "decoder.net.attn_layers.layers.11.1.net.3.bias". 
        Unexpected key(s) in state_dict: "decoder.net.attn_layers.layers.2.1.net.2.weight", "decoder.net.attn_layers.layers.2.1.net.2.bias", "decoder.net.attn_layers.layers.5.1.net.2.weight", "decoder.net.attn_layers.layers.5.1.net.2.bias", "decoder.net.attn_layers.layers.8.1.net.2.weight", "decoder.net.attn_layers.layers.8.1.net.2.bias", "decoder.net.attn_layers.layers.11.1.net.2.weight", "decoder.net.attn_layers.layers.11.1.net.2.bias".

I would like to know what caused this problem and how can I run the code correctly?
Thanks.

Image grab not supported in Linux

Take a snapshot of the clipboard image, if any. Only macOS and Windows are currently supported.
https://pillow.readthedocs.io/en/stable/reference/ImageGrab.html#PIL.ImageGrab.grabclipboard

pix2tex.py uses this method but it is not supported in Linux.
Need to find a workaround.

Missing key(s) in state_dict

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
Missing key(s) in state_dict:

how to get the data ?

Data
We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k dataset. All of it can be found here.

where is the wikipedia data ? how to use it ?
where is the arXiv data ? how to use it ?

Model training speed

How fast should the training of the model be? I'm using the data provided in the google drive link and using default.yaml. At the current rate, I'm needing more than a day to train the model. Is there a way to shorten the amount of time considerably?

Use GPU to train

Dear author:
Hello! I would like to ask how to use GPU to train my own data set?

‘temperature’ parameter

How should the temperature parameter be set so that the output is as consistent as possible each time？

)can't snip outside window (i3-wm)

I just found your great tool and it seems that the way you capture in a windows does not allow for tiling managers like i3-wm. The snipping seems to open a separate windows instead of creating a layer on top of all windows

.

"--no-cuda" does not work

When using the --no-cuda argument, it returns an error.

(env) λ python pix2tex.py --no-cuda
Traceback (most recent call last):
  File "H:\pytlat\ocr\pix2tex.py", line 84, in <module>
    args, model, tokenizer = initialize(args)
  File "H:\pytlat\ocr\pix2tex.py", line 33, in initialize
    model.load_state_dict(torch.load(args.checkpoint))
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 853, in _load
    result = unpickler.load()
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 845, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 834, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I use torch 1.7.+cpu, cuda version is not installed, and can't use cuda.

Feature extraction

Hi,

Can we use your trained model, to extract math features from an image. I mean can we take the layer before the prediction layer for feature extractions?

[image_resizer.pth] RuntimeError: Error(s) in loading state_dict for Model

Hello I tried using the image_resizer.pth weight because my data often have large image size, but when I modified the pix2tex file checkpoint argument to image_resizer.pth, I got a runtime error as appeared on this screenshot I attached. Does anyone ever tried image_resizer.pth? Or any suggestions to solve this issue?

Thank you in advance.

Here's how I modify the checkpoint argument:

how to create new dataset for testing?

Error: Index out of range in self during the model training

I tried to train the model, in the CPU, but received the below error; not sure what could be the cause?

Loss: 1.0180: 2%|█▉ | 421/18013 [09:41<6:45:16, 1.38s/it]
Traceback (most recent call last):
File "train.py", line 94, in
train(args)
File "train.py", line 52, in train
loss = decoder(tgt_seq, mask=tgt_mask, context=encoded)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/autoregressive_wrapper.py", line 102, in forward
out = self.net(xi, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/x_transformers.py", line 738, in forward
x += self.pos_emb(x)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/x_transformers.py", line 107, in forward
return self.emb(n)[None, :, :]
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 156, in forward
return F.embedding(
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/functional.py", line 1916, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

Hello, author. Thank you for your open source. Unfortunately, in the course of the experiment, I found that only train TXT, test and val do not have corresponding txt files? I'd like to ask where this document is

How to use the model without using gui.py?

asumming i have picture.jpg in same folder, what should i type in new python script?

Getting unbalanced latex equations as results

Hi, I tried with your recent commit, this time I tested with a different images but it doesn't seems to fix the issue. I am still getting unbalanced latex equation, and you also mentioned you have trained a new model, but did you uploaded it in the drive?

I also observed that when we try to predict the same image for multiple times, it produces different latex results, is that the expected behaviour?

Broken links in README

The wikipedia and arXiv links are broken under Data header. (Didn't prefix them with https://)

Completely unusable

At first when I tried a complicated formula, it would get stuck for a long time and then returned the wrong result.

Later, I found that even the simplest case is not recognized by this program.

The problms of mismatched evaluation metrics

Hi, thank you for your excellent work. I reproduce your work with the config file named default.yaml, but cannot get the same result(BLEU=0.74). And I found the train loss increased after a few epoches. Can you give some adivice?

Runtime error while running

I am getting this following error: RuntimeError: Error(s) in loading state_dict for ResNetV2: size mismatch for head.fc.weight: copying a param with shape torch.Size([21, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([22, 1024, 1, 1]). size mismatch for head.fc.bias: copying a param with shape torch.Size([21]) from checkpoint, the shape in current model is torch.Size([22]). I don't really know how to resolve this issue, please help.

What settings to achieve BLEU: 0.88?

Hi Lukas,

Thanks for the work.

I trained on the same dataset you mentioned in README.
But I only get BLEU: 0.719, ED: 3.18e-01. After that, the training diverge and the BLEU decrease.
I would like o reproduce your training to get that BLEU: 0.88.

Thanks,
Hung

How to divide the crohme data set（CROHME.zip is in your google driver） into training data sets, validation data sets, and test data sets for testing ?

generate the cromhe tokenizer.json ,error,how to fix it ?

(tf_1.12) root@f15b165683e6:/home/code/LaTeX-OCR# python dataset/dataset.py --equations latex-ocr-data/crohme/CROHME_math.txt --vocab-size 8000 --out crohme-tokenizer.json
Generate tokenizer
Traceback (most recent call last):
File "dataset/dataset.py", line 244, in
generate_tokenizer(args.equations, args.out, args.vocab_size)
File "dataset/dataset.py", line 228, in generate_tokenizer
trainer = BpeTrainer(special_tokens=["[PAD]", "[BOS]", "[EOS]"], vocab_size=vocab_size, show_progress=True)
TypeError: 'str' object cannot be interpreted as an integer
(tf_1.12) root@f15b165683e6:/home/code/LaTeX-OCR#
how to fix it ?

Use model on Android?

Hi! Your model is working great on PC, but is is possible to use it on Android device?
As far as I know, the model have to be converted to TorchScript format to work on mobile device, but it's not enough. We also need to transfer "call_model" function from pix2tex.py script to Android app, because model requires specific image resize to work. How we can do that? Thank you :)

Convert to onnx model

Your work is very helpful for you, thank you! But when I try to convert this pytorch model to onnx file, I meet some errors. Have you tried this program? Thanks!

gui.py cannot capture screen in the second monitor

Hi, thank you for the excellent work. I meet some problems when using the gui. I am using Ubuntu20.04 with a kde desktop. After running the screen snip(button or alt+s), I can only start the area selection in the main monitor and the opacity only changes in the main monitor. If I press my mouse in the main screen and drag it to the second screen, the program can accurately select the area but the selected rectangle only shows the main screen part. I can not select area if I firstly press my mouse in the second screen.
When using the "gui.py" script, I slightly modify it and I'm quite sure that these changes are not related with this problem. Anyway, here are the changes that I made. The ImageGrab from PIL keep throwing errors so I change from PIL import ImageGrab to import pyscreenshot as ImageGrab. I also remove the all_screen parameter in line 252: img = ImageGrab.grab(bbox=(x1, y1, x2, y2), all_screens=True) since this parameter is only available in Windows.

Help to speed up inference processing

Hi authors.
Thank you so much for awesome project. It working very good. Currently i got issues about time consuming when I ran with multiple cropped image formulation ( about 50 images) it took about 9 s to run all images. I ran model by function call_model in pix2text.py and with GPU 2080TI. Do you have some ideal to speed up inference processing. Thank u so much

Return unbalanced latex equations

I tested with lot of images, but for most of the images it results in an unbalanced latex equation, do you have any idea to resolve it?
I also attached the produced latex equation and image for reference.

$$U_{s2A_{-}k}=\bar{\cal B}{s}{s\bar{G}^{+}}\cdot\bar{Y}{s2A{-}\bar{k}}+\bar{\cal B}{r}\frac{\displaystyle\frac{\displaystyle\cal L}{m}^{2}}{\displaystyle\frac{\displaystyle\cal L}{\displaystyle\frac{\displaystyle\cal E}{m}}{\displaystyle\cal P}{s}\cdot\bar{\cal A}{2}\cdot\bar{\cal I}{s22}\cdot\bar{Y}{s2\bar{\cal B}{-}\bar{\cal F}{s}}{\displaystyle\cal E}{s2\bar{\cal F}{-}\displaystyle\cal A}^{-}\left(\frac{\displaystyle\frac{\displaystyle\frac{\displaystyle\cal E}{m}\cdot\bar{\cal S}{s}}{\displaystyle\cal E}{s}^{2}}\right)\cdot\bar{\mathrm Y}{s4\bar{\cal B}{s2}\cdot\bar{\cal A}{s2}\displaystyle\cal E{s}\bar{\cal F}_{s}\right},$$

Unexpected key(s) in state_dict and size mismatch when running python3 pix2tex.py

macbook cpu inference

python3 pix2tex.py
/usr/local/lib/python3.9/site-packages/albumentations/augmentations/transforms.py:913: FutureWarning: This class has been deprecated. Please use ImageCompression
warnings.warn(
Traceback (most recent call last):
File "/mnt/disk/LaTeX-OCR/pix2tex.py", line 144, in
args, *objs = initialize(arguments)
File "/mnt/disk/LaTeX-OCR/pix2tex.py", line 55, in initialize
model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
Unexpected key(s) in state_dict: "encoder.patch_embed.backbone.stem.conv.weight", "encoder.patch_embed.backbone.stem.norm.weight", "encoder.patch_embed.backbone.stem.norm.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm3.bias".
size mismatch for encoder.patch_embed.proj.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1, 16, 16]).

Snipping failing everytime

I just finally managed to get to launch (getting torch errors) but it is showing prediction failed each and every time. Even expressions like 1=2-1 are not being captured. I don't think this software is this much lame. There must be some problem.
Also I am using it in a virtual environment if that matters. I am attaching a screenshot with error messages.

AttributeError: Can't get attribute 'Im2LatexDataset' on <module 'main'>

During the training, I am getting the following error message.

LaTeX-OCR/dataset/dataset.py", line 195, in load x = pickle.load(file) AttributeError: Can't get attribute 'Im2LatexDataset' on <module '__main__'>

Tanks in advance.

Is there any solution for bad predicition of image that has long width?

Im working with your great code! Thanks :)

I already finish training with my own latex data and by using pix2tex.py, I get output of my own testset.

Most of testset predict well, but some images that have relatively long width predict bad.

Is there any tips for this problem? (like use small patch size, etc)