Giter Club home page Giter Club logo

latex-ocr's People

Contributors

frankfrank9 avatar frankier avatar freed-wu avatar jcgoran avatar joepdejong avatar katie-lim avatar kxxt avatar llxlr avatar lukas-blecher avatar moetayuko avatar muyuuuu avatar r-haecker avatar rainyl avatar titc avatar yongwookha avatar zhouzq-thu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

latex-ocr's Issues

Deficiencies in recognition

It always recognize 0 as () or O.It seems that the simpler the formula, the less likely it is to predict success.

Installation Help

I'm trying to install using the
pip install -r requirements.txt
line of code, but it seems like my computer is stuck in an infinite loop. I successfully installed Pytorch and Python 3.7 before running the requirements line. At first it was unsuccessful and the error line recommended trying --user. No dice. I appreciate your help! I'm excited to try out your code.

what is the -tokenizer path_to_tokenizer

what is the -tokenizer path_to_tokenizer
python dataset/dataset.py --equations path_to_textfile --images path_to_images --tokenizer path_to_tokenizer --out dataset.pkl

snip failed

I found some problems happed in the process of snip,

/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
height and width must be > 0
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:84: RuntimeWarning: invalid value encountered in true_divide
data = (data-data.min())/(data.max()-data.min())*255
/Users/lmy86263/SourceTreeRepo/OCR-Latex/LaTeX-OCR/utils/utils.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
if rect[..., -1].var() == 0:
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:221: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:253: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

This problem leads to failure of prediction for latex.

Munch attribute error

error

I get the above error when trying to run pix2tex.py. How can I resolve this? I am running on Windows 10.

Model trained with the latest commit seems to be not working

Hi, I retrained the model with your latest commit c7898ab, and when I tried to run the pix2tex.py I got the below errors, It seems it not able to load the trained model, do you have any idea on that?

Traceback (most recent call last):
File "pix2tex.py", line 136, in
args, *objs = initialize(arguments)
File "pix2tex.py", line 49, in initialize
model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1224, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
Missing key(s) in state_dict: "decoder.net.attn_layers.layers.0.1.to_out.0.weight", "decoder.net.attn_layers.layers.0.1.to_out.0.bias", "decoder.net.attn_layers.layers.1.1.to_out.0.weight", "decoder.net.attn_layers.layers.1.1.to_out.0.bias", "decoder.net.attn_layers.layers.2.1.net.0.proj.weight", "decoder.net.attn_layers.layers.2.1.net.0.proj.bias", "decoder.net.attn_layers.layers.3.1.to_out.0.weight", "decoder.net.attn_layers.layers.3.1.to_out.0.bias", "decoder.net.attn_layers.layers.4.1.to_out.0.weight", "decoder.net.attn_layers.layers.4.1.to_out.0.bias", "decoder.net.attn_layers.layers.5.1.net.0.proj.weight", "decoder.net.attn_layers.layers.5.1.net.0.proj.bias", "decoder.net.attn_layers.layers.6.1.to_out.0.weight", "decoder.net.attn_layers.layers.6.1.to_out.0.bias", "decoder.net.attn_layers.layers.7.1.to_out.0.weight", "decoder.net.attn_layers.layers.7.1.to_out.0.bias", "decoder.net.attn_layers.layers.8.1.net.0.proj.weight", "decoder.net.attn_layers.layers.8.1.net.0.proj.bias", "decoder.net.attn_layers.layers.9.1.to_out.0.weight", "decoder.net.attn_layers.layers.9.1.to_out.0.bias", "decoder.net.attn_layers.layers.10.1.to_out.0.weight", "decoder.net.attn_layers.layers.10.1.to_out.0.bias", "decoder.net.attn_layers.layers.11.1.net.0.proj.weight", "decoder.net.attn_layers.layers.11.1.net.0.proj.bias".
Unexpected key(s) in state_dict: "encoder.patch_embed.backbone.stages.0.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.3.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.3.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.7.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.7.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.8.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.8.norm3.bias", "decoder.net.attn_layers.layers.0.1.to_out.weight", "decoder.net.attn_layers.layers.0.1.to_out.bias", "decoder.net.attn_layers.layers.1.1.to_out.weight", "decoder.net.attn_layers.layers.1.1.to_out.bias", "decoder.net.attn_layers.layers.2.1.net.0.0.weight", "decoder.net.attn_layers.layers.2.1.net.0.0.bias", "decoder.net.attn_layers.layers.3.1.to_out.weight", "decoder.net.attn_layers.layers.3.1.to_out.bias", "decoder.net.attn_layers.layers.4.1.to_out.weight", "decoder.net.attn_layers.layers.4.1.to_out.bias", "decoder.net.attn_layers.layers.5.1.net.0.0.weight", "decoder.net.attn_layers.layers.5.1.net.0.0.bias", "decoder.net.attn_layers.layers.6.1.to_out.weight", "decoder.net.attn_layers.layers.6.1.to_out.bias", "decoder.net.attn_layers.layers.7.1.to_out.weight", "decoder.net.attn_layers.layers.7.1.to_out.bias", "decoder.net.attn_layers.layers.8.1.net.0.0.weight", "decoder.net.attn_layers.layers.8.1.net.0.0.bias", "decoder.net.attn_layers.layers.9.1.to_out.weight", "decoder.net.attn_layers.layers.9.1.to_out.bias", "decoder.net.attn_layers.layers.10.1.to_out.weight", "decoder.net.attn_layers.layers.10.1.to_out.bias", "decoder.net.attn_layers.layers.11.1.net.0.0.weight", "decoder.net.attn_layers.layers.11.1.net.0.0.bias".

test failed with this file: pix2tex.py

Hi, hello, I am a newbie. I would like to ask how to use pix2tex.py for testing. I entered the image path as shown in the figure below:
微信图片_20211015111906
Thank you very much for your help!

gui does not show the original text

Hey guys,

thanks for working on this, its a cool project. I have installed it and am using the GUI on win10. Here is what i see:

Unbenannt

There is no upper part of the GUI, perhaps its just the copy of the screenshot so its nothing, but just asking if this might impact the functionality.

Finally I would like to know, how can I retranslate the LATEX OCR:

$\scriptstyle\pi(\infty);=;1\times1$

from

function

back into readable format.

Thanks!

Code understanding

I want to understand the whole code please direct me where i can find help

training time and equipment

Hi, thanks for your sharing code. I want to use this dataset to train a similar model. So I'd like to know how long your model has been trained and what kind of machine did you use?

how to ocr low width latex image

0123
0123-result

If image width is too low, the ocr result will useless.

I have try to reduce patch_size to 8, but the error occured:
Exception has occurred: RuntimeError The size of tensor a (33) must match the size of tensor b (129) at non-singleton dimension 1 File "F:\code\LaTeX-OCR\models.py", line 81, in forward_features x += self.pos_embed[:, pos_emb_ind] File "F:\code\LaTeX-OCR\train.py", line 48, in train encoded = encoder(im.to(device)) File "F:\code\LaTeX-OCR\train.py", line 88, in <module> train(args)

I have struggle this issue several days, Please tell me what can I do for this situation.

Thank you very much!

No module named 'PyQt5.QtWebEngineWidgets'

I'm getting this error every time I try to run gui.py:

Traceback (most recent call last):
  File "gui.py", line 6, in <module>
    from PyQt5.QtWebEngineWidgets import QWebEngineView
ModuleNotFoundError: No module named 'PyQt5.QtWebEngineWidgets'

Regarding PyQt5 related packages installed, this is what I have

PyQt5==5.15.5
PyQt5-Qt5==5.15.2
PyQt5-sip==12.9.0
PyQtWebEngine-Qt5==5.15.2

Training?

Really Nice Work.
Can you please provide a proper pipeline on how to train with own data.
I tried to use your formulae images and config (from google drive) to train but got error
RuntimeError: merge_sort: failed to synchronize: device-side assert triggered
Can you help in training.
Thanks

Model fails for a simple tex

Hi,

it's a great project, many thanks for it! The model needs some work though, I found it failing for relatively simple examples like this one: $\mathcal{X} \times \Theta \to \overbar{R}$ spitting out very different results in consecutive prediction attempts. Fundamentally this issue could be solved by the introduction of confidence thresholds...

Cheers,
Wojtek

Compatibility : Missing key(s)

Hi, author.
Thanks for your work. It will be great convenient for converting formulas to latex codes offline by this software.
I had installed its requirements under manjaro Linux today; however, it still threw an error like below if I executed the command python gui.py:

Traceback (most recent call last):
  File ".../LaTeX-OCR/gui.py", line 274, in <module>
    ex = App(arguments)
  File ".../LaTeX-OCR/gui.py", line 26, in __init__
    self.initModel()
  File ".../LaTeX-OCR/gui.py", line 33, in initModel
    args, *objs = pix2tex.initialize(self.args)
  File ".../LaTeX-OCR/pix2tex.py", line 55, in initialize
    model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
        Missing key(s) in state_dict: "decoder.net.attn_layers.layers.2.1.net.3.weight", "decoder.net.attn_layers.layers.2.1.net.3.bias", "decoder.net.attn_layers.layers.5.1.net.3.weight", "decoder.net.attn_layers.layers.5.1.net.3.bias", "decoder.net.attn_layers.layers.8.1.net.3.weight", "decoder.net.attn_layers.layers.8.1.net.3.bias", "decoder.net.attn_layers.layers.11.1.net.3.weight", "decoder.net.attn_layers.layers.11.1.net.3.bias". 
        Unexpected key(s) in state_dict: "decoder.net.attn_layers.layers.2.1.net.2.weight", "decoder.net.attn_layers.layers.2.1.net.2.bias", "decoder.net.attn_layers.layers.5.1.net.2.weight", "decoder.net.attn_layers.layers.5.1.net.2.bias", "decoder.net.attn_layers.layers.8.1.net.2.weight", "decoder.net.attn_layers.layers.8.1.net.2.bias", "decoder.net.attn_layers.layers.11.1.net.2.weight", "decoder.net.attn_layers.layers.11.1.net.2.bias". 

I would like to know what caused this problem and how can I run the code correctly?
Thanks.

Missing key(s) in state_dict

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
Missing key(s) in state_dict:

how to get the data ?

Data
We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-100k dataset. All of it can be found here.

where is the wikipedia data ? how to use it ?
where is the arXiv data ? how to use it ?

Model training speed

How fast should the training of the model be? I'm using the data provided in the google drive link and using default.yaml. At the current rate, I'm needing more than a day to train the model. Is there a way to shorten the amount of time considerably?

Use GPU to train

Dear author:
Hello! I would like to ask how to use GPU to train my own data set?

)can't snip outside window (i3-wm)

I just found your great tool and it seems that the way you capture in a windows does not allow for tiling managers like i3-wm. The snipping seems to open a separate windows instead of creating a layer on top of all windows
pix2tex
.

"--no-cuda" does not work

When using the --no-cuda argument, it returns an error.

(env) λ python pix2tex.py --no-cuda
Traceback (most recent call last):
  File "H:\pytlat\ocr\pix2tex.py", line 84, in <module>
    args, model, tokenizer = initialize(args)
  File "H:\pytlat\ocr\pix2tex.py", line 33, in initialize
    model.load_state_dict(torch.load(args.checkpoint))
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 853, in _load
    result = unpickler.load()
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 845, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 834, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "H:\pytlat\env\lib\site-packages\torch\serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I use torch 1.7.+cpu, cuda version is not installed, and can't use cuda.

Feature extraction

Hi,

Can we use your trained model, to extract math features from an image. I mean can we take the layer before the prediction layer for feature extractions?

[image_resizer.pth] RuntimeError: Error(s) in loading state_dict for Model

Hello I tried using the image_resizer.pth weight because my data often have large image size, but when I modified the pix2tex file checkpoint argument to image_resizer.pth, I got a runtime error as appeared on this screenshot I attached. Does anyone ever tried image_resizer.pth? Or any suggestions to solve this issue?

Screen Shot 2021-12-06 at 15 16 43

Thank you in advance.

Here's how I modify the checkpoint argument:
Screen Shot 2021-12-06 at 15 23 09

Error: Index out of range in self during the model training

I tried to train the model, in the CPU, but received the below error; not sure what could be the cause?

Loss: 1.0180: 2%|█▉ | 421/18013 [09:41<6:45:16, 1.38s/it]
Traceback (most recent call last):
File "train.py", line 94, in
train(args)
File "train.py", line 52, in train
loss = decoder(tgt_seq, mask=tgt_mask, context=encoded)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/autoregressive_wrapper.py", line 102, in forward
out = self.net(xi, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/x_transformers.py", line 738, in forward
x += self.pos_emb(x)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/x_transformers/x_transformers.py", line 107, in forward
return self.emb(n)[None, :, :]
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 156, in forward
return F.embedding(
File "/home/devops/Envs/latex_ocr/lib/python3.8/site-packages/torch/nn/functional.py", line 1916, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

Getting unbalanced latex equations as results

Hi, I tried with your recent commit, this time I tested with a different images but it doesn't seems to fix the issue. I am still getting unbalanced latex equation, and you also mentioned you have trained a new model, but did you uploaded it in the drive?

  • I also observed that when we try to predict the same image for multiple times, it produces different latex results, is that the expected behaviour?

patent_258_2

Broken links in README

The wikipedia and arXiv links are broken under Data header. (Didn't prefix them with https://)

Completely unusable

At first when I tried a complicated formula, it would get stuck for a long time and then returned the wrong result.

Later, I found that even the simplest case is not recognized by this program.

latexocr

The problms of mismatched evaluation metrics

Hi, thank you for your excellent work. I reproduce your work with the config file named default.yaml, but cannot get the same result(BLEU=0.74). And I found the train loss increased after a few epoches. Can you give some adivice?

image

Runtime error while running

I am getting this following error: RuntimeError: Error(s) in loading state_dict for ResNetV2: size mismatch for head.fc.weight: copying a param with shape torch.Size([21, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([22, 1024, 1, 1]). size mismatch for head.fc.bias: copying a param with shape torch.Size([21]) from checkpoint, the shape in current model is torch.Size([22]). I don't really know how to resolve this issue, please help.

What settings to achieve BLEU: 0.88?

Hi Lukas,

Thanks for the work.

I trained on the same dataset you mentioned in README.
But I only get BLEU: 0.719, ED: 3.18e-01. After that, the training diverge and the BLEU decrease.
I would like o reproduce your training to get that BLEU: 0.88.

Thanks,
Hung

generate the cromhe tokenizer.json ,error,how to fix it ?

(tf_1.12) root@f15b165683e6:/home/code/LaTeX-OCR# python dataset/dataset.py --equations latex-ocr-data/crohme/CROHME_math.txt --vocab-size 8000 --out crohme-tokenizer.json
Generate tokenizer
Traceback (most recent call last):
File "dataset/dataset.py", line 244, in
generate_tokenizer(args.equations, args.out, args.vocab_size)
File "dataset/dataset.py", line 228, in generate_tokenizer
trainer = BpeTrainer(special_tokens=["[PAD]", "[BOS]", "[EOS]"], vocab_size=vocab_size, show_progress=True)
TypeError: 'str' object cannot be interpreted as an integer
(tf_1.12) root@f15b165683e6:/home/code/LaTeX-OCR#
how to fix it ?

Use model on Android?

Hi! Your model is working great on PC, but is is possible to use it on Android device?
As far as I know, the model have to be converted to TorchScript format to work on mobile device, but it's not enough. We also need to transfer "call_model" function from pix2tex.py script to Android app, because model requires specific image resize to work. How we can do that? Thank you :)

Convert to onnx model

Your work is very helpful for you, thank you! But when I try to convert this pytorch model to onnx file, I meet some errors. Have you tried this program? Thanks!

gui.py cannot capture screen in the second monitor

Hi, thank you for the excellent work. I meet some problems when using the gui. I am using Ubuntu20.04 with a kde desktop. After running the screen snip(button or alt+s), I can only start the area selection in the main monitor and the opacity only changes in the main monitor. If I press my mouse in the main screen and drag it to the second screen, the program can accurately select the area but the selected rectangle only shows the main screen part. I can not select area if I firstly press my mouse in the second screen.
When using the "gui.py" script, I slightly modify it and I'm quite sure that these changes are not related with this problem. Anyway, here are the changes that I made. The ImageGrab from PIL keep throwing errors so I change from PIL import ImageGrab to import pyscreenshot as ImageGrab. I also remove the all_screen parameter in line 252: img = ImageGrab.grab(bbox=(x1, y1, x2, y2), all_screens=True) since this parameter is only available in Windows.

Help to speed up inference processing

Hi authors.
Thank you so much for awesome project. It working very good. Currently i got issues about time consuming when I ran with multiple cropped image formulation ( about 50 images) it took about 9 s to run all images. I ran model by function call_model in pix2text.py and with GPU 2080TI. Do you have some ideal to speed up inference processing. Thank u so much

Return unbalanced latex equations

I tested with lot of images, but for most of the images it results in an unbalanced latex equation, do you have any idea to resolve it?
I also attached the produced latex equation and image for reference.

$$U_{s2A_{-}k}=\bar{\cal B}{s}{s\bar{G}^{+}}\cdot\bar{Y}{s2A{-}\bar{k}}+\bar{\cal B}{r}\frac{\displaystyle\frac{\displaystyle\cal L}{m}^{2}}{\displaystyle\frac{\displaystyle\cal L}{\displaystyle\frac{\displaystyle\cal E}{m}}{\displaystyle\cal P}{s}\cdot\bar{\cal A}{2}\cdot\bar{\cal I}{s22}\cdot\bar{Y}{s2\bar{\cal B}{-}\bar{\cal F}{s}}{\displaystyle\cal E}{s2\bar{\cal F}{-}\displaystyle\cal A}^{-}\left(\frac{\displaystyle\frac{\displaystyle\frac{\displaystyle\cal E}{m}\cdot\bar{\cal S}{s}}{\displaystyle\cal E}{s}^{2}}\right)\cdot\bar{\mathrm Y}{s4\bar{\cal B}{s2}\cdot\bar{\cal A}{s2}\displaystyle\cal E{s}\bar{\cal F}_{s}\right},$$
patent_8_60

Unexpected key(s) in state_dict and size mismatch when running python3 pix2tex.py

macbook cpu inference

python3 pix2tex.py
/usr/local/lib/python3.9/site-packages/albumentations/augmentations/transforms.py:913: FutureWarning: This class has been deprecated. Please use ImageCompression
warnings.warn(
Traceback (most recent call last):
File "/mnt/disk/LaTeX-OCR/pix2tex.py", line 144, in
args, *objs = initialize(arguments)
File "/mnt/disk/LaTeX-OCR/pix2tex.py", line 55, in initialize
model.load_state_dict(torch.load(args.checkpoint, map_location=args.device))
File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
Unexpected key(s) in state_dict: "encoder.patch_embed.backbone.stem.conv.weight", "encoder.patch_embed.backbone.stem.norm.weight", "encoder.patch_embed.backbone.stem.norm.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.0.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.0.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.1.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.1.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.conv.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.norm.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.downsample.norm.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.0.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.0.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.1.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.1.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.2.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.2.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.3.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.3.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.4.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.4.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.5.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.5.norm3.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv1.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm1.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm1.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv2.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm2.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm2.bias", "encoder.patch_embed.backbone.stages.2.blocks.6.conv3.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm3.weight", "encoder.patch_embed.backbone.stages.2.blocks.6.norm3.bias".
size mismatch for encoder.patch_embed.proj.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1, 16, 16]).

Snipping failing everytime

I just finally managed to get to launch (getting torch errors) but it is showing prediction failed each and every time. Even expressions like 1=2-1 are not being captured. I don't think this software is this much lame. There must be some problem.
Also I am using it in a virtual environment if that matters. I am attaching a screenshot with error messages.

image

Is there any solution for bad predicition of image that has long width?

Im working with your great code! Thanks :)

I already finish training with my own latex data and by using pix2tex.py, I get output of my own testset.

Most of testset predict well, but some images that have relatively long width predict bad.

Is there any tips for this problem? (like use small patch size, etc)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.