Comments (9)
I can confirm that problem. The release zip files are provided by GitHub, so there is nothing we can do to fix it. But the tar.gz file is fine, so there is a working alternative.
from tessdata.
tessdata repository is huge repository and:
ZIP format had a 4 GB limit on various things (uncompressed size of a file, compressed size of a file, and total size of the archive), as well as a limit of 65,535 entries in a ZIP archive.
so it seems you chose to use the wrong technology. Use git clone --depth 1 https://github.com/tesseract-ocr/tessdata.git
for downloading all data.
from tessdata.
The tar.gz file is about 565 MiB. The zip file 543 MiB, far away from the limit but the size seems to be too small.
git clone
is a good alternative which has several advantages compared to downloading a zip or tar.gz file.
from tessdata.
I am also seeing the same issue with 4.0.0. I opened an issue earlier, but don't see it now.
Is the root problem understood? I don't think the zip file we have been downloading has been updated in some time.
Our build system does this quite often, so the issue has come up very recently, within the last few days.
Also, we had similar issues with the tar.gz file.
Thanks,
Mabry
from tessdata.
Is the root problem understood?
Ask Microsoft, the owners of GitHub. The zip files are created automatically by GitHub.
Also, we had similar issues with the tar.gz file.
In my test tar.gz worked fine.
from tessdata.
I now tried several downloads of the zip file using wget. The resulting files had different sizes although there was no download error. All downloads finished after 1:40, so maybe the GitHub servers have a fixed time limit.
from tessdata.
More tests confirmed that the download terminates after exactly 100 seconds, resulting in a partial file.
from tessdata.
I close this issue as we cannot solve it, but there is a good alternative way to get the data using git clone
(see above).
from tessdata.
Hi !
Thank you for your reply. I will use git clone
instead. About tar.gz
in my remember it did not work either but i'm not sure.
Anyway, the main thing is that I can use tessdata somehow !
thank you for your explanations and your time :)
from tessdata.
Related Issues (20)
- Modern Greek data issues HOT 10
- Cannot extract tessdata HOT 2
- Arabic issue
- Which library recognizes operators and numbers? HOT 1
- VietOCR - how to manually config language file if I don't have write access to C:\Program Files\Tesseract-OCR\tessdata folder? HOT 1
- orc Portugues Brazil not found HOT 1
- Which font is used for Bengali tessdata? HOT 1
- Error: LSTM requested, but not present!! Loading tesseract HOT 5
- size of eng.traineddata best/fast/... HOT 1
- Tessdata on Homebrew HOT 1
- Select screen area bug HOT 1
- OCR by chi_tra_vert or chi_sim_vert returns garbled results HOT 1
- Python: pytesseract does not recognize language Romanian characters on converting PDF files (that contains photocopied images) HOT 1
- Failed to load list of training filenames from data/eng/list.train HOT 5
- hin.traindata,devnagri.traindata
- About the identification of national currency symbol icons
- Need new trained-data for Myanmar.
- Word list in eng.traineddata HOT 4
- Failed loading language 'eng' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tessdata.