Giter Club home page Giter Club logo

Comments (9)

stweil avatar stweil commented on July 18, 2024 1

I can confirm that problem. The release zip files are provided by GitHub, so there is nothing we can do to fix it. But the tar.gz file is fine, so there is a working alternative.

from tessdata.

zdenop avatar zdenop commented on July 18, 2024 1

tessdata repository is huge repository and:

ZIP format had a 4 GB limit on various things (uncompressed size of a file, compressed size of a file, and total size of the archive), as well as a limit of 65,535 entries in a ZIP archive.

so it seems you chose to use the wrong technology. Use git clone --depth 1 https://github.com/tesseract-ocr/tessdata.git for downloading all data.

from tessdata.

stweil avatar stweil commented on July 18, 2024

The tar.gz file is about 565 MiB. The zip file 543 MiB, far away from the limit but the size seems to be too small.

git clone is a good alternative which has several advantages compared to downloading a zip or tar.gz file.

from tessdata.

mabrydozier avatar mabrydozier commented on July 18, 2024

I am also seeing the same issue with 4.0.0. I opened an issue earlier, but don't see it now.

Is the root problem understood? I don't think the zip file we have been downloading has been updated in some time.

Our build system does this quite often, so the issue has come up very recently, within the last few days.

Also, we had similar issues with the tar.gz file.

Thanks,

Mabry

from tessdata.

stweil avatar stweil commented on July 18, 2024

Is the root problem understood?

Ask Microsoft, the owners of GitHub. The zip files are created automatically by GitHub.

Also, we had similar issues with the tar.gz file.

In my test tar.gz worked fine.

from tessdata.

stweil avatar stweil commented on July 18, 2024

I now tried several downloads of the zip file using wget. The resulting files had different sizes although there was no download error. All downloads finished after 1:40, so maybe the GitHub servers have a fixed time limit.

from tessdata.

stweil avatar stweil commented on July 18, 2024

More tests confirmed that the download terminates after exactly 100 seconds, resulting in a partial file.

from tessdata.

stweil avatar stweil commented on July 18, 2024

I close this issue as we cannot solve it, but there is a good alternative way to get the data using git clone (see above).

from tessdata.

V-kto avatar V-kto commented on July 18, 2024

Hi !

Thank you for your reply. I will use git clone instead. About tar.gz in my remember it did not work either but i'm not sure.

Anyway, the main thing is that I can use tessdata somehow !

thank you for your explanations and your time :)

from tessdata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.