Giter Club home page Giter Club logo

Comments (18)

dickreuter avatar dickreuter commented on May 24, 2024 1

from tesserocr.

winstxnhdw avatar winstxnhdw commented on May 24, 2024

The maintainer is long gone. Anyways, since you are on Windows, you shouldn't need to pre-install Tesseract. For Windows, the Tesseract model is bundled with the tesserocr wheel. See here. You may still need to install the relevant tessdata though.

from tesserocr.

zdenop avatar zdenop commented on May 24, 2024

tessocr support tesseract 5 - see tesserocr code.

Building tesserocr from source (tesserocr-2.6.2.tar.gz) requires also building tesseract development files (or to build leptonica&tesseract from source), otherwise tesserocr build fails. Details are in Readme.

from tesserocr.

winstxnhdw avatar winstxnhdw commented on May 24, 2024

He clearly isn't building tesserocr from source, so there's no need for him to install leptonica and tesseract.

from tesserocr.

dickreuter avatar dickreuter commented on May 24, 2024

from tesserocr.

winstxnhdw avatar winstxnhdw commented on May 24, 2024

@dickreuter I have sent you a PR regarding the pipeline.

from tesserocr.

winstxnhdw avatar winstxnhdw commented on May 24, 2024

Also, I noticed that you have libleptonica and libtesseract in your Ubuntu Docker builds. You can remove them safely for faster builds and a smaller image size as they are now bundled into the tesserocr installation.

from tesserocr.

zdenop avatar zdenop commented on May 24, 2024

If this is correct:

Downloading tesserocr-2.6.2.tar.gz

then he is for 100% building from source. Maybe not intentionally, but this is source code - not a wheel (binary build)...

from tesserocr.

winstxnhdw avatar winstxnhdw commented on May 24, 2024

Collecting tesserocr (from -r requirements.txt (line 31))

The log here already tells you that he is doing a pip install from requirements.txt. Also, circling back to your earlier point, there's no need to install leptonica and tesseract anymore. The README is outdated.

I am using tesserocr without installing those dependencies in my Examplify app.

from tesserocr.

zdenop avatar zdenop commented on May 24, 2024

And??? pip invoke build from source if it did not find a wheel... Are you familiar with the tools you try to use?

from tesserocr.

zdenop avatar zdenop commented on May 24, 2024

What exactly is outdated in README?

from tesserocr.

winstxnhdw avatar winstxnhdw commented on May 24, 2024

And??? pip invoke build from source if it did not find a wheel...

Why does this matter? OP is using Windows and installing with pip, obviously expecting a binary build, which there is. Just that the maintainer's setup.py doesn't pull the wheels for Windows for whatever reason.

What exactly is outdated in README?

The entire requirements section. Instead, he should add that to a section specifically for building from source / development.

from tesserocr.

zdenop avatar zdenop commented on May 24, 2024

The entire requirements section.

Seriously?? This one?

pip
Download the wheel file corresponding to your Windows platform and Python installation from [simonflueckiger/tesserocr-windows_build/releases](https://github.com/simonflueckiger/tesserocr-windows_build/releases) and install them via:

> pip install <package_name>.whl

Do you understand that text? What is outdated there? Please state facts, not vague accusations.

Just that the maintainer's setup.py doesn't pull

tesserocr (this project where the issue was created) NEVER produced Windows binary version. It was always created externally.

the wheels for Windows for whatever reason.

whatever the reason => the latest Windows wheel is 2.6.0
And it is not a problem if somebody knows how to write requirements.txt correctly.

from tesserocr.

winstxnhdw avatar winstxnhdw commented on May 24, 2024

It is truly amazing how you missed this entire part

Requires libtesseract (>=3.04) and libleptonica (>=1.71).

On Debian/Ubuntu:

$ apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config
You may need to manually compile tesseract for a more recent version. Note that you may need to update your LD_LIBRARY_PATH environment variable to point to the right library versions in case you have multiple tesseract/leptonica installations.

tesserocr (this project where the issue was created) NEVER produced Windows binary version. It was always created externally.

Exactly, and that's the problem. If you are going to commit to supporting a platform, the maintainer should do it well.

from tesserocr.

zdenop avatar zdenop commented on May 24, 2024

It is truly amazing how you missed this entire part

I did not miss it. Is correct and relevant. Or do you claim you can run tesserocr on Debian without these libraries???

Exactly, and that's the problem. If you are going to commit to supporting a platform, the maintainer should do it well.

It is not a problem. E.g. tesseract and leptonica support many platforms but they never provide binary packages, just a source code.

from tesserocr.

winstxnhdw avatar winstxnhdw commented on May 24, 2024

Or do you claim you can run tesserocr on Debian without these libraries???

I am just saying that there is no longer a need to explicitly install these dependencies. You were even a participant on the PR for this change.

It is not a problem. E.g. tesseract and leptonica support many platforms but they never provide binary packages, just a source code.

We can agree to disagree then. I believe it's the maintainer's responsibility to ensure that the DX for installing their libraries should always be seamless. In one of my projects, I made sure to bundle the nvidia cublas and cudnn libraries along with the wheel. I know some people may argue that it could be a redundant install if the user already has the dependencies installed in the machine, but relying on the user's PATH to properly resolve these dependencies, in my experience and many others, usually just leads to pain.

To reiterate, the only reason why I, and many others are using this library instead of pytesseract is because the OCR engine is bundled within the installation. That can lead to many advantages. For one, I don't have to add a layer to my docker image for installing these dependencies and I don't have to worry about whether my OS has or has not installed the dependencies in the PATH that tesserocr is expecting.

from tesserocr.

zdenop avatar zdenop commented on May 24, 2024

am just saying that there is no longer a need to explicitly install

... untill you start to face the problems - see e.g. #337. Other problems were reported for Mac. Distributing own binary libraries on Linux is not a good idea. Linux philosophy is using system shared libraries => tesserocr should be linked against system leptonica and tesseract and not against their custom build.
pip install --no-binary tesserocr tesserocr is the right way to install tesseroct on Linux and similar systems (MacOS, Freebsd). Windows is the other problem because ... it is Windows.

...pytesseract is because the OCR engine is bundled within the installation

pytesseract does not bundle OCR - it wraps tesseract executable (e.g. you need to install tesseract separately) while tesserocr wraps (and links) tesseract library. As far as I understand pytesseract decided to go this way to avoid problems with distributing binary libraries, dependancies, security etc. (e.g. it leaves all problems to tesseract packagers)...

I believe it's the maintainer's responsibility to ensure that the DX for installing their libraries should always be seamless

No. It is a packager responsibility. Packager != maintainer. There is a split of tasks and responsibilities and it is right.
GTK, pango, gnome, KDE maintainers do not care if you are able to install their products/libraries on Windows etc... The same problem is with Windows or Mac OS apps&libs.

from tesserocr.

winstxnhdw avatar winstxnhdw commented on May 24, 2024

pytesseract does not bundle OCR - it wraps tesseract executable (e.g. you need to install tesseract separately) while tesserocr wraps (and links) tesseract library.

You misread me. I am saying that I prefer tesserocr over pytesseract because it links the tesseract library.

... untill you start to face the problems - see e.g. #337.

Is this issue not because the maintainer failed to properly pre-compile tesseract in the proper environment?

GTK, pango, gnome, KDE maintainers do not care if you are able to install their products/libraries on Windows etc..

And you're right, they don't have to because they do not explicitly support these platforms. This is unlike tesserocr which explicitly mentions support for these platforms in the README. In this case, this library is playing the role of the Packager.

All I am saying is that tesserocr's DX is almost there. Just update the README and fix the automated CIs that pre-compile the tesseract library so that everyone gets the full-feature set.

from tesserocr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.