Comments (18)
from tesserocr.
The maintainer is long gone. Anyways, since you are on Windows, you shouldn't need to pre-install Tesseract. For Windows, the Tesseract model is bundled with the tesserocr
wheel. See here. You may still need to install the relevant tessdata
though.
from tesserocr.
tessocr support tesseract 5 - see tesserocr code.
Building tesserocr from source (tesserocr-2.6.2.tar.gz) requires also building tesseract development files (or to build leptonica&tesseract from source), otherwise tesserocr build fails. Details are in Readme.
from tesserocr.
He clearly isn't building tesserocr
from source, so there's no need for him to install leptonica
and tesseract
.
from tesserocr.
from tesserocr.
@dickreuter I have sent you a PR regarding the pipeline.
from tesserocr.
Also, I noticed that you have libleptonica
and libtesseract
in your Ubuntu Docker builds. You can remove them safely for faster builds and a smaller image size as they are now bundled into the tesserocr
installation.
from tesserocr.
If this is correct:
Downloading tesserocr-2.6.2.tar.gz
then he is for 100% building from source. Maybe not intentionally, but this is source code - not a wheel (binary build)...
from tesserocr.
Collecting tesserocr (from -r requirements.txt (line 31))
The log here already tells you that he is doing a pip
install from requirements.txt
. Also, circling back to your earlier point, there's no need to install leptonica
and tesseract
anymore. The README is outdated.
I am using tesserocr
without installing those dependencies in my Examplify app.
from tesserocr.
And??? pip invoke build from source if it did not find a wheel... Are you familiar with the tools you try to use?
from tesserocr.
What exactly is outdated in README?
from tesserocr.
And??? pip invoke build from source if it did not find a wheel...
Why does this matter? OP is using Windows and installing with pip
, obviously expecting a binary build, which there is. Just that the maintainer's setup.py
doesn't pull the wheels for Windows for whatever reason.
What exactly is outdated in README?
The entire requirements section. Instead, he should add that to a section specifically for building from source / development.
from tesserocr.
The entire requirements section.
Seriously?? This one?
pip
Download the wheel file corresponding to your Windows platform and Python installation from [simonflueckiger/tesserocr-windows_build/releases](https://github.com/simonflueckiger/tesserocr-windows_build/releases) and install them via:
> pip install <package_name>.whl
Do you understand that text? What is outdated there? Please state facts, not vague accusations.
Just that the maintainer's setup.py doesn't pull
tesserocr (this project where the issue was created) NEVER produced Windows binary version. It was always created externally.
the wheels for Windows for whatever reason.
whatever the reason => the latest Windows wheel is 2.6.0
And it is not a problem if somebody knows how to write requirements.txt correctly.
from tesserocr.
It is truly amazing how you missed this entire part
Requires libtesseract (>=3.04) and libleptonica (>=1.71).
On Debian/Ubuntu:
$ apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config
You may need to manually compile tesseract for a more recent version. Note that you may need to update your LD_LIBRARY_PATH environment variable to point to the right library versions in case you have multiple tesseract/leptonica installations.
tesserocr (this project where the issue was created) NEVER produced Windows binary version. It was always created externally.
Exactly, and that's the problem. If you are going to commit to supporting a platform, the maintainer should do it well.
from tesserocr.
It is truly amazing how you missed this entire part
I did not miss it. Is correct and relevant. Or do you claim you can run tesserocr on Debian without these libraries???
Exactly, and that's the problem. If you are going to commit to supporting a platform, the maintainer should do it well.
It is not a problem. E.g. tesseract and leptonica support many platforms but they never provide binary packages, just a source code.
from tesserocr.
Or do you claim you can run tesserocr on Debian without these libraries???
I am just saying that there is no longer a need to explicitly install these dependencies. You were even a participant on the PR for this change.
It is not a problem. E.g. tesseract and leptonica support many platforms but they never provide binary packages, just a source code.
We can agree to disagree then. I believe it's the maintainer's responsibility to ensure that the DX for installing their libraries should always be seamless. In one of my projects, I made sure to bundle the nvidia cublas and cudnn libraries along with the wheel. I know some people may argue that it could be a redundant install if the user already has the dependencies installed in the machine, but relying on the user's PATH to properly resolve these dependencies, in my experience and many others, usually just leads to pain.
To reiterate, the only reason why I, and many others are using this library instead of pytesseract
is because the OCR engine is bundled within the installation. That can lead to many advantages. For one, I don't have to add a layer to my docker image for installing these dependencies and I don't have to worry about whether my OS has or has not installed the dependencies in the PATH that tesserocr
is expecting.
from tesserocr.
am just saying that there is no longer a need to explicitly install
... untill you start to face the problems - see e.g. #337. Other problems were reported for Mac. Distributing own binary libraries on Linux is not a good idea. Linux philosophy is using system shared libraries => tesserocr should be linked against system leptonica and tesseract and not against their custom build.
pip install --no-binary tesserocr tesserocr
is the right way to install tesseroct on Linux and similar systems (MacOS, Freebsd). Windows is the other problem because ... it is Windows.
...pytesseract is because the OCR engine is bundled within the installation
pytesseract
does not bundle OCR - it wraps tesseract executable (e.g. you need to install tesseract separately) while tesserocr
wraps (and links) tesseract library. As far as I understand pytesseract decided to go this way to avoid problems with distributing binary libraries, dependancies, security etc. (e.g. it leaves all problems to tesseract packagers)...
I believe it's the maintainer's responsibility to ensure that the DX for installing their libraries should always be seamless
No. It is a packager responsibility. Packager != maintainer. There is a split of tasks and responsibilities and it is right.
GTK, pango, gnome, KDE maintainers do not care if you are able to install their products/libraries on Windows etc... The same problem is with Windows or Mac OS apps&libs.
from tesserocr.
pytesseract does not bundle OCR - it wraps tesseract executable (e.g. you need to install tesseract separately) while tesserocr wraps (and links) tesseract library.
You misread me. I am saying that I prefer tesserocr
over pytesseract
because it links the tesseract
library.
... untill you start to face the problems - see e.g. #337.
Is this issue not because the maintainer failed to properly pre-compile tesseract
in the proper environment?
GTK, pango, gnome, KDE maintainers do not care if you are able to install their products/libraries on Windows etc..
And you're right, they don't have to because they do not explicitly support these platforms. This is unlike tesserocr
which explicitly mentions support for these platforms in the README. In this case, this library is playing the role of the Packager
.
All I am saying is that tesserocr
's DX is almost there. Just update the README and fix the automated CIs that pre-compile the tesseract
library so that everyone gets the full-feature set.
from tesserocr.
Related Issues (20)
- ImportError: dlopen HOT 4
- Publish wheels for aarch64 HOT 2
- symbol not found in flat namespace HOT 7
- `GetTextDirection` + `MapWordConfidences` crash python HOT 2
- in loop `GetChoiceIterator` crashs python if result is empty
- `GetDatapath` can't find the default path that tesseract should find on windows HOT 7
- `MapWordConfidences` throw 'No text returned' when the result is empty
- ImportError: DLL load failed while importing tesserocr: The specified module could not be found. HOT 6
- Side effects of running tesserocr-recognize as a worker HOT 1
- Can't directly use image_to_text for invalid path for tessdata. HOT 2
- Segfault when used with PyMuPDF (aka fitz) HOT 1
- tesserocr.tesseract_version() Missing Libaries HOT 2
- can't ocr anything with 2.6.2 HOT 2
- No definition found for "tesserocr" HOT 2
- user patterns are not considered HOT 4
- does not build on current Tesseract anymore HOT 8
- Problem with API HOT 2
- Allow to show tesseract and leptonica messages (easily) HOT 2
- `PY_MAJOR_VERSION > 3` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesserocr.