Comments (6)
Ok, after some testing I found that there is a compatibility issue between the latest version of Tesseract in conda-forge and tesserocr.
The ones to the left are the latest versions, which end up with ImportError: DLL load failed while importing tesserocr: The specified module could not be found.
:
libarchive 3.7.2-h6f8411a_0 --> 3.6.2-h6f8411a_1
tesseract 5.3.2-hb328096_1 --> 5.3.2-hae9691c_0
If you use conda install tesseract=5.3.2=hae9691c_0
to specifically install that build, the issue is gone.
This is a temporary fix, but I am not sure if the error arises from the build of Tesseract in conda-forge or if it a problem with tesserocr itself.
from tesserocr.
I guess you're installing tesserocr via conda-forge? Unfortunately the tesserocr build has been broken for a while (2 or 3 versions ago). The original maintainer isn't active on it and I'm no conda user myself, if anyone wants to take up maintenance responsibilities it would be great.
from tesserocr.
I also found that libarchive is required to run Tesseract on Windows. I think as long as you have a working libarchive
on path (and it's dependencies, I had to add openssl
as well), your tesseract.exe
should work.
If you want to try our build of the Tesserocr stack you can pull it from here. I've tested it on Linux and Windows using the Post Install steps.
With big thanks to @sirfz for that documentation.
from tesserocr.
libarchive
and curl
(which needs openssl
) are not needed - these are optional dependencies for tesseract.
libarchive
could be used for compressed traneddata, but you find nobody use it. People prefer speed over saving space.
curl
is used for opening online images by tesseract executable which is not wrapped by tesserocr.
Both features (Saving space & reading online images) could be replaced by native python functions, so adding them as a dependancy to tesserocr makes no sence.
from tesserocr.
There is an option in Tesseract to disable libarchive
DISABLE_ARCHIVE - it is set to off by default. If libarchive
is not present at build time it doesn't throw an error, but the Tesseract.exe
expects the dependency to be available on start-up. I will try a rebuild without these dependencies.
from tesserocr.
Confirmed that these settings removed the need to ship extra dependencies.
For CMake:
DISABLE_ARCHIVE=ON
DISABLE_CURL=ON
For Autotools:
--without-archive
--without-curl
from tesserocr.
Related Issues (20)
- ImportError: dlopen HOT 4
- Publish wheels for aarch64 HOT 2
- symbol not found in flat namespace HOT 7
- `GetTextDirection` + `MapWordConfidences` crash python HOT 2
- in loop `GetChoiceIterator` crashs python if result is empty
- `GetDatapath` can't find the default path that tesseract should find on windows HOT 7
- `MapWordConfidences` throw 'No text returned' when the result is empty
- Side effects of running tesserocr-recognize as a worker HOT 1
- Can't directly use image_to_text for invalid path for tessdata. HOT 2
- Segfault when used with PyMuPDF (aka fitz) HOT 1
- tesserocr.tesseract_version() Missing Libaries HOT 2
- can't ocr anything with 2.6.2 HOT 2
- Missing support for Tesseract5? HOT 18
- No definition found for "tesserocr" HOT 2
- user patterns are not considered HOT 4
- does not build on current Tesseract anymore HOT 8
- Problem with API HOT 2
- Allow to show tesseract and leptonica messages (easily) HOT 2
- `PY_MAJOR_VERSION > 3` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesserocr.