Comments (3)
I've compared the outputs produced by:
tesseract 3.03
with leptonica-1.70
and
tesseract 3.04.01
with leptonica-1.73
And for some reason the output of the first one (the older version) seems to be of better quality, with less misspelled words and less wrongly recognised characters. It might be due to some non-obvious configuration of tesseract in the first case that changed with the fresh installation of newer version.
from cogstack-pipeline.
I suspect this is a result of #3
@jstuczyn I think in anycase #30 is likely to close this?
from cogstack-pipeline.
Using the new version of Tesseract 4.0 solves this issue -- the text is recognised correctly.
The new version of tesseract is already in dev
branch via PR #65 .
Therefore, this test has been once again re-enabled in dev
in commit 2c3bdd0.
from cogstack-pipeline.
Related Issues (20)
- Add support for PDF Form Parsing HOT 1
- [Feature] Support arbitrary parameter for SQL INSERT statement for jdbc_out
- Default for scheduler.rate does not follow the cron syntax HOT 1
- Post-processing of bio yodie result HOT 1
- De-Identification
- Test LSTM OCR Engine in Tesseract HOT 4
- ElasticsearchRest Client not working with scheduler HOT 4
- ElasticsearchRest Client will fail silently if index contains invalid character HOT 2
- PDF and Thumbnail generation will fail if Tika throws length warning
- Unable to index Docman Documents HOT 5
- Tika_deid not working since ES Upgrade HOT 6
- Add PDF Table Extraction using Tabula
- Cogstack docker download issues HOT 6
- Refactor the build process HOT 1
- add Nginx proxy to the stack for basic Auth
- Mechanism to prevent stale CogStack structured data in Elasticsearch HOT 1
- Refactor Integration and acceptance tests HOT 1
- Can we add more than one elastic search hosts in the config ? HOT 4
- fix: read from filesystem or object-store
- Unable to view links on confluence HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cogstack-pipeline.