Comments (8)
It is unclear who invented the name frk for Frankish. Maybe it should be renamed.
frk is the ISO 639-3 code for Frankish.
from tessdata.
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
FYI, the source of the 'Language' column in the tables is the old google code download page. Ray uploaded the official traineddata files to that old page, Zdenko added a few 3rd party files.
from tessdata.
I should have explained my question better. Why is German fraktur called Frankish? Neither the characters nor the words and also not the fonts used are Frankish language. And without hints from others I'd never have thought of using frk
for German fraktur.
from tessdata.
It seems frk is trained using modern German corpus and a small number of fonts.
from tessdata.
@stweil, maybe you want to close this issue?
from tessdata.
Do you think that frk
is the right name? Or should it be renamed, maybe deu_old
or deu_frak
(as people are used to that name)? "Frankish" is definitely the wrong description for the current frk
.
from tessdata.
Is 'frk' only for German Fraktur?
from tessdata.
I expect that the frk
LSTM model will work quite good with Fraktur text in other languages, too. But the word list of frk
is mainly based on German words (I estimate more than 95 % of the 473228 words are German). The list also includes few words from English, Spanish, French, Latin, Russian and other languages. Many of them would not be expected in Fraktur text (jQuery, motherboard, ...). The German words contain lots of the known problems like ß/B, ii/ü and other confusions, lower case substantives (should always be upper case for German), upper case adjectives (should normally be lower case), random words in all upper case, lots of web sites (also not typical for Fraktur) and so on.
@theraysmith, it would be really interesting to know more details of the process which leads to that and also the other word lists. They look like extracts from random web sites. I don't think that good word lists for Fraktur can be produced like that.
from tessdata.
Related Issues (20)
- Modern Greek data issues HOT 10
- The release is corrupted HOT 9
- Cannot extract tessdata HOT 2
- Arabic issue
- Which library recognizes operators and numbers? HOT 1
- VietOCR - how to manually config language file if I don't have write access to C:\Program Files\Tesseract-OCR\tessdata folder? HOT 1
- orc Portugues Brazil not found HOT 1
- Which font is used for Bengali tessdata? HOT 1
- Error: LSTM requested, but not present!! Loading tesseract HOT 5
- size of eng.traineddata best/fast/... HOT 1
- Tessdata on Homebrew HOT 1
- Select screen area bug HOT 1
- OCR by chi_tra_vert or chi_sim_vert returns garbled results HOT 1
- Python: pytesseract does not recognize language Romanian characters on converting PDF files (that contains photocopied images) HOT 1
- Failed to load list of training filenames from data/eng/list.train HOT 5
- hin.traindata,devnagri.traindata
- About the identification of national currency symbol icons
- Need new trained-data for Myanmar.
- Word list in eng.traineddata HOT 4
- Failed loading language 'eng' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tessdata.