Comments (4)
In general Calamari is not designed to predict single characters, instead it is designed to predict a complete sequence of characters (sentence) as a whole.
To predict a single character a simple Classification-Network might be better suited (see e.g. MNIST examples)
If you still want to use Calamari:
There are many parameters that could possible effect the accuracy. You could try to increase the --batch_size e.g. to 128. Moreover another network structure could be usefull (--network).
You can also try to limit the alphabet to test if Calamari is able to learn a smaller charset (e. g. 1000, just for testing)
from calamari.
@ChWick Thank you for your advice. I change data to sequence of character.
from calamari.
@ChWick I trianed using data provided by tesseract
(https://github.com/tesseract-ocr/langdata/blob/master/kor/kor.training_text)
Training works quite well.
#00070662: loss=1.50009847 ler=0.01098767 dt=0.05774912s
PRED: '10 연락 미용 톈진 강릉 끙 홍콩 월간 라 큰술 란 잇는 의회 쪄'
TRUE: '10 연락 미용 톈진 강릉 끙 홍콩 월간 라 큰술 란 잇는 의회 쪄'
#00070663: loss=1.54514930 ler=0.01124408 dt=0.05764973s
PRED: '넷째 발표 되며 ( 바향 모퉁이 세괌 16 뒤에 등 자료실 알뜰 늠름한'
TRUE: '넷째 발표 되며 ( 방향 모퉁이 세괌 16 뒤에 등 자료실 알뜰 늠름한'
#00070664: loss=1.40779295 ler=0.01045460 dt=0.05747745s
PRED: '카를로스 신지식 과 보다는 곳 수 바깥 역할 벼룩 질문 . 꿰어 중'
TRUE: '카를로스 신지식 과 보다는 곳 수 바깥 역할 벼룩 질문 . 꿰어 중'
#00070665: loss=1.44664021 ler=0.01071776 dt=0.05732183s
PRED: '쟌느 분 코뮌 디앤샵 건의 반침 19 헌법 법령 프톨레마이오스 > 골'
TRUE: '쟌느 분 코뮌 디앤샵 건의 방침 19 헌법 법령 프톨레마이오스 > 골'
#00070666: loss=1.44412356 ler=0.01071776 dt=0.05723174s
PRED: '17 숙박 조각 다룬다 커스텀 최저가 것이 사건 맥 답하기 뻘 탭'
TRUE: '17 숙박 조각 다룬다 커스텀 최저가 것이 사건 맥 답하기 뻘 탭'
My sample prediction(sentence not in my training dataset) seems good
TRUE: 원대복귀 조치에 따라 둘은 육군으로 돌아가게 됐다.
PRE: 원대복귀 조치에 따라 둘은 육군으로 돌아가게 됐다.
Thanks again
P.S) In your README.md, It said Modules to segment pages into lines will be available soon.
You recomend to use OCRopy scripts. But It's not that good.
When can I check this module?
from calamari.
@a41888936 I'm very glad u got this working! Unfortunately, the line segmentation part of our complete OCR-workflow also relies on the OCRopy scripts, therefore this module wont help you neither.
from calamari.
Related Issues (20)
- Argument "val.preload" documented but not known HOT 1
- Cannot convert a symbolic Tensor - Cannot even initialize the Predictor object HOT 2
- Characters coordinates HOT 1
- training: Cannot convert a symbolic Tensor to a numpy array HOT 7
- HDF5 dataset format: how to convert HOT 4
- calamari-train: warmstart not working without also giving network spec
- featreq: when warmstart-training, init weights of new chars from existing ones HOT 2
- calamari-eval: skip missing pairs HOT 3
- calamari-eval: unknown arguments HOT 6
- calamari-eval: confusion table miscalculates relative frequency HOT 3
- Error when convert old trained model to latest version model HOT 1
- Got exception during training HOT 4
- calamari-ocr 2.2.2 on ubuntu 22.04 partial success, difficulty with GPU software
- Prediction from calamari trained .pb model HOT 5
- Issue while using the model and json HOT 8
- setup.py on Ubuntu20.04: tensorflow is wrong version HOT 7
- Model very sensitive on PNG input HOT 3
- calamari/1.0: hold Tensorflow and Protobuf dependencies HOT 6
- What is the accuracy on Chinese/Japanese text? HOT 2
- Attention layer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from calamari.