Giter Club home page Giter Club logo

Comments (20)

rohitsaluja22 avatar rohitsaluja22 commented on July 20, 2024

Thanks for your interest.
Remove file CPair to remove auto-corrections. Auto-corrections (and CPair) depend on ocr system and domain.
I have no idea about CLA, if you have any issues related to qt and its gui, i can try to help.

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

sir if we load one text file in gui, then if we load the next page to check by clicking on " + " , it doesnt work.

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

As per your test cases , when we load text file the corresponding image gets automatically loaded, but it doesnt happens with my data.

from openocrcorrect.

rohitsaluja22 avatar rohitsaluja22 commented on July 20, 2024

The file names for the text file and image should be same, also it should follow the syntax: "page-i.txt" and "page-i.jpeg", where i goes from 1 to no_of_pages.

from openocrcorrect.

rohitsaluja22 avatar rohitsaluja22 commented on July 20, 2024

one

you can move from page-1.txt to page-2.txt by clicking on "Page(CtrlShftR)>>". This will also change the image from page-1.jpeg to page-2.jpeg.
use "Open" (Right to +) to load only the first file. Do not use +, it will load only text file.

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

Hello sir,
I have a doubt regarding color marking of system.
As per the documentation and your test cases , colours are marked only to the words which are wrong spelled and correct words are in normal colour.But when it comes for my data checking even correct words are shown in color marked and so i cannot able to distingush between wrong and correct word by observing colors.
Documentation says correct folder contains correct pages , so this folder contains manually corrected samples?
How many things are necessary if i want to run the same process and to expect the same results on my data , as showing in yours.

Please guide ,
Thank You

from openocrcorrect.

rohitsaluja22 avatar rohitsaluja22 commented on July 20, 2024
  • which 2 OCR systems do you use? Quality of Colour coding depends on the quality of difference in models and training data of two OCR systems. The more different they are, the better would be the quality.

  • The samples were corrected using our software. For demo, we cannot keep them in folder "Corrected". So we just shifted them from "Corrected" to "Correct".

  • All the things are given in Readme. Read them carefully. I agree that it's tedious, but once understood it saves a lot of time.

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

so sir you have used Indsenz and Google Doc OCR outputs for quality difference?

from openocrcorrect.

rohitsaluja22 avatar rohitsaluja22 commented on July 20, 2024

yes, for Sanskrit. what OCR systems you are using and what language you are working on?

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

so for better quality of spell checking , its mandatory for me to use two different ocr's for better spellchecking.
I am only working work hindi language.
So the folder Book3hindi contains outputs from two different ocr's ?

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

i am using Tesseract OCR and Google doc OCR

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

Sir I have combined my dictionary with your and took sample converted pages by Google Doc and Tesseract OCR and loaded in the system, same issue of not showing color to wrongs words is happening , I have also created IEOCR and GEOCR folders of data.
Please guide.
Thank you

from openocrcorrect.

rohitsaluja22 avatar rohitsaluja22 commented on July 20, 2024

Yes, you should try Indsenz and Tesseract, or Indsenz and Google Doc.
Tesseract and Google Doc are both from Google, probably that is the reason you are not getting good results.

Or send me your folder structure via mail. I can check if something else is wrong.

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

Indsenz shows only premium version, Which is not affordable for me, Can you suggest some another OCR in combination with TESSERACT.
ThankYou

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

sir i have correct word in file like पडे़ , लडे़ , पडे,लडे
but when i click spellcheck button , they automatically becomes पड़ए,लड़ए.
Even though my dict doesnt contain these words ( पड़ए,लड़ए.).
what should i do to correct it.?
Thankyou

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

Sir, It tool is working well now, I have setted my data according to thee standard names.
Thankyou.

from openocrcorrect.

rohitsaluja22 avatar rohitsaluja22 commented on July 20, 2024

Cool.. all the best.
Please reply which OCR engines you are using and then close the issue.

from openocrcorrect.

NikhilKhuje2797 avatar NikhilKhuje2797 commented on July 20, 2024

Tesseract-OCR and Google DOC OCR.
Thank You

from openocrcorrect.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.