Giter Club home page Giter Club logo

Comments (5)

GFoley83 avatar GFoley83 commented on June 3, 2024

@Tentacule Just bumping this one.

I've run a bit of a test with the latest version of PgsToSrt and Subtitle Edit w/ Tesseract 5.3.3
I can't seem to replicate the accuracy of Subtitle Edit using PgsToSrt, even with Fix OCR errors unchecked. Some of the conversion, which should be pretty basic, come out as gibberish (see screenshot).

Command I'm using for PgsToSrt is:
dotnet PgsToSrt.dll --input "file.mkv" --tracklanguage eng --tesseractdata "C:\Program Files\Tesseract-OCR\tessdata" --tesseractlanguage eng

Any ideas here?

image

image

from pgstosrt.

Tentacule avatar Tentacule commented on June 3, 2024

I won't add "Fix OCR errors" for now because this functionality is not included in LibSE.

I have done some tests, it looks like an issue on windows, it's working fine when run on linux. I'll investigate.

from pgstosrt.

GFoley83 avatar GFoley83 commented on June 3, 2024

I just flicked you an email.

I don't think "Fix OCR errors" will make a difference anyway as I had it disabled in SE (see first screenshot) and it still converted the PGS subs almost perfectly. Issue is something else.

Thanks for looking into it.

from pgstosrt.

Tentacule avatar Tentacule commented on June 3, 2024

There was an isssue in windows Tesseract dll, I tried another one and it looks good now.

Here is a new release with this change: PgsToStr-1.4.5.zip

from pgstosrt.

GFoley83 avatar GFoley83 commented on June 3, 2024

Can confirm that 1.4.5 fixes it. Does a much better job at conversion with no random gibberish to be seen.
Tested with eng.traineddata from:

Command:

dotnet "PgsToSrt-1.4.5\\PgsToSrt.dll" --input "file.mkv" --tracklanguage eng --tesseractdata "C:\\Program Files\\Tesseract-OCR\\tessdata" --tesseractlanguage eng

dotnet "PgsToSrt-1.4.5\\PgsToSrt.dll" --input "file.mkv" --tracklanguage eng --tesseractdata "C:\\Program Files\\Tesseract-OCR\\tessdata_best" --tesseractlanguage eng

On the test I ran with the english subtitles for the movie Blade, using tessdata_best took just under 4 minutes with mostly perfect results, while tessdata took 1 minute and had only a few very minor mistakes e.g. capital "I" instead of "i" etc.

from pgstosrt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.