Comments (5)
@Tentacule Just bumping this one.
I've run a bit of a test with the latest version of PgsToSrt and Subtitle Edit w/ Tesseract 5.3.3
I can't seem to replicate the accuracy of Subtitle Edit using PgsToSrt, even with Fix OCR errors unchecked. Some of the conversion, which should be pretty basic, come out as gibberish (see screenshot).
Command I'm using for PgsToSrt is:
dotnet PgsToSrt.dll --input "file.mkv" --tracklanguage eng --tesseractdata "C:\Program Files\Tesseract-OCR\tessdata" --tesseractlanguage eng
Any ideas here?
from pgstosrt.
I won't add "Fix OCR errors" for now because this functionality is not included in LibSE.
I have done some tests, it looks like an issue on windows, it's working fine when run on linux. I'll investigate.
from pgstosrt.
I just flicked you an email.
I don't think "Fix OCR errors" will make a difference anyway as I had it disabled in SE (see first screenshot) and it still converted the PGS subs almost perfectly. Issue is something else.
Thanks for looking into it.
from pgstosrt.
There was an isssue in windows Tesseract dll, I tried another one and it looks good now.
Here is a new release with this change: PgsToStr-1.4.5.zip
from pgstosrt.
Can confirm that 1.4.5 fixes it. Does a much better job at conversion with no random gibberish to be seen.
Tested with eng.traineddata
from:
Command:
dotnet "PgsToSrt-1.4.5\\PgsToSrt.dll" --input "file.mkv" --tracklanguage eng --tesseractdata "C:\\Program Files\\Tesseract-OCR\\tessdata" --tesseractlanguage eng
dotnet "PgsToSrt-1.4.5\\PgsToSrt.dll" --input "file.mkv" --tracklanguage eng --tesseractdata "C:\\Program Files\\Tesseract-OCR\\tessdata_best" --tesseractlanguage eng
On the test I ran with the english subtitles for the movie Blade, using tessdata_best
took just under 4 minutes with mostly perfect results, while tessdata
took 1 minute and had only a few very minor mistakes e.g. capital "I" instead of "i" etc.
from pgstosrt.
Related Issues (20)
- Error during execution HOT 8
- read_params_file: parameter not found: HOT 2
- Get dotnet error during execution HOT 2
- Update to .NET 5.0
- Update to Tesseract 4 HOT 1
- Update upstream dependencies, consider re-doing your customizations on top of the original code instead. HOT 1
- entrypoint.sh for the docker container cannot handle spaces
- Error using docker HOT 3
- Exception occurs on Ubuntu 20.04 HOT 5
- Question HOT 2
- How to Use with MKS? HOT 1
- Bulk Conversion HOT 2
- 1.4.2 release is missing .NET5 binaries HOT 1
- Linux Docker build fails due to .NET 6 HOT 1
- Can we pack this as a dotnet tool?
- Problem with Ubuntu 23.04 and libtesseract4 HOT 1
- Accept other leptonica's names HOT 1
- Model used for trained data HOT 2
- error NETSDK1129: The 'Publish' target is not supported without specifying a target framework.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgstosrt.