Comments (9)
Yep you nailed it ! Mounted the big eng model file only. (not the directory) and it works !
from stirling-pdf.
Sounds like a good fix. Btw when you convert a pdf which you first used ocr to word, the ocr layer disappears and the word document has only an image. Is this intended ?
Also amazing app! You should post it on r/selfhosted. A pdf manipulator is one of the most requested apps I’ve seen over the years. The closest is hrconvert but that doesn’t work very well and it’s ui is dated.
from stirling-pdf.
Weird are you able to give me the pdf to test against or is it private? I can't reproduce this my side
Maybe it's how the bigger eng data works I will test with that as well tonight or tomorrow
from stirling-pdf.
Since I am using this at work, that PDF is private, however it happens with all pdf's that I have tried. Will try the fast model and let you know.
Edit: Same issue with fast model. Any way I can send you debug logs or such ? Just point me in the right direction.
from stirling-pdf.
I think I found issue
Mounting that directory is removing some needed files which are already there and which you are missing
I will change docker file this weekend to ensure those files are kept on mount
from stirling-pdf.
I have a fix to create a temp folder during build and copy everything from temp to final folder on container startup.
It does mean you wont be able to delete of the old files in that folder but you can add any new ones fine now
also renaming eng.traineddata to English-Lite.traineddata
from stirling-pdf.
It's not ideal but the backend for pdf to word document is libre office so I can't change how it handles the OCR layer sadly
you would have to raise a issue with LibreOffice directly to get that fixed
The OCR tech im using in backend has different ways of rendering the text
here
So i will try this usecase and see if i can get it working, will track the issue here https://github.com/Frooodle/Stirling-PDF/issues/118
from stirling-pdf.
Also thanks for the comments! i did post on reddit when i first started this app
https://www.reddit.com/r/selfhosted/comments/10pexhn/new_browserbased_pdf_editor_github_link/
I plan to post again when I release V1.0.0 (Once i finally add PDF cropping, PDF signing and improved PDF image importing)
Feel free to make a post for me though:') haha
from stirling-pdf.
Fixed with extra lang support in latest patch
from stirling-pdf.
Related Issues (20)
- [Page Organizer] Index starts from 0 instead of 1 HOT 1
- PDF to image problem
- java.lang.UnsupportedOperationException on split PDF operation
- Extract page - it is not intuitive that the first page is number 0 HOT 1
- docx to pdf OR pdf to word - ERROR HOT 4
- Investigate Docker-lite container not building in git action release
- Add command output to stacktrace logs for libre etc
- Adding stamp to PDF causes error HOT 1
- Browser download always blocks file HOT 7
- add shadow
- Split pdf page numbering issues
- Use PDFium instead of PDF.js HOT 1
- new chmod in entrypoint.sh now breaks running as non root HOT 3
- auto-rename adds whitespace at the end
- Missing features——pdf To xlsx
- [Feature] Textfield for markdown-to-pdf
- 0.22.0, 0.22.1, 0.22.2 - Error permission HOT 11
- Delete pages HOT 4
- 0.22.22 no login page HOT 3
- Issue regarding Chinese fonts HOT 18
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stirling-pdf.