Comments (18)
Okay, so after diving into pdftoppm error output, I am afraid I do not want to raise the MissingFontError.
The thing is that pdftoppm is configured to use a fallback when something fishy is up with the font. People using pdftoppm right now could end up with new exceptions in their applications.
What I did is add a new strict
parameter to the function call that you must set to True
. If you want to catch the MissingFontError
exception it will be under the PDFSyntaxError
name and it can also be thrown if your PDF are malformed.
from pdf2image.
Are you familiar with docker? It's going to be hard to give you a test pdf as I have no idea what fonts you have on your system. But I could create an example script/pdf in a docker container for you.
from pdf2image.
Got it.
I think pdf2image should raise a custom MissingFont exception in that case. Would that be satisfying to you?
from pdf2image.
I know the maintainers are active on IRC, I'll try to contact them.
You can expect a "fixed" version in the next few days.
from pdf2image.
User tsdgeos recommended to open a bug so I'll do that, but in the mean time, most Linux installation do not ship with the last version of poppler. Therefore, fixing it in a future version would yield very little value for most people. I'll probably go forward with matching text for this version and change the error handling should a more tasteful solution presents itself.
EDIT: Opened an issue on their repo, https://gitlab.freedesktop.org/poppler/poppler/issues/682
from pdf2image.
Closing for lack of activity and I think this is reasonably fixed. Feel free to reopen if you experience related issues.
from pdf2image.
Interesting, could you provide a sample PDF to test a potential fix?
from pdf2image.
Oh yeah a test case in a docker container would be awesome!
from pdf2image.
Cool, here's the docker container and steps to reproduce.
Start bash in the container. (This has your library and python 3 installed)
docker run -it bpdev97/pdf2image-font-example /bin/bash
Create a png from test.pdf
pdftoppm -r 150 -png test.pdf out
You should see a bunch of font missing errors. pdf2image masks these errors.
from pdf2image.
Yeah for sure, that would work great. Thanks for being so responsive!
from pdf2image.
So I did a quick prototype:
for uid, proc in processes:
data, err = proc.communicate()
if b"Couldn't find a font for" in err:
raise MissingFontError()
if output_folder is not None:
images += __load_from_output_folder(output_folder, uid)
else:
images += parse_buffer_func(data)
Which returns:
Traceback (most recent call last):
File "test2.py", line 3, in <module>
convert_from_path('../test.pdf')
File "/app/pdf2image/pdf2image/pdf2image.py", line 71, in convert_from_path
raise MissingFontError()
pdf2image.exceptions.MissingFontError
But I dislike the reliance on someone at poppler not changing the error message...
Thoughts? I looking into the doc to find an error code that would be less prone to change.
from pdf2image.
I'll dig a little more, but I didn't see anything earlier that you could catch other than that error text. I agree it sucks if you have to rely on them not changing the error message though.
from pdf2image.
I thought you could maybe use the exit code of pdftoppm
but it appears to return 0 even though the man page says otherwise. Okay cool, thank you!
from pdf2image.
That will work, thanks!
from pdf2image.
Version 1.2.0 is now available on PyPi.
Feel free to close the issue once you ensured it works as expected.
Thank you.
from pdf2image.
Okay, I looked at your fix. That should work, however one improvement I would make is to also bubble up the error text in the exception.
from pdf2image.
Excellent idea, I will push an updated version as soon as I can.
from pdf2image.
Version 1.2.1 is now available on PyPi.
The exception message is now the stderr of pdftoppm.
from pdf2image.
Related Issues (20)
- PDFPageCountError: Unable to get page count. I/O Error: Couldn't open file 'C:\Users\cdragomir2\Desktop\dataiku\Non Phub Samples\New folder (3)\007-084841-1 to 31 Dec'22': No error. HOT 5
- pdf2image poppler error in Linux HOT 1
- convert_from_bytes has wrong argument name or docstring
- Missing characters like "Ε Δ ΔΎ δ»" when converting a fillable pdf form to image HOT 6
- Missing py.typed file HOT 4
- Wrong image converted HOT 2
- No source release on pypi for 1.16.3 HOT 1
- Inconsistent results between servers with the same code and PDF file HOT 2
- Problem converting from Image coordinates to PDF coordinates
- Invalid SOS parameters for sequential JPEG
- Adding conda poppler to PATH HOT 1
- Lossless conversion HOT 3
- weird CMYK and text at top of output image
- Missing optional arguments in pdfinfo_from_bytes and pdfinfo_from_path HOT 2
- `Page rot` metadata and `size` param interact incorrectly in convert_from_path() HOT 5
- When a pdf is converted to a picture, it takes up a very small part of the picture
- Add an option to start counter generator at index 0 HOT 1
- FileNotFoundError with poppler_path HOT 1
- Is there way to control rotation?
- Some missing words from converting PDF to Image
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdf2image.