Giter Club home page Giter Club logo

Comments (18)

Belval avatar Belval commented on July 29, 2024 3

Okay, so after diving into pdftoppm error output, I am afraid I do not want to raise the MissingFontError.

The thing is that pdftoppm is configured to use a fallback when something fishy is up with the font. People using pdftoppm right now could end up with new exceptions in their applications.

What I did is add a new strict parameter to the function call that you must set to True. If you want to catch the MissingFontError exception it will be under the PDFSyntaxError name and it can also be thrown if your PDF are malformed.

from pdf2image.

bpdev97 avatar bpdev97 commented on July 29, 2024 1

Are you familiar with docker? It's going to be hard to give you a test pdf as I have no idea what fonts you have on your system. But I could create an example script/pdf in a docker container for you.

from pdf2image.

Belval avatar Belval commented on July 29, 2024 1

Got it.

I think pdf2image should raise a custom MissingFont exception in that case. Would that be satisfying to you?

from pdf2image.

Belval avatar Belval commented on July 29, 2024 1

I know the maintainers are active on IRC, I'll try to contact them.

You can expect a "fixed" version in the next few days.

from pdf2image.

Belval avatar Belval commented on July 29, 2024 1

User tsdgeos recommended to open a bug so I'll do that, but in the mean time, most Linux installation do not ship with the last version of poppler. Therefore, fixing it in a future version would yield very little value for most people. I'll probably go forward with matching text for this version and change the error handling should a more tasteful solution presents itself.

screenshot from 2018-12-12 10-03-27

EDIT: Opened an issue on their repo, https://gitlab.freedesktop.org/poppler/poppler/issues/682

from pdf2image.

Belval avatar Belval commented on July 29, 2024 1

Closing for lack of activity and I think this is reasonably fixed. Feel free to reopen if you experience related issues.

from pdf2image.

Belval avatar Belval commented on July 29, 2024

Interesting, could you provide a sample PDF to test a potential fix?

from pdf2image.

Belval avatar Belval commented on July 29, 2024

Oh yeah a test case in a docker container would be awesome!

from pdf2image.

bpdev97 avatar bpdev97 commented on July 29, 2024

Cool, here's the docker container and steps to reproduce.

Start bash in the container. (This has your library and python 3 installed)
docker run -it bpdev97/pdf2image-font-example /bin/bash

Create a png from test.pdf
pdftoppm -r 150 -png test.pdf out

You should see a bunch of font missing errors. pdf2image masks these errors.

from pdf2image.

bpdev97 avatar bpdev97 commented on July 29, 2024

Yeah for sure, that would work great. Thanks for being so responsive!

from pdf2image.

Belval avatar Belval commented on July 29, 2024

So I did a quick prototype:

for uid, proc in processes:
    data, err = proc.communicate()

    if b"Couldn't find a font for" in err:
        raise MissingFontError()

    if output_folder is not None:
        images += __load_from_output_folder(output_folder, uid)
    else:
        images += parse_buffer_func(data)

Which returns:

Traceback (most recent call last):
  File "test2.py", line 3, in <module>
    convert_from_path('../test.pdf')
  File "/app/pdf2image/pdf2image/pdf2image.py", line 71, in convert_from_path
    raise MissingFontError()
pdf2image.exceptions.MissingFontError

But I dislike the reliance on someone at poppler not changing the error message...

Thoughts? I looking into the doc to find an error code that would be less prone to change.

from pdf2image.

bpdev97 avatar bpdev97 commented on July 29, 2024

I'll dig a little more, but I didn't see anything earlier that you could catch other than that error text. I agree it sucks if you have to rely on them not changing the error message though.

from pdf2image.

bpdev97 avatar bpdev97 commented on July 29, 2024

I thought you could maybe use the exit code of pdftoppm but it appears to return 0 even though the man page says otherwise. Okay cool, thank you!

screen shot 2018-12-11 at 10 54 39 pm

from pdf2image.

bpdev97 avatar bpdev97 commented on July 29, 2024

That will work, thanks!

from pdf2image.

Belval avatar Belval commented on July 29, 2024

Version 1.2.0 is now available on PyPi.

Feel free to close the issue once you ensured it works as expected.

Thank you.

from pdf2image.

bpdev97 avatar bpdev97 commented on July 29, 2024

Okay, I looked at your fix. That should work, however one improvement I would make is to also bubble up the error text in the exception.

from pdf2image.

Belval avatar Belval commented on July 29, 2024

Excellent idea, I will push an updated version as soon as I can.

from pdf2image.

Belval avatar Belval commented on July 29, 2024

Version 1.2.1 is now available on PyPi.

The exception message is now the stderr of pdftoppm.

from pdf2image.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.