Giter Club home page Giter Club logo

Comments (9)

ionFreeman avatar ionFreeman commented on September 25, 2024 1

from pdftotext.

jalan avatar jalan commented on September 25, 2024

Glad you like the library!

I agree that it would be nice to have wheels built automatically for all platforms, but the tools in this space are lacking:

  • cibuildwheel is in beta, but it seems stable enough for building wheels. But because we require shared libraries here, then we also need to bundle those, so
  • auditwheel can help with the linux bundling, but it doesn't seem to get much use, and its build is currently failing
  • delocate can help with the macOS bundling, but it identifies itself as alpha quality without much use in the wild
  • there doesn't seem to be any tool out there to help with the bundling on Windows, so that would have to be done manually anyway. I don't own any Windows system to troubleshoot on
  • should we just bundle poppler itself, or all its dependencies (libtiff, libpng, fontconfig, freetype, ...) too?
  • what versions of libc and libstdc++ should we build with for linux? To get good compatibility across lots of distros, we have to build with something as old as CentOS 5. But then that means we have to use an older version of poppler, complete with all its bugs

As long as the Python packaging community doesn't really address this issue (haven't updated the relevant documentation for over five years!) and the tooling is in a sad state, I would rather keep the status quo. It requires one extra step by the user, but it is simple and reliable and should work on any unix-like system that has poppler.

from pdftotext.

Mattwmaster58 avatar Mattwmaster58 commented on September 25, 2024

I didn't realize that this was such a broad issue. Thank you for sharing your position on this. Unfortunately I'm not very experienced when it comes to packaging binary extensions/building them, so thus far I've been unable to build this on Windows.

from pdftotext.

SonGokussj4 avatar SonGokussj4 commented on September 25, 2024

Hi. I've got a problem. This is the only package that works for my type of PDFs.

I have developers rights on my PC so I've installed python3-dev, popper/dev and so on...
Installed pdftotext to my virtual environment and run without a problem.

But when my colleague goes into my shared folder from different PC (he doesn't have dev rights), activate the environment, he fails on import pdftotext because it can't find the libpoppler-cpp.so...

Can it be copied somewhere? Or... How would I provide this script to Linux colleagues?

from pdftotext.

ionFreeman avatar ionFreeman commented on September 25, 2024

I didn't realize that this was such a broad issue. Thank you for sharing your position on this. Unfortunately I'm not very experienced when it comes to packaging binary extensions/building them, so thus far I've been unable to build this on Windows.

I'm currently unable to build it on WIndows. Did you ever figure it out? I guess building Poppler, really, is the problem, which seems to work for these guys.

from pdftotext.

jalan avatar jalan commented on September 25, 2024

If you need to use this library on Windows, right now you can

  • use conda to install poppler as described in the README, or
  • build poppler yourself and figure out how to get it all working. See #72 for one example

I don't have or use Windows, so even getting it working in conda was a challenge!

The link you provided is for tsdgeos/poppler_mirror. tsdgeos is the maintainer of poppler, so of course he knows how to build it on Windows! 😄 This issue is more about the difficulty of packing and distributing, and since I don't have Windows, well...

from pdftotext.

bauerj avatar bauerj commented on September 25, 2024

there doesn't seem to be any tool out there to help with the bundling on Windows, so that would have to be done manually anyway.

That used to be true but now there is https://github.com/adang1345/delvewheel 🥳

delvewheel is a command-line tool for creating Python wheel packages for Windows that have DLL dependencies that may not be present on the target system. It is functionally similar to auditwheel (for Linux) and delocate (for Mac OS).

It should be relatively easy to integrate into cibuildwheel using something like CIBW_REPAIR_WHEEL_COMMAND=delvewheel repair.

I'm going to look into this if noone else is doing that already.

should we just bundle poppler itself, or all its dependencies (libtiff, libpng, fontconfig, freetype, ...) too?

Generally speaking, a binary wheel should contain all dependencies that would not be found on a vanilla system.

what versions of libc and libstdc++ should we build with for linux?

There are three different compatibility sets (and corresponding OS images): manylinux1, manylinux2010 and manylinux2014. If possible, manylinux1 (the oldest) should be targeted.

But then that means we have to use an older version of poppler, complete with all its bugs

Not necessarily. It should be possible to compile poppler against older libc versions.

In general, I agree with your assessment of the wheel packaging toolchain. It really is a mess.

from pdftotext.

bauerj avatar bauerj commented on September 25, 2024

Okay, so I managed to get Windows builds working. This actually revealed an issue in delvewheel but @adang1345 released a fixed version immediately!

I will send in a PR shortly.

from pdftotext.

grahamperrin avatar grahamperrin commented on September 25, 2024
% uname -sKU
FreeBSD 1400053 1400053
% which pdftotext
/usr/local/bin/pdftotext
% pkg provides /usr/local/bin/pdftotext
Name    : poppler-utils-21.12.0
Desc    : Poppler's xpdf-workalike command line utilities
Repo    : FreeBSD
Filename: usr/local/bin/pdftotext
% pkg info --list textproc/py-pdftotext
py38-pdftotext-2.2.2:
        /usr/local/lib/python3.8/site-packages/pdftotext-2.2.2-py3.8.egg-info/PKG-INFO
        /usr/local/lib/python3.8/site-packages/pdftotext-2.2.2-py3.8.egg-info/SOURCES.txt
        /usr/local/lib/python3.8/site-packages/pdftotext-2.2.2-py3.8.egg-info/dependency_links.txt
        /usr/local/lib/python3.8/site-packages/pdftotext-2.2.2-py3.8.egg-info/top_level.txt
        /usr/local/lib/python3.8/site-packages/pdftotext.cpython-38.so
        /usr/local/share/licenses/py38-pdftotext-2.2.2/LICENSE
        /usr/local/share/licenses/py38-pdftotext-2.2.2/MIT
        /usr/local/share/licenses/py38-pdftotext-2.2.2/catalog.mk
% 

https://www.freshports.org/graphics/poppler-utils/

https://www.freshports.org/textproc/py-pdftotext/

from pdftotext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.