Giter Club home page Giter Club logo

Comments (3)

jcushman avatar jcushman commented on July 29, 2024

Do you have example code that reproduces this error? pdf.load() is working for me with your supplied file.

from pdfquery.

adamestein avatar adamestein commented on July 29, 2024

I'm assuming you have a little script to send the file to pdf.load(). Could you attach that? That way, I can run exactly what you did with the same file. If it doesn't work for me, that could indicate something else is causing the issue. If it does work, I can trace the difference between what you did vs what my code is doing. It could be also possible that by removing the private information from the PDF file, I also removed what was causing the problem. It's been so long since I've submitted this that I don't remember. I think I verified that I was still having the issue with the PDF I've attached, but can't remember for sure.

from pdfquery.

jacksongs avatar jacksongs commented on July 29, 2024

I seem to be getting this error as well. PDF file here. I have also tested on this

  • pdfquery 0.4.3
  • pdfminer 20170720
  • python 3.6
  • OSX 10.12.6

Error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-6d31003dedab> in <module>()
     13     pdf = pdfquery.PDFQuery("../"+name)
     14     pdf.load()
---> 15     tree = pdf.get_tree()
     16     #tree.write("current.xml", pretty_print=True)
     17 

~/anaconda3/lib/python3.6/site-packages/pdfquery/pdfquery.py in get_tree(self, *page_numbers)
    485                 else:
    486                     pages = enumerate(self.get_layouts())
--> 487                 for n, page in pages:
    488                     page = self._xmlize(page)
    489                     page.set('page_index', obj_to_string(n))

~/anaconda3/lib/python3.6/site-packages/pdfquery/pdfquery.py in <genexpr>(.0)
    606     def get_layouts(self):
    607         """ Get list of PDFMiner Layout objects for each page. """
--> 608         return (self.get_layout(page) for page in self._cached_pages())
    609 
    610     def _cached_pages(self, target_page=-1):

~/anaconda3/lib/python3.6/site-packages/pdfquery/pdfquery.py in get_layout(self, page)
    601         self.interpreter.process_page(page)
    602         layout = self.device.get_result()
--> 603         layout = self._add_annots(layout, page.annots)
    604         return layout
    605 

~/anaconda3/lib/python3.6/site-packages/pdfquery/pdfquery.py in _add_annots(self, layout, annots)
    647                     annot = self._set_hwxy_attrs(annot)
    648                 try:
--> 649                     annot['URI'] = resolve1(annot['A'])['URI']
    650                 except KeyError:
    651                     pass

TypeError: string indices must be integers

from pdfquery.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.