Comments (3)
Do you have example code that reproduces this error? pdf.load()
is working for me with your supplied file.
from pdfquery.
I'm assuming you have a little script to send the file to pdf.load(). Could you attach that? That way, I can run exactly what you did with the same file. If it doesn't work for me, that could indicate something else is causing the issue. If it does work, I can trace the difference between what you did vs what my code is doing. It could be also possible that by removing the private information from the PDF file, I also removed what was causing the problem. It's been so long since I've submitted this that I don't remember. I think I verified that I was still having the issue with the PDF I've attached, but can't remember for sure.
from pdfquery.
I seem to be getting this error as well. PDF file here. I have also tested on this
- pdfquery 0.4.3
- pdfminer 20170720
- python 3.6
- OSX 10.12.6
Error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-6d31003dedab> in <module>()
13 pdf = pdfquery.PDFQuery("../"+name)
14 pdf.load()
---> 15 tree = pdf.get_tree()
16 #tree.write("current.xml", pretty_print=True)
17
~/anaconda3/lib/python3.6/site-packages/pdfquery/pdfquery.py in get_tree(self, *page_numbers)
485 else:
486 pages = enumerate(self.get_layouts())
--> 487 for n, page in pages:
488 page = self._xmlize(page)
489 page.set('page_index', obj_to_string(n))
~/anaconda3/lib/python3.6/site-packages/pdfquery/pdfquery.py in <genexpr>(.0)
606 def get_layouts(self):
607 """ Get list of PDFMiner Layout objects for each page. """
--> 608 return (self.get_layout(page) for page in self._cached_pages())
609
610 def _cached_pages(self, target_page=-1):
~/anaconda3/lib/python3.6/site-packages/pdfquery/pdfquery.py in get_layout(self, page)
601 self.interpreter.process_page(page)
602 layout = self.device.get_result()
--> 603 layout = self._add_annots(layout, page.annots)
604 return layout
605
~/anaconda3/lib/python3.6/site-packages/pdfquery/pdfquery.py in _add_annots(self, layout, annots)
647 annot = self._set_hwxy_attrs(annot)
648 try:
--> 649 annot['URI'] = resolve1(annot['A'])['URI']
650 except KeyError:
651 pass
TypeError: string indices must be integers
from pdfquery.
Related Issues (20)
- Can't get coordinates.
- Pseudo classes not working
- How does pdfquery determine the index?
- can load the pages I need HOT 1
- Can't concat str to bytes HOT 3
- ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters HOT 1
- PdfQuery | .extract problem
- loading file with filecache AttributeError: 'NoneType' object has no attribute 'writestr' HOT 1
- windows only: pdfquery is locking the opended pdf-file HOT 1
- Extract all words with their coordinates.
- cache collision HOT 1
- can't concat str to bytes EASY FIX -- please update! HOT 3
- recommend you use pdfminer rather than pdfquery HOT 1
- Not able to detect horizontal lines properly.
- Coordinates to locator
- Is this project still alive? HOT 3
- Python 2 dependency problem: pyquery
- Support for password protected pdf files
- AttributeError: module 'pdfquery' has no attribute 'PDFQuery'
- TypeError: 'PDFObjRef' object is not subscriptable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdfquery.