Comments (7)
Hi! I can't really debug this without the PDF that's causing a problem for you -- can you share it?
from pdfquery.
I would love to, but it's proprietary and confidential. Sorry :(
from pdfquery.
FYI, experienced a different problem this time:
>>> pdf = pdfquery.PDFQuery("input/2015/12-Dec/17-Dec/17-12.pdf")
>>> pdf.load()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 385, in load
self.tree = self.get_tree(*_flatten(page_numbers))
File "/usr/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 486, in get_tree
pages = enumerate(self.get_layouts())
File "/usr/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 608, in get_layouts
return (self.get_layout(page) for page in self._cached_pages())
File "/usr/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 636, in _cached_pages
self._pages += list(self._pages_iter)
File "/usr/local/lib/python2.7/site-packages/pdfminer/pdfpage.py", line 100, in create_pages
yield klass(document, objid, tree)
File "/usr/local/lib/python2.7/site-packages/pdfminer/pdfpage.py", line 53, in __init__
self.mediabox = resolve1(self.attrs['MediaBox'])
KeyError: 'MediaBox'
from pdfquery.
Any update on the "'PDFObjRef' object does not support indexing" issue?
from pdfquery.
I experienced this same issue, and also cannot share the PDF being used unfortunately.
from pdfquery.
I have a similar problem.
>>> pdf.load() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/anaconda3/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 385, in load self.tree = self.get_tree(*_flatten(page_numbers)) File "/anaconda3/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 487, in get_tree for n, page in pages: File "/anaconda3/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 608, in <genexpr> return (self.get_layout(page) for page in self._cached_pages()) File "/anaconda3/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 601, in get_layout self.interpreter.process_page(page) File "/anaconda3/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 852, in process_page self.render_contents(page.resources, page.contents, ctm=ctm) File "/anaconda3/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 864, in render_contents self.execute(list_value(streams)) File "/anaconda3/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 888, in execute func(*args) File "/anaconda3/lib/python3.7/site-packages/pdfminer/pdfinterp.py", line 772, in do_TJ self.device.render_string(self.textstate, seq, self.ncs, self.graphicstate.copy()) File "/anaconda3/lib/python3.7/site-packages/pdfminer/pdfdevice.py", line 87, in render_string scaling, charspace, wordspace, rise, dxscale, ncs, graphicstate) File "/anaconda3/lib/python3.7/site-packages/pdfminer/pdfdevice.py", line 105, in render_string_horizontal ncs, graphicstate) File "/anaconda3/lib/python3.7/site-packages/pdfminer/converter.py", line 121, in render_char textwidth = font.char_width(cid) File "/anaconda3/lib/python3.7/site-packages/pdfminer/pdffont.py", line 525, in char_width return self.widths[cid] * self.hscale TypeError: unsupported operand type(s) for *: 'PDFObjRef' and 'float'
from pdfquery.
Here too. Same problem:
File "pdfqueryparser.py", line 4, in <module>
pdf.load()
File "/usr/local/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 385, in load
self.tree = self.get_tree(*_flatten(page_numbers))
File "/usr/local/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 487, in get_tree
for n, page in pages:
File "/usr/local/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 608, in <genexpr>
return (self.get_layout(page) for page in self._cached_pages())
File "/usr/local/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 603, in get_layout
layout = self._add_annots(layout, page.annots)
File "/usr/local/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 647, in _add_annots
annot = self._set_hwxy_attrs(annot)
File "/usr/local/lib/python3.7/site-packages/pdfquery/pdfquery.py", line 665, in _set_hwxy_attrs
attr['x0'] = bbox[0]
TypeError: 'PDFObjRef' object does not support indexing
from pdfquery.
Related Issues (20)
- Can't get coordinates.
- Pseudo classes not working
- How does pdfquery determine the index?
- can load the pages I need HOT 1
- Can't concat str to bytes HOT 3
- ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters HOT 1
- PdfQuery | .extract problem
- loading file with filecache AttributeError: 'NoneType' object has no attribute 'writestr' HOT 1
- windows only: pdfquery is locking the opended pdf-file HOT 1
- Extract all words with their coordinates.
- cache collision HOT 1
- can't concat str to bytes EASY FIX -- please update! HOT 3
- recommend you use pdfminer rather than pdfquery HOT 1
- Not able to detect horizontal lines properly.
- Coordinates to locator
- Is this project still alive? HOT 3
- Python 2 dependency problem: pyquery
- Support for password protected pdf files
- AttributeError: module 'pdfquery' has no attribute 'PDFQuery'
- TypeError: 'PDFObjRef' object is not subscriptable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdfquery.