Comments (8)
The problem is that with the -table
option documents will be parsed a little differently. So templates need to be different.
It's simple to get the static version of xpdf. No dependencies.
from invoice2data.
Ok, I understand your point. I hope that the Linux distro will update their packages soon.
from invoice2data.
Debian sid currently does not package pdftotext in the xpdf package, but rather in the poppler-utils package. Which version of that has the --table option?
from invoice2data.
Debian Sid has the required package, but it's called xpdf
, not poppler
.
from invoice2data.
I know the requirement sucks until Debian Sid is out. But as you saw in your other issue, the table-option makes a big difference for almost all invoices that use tables. (almost all invoices use tables). If we allow 2 versions, we need to redo most matching templates.
The download is just a single binary, so it's not impossible.
- Ubuntu already has the correct version. (Xenial)
- OSX also has it via Homebrew
- Debian doesn't, but will get it next year. (or can just add the bin from the website)
Do we need a better error message until then?
from invoice2data.
If you go to https://packages.debian.org/sid/amd64/xpdf/filelist you will see that the pdftoxxxx utils are no longer there. However if you do to https://packages.debian.org/sid/amd64/poppler-utils/filelist you will find them all there.
from invoice2data.
Ah. They re-organized it. That should be added as note. Would be even better if they add a backports package.
from invoice2data.
It will try with- and without -table
option now and give a warning.
I guess anyone adding new templates should use the newer version to avoid them breaking in the next months.
from invoice2data.
Related Issues (20)
- Date Format not being applied HOT 2
- Return a Default Value if Field Failed to Parse HOT 1
- Distinct outputs of line-field HOT 2
- Use DocTR & PaddleOCR for OCR HOT 1
- Linesblock without end-argument possible? HOT 4
- No match on group regex results in List index out of range
- InvoiceTemplate object has no attribute 'template' HOT 1
- Cookiecutter Template for Repo management?
- Support for tax lines
- OSS-Fuzz Integration HOT 3
- --move will fail when any parsed field contains illegal filesystem characters HOT 1
- A way to see which template was selected for a parsed document HOT 3
- Probably issue with text parser
- Python App with Json export HOT 2
- Parsing Lines in Invoice --> Failed to find any lines for "lines" HOT 2
- RegEx for Date HOT 1
- Parse field "amount" from 2 lines HOT 1
- read image pdf with tesseract falls HOT 1
- I am having trouble using the table plug-in. Please help HOT 6
- Add an extra parameter (in API and CLI) to include specific input reader params
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from invoice2data.