Comments (9)
Okay, upon further examination I see that PS2PDF is performing the cropping, despite it being a feature of the Fujitsu driver. Now to work out how to use the driver cropping and not ps2pdf.....
from sane-scan-pdf.
Thank you for a great script.
I'm happy people are finding it useful.
I had no trouble using this script on my Mac (OS X 10.14.3) , once I installed the dependencies in Brew. In order to install scanadf I had to compile the Sane frontends, however this went smoothly.
Nice to hear.
However, the Fujitsu driver ( for my Scansnap S510M) supports brightness and contrast (see https://fossies.org/dox/sane-backends-1.0.27/fujitsu_8c_source.html etc) and I suspect that I can use those option by tweaking your script?
Yes, that should be quite easy to add. I might also add a way to "pass through" options to the driver.
Lastly, the OCR option works perfectly except that the resulting PDF is extremely large i.e. A0 or A1 in size even if the original is A5. Any thoughts on this would be appreciated although I suspect it's an issue with how one of the dependencies works on Macs.
When using ocr, tesseract does the conversion into PDF. Does this size issue happen consistently? What version of tesseract do you have installed?
Okay, upon further examination I see that PS2PDF is performing the cropping, despite it being a feature of the Fujitsu driver. Now to work out how to use the driver cropping and not ps2pdf.....
The --crop
option actually does set the Fujitsu driver --sw-crop=yes
option. It then asks ps2pdf
to respect the bounding box in the Postscript data. I actually can't remember why I added this, or if this is a bug, but if you think this is the problem try commenting out this line, and see what happens:
https://github.com/rocketraman/sane-scan-pdf/blob/master/scan#L188
Also, I find the cropping to be a little too random i.e. sometimes it is too aggressive, however I guess that's simply how the driver works.
You could try commenting out the deletion of the intermediate outputs as mentioned here #8 (comment) and try and determine where in the pipeline the issue is occurring. If it turns out the aggressive crop is as result of a post-scan stage (like ps2pdf), then post the intermediate outputs somewhere (send them to me privately if you wish) and I can take a look.
from sane-scan-pdf.
Check out the code in branch issue-9
-- it should allow you to pass through any driver option you like with -xo
(short for eXtended option). For example:
scan -xo "--brightness 50 --contrast -10" -o scan.pdf
from sane-scan-pdf.
The --crop option actually does set the Fujitsu driver --sw-crop=yes option. It then asks ps2pdf to respect the bounding box in the Postscript data. I actually can't remember why I added this, or if this is a bug.
Just tried it, and it isn't a bug -- its necessary to get the PDF to respect the size of the driver image output.
from sane-scan-pdf.
In my local testing, I do notice that tesseract (with the --ocr
option) does a slightly poorer job at setting the bounding box correctly, and does crop a bit beyond what the driver has output. If you were using the --ocr
option with --crop
, can you try it without --ocr
?
from sane-scan-pdf.
Tesseract creating very large PDF page sizes is a known issue, with a suggested solution:
"Set the dpi of the input images. Use mogrify from ImageMagick or similar."
tesseract-ocr/tesseract#150
I have noticed that my cropping issue is that sometimes scans are cropped to exactly Letter size, even if they are A4 and A4 is specified. IT appears that there is a page size setting somewhere that I am overlooking, although I do not know why I sometimes do end up with A4 pdfs (assuming I don't use OCR/tesseract).
from sane-scan-pdf.
Tesseract creating very large PDF page sizes is a known issue, with a suggested solution:
"Set the dpi of the input images. Use mogrify from ImageMagick or similar."
tesseract-ocr/tesseract#150
Thanks for the link, I'll take a look.
I have noticed that my cropping issue is that sometimes scans are cropped to exactly Letter size, even if they are A4 and A4 is specified. IT appears that there is a page size setting somewhere that I am overlooking, although I do not know why I sometimes do end up with A4 pdfs (assuming I don't use OCR/tesseract).
Its possible you are running into #8.
from sane-scan-pdf.
I've merged the extended option support into master
, closing. I will re-open a separate issue for the tesseract problem once I look into it.
from sane-scan-pdf.
@IBMPortablePc The tesseract issue fixed also as per #12 .
from sane-scan-pdf.
Related Issues (20)
- Using AVStream.codec to pass codec parameters to muxers is deprecated... HOT 1
- My scanadf does not recognize --page-height HOT 8
- bc appears to be required in default configuration
- Batch scan into single files doesn't work HOT 12
- Improve OCR layer compatibility with MacOS Preview via hocr renderer HOT 16
- Adjust brightness and optimise white page recognition HOT 1
- usage with scanbd: invalid argument when script is executed directly HOT 7
- Simulated duplex scanning with page re-ordering HOT 7
- Integration with Paperless-ng HOT 1
- no decode delegate for this image format HOT 4
- Rotate HOT 5
- Settings SOURCE=ADF doesn't work on brother MFC-L2700DW HOT 2
- units: cannot open file '/root/.units': Permission denied HOT 2
- When calling sane-scan-pdf from scanbd, it is run with euid root, causing permission errors
- Scan quality Fujitsu Software vs Sane?
- scanimage instead of scanadf HOT 4
- Page not aligning correctly HOT 20
- Binary name conflict HOT 3
- How to select the ADF as a source for scanning HOT 2
- Scan on Brother DCP-L3550CDW from ADF fails with `unrecognized option '--page-height'` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sane-scan-pdf.