Since every page will spawn a new instance of the scan_perpa

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Fast scanning ADF with long post-processing steps will consume all resources about sane-scan-pdf HOT 12 CLOSED

rocketraman commented on September 24, 2024

Fast scanning ADF with long post-processing steps will consume all resources

from sane-scan-pdf.

Comments (12)

MoD01 commented on September 24, 2024 1

Has running out of resources actually been an issue for you, or is this more of an academic concern?

I use my Raspberry Pi 4 because my Scansnap has not WebDAV or FTP feature. The resources of the pi runs out very quickly.

@rocketraman Can you please add sem as additional requirement in the readme ? The lack of this information cost me some time to debug the bottleneg - until I found this closed ticket here telling my the if sem is installed: solve problem code insertion :)

from sane-scan-pdf.

rocketraman commented on September 24, 2024

Yup, that was the intended behavior to parallelize the processing. Has running out of resources actually been an issue for you, or is this more of an academic concern? I find it difficult to believe a scanner could scan pages fast enough to cause a problem.

from sane-scan-pdf.

jarrodsfarrell commented on September 24, 2024

Yeah. We have a Fujitsu that can scan upto 60PPM. I was doing some testing on a laptop with the scanner on duplex, producing 78~ pages, and it'd spawn an absurd amount of tesseract processes to consume 2/3rds of the laptop's 16GB of RAM, kept CPU pegged at 100%, and all tesseract processes working at a crawl.

from sane-scan-pdf.

rocketraman commented on September 24, 2024

Nice scanner :-) Ok, good thing to fix.

from sane-scan-pdf.

rocketraman commented on September 24, 2024

@jarrodsfarrell Probably the easiest way I've found to do this is to use sem from the GNU parallel project, but it will introduce another (optional) dependency. Its widely available so I don't have a problem with adding this, but would that work for your situation?

from sane-scan-pdf.

jarrodsfarrell commented on September 24, 2024

Taking a look into the project's man page it seems perfectly fine to use and a non-issue to have another dependency.

from sane-scan-pdf.

rocketraman commented on September 24, 2024

@jarrodsfarrell Can you grab the changes in pull #5 and see if that solves your problem? If it works for you, I'll merge it.

from sane-scan-pdf.

jarrodsfarrell commented on September 24, 2024

Currently at work, but I will give it a try whenever I can.

from sane-scan-pdf.

jarrodsfarrell commented on September 24, 2024

Unfortunately we don't have the 60PPM like before so I'm using a 25PPM model instead.

Regardless, it seems like using sem is a overall good change. I think it's even letting the OCRing step work a bit faster than running all the tesseract processes all at once (less task-switching?) and pauses between scans are noticeably more brief (scan process doesn't have to fight as much for resources?). Additional bonus is that the movement of the console is a good indicator that work is still being done instead of staying still until the tesseract processes begin quiting.

~~Anyways, should the last argument be erroring like this?~~

USER@HOST:~/Workspace/sane-scan-pdf$ ./scan -d -m color --crop --deskew --ocr out.pdf
Unknown argument: out.pdf

Nevermind. It'd help if I read the documentation.

from sane-scan-pdf.

rocketraman commented on September 24, 2024

Thanks for reporting and testing. I'll merge this.

from sane-scan-pdf.

rocketraman commented on September 24, 2024

@rocketraman Can you please add sem as additional requirement in the readme ? The lack of this information cost me some time to debug the bottleneg - until I found this closed ticket here telling my the if sem is installed: solve problem code insertion :)

It's already listed under optional requirements, but perhaps this issue deserves a more extensive call out.

from sane-scan-pdf.

rocketraman commented on September 24, 2024

@MoD01 I added an explanatory line in features for future people in your situation...

from sane-scan-pdf.

Fast scanning ADF with long post-processing steps will consume all resources about sane-scan-pdf HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent