mzucker / noteshrink Goto Github PK
View Code? Open in Web Editor NEWConvert scans of handwritten notes to beautiful, compact PDFs
Home Page: https://mzucker.github.io/2016/09/20/noteshrink.html
License: MIT License
Convert scans of handwritten notes to beautiful, compact PDFs
Home Page: https://mzucker.github.io/2016/09/20/noteshrink.html
License: MIT License
Hello,
I wanted to try noteshrink.
After installation, I tried with jpg files in examples folder.
For all files, the same error : TypeError: unique() got an unexpected keyword argument 'return_counts'
I don't know how to do ?
Thanks.
steph@Pergolesi $ noteshrink notesA1.jpg
opened notesA1.jpg
getting palette...
Traceback (most recent call last):
File "/usr/local/bin/noteshrink", line 9, in
load_entry_point('noteshrink==0.1.0', 'console_scripts', 'noteshrink')()
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 582, in main
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 558, in notescan_main
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 381, in get_palette
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 106, in get_bg_color
TypeError: unique() got an unexpected keyword argument 'return_counts'
steph@Pergolesi $ python -V
Python 2.7.6
steph@Pergolesi $ cat /proc/version
Linux version 3.19.0-32-generic (buildd@lgw01-43) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015
steph@Pergolesi $ pip list
apt-xapian-index (0.45)
apturl (0.4.1ubuntu4)
argparse (1.2.1)
BeautifulSoup (3.2.1)
beautifulsoup4 (4.4.1)
chardet (2.0.1)
colorama (0.2.5)
command-not-found (0.3)
configglue (1.1.2)
configobj (4.7.2)
configparser (3.3.0r2)
cssselect (0.9.1)
cssutils (0.9.10)
debtagshw (0.1)
decorator (3.4.0)
defer (1.0.6)
deluge (1.3.6)
dirspec (13.10)
dnspython (1.11.1)
duplicity (0.6.23)
ecdsa (0.13)
Electrum (2.0.2)
eventlet (0.13.0)
feedparser (5.1.3)
googleplaydownloader (1.7)
greenlet (0.4.2)
html5lib (0.999)
httplib2 (0.8)
ipython (1.2.1)
Jinja2 (2.7.2)
kaa-base (0.99.1)
kaa-metadata (0.7.8)
lockfile (0.8)
lxml (3.3.3)
Mako (0.9.1)
MarkupSafe (0.18)
matplotlib (1.3.1)
mechanize (0.2.5)
Mirage (0.9.5.1)
mysql (0.0.1)
mysql-connector-python (2.0.4)
MySQL-python (1.2.3)
ndg-httpsclient (0.3.2)
nemo-emblems (0.0.1)
netifaces (0.8)
nose (1.3.1)
noteshrink (0.1.0)
numpy (1.8.2)
oauthlib (0.6.1)
oneconf (0.3.7.14.04.1)
PAM (0.4.2)
pandas (0.13.1)
paramiko (1.10.1)
pbkdf2 (1.3)
pdfshuffler (0.6.0)
pexpect (3.1)
Pillow (2.6.1)
pip (1.5.4)
piston-mini-client (0.7.5)
protobuf (2.5.0)
pyasn1 (0.1.7)
pyasn1-modules (0.0.5)
pychm (0.8.4)
pycrypto (2.6.1)
pycups (1.9.66)
pycurl (7.19.3)
pydns (2.3.6)
pygobject (3.12.0)
pyinotify (0.9.4)
pyOpenSSL (0.13)
pyparsing (2.0.1)
pyPdf (1.13)
pyserial (2.6)
pysmbc (1.0.14.1)
pysqlite (1.0.1)
pysrt (1.0.1)
python-apt (0.9.3.5ubuntu2)
python-dateutil (1.5)
python-debian (0.1.21-nmu2ubuntu2)
python-epson-printer (1.3)
python-escpos (1.0.1)
python-libtorrent (0.16.13)
pytz (2012c)
pyusb (1.0.0b1)
pyxdg (0.25)
pyzmq (14.0.1)
qrcode (5.1)
reportlab (3.0)
requests (2.2.1)
requests-oauthlib (0.6.1)
scipy (0.13.3)
sessioninstaller (0.0.0)
setuptools (3.3)
simplegeneric (0.8.1)
six (1.5.2)
slowaes (0.1a1)
sympy (0.7.4.1)
system-service (0.1.6)
tlslite (0.4.8)
tornado (3.1.1)
tweepy (3.5.0)
Twisted-Core (13.2.0)
Twisted-Names (13.2.0)
Twisted-Web (13.2.0)
urllib3 (1.7.1)
uTidylib (0.2)
vboxapi (1.0)
wsgiref (0.1.2)
wxPython (2.8.12.1)
wxPython-common (2.8.12.1)
zope.interface (4.0.5)
Tried running noteshrink on a JPEG saved from MacOS Preview, with the following result:
Larss-MacBook-Pro:shrink larsga$ noteshrink IMG_544*
opened IMG_5442.JPG
getting palette...
Traceback (most recent call last):
File "/usr/local/bin/noteshrink", line 9, in <module>
load_entry_point('noteshrink==0.1.1', 'console_scripts', 'noteshrink')()
File "/Library/Python/2.7/site-packages/noteshrink.py", line 584, in main
notescan_main(options=get_argument_parser().parse_args())
File "/Library/Python/2.7/site-packages/noteshrink.py", line 560, in notescan_main
palette = get_palette(samples, options)
File "/Library/Python/2.7/site-packages/noteshrink.py", line 383, in get_palette
bg_color = get_bg_color(samples, 6)
File "/Library/Python/2.7/site-packages/noteshrink.py", line 108, in get_bg_color
unique, counts = np.unique(packed, return_counts=True)
TypeError: unique() got an unexpected keyword argument 'return_counts'
Hello,
FDNCRED and i have rewritten your project in C# and VB.net
with the AccordNet Framework. Please add a link to your readme
under derived works.
https://github.com/Phreak87/NoteShrink
Remove this issue.
Hi,
depending on the input, the output may be shrunk even more when converted to vector graphics, as you also noticed in https://mzucker.github.io/2018/05/14/maptrace.html :)
Here is a quick and dirty example:
./noteshrink.py -e ".pnm" -P "convert %i %o" -c "potrace -b pdf -o %o %i" examples/tree.jpg
Maybe the example can be added to the README?
Unfortunately potrace
only supports binary images.
Ciao,
Antonio
I wanted just to warn you that I have seen your code in another repo. The noteshrink.py looks the same, except that we can find Created by Georgy Perevozchikov (gosha20777) 2018.
on line 7.
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
Traceback (most recent call last):
File "./noteshrink.py", line 585, in <module>
main()
File "./noteshrink.py", line 582, in main
notescan_main(options=get_argument_parser().parse_args())
File "./noteshrink.py", line 557, in notescan_main
samples = sample_pixels(img, options)
File "./noteshrink.py", line 340, in sample_pixels
pixels = img.reshape((-1, 3))
ValueError: total size of new array must be unchanged
i have installed all requirements except notescan
m which isn't pip installable.
got this weird error:
getting palette...
/usr/lib/python3.6/site-packages/noteshrink.py:132: RuntimeWarning: invalid value encountered in true_divide
saturation = delta.astype(np.float32) / cmax.astype(np.float32)
applying palette...
saving page0000.png...
Traceback (most recent call last):
File "/usr/bin/noteshrink", line 11, in <module>
load_entry_point('noteshrink==0.1.1', 'console_scripts', 'noteshrink')()
File "/usr/lib/python3.6/site-packages/noteshrink.py", line 584, in main
notescan_main(options=get_argument_parser().parse_args())
File "/usr/lib/python3.6/site-packages/noteshrink.py", line 564, in notescan_main
save(output_filename, labels, palette, dpi, options)
File "/usr/lib/python3.6/site-packages/noteshrink.py", line 455, in save
output_img.putpalette(palette.flatten())
File "/usr/lib/python3.6/site-packages/PIL/Image.py", line 1483, in putpalette
data = bytes(data)
TypeError: only integer arrays with one element can be converted to an index
Maybe this is a python3.6 problem
Regards,
bitwave
Hi I like the idea very much, and I see some people wanting the C++ version. Thus I would like to port it to C++. If I do so will you accept the PR?
When you run noteshrink on windows (am using windows 10), noteshrink fails on the subprocress.call to convert:
running PDF command "convert page0000.png output.pdf"...
Invalid Parameter - output.pdf
The reason appears to be given in this stackoverflow question:
https://stackoverflow.com/questions/41860668/why-does-this-python-subprocess-command-only-work-when-shell-true-on-windows
Windows has another old program called convert sitting on my PC in C:\Windows\System32\convert.exe
Possible solutions involve using:
shell=True
OR
changing the command to magick. I note from the current release notes that:
magick
The "magick" command is the new primary command of the Shell API, replacing the old "convert" command. This allows you to create a 'magick script' of the form "#!/path/to/command/magick -script", or pipe options into a command "magick -script -, as abackground process.
I recently found noteshrink
and I'm fond of its utility! Thanks for giving it to the world!
In order to confidently make noteshrink
compatible with Python 3 (per issue #4), introduction of unit tests would be beneficial. This will provide a baseline to compare behaviors as well as help with alteration of existing features/introduction of new features over time.
I would like to help with this, but I want to feel this out before I get the horse too far ahead of the cart! The code structure seems amenable to unit testing for the most part.
Got this error while running. I wanted to work on a small bill I got from a grocery shop.
./noteshrink.py bill.jpg opened bill.jpg Traceback (most recent call last): File "./noteshrink.py", line 587, in <module> main() File "./noteshrink.py", line 584, in main notescan_main(options=get_argument_parser().parse_args()) File "./noteshrink.py", line 559, in notescan_main samples = sample_pixels(img, options) File "./noteshrink.py", line 342, in sample_pixels pixels = img.reshape((-1, 3)) ValueError: cannot reshape array of size 1 into shape (3)
When get_fg_mask can't find any pixel in the sample that's foreground (e.g. white page, like image attached) will cause a exception running K-means clustering(randint's call inside it to be precise)
Im not sure which behavior would be desirable in this case, check and ignore it or writing the white page on output
Hi Matt, awesome work and a very enjoyable write-up, thank you.
I've looked over the code briefly, and it's very neatly structured and compartmentalised. None of the dependencies look like no-gos for Python3, and the file-handling seems to be compartmentalised by PIL/Pillow.
So, I'm wondering why this is Python2 only? If it's merely personal preference and all it would take are some modernisations from __future__
and some print function calls (or six
, if it came to that), would you accept pull requests to make this work on both?
I have some scanned brochures in PDF format. Is it possible to use noteshrink
app over a PDF file to create a clean beautiful pdf?
Ignore
warning: error opening C:\Users\Kartikey
warning: error opening Kushwah\Downloads\Compressed\noteshrink-master\image.jpg
running PDF command "convert output.pdf"...
Invalid drive specification.
warning: PDF command failed
pillow 10.4 error PIL\Image.py", line 1087, in convert im = self.im.convert(mode, dither) ValueError: conversion not supported
Can translate python to c++? As it require to deploy on mobile device
Hi, I was trying to run you app on old documents taken from church to enhance readability. Unfortunately can't find best settings to remove noisy background.
Two of them are here:
https://www.dropbox.com/sh/u3noc0dkd17r0a0/AABR0yDH07qG63oDMtENL8FBa?dl=0
I was trying with settings:
-p up to 50
-n up to 128
-v down to 5
-s down to 5
and result wasn't satisfactory
Could you treat it as a case study?
By default it tries to convert the output image to PDF, using a command-line program called "convert". This doesn't seem to be one of the listed dependencies - can someone explain what it is and how to install it?
BTW on macOS I can get it working with sips using this parameter:
-c 'sips %i -s format pdf --out %o'
@mzucker
I am afraid that there are parts of the world that cannot use your program simply because there is no LICENSE file or any other information about the legal status of the code.
I would recommend GPLv3 or MIT license. But there are more options depending on what aspect of the legal status of the code is your concern.
Cheers!
I downloaded the release 0.1 installed the requirements via pip and run make.
make
mkdir -p example_output && \
cd example_output && \
../noteshrink.py -O -g -w -s 20 -v 30 -o notesA.pdf ../examples/notesA*.jpg
File "../noteshrink.py", line 159
print ' running "{}"...'.format(cmd),
^
SyntaxError: invalid syntax
makefile:9: die Regel für Ziel „example_output/notesA.pdf“ scheiterte
make: *** [example_output/notesA.pdf] Fehler 1
Hello,
First of all, gotta say that this work is insanely good. I've been trying to apply for pictures taken by cameras instead but as this work is mainly aimed at scanned documents, I've been having fairly noisy results(images below). I've wondered if you have any tips to handle documents that have fairly varying light distribution(not as uniform as the light from a scanner) would it possible to get a clean white background on the image?
Thank you in advance
I need to automatize images from the python and send them to a node.js app.
I would prefer to have a little script so I can plug it to a node command and automatize the process from the server.
Any clue how to make a CLI tool + documentation for this soft ?
I have got this error:
File "E:\projects\noteshrink\noteshrink.py", line 590, in <module>
main()
File "E:\projects\noteshrink\noteshrink.py", line 586, in main
notescan_main(options=get_argument_parser().parse_args())
File "E:\projects\noteshrink\noteshrink.py", line 561, in notescan_main
palette = get_palette(samples, options)
File "E:\projects\noteshrink\noteshrink.py", line 387, in get_palette
centers, _ = kmeans(samples[fg_mask].astype(np.float32),
File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 454, in kmeans
guess = _kpoints(obs, k)
File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 480, in _kpoints
idx = np.random.choice(data.shape[0], size=k, replace=False)
File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken
I thought a bit since I read the article, and thought of ways to improve the results:
Not increasing the intensity of the background color. On a few example pictures the background becomes garish and over-saturated.
Using the CIE color space for clustering. It is much more accurate for calculating perceived color distance, and will result in superior clustering. CIE is implemented by the colormath library.
Using a walking median to calculate the background color. Could be done individually by color. Would result in a background color not stripped of the least significant bits.
Hi, I receive this error trying to process an image received in whatsapp
:/usr/local/src/noteshrink$ python noteshrink.py /mnt/Large/whats0001.png
opened /mnt/Large/whats0001.png
getting palette...
noteshrink.py:132: RuntimeWarning: invalid value encountered in divide
saturation = delta.astype(np.float32) / cmax.astype(np.float32)
applying palette...
saving page0000.png...
done
running PDF command "convert page0000.png output.pdf"...
wrote output.pdf
You can download the image here: https://www.dropbox.com/s/6a0jwcz53mr0m70/whats0001.jpeg?dl=0
I'm using python 2.7.12
These are the dependencies versions:
Installed /usr/local/lib/python2.7/dist-packages/noteshrink-0.1.1-py2.7.egg
Processing dependencies for noteshrink==0.1.1
Searching for Pillow==3.1.2
Best match: Pillow 3.1.2
Pillow 3.1.2 is already the active version in easy-install.pth
Using /usr/lib/python2.7/dist-packages
Searching for scipy==0.17.0
Best match: scipy 0.17.0
scipy 0.17.0 is already the active version in easy-install.pth
Using /usr/lib/python2.7/dist-packages
Searching for numpy==1.11.0
Best match: numpy 1.11.0
numpy 1.11.0 is already the active version in easy-install.pth
Using /usr/lib/python2.7/dist-packages
Finished processing dependencies for noteshrink==0.1.1
Regards
Ive got this error:
Traceback (most recent call last):
File "E:\projects\noteshrink\noteshrink.py", line 590, in
main()
File "E:\projects\noteshrink\noteshrink.py", line 586, in main
notescan_main(options=get_argument_parser().parse_args())
File "E:\projects\noteshrink\noteshrink.py", line 561, in notescan_main
palette = get_palette(samples, options)
File "E:\projects\noteshrink\noteshrink.py", line 387, in get_palette
centers, _ = kmeans(samples[fg_mask].astype(np.float32),
File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 454, in kmeans
guess = _kpoints(obs, k)
File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 480, in _kpoints
idx = np.random.choice(data.shape[0], size=k, replace=False)
File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken`
Hi, are you aware of gscan2pdf (http://gscan2pdf.sourceforge.net/) or paperwork (https://github.com/twostairs/paperwork). Could be wonderful to integrate your work to a full note-scan-pdf software. Regards
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.