mzucker / noteshrink Goto Github PK

View Code? Open in Web Editor NEW

4.8K 108.0 353.0 6.2 MB

Convert scans of handwritten notes to beautiful, compact PDFs

Home Page: https://mzucker.github.io/2016/09/20/noteshrink.html

License: MIT License

Python 95.11% Makefile 4.89%

noteshrink's Issues

Error with noteshrink : TypeError: unique() got an unexpected keyword argument 'return_counts'

Hello,

I wanted to try noteshrink.
After installation, I tried with jpg files in examples folder.
For all files, the same error : TypeError: unique() got an unexpected keyword argument 'return_counts'

I don't know how to do ?

Thanks.

steph@Pergolesi $ noteshrink notesA1.jpg

opened notesA1.jpg
getting palette...
Traceback (most recent call last):
File "/usr/local/bin/noteshrink", line 9, in
load_entry_point('noteshrink==0.1.0', 'console_scripts', 'noteshrink')()
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 582, in main
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 558, in notescan_main
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 381, in get_palette
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 106, in get_bg_color
TypeError: unique() got an unexpected keyword argument 'return_counts'

steph@Pergolesi $ python -V
Python 2.7.6

steph@Pergolesi $ cat /proc/version
Linux version 3.19.0-32-generic (buildd@lgw01-43) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015

steph@Pergolesi $ pip list
apt-xapian-index (0.45)
apturl (0.4.1ubuntu4)
argparse (1.2.1)
BeautifulSoup (3.2.1)
beautifulsoup4 (4.4.1)
chardet (2.0.1)
colorama (0.2.5)
command-not-found (0.3)
configglue (1.1.2)
configobj (4.7.2)
configparser (3.3.0r2)
cssselect (0.9.1)
cssutils (0.9.10)
debtagshw (0.1)
decorator (3.4.0)
defer (1.0.6)
deluge (1.3.6)
dirspec (13.10)
dnspython (1.11.1)
duplicity (0.6.23)
ecdsa (0.13)
Electrum (2.0.2)
eventlet (0.13.0)
feedparser (5.1.3)
googleplaydownloader (1.7)
greenlet (0.4.2)
html5lib (0.999)
httplib2 (0.8)
ipython (1.2.1)
Jinja2 (2.7.2)
kaa-base (0.99.1)
kaa-metadata (0.7.8)
lockfile (0.8)
lxml (3.3.3)
Mako (0.9.1)
MarkupSafe (0.18)
matplotlib (1.3.1)
mechanize (0.2.5)
Mirage (0.9.5.1)
mysql (0.0.1)
mysql-connector-python (2.0.4)
MySQL-python (1.2.3)
ndg-httpsclient (0.3.2)
nemo-emblems (0.0.1)
netifaces (0.8)
nose (1.3.1)
noteshrink (0.1.0)
numpy (1.8.2)
oauthlib (0.6.1)
oneconf (0.3.7.14.04.1)
PAM (0.4.2)
pandas (0.13.1)
paramiko (1.10.1)
pbkdf2 (1.3)
pdfshuffler (0.6.0)
pexpect (3.1)
Pillow (2.6.1)
pip (1.5.4)
piston-mini-client (0.7.5)
protobuf (2.5.0)
pyasn1 (0.1.7)
pyasn1-modules (0.0.5)
pychm (0.8.4)
pycrypto (2.6.1)
pycups (1.9.66)
pycurl (7.19.3)
pydns (2.3.6)
pygobject (3.12.0)
pyinotify (0.9.4)
pyOpenSSL (0.13)
pyparsing (2.0.1)
pyPdf (1.13)
pyserial (2.6)
pysmbc (1.0.14.1)
pysqlite (1.0.1)
pysrt (1.0.1)
python-apt (0.9.3.5ubuntu2)
python-dateutil (1.5)
python-debian (0.1.21-nmu2ubuntu2)
python-epson-printer (1.3)
python-escpos (1.0.1)
python-libtorrent (0.16.13)
pytz (2012c)
pyusb (1.0.0b1)
pyxdg (0.25)
pyzmq (14.0.1)
qrcode (5.1)
reportlab (3.0)
requests (2.2.1)
requests-oauthlib (0.6.1)
scipy (0.13.3)
sessioninstaller (0.0.0)
setuptools (3.3)
simplegeneric (0.8.1)
six (1.5.2)
slowaes (0.1a1)
sympy (0.7.4.1)
system-service (0.1.6)
tlslite (0.4.8)
tornado (3.1.1)
tweepy (3.5.0)
Twisted-Core (13.2.0)
Twisted-Names (13.2.0)
Twisted-Web (13.2.0)
urllib3 (1.7.1)
uTidylib (0.2)
vboxapi (1.0)
wsgiref (0.1.2)
wxPython (2.8.12.1)
wxPython-common (2.8.12.1)
zope.interface (4.0.5)

TypeError: unique() got an unexpected keyword argument 'return_counts'

Tried running noteshrink on a JPEG saved from MacOS Preview, with the following result:

Larss-MacBook-Pro:shrink larsga$ noteshrink IMG_544*
opened IMG_5442.JPG
  getting palette...
Traceback (most recent call last):
  File "/usr/local/bin/noteshrink", line 9, in <module>
    load_entry_point('noteshrink==0.1.1', 'console_scripts', 'noteshrink')()
  File "/Library/Python/2.7/site-packages/noteshrink.py", line 584, in main
    notescan_main(options=get_argument_parser().parse_args())
  File "/Library/Python/2.7/site-packages/noteshrink.py", line 560, in notescan_main
    palette = get_palette(samples, options)
  File "/Library/Python/2.7/site-packages/noteshrink.py", line 383, in get_palette
    bg_color = get_bg_color(samples, 6)
  File "/Library/Python/2.7/site-packages/noteshrink.py", line 108, in get_bg_color
    unique, counts = np.unique(packed, return_counts=True)
TypeError: unique() got an unexpected keyword argument 'return_counts'

Add C# and VB.Net project to derived works in your readme

Hello,

FDNCRED and i have rewritten your project in C# and VB.net
with the AccordNet Framework. Please add a link to your readme
under derived works.
https://github.com/Phreak87/NoteShrink

New Design

Remove this issue.

Convert the result to vector graphics

Hi,

depending on the input, the output may be shrunk even more when converted to vector graphics, as you also noticed in https://mzucker.github.io/2018/05/14/maptrace.html :)

Here is a quick and dirty example:
./noteshrink.py -e ".pnm" -P "convert %i %o" -c "potrace -b pdf -o %o %i" examples/tree.jpg

Maybe the example can be added to the README?

Unfortunately potrace only supports binary images.

Ciao,
Antonio

Plagiarism

I wanted just to warn you that I have seen your code in another repo. The noteshrink.py looks the same, except that we can find Created by Georgy Perevozchikov (gosha20777) 2018. on line 7.

Value error on reshaping image

Python 2.7.12 (default, Jul 1 2016, 15:12:24)

Traceback (most recent call last):
  File "./noteshrink.py", line 585, in <module>
    main()
  File "./noteshrink.py", line 582, in main
    notescan_main(options=get_argument_parser().parse_args())
  File "./noteshrink.py", line 557, in notescan_main
    samples = sample_pixels(img, options)
  File "./noteshrink.py", line 340, in sample_pixels
    pixels = img.reshape((-1, 3))
ValueError: total size of new array must be unchanged

i have installed all requirements except notescanm which isn't pip installable.

TypeError: only integer arrays with one element can be converted to an index

got this weird error:

  getting palette...
/usr/lib/python3.6/site-packages/noteshrink.py:132: RuntimeWarning: invalid value encountered in true_divide
  saturation = delta.astype(np.float32) / cmax.astype(np.float32)
  applying palette...
  saving page0000.png...
Traceback (most recent call last):
  File "/usr/bin/noteshrink", line 11, in <module>
    load_entry_point('noteshrink==0.1.1', 'console_scripts', 'noteshrink')()
  File "/usr/lib/python3.6/site-packages/noteshrink.py", line 584, in main
    notescan_main(options=get_argument_parser().parse_args())
  File "/usr/lib/python3.6/site-packages/noteshrink.py", line 564, in notescan_main
    save(output_filename, labels, palette, dpi, options)
  File "/usr/lib/python3.6/site-packages/noteshrink.py", line 455, in save
    output_img.putpalette(palette.flatten())
  File "/usr/lib/python3.6/site-packages/PIL/Image.py", line 1483, in putpalette
    data = bytes(data)
TypeError: only integer arrays with one element can be converted to an index

Maybe this is a python3.6 problem

Regards,
bitwave

Will you accept the PR if I port it to C++? :)

Hi I like the idea very much, and I see some people wanting the C++ version. Thus I would like to port it to C++. If I do so will you accept the PR?

Issues with Windows convert

When you run noteshrink on windows (am using windows 10), noteshrink fails on the subprocress.call to convert:

running PDF command "convert page0000.png output.pdf"...
Invalid Parameter - output.pdf

The reason appears to be given in this stackoverflow question:
https://stackoverflow.com/questions/41860668/why-does-this-python-subprocess-command-only-work-when-shell-true-on-windows

Windows has another old program called convert sitting on my PC in C:\Windows\System32\convert.exe

Possible solutions involve using:
shell=True

changing the command to magick. I note from the current release notes that:

magick
The "magick" command is the new primary command of the Shell API, replacing the old "convert" command. This allows you to create a 'magick script' of the form "#!/path/to/command/magick -script", or pipe options into a command "magick -script -, as abackground process.

Unit testing

I recently found noteshrink and I'm fond of its utility! Thanks for giving it to the world!

In order to confidently make noteshrink compatible with Python 3 (per issue #4), introduction of unit tests would be beneficial. This will provide a baseline to compare behaviors as well as help with alteration of existing features/introduction of new features over time.

I would like to help with this, but I want to feel this out before I get the horse too far ahead of the cart! The code structure seems amenable to unit testing for the most part.

ValueError: cannot reshape array of size 1 into shape (3)

Got this error while running. I wanted to work on a small bill I got from a grocery shop.

./noteshrink.py bill.jpg opened bill.jpg Traceback (most recent call last): File "./noteshrink.py", line 587, in <module> main() File "./noteshrink.py", line 584, in main notescan_main(options=get_argument_parser().parse_args()) File "./noteshrink.py", line 559, in notescan_main samples = sample_pixels(img, options) File "./noteshrink.py", line 342, in sample_pixels pixels = img.reshape((-1, 3)) ValueError: cannot reshape array of size 1 into shape (3)

K-means clustering exception with white pages

When get_fg_mask can't find any pixel in the sample that's foreground (e.g. white page, like image attached) will cause a exception running K-means clustering(randint's call inside it to be precise)

Im not sure which behavior would be desirable in this case, check and ignore it or writing the white page on output

Multi-version Support?

Hi Matt, awesome work and a very enjoyable write-up, thank you.

I've looked over the code briefly, and it's very neatly structured and compartmentalised. None of the dependencies look like no-gos for Python3, and the file-handling seems to be compartmentalised by PIL/Pillow.

So, I'm wondering why this is Python2 only? If it's merely personal preference and all it would take are some modernisations from __future__ and some print function calls (or six, if it came to that), would you accept pull requests to make this work on both?

What about 'noteshrinking' a pdf?

I have some scanned brochures in PDF format. Is it possible to use noteshrink app over a PDF file to create a clean beautiful pdf?

Ignore

Warning: Error opening image

warning: error opening C:\Users\Kartikey
warning: error opening Kushwah\Downloads\Compressed\noteshrink-master\image.jpg
running PDF command "convert output.pdf"...
Invalid drive specification.
warning: PDF command failed

pillow 10.4 error PIL\Image.py", line 1087, in convert im = self.im.convert(mode, dither) ValueError: conversion not supported

Can translate python to c++?

Can translate python to c++? As it require to deploy on mobile device

Old documents

Hi, I was trying to run you app on old documents taken from church to enhance readability. Unfortunately can't find best settings to remove noisy background.

Two of them are here:
https://www.dropbox.com/sh/u3noc0dkd17r0a0/AABR0yDH07qG63oDMtENL8FBa?dl=0

I was trying with settings:
-p up to 50
-n up to 128
-v down to 5
-s down to 5
and result wasn't satisfactory

Could you treat it as a case study?

PDF converter requirement

By default it tries to convert the output image to PDF, using a command-line program called "convert". This doesn't seem to be one of the listed dependencies - can someone explain what it is and how to install it?

BTW on macOS I can get it working with sips using this parameter:
-c 'sips %i -s format pdf --out %o'

Licensing information

@mzucker
I am afraid that there are parts of the world that cannot use your program simply because there is no LICENSE file or any other information about the legal status of the code.
I would recommend GPLv3 or MIT license. But there are more options depending on what aspect of the legal status of the code is your concern.

Cheers!

SyntaxError: invalid syntax

I downloaded the release 0.1 installed the requirements via pip and run make.

make
mkdir -p example_output && \
cd example_output && \
../noteshrink.py -O -g -w -s 20 -v 30 -o notesA.pdf ../examples/notesA*.jpg
  File "../noteshrink.py", line 159
    print '  running "{}"...'.format(cmd),
                            ^
SyntaxError: invalid syntax
makefile:9: die Regel für Ziel „example_output/notesA.pdf“ scheiterte
make: *** [example_output/notesA.pdf] Fehler 1

Doc photos taken by cameras

Hello,

First of all, gotta say that this work is insanely good. I've been trying to apply for pictures taken by cameras instead but as this work is mainly aimed at scanned documents, I've been having fairly noisy results(images below). I've wondered if you have any tips to handle documents that have fairly varying light distribution(not as uniform as the light from a scanner) would it possible to get a clean white background on the image?

Thank you in advance

original

with noteshrink

a CLI to automatize the image creation

I need to automatize images from the python and send them to a node.js app.

I would prefer to have a little script so I can plug it to a node command and automatize the process from the server.

Any clue how to make a CLI tool + documentation for this soft ?

ValueError: a must be greater than 0 unless no samples are taken

I have got this error:

  File "E:\projects\noteshrink\noteshrink.py", line 590, in <module>
    main()
  File "E:\projects\noteshrink\noteshrink.py", line 586, in main
    notescan_main(options=get_argument_parser().parse_args())
  File "E:\projects\noteshrink\noteshrink.py", line 561, in notescan_main
    palette = get_palette(samples, options)
  File "E:\projects\noteshrink\noteshrink.py", line 387, in get_palette
    centers, _ = kmeans(samples[fg_mask].astype(np.float32),
  File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 454, in kmeans
    guess = _kpoints(obs, k)
  File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 480, in _kpoints
    idx = np.random.choice(data.shape[0], size=k, replace=False)
  File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken

Feature requests to improve readability

I thought a bit since I read the article, and thought of ways to improve the results:

Not increasing the intensity of the background color. On a few example pictures the background becomes garish and over-saturated.
Using the CIE color space for clustering. It is much more accurate for calculating perceived color distance, and will result in superior clustering. CIE is implemented by the colormath library.
Using a walking median to calculate the background color. Could be done individually by color. Would result in a background color not stripped of the least significant bits.

Error: noteshrink.py:132: RuntimeWarning: invalid value encountered in divide

Hi, I receive this error trying to process an image received in whatsapp

:/usr/local/src/noteshrink$ python noteshrink.py /mnt/Large/whats0001.png
opened /mnt/Large/whats0001.png
  getting palette...
noteshrink.py:132: RuntimeWarning: invalid value encountered in divide
  saturation = delta.astype(np.float32) / cmax.astype(np.float32)
  applying palette...
  saving page0000.png...
  done

running PDF command "convert page0000.png output.pdf"...
  wrote output.pdf

You can download the image here: https://www.dropbox.com/s/6a0jwcz53mr0m70/whats0001.jpeg?dl=0

I'm using python 2.7.12
These are the dependencies versions:

Installed /usr/local/lib/python2.7/dist-packages/noteshrink-0.1.1-py2.7.egg
Processing dependencies for noteshrink==0.1.1
Searching for Pillow==3.1.2
Best match: Pillow 3.1.2
Pillow 3.1.2 is already the active version in easy-install.pth

Using /usr/lib/python2.7/dist-packages
Searching for scipy==0.17.0
Best match: scipy 0.17.0
scipy 0.17.0 is already the active version in easy-install.pth

Using /usr/lib/python2.7/dist-packages
Searching for numpy==1.11.0
Best match: numpy 1.11.0
numpy 1.11.0 is already the active version in easy-install.pth

Using /usr/lib/python2.7/dist-packages
Finished processing dependencies for noteshrink==0.1.1

Regards

ValueError: a must be greater than 0 unless no samples are taken

Ive got this error: Traceback (most recent call last):
File "E:\projects\noteshrink\noteshrink.py", line 590, in
main()
File "E:\projects\noteshrink\noteshrink.py", line 586, in main
notescan_main(options=get_argument_parser().parse_args())
File "E:\projects\noteshrink\noteshrink.py", line 561, in notescan_main
palette = get_palette(samples, options)
File "E:\projects\noteshrink\noteshrink.py", line 387, in get_palette
centers, _ = kmeans(samples[fg_mask].astype(np.float32),
File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 454, in kmeans
guess = _kpoints(obs, k)
File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 480, in _kpoints
idx = np.random.choice(data.shape[0], size=k, replace=False)
File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken`

Integrate to other full note - scan - pdf softwares

Hi, are you aware of gscan2pdf (http://gscan2pdf.sourceforge.net/) or paperwork (https://github.com/twostairs/paperwork). Could be wonderful to integrate your work to a full note-scan-pdf software. Regards

mzucker / noteshrink Goto Github PK

noteshrink's Issues

Recommend Projects

Recommend Topics

Recommend Org