Giter Club home page Giter Club logo

noteshrink's People

Contributors

fcladera avatar mzucker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

noteshrink's Issues

Unit testing

I recently found noteshrink and I'm fond of its utility! Thanks for giving it to the world!

In order to confidently make noteshrink compatible with Python 3 (per issue #4), introduction of unit tests would be beneficial. This will provide a baseline to compare behaviors as well as help with alteration of existing features/introduction of new features over time.

I would like to help with this, but I want to feel this out before I get the horse too far ahead of the cart! The code structure seems amenable to unit testing for the most part.

Doc photos taken by cameras

Hello,

First of all, gotta say that this work is insanely good. I've been trying to apply for pictures taken by cameras instead but as this work is mainly aimed at scanned documents, I've been having fairly noisy results(images below). I've wondered if you have any tips to handle documents that have fairly varying light distribution(not as uniform as the light from a scanner) would it possible to get a clean white background on the image?

Thank you in advance

original
Saved_file copy 3

with noteshrink
page0000

Multi-version Support?

Hi Matt, awesome work and a very enjoyable write-up, thank you.

I've looked over the code briefly, and it's very neatly structured and compartmentalised. None of the dependencies look like no-gos for Python3, and the file-handling seems to be compartmentalised by PIL/Pillow.

So, I'm wondering why this is Python2 only? If it's merely personal preference and all it would take are some modernisations from __future__ and some print function calls (or six, if it came to that), would you accept pull requests to make this work on both?

ValueError: a must be greater than 0 unless no samples are taken

Ive got this error: Traceback (most recent call last):
File "E:\projects\noteshrink\noteshrink.py", line 590, in
main()
File "E:\projects\noteshrink\noteshrink.py", line 586, in main
notescan_main(options=get_argument_parser().parse_args())
File "E:\projects\noteshrink\noteshrink.py", line 561, in notescan_main
palette = get_palette(samples, options)
File "E:\projects\noteshrink\noteshrink.py", line 387, in get_palette
centers, _ = kmeans(samples[fg_mask].astype(np.float32),
File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 454, in kmeans
guess = _kpoints(obs, k)
File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 480, in _kpoints
idx = np.random.choice(data.shape[0], size=k, replace=False)
File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken`

TypeError: unique() got an unexpected keyword argument 'return_counts'

Tried running noteshrink on a JPEG saved from MacOS Preview, with the following result:

Larss-MacBook-Pro:shrink larsga$ noteshrink IMG_544*
opened IMG_5442.JPG
  getting palette...
Traceback (most recent call last):
  File "/usr/local/bin/noteshrink", line 9, in <module>
    load_entry_point('noteshrink==0.1.1', 'console_scripts', 'noteshrink')()
  File "/Library/Python/2.7/site-packages/noteshrink.py", line 584, in main
    notescan_main(options=get_argument_parser().parse_args())
  File "/Library/Python/2.7/site-packages/noteshrink.py", line 560, in notescan_main
    palette = get_palette(samples, options)
  File "/Library/Python/2.7/site-packages/noteshrink.py", line 383, in get_palette
    bg_color = get_bg_color(samples, 6)
  File "/Library/Python/2.7/site-packages/noteshrink.py", line 108, in get_bg_color
    unique, counts = np.unique(packed, return_counts=True)
TypeError: unique() got an unexpected keyword argument 'return_counts'

Error with noteshrink : TypeError: unique() got an unexpected keyword argument 'return_counts'

Hello,

I wanted to try noteshrink.
After installation, I tried with jpg files in examples folder.
For all files, the same error : TypeError: unique() got an unexpected keyword argument 'return_counts'

I don't know how to do ?

Thanks.


steph@Pergolesi $ noteshrink notesA1.jpg

opened notesA1.jpg
getting palette...
Traceback (most recent call last):
File "/usr/local/bin/noteshrink", line 9, in
load_entry_point('noteshrink==0.1.0', 'console_scripts', 'noteshrink')()
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 582, in main
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 558, in notescan_main
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 381, in get_palette
File "build/bdist.linux-x86_64/egg/noteshrink.py", line 106, in get_bg_color
TypeError: unique() got an unexpected keyword argument 'return_counts'


steph@Pergolesi $ python -V
Python 2.7.6

steph@Pergolesi $ cat /proc/version
Linux version 3.19.0-32-generic (buildd@lgw01-43) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015

steph@Pergolesi $ pip list
apt-xapian-index (0.45)
apturl (0.4.1ubuntu4)
argparse (1.2.1)
BeautifulSoup (3.2.1)
beautifulsoup4 (4.4.1)
chardet (2.0.1)
colorama (0.2.5)
command-not-found (0.3)
configglue (1.1.2)
configobj (4.7.2)
configparser (3.3.0r2)
cssselect (0.9.1)
cssutils (0.9.10)
debtagshw (0.1)
decorator (3.4.0)
defer (1.0.6)
deluge (1.3.6)
dirspec (13.10)
dnspython (1.11.1)
duplicity (0.6.23)
ecdsa (0.13)
Electrum (2.0.2)
eventlet (0.13.0)
feedparser (5.1.3)
googleplaydownloader (1.7)
greenlet (0.4.2)
html5lib (0.999)
httplib2 (0.8)
ipython (1.2.1)
Jinja2 (2.7.2)
kaa-base (0.99.1)
kaa-metadata (0.7.8)
lockfile (0.8)
lxml (3.3.3)
Mako (0.9.1)
MarkupSafe (0.18)
matplotlib (1.3.1)
mechanize (0.2.5)
Mirage (0.9.5.1)
mysql (0.0.1)
mysql-connector-python (2.0.4)
MySQL-python (1.2.3)
ndg-httpsclient (0.3.2)
nemo-emblems (0.0.1)
netifaces (0.8)
nose (1.3.1)
noteshrink (0.1.0)
numpy (1.8.2)
oauthlib (0.6.1)
oneconf (0.3.7.14.04.1)
PAM (0.4.2)
pandas (0.13.1)
paramiko (1.10.1)
pbkdf2 (1.3)
pdfshuffler (0.6.0)
pexpect (3.1)
Pillow (2.6.1)
pip (1.5.4)
piston-mini-client (0.7.5)
protobuf (2.5.0)
pyasn1 (0.1.7)
pyasn1-modules (0.0.5)
pychm (0.8.4)
pycrypto (2.6.1)
pycups (1.9.66)
pycurl (7.19.3)
pydns (2.3.6)
pygobject (3.12.0)
pyinotify (0.9.4)
pyOpenSSL (0.13)
pyparsing (2.0.1)
pyPdf (1.13)
pyserial (2.6)
pysmbc (1.0.14.1)
pysqlite (1.0.1)
pysrt (1.0.1)
python-apt (0.9.3.5ubuntu2)
python-dateutil (1.5)
python-debian (0.1.21-nmu2ubuntu2)
python-epson-printer (1.3)
python-escpos (1.0.1)
python-libtorrent (0.16.13)
pytz (2012c)
pyusb (1.0.0b1)
pyxdg (0.25)
pyzmq (14.0.1)
qrcode (5.1)
reportlab (3.0)
requests (2.2.1)
requests-oauthlib (0.6.1)
scipy (0.13.3)
sessioninstaller (0.0.0)
setuptools (3.3)
simplegeneric (0.8.1)
six (1.5.2)
slowaes (0.1a1)
sympy (0.7.4.1)
system-service (0.1.6)
tlslite (0.4.8)
tornado (3.1.1)
tweepy (3.5.0)
Twisted-Core (13.2.0)
Twisted-Names (13.2.0)
Twisted-Web (13.2.0)
urllib3 (1.7.1)
uTidylib (0.2)
vboxapi (1.0)
wsgiref (0.1.2)
wxPython (2.8.12.1)
wxPython-common (2.8.12.1)
zope.interface (4.0.5)

Issues with Windows convert

When you run noteshrink on windows (am using windows 10), noteshrink fails on the subprocress.call to convert:

running PDF command "convert page0000.png output.pdf"...
Invalid Parameter - output.pdf

The reason appears to be given in this stackoverflow question:
https://stackoverflow.com/questions/41860668/why-does-this-python-subprocess-command-only-work-when-shell-true-on-windows

Windows has another old program called convert sitting on my PC in C:\Windows\System32\convert.exe

Possible solutions involve using:
shell=True

OR

changing the command to magick. I note from the current release notes that:

magick
The "magick" command is the new primary command of the Shell API, replacing the old "convert" command. This allows you to create a 'magick script' of the form "#!/path/to/command/magick -script", or pipe options into a command "magick -script -, as abackground process.

TypeError: only integer arrays with one element can be converted to an index

got this weird error:

  getting palette...
/usr/lib/python3.6/site-packages/noteshrink.py:132: RuntimeWarning: invalid value encountered in true_divide
  saturation = delta.astype(np.float32) / cmax.astype(np.float32)
  applying palette...
  saving page0000.png...
Traceback (most recent call last):
  File "/usr/bin/noteshrink", line 11, in <module>
    load_entry_point('noteshrink==0.1.1', 'console_scripts', 'noteshrink')()
  File "/usr/lib/python3.6/site-packages/noteshrink.py", line 584, in main
    notescan_main(options=get_argument_parser().parse_args())
  File "/usr/lib/python3.6/site-packages/noteshrink.py", line 564, in notescan_main
    save(output_filename, labels, palette, dpi, options)
  File "/usr/lib/python3.6/site-packages/noteshrink.py", line 455, in save
    output_img.putpalette(palette.flatten())
  File "/usr/lib/python3.6/site-packages/PIL/Image.py", line 1483, in putpalette
    data = bytes(data)
TypeError: only integer arrays with one element can be converted to an index

Maybe this is a python3.6 problem

Regards,
bitwave

Value error on reshaping image

Python 2.7.12 (default, Jul 1 2016, 15:12:24)

Traceback (most recent call last):
  File "./noteshrink.py", line 585, in <module>
    main()
  File "./noteshrink.py", line 582, in main
    notescan_main(options=get_argument_parser().parse_args())
  File "./noteshrink.py", line 557, in notescan_main
    samples = sample_pixels(img, options)
  File "./noteshrink.py", line 340, in sample_pixels
    pixels = img.reshape((-1, 3))
ValueError: total size of new array must be unchanged

test1

i have installed all requirements except notescanm which isn't pip installable.

ValueError: a must be greater than 0 unless no samples are taken

I have got this error:

  File "E:\projects\noteshrink\noteshrink.py", line 590, in <module>
    main()
  File "E:\projects\noteshrink\noteshrink.py", line 586, in main
    notescan_main(options=get_argument_parser().parse_args())
  File "E:\projects\noteshrink\noteshrink.py", line 561, in notescan_main
    palette = get_palette(samples, options)
  File "E:\projects\noteshrink\noteshrink.py", line 387, in get_palette
    centers, _ = kmeans(samples[fg_mask].astype(np.float32),
  File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 454, in kmeans
    guess = _kpoints(obs, k)
  File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 480, in _kpoints
    idx = np.random.choice(data.shape[0], size=k, replace=False)
  File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken

SyntaxError: invalid syntax

I downloaded the release 0.1 installed the requirements via pip and run make.

make
mkdir -p example_output && \
cd example_output && \
../noteshrink.py -O -g -w -s 20 -v 30 -o notesA.pdf ../examples/notesA*.jpg
  File "../noteshrink.py", line 159
    print '  running "{}"...'.format(cmd),
                            ^
SyntaxError: invalid syntax
makefile:9: die Regel für Ziel „example_output/notesA.pdf“ scheiterte
make: *** [example_output/notesA.pdf] Fehler 1

a CLI to automatize the image creation

I need to automatize images from the python and send them to a node.js app.

I would prefer to have a little script so I can plug it to a node command and automatize the process from the server.

Any clue how to make a CLI tool + documentation for this soft ?

Error: noteshrink.py:132: RuntimeWarning: invalid value encountered in divide

Hi, I receive this error trying to process an image received in whatsapp

:/usr/local/src/noteshrink$ python noteshrink.py /mnt/Large/whats0001.png
opened /mnt/Large/whats0001.png
  getting palette...
noteshrink.py:132: RuntimeWarning: invalid value encountered in divide
  saturation = delta.astype(np.float32) / cmax.astype(np.float32)
  applying palette...
  saving page0000.png...
  done

running PDF command "convert page0000.png output.pdf"...
  wrote output.pdf

You can download the image here: https://www.dropbox.com/s/6a0jwcz53mr0m70/whats0001.jpeg?dl=0

I'm using python 2.7.12
These are the dependencies versions:

Installed /usr/local/lib/python2.7/dist-packages/noteshrink-0.1.1-py2.7.egg
Processing dependencies for noteshrink==0.1.1
Searching for Pillow==3.1.2
Best match: Pillow 3.1.2
Pillow 3.1.2 is already the active version in easy-install.pth

Using /usr/lib/python2.7/dist-packages
Searching for scipy==0.17.0
Best match: scipy 0.17.0
scipy 0.17.0 is already the active version in easy-install.pth

Using /usr/lib/python2.7/dist-packages
Searching for numpy==1.11.0
Best match: numpy 1.11.0
numpy 1.11.0 is already the active version in easy-install.pth

Using /usr/lib/python2.7/dist-packages
Finished processing dependencies for noteshrink==0.1.1

Regards

ValueError: cannot reshape array of size 1 into shape (3)

Got this error while running. I wanted to work on a small bill I got from a grocery shop.

./noteshrink.py bill.jpg opened bill.jpg Traceback (most recent call last): File "./noteshrink.py", line 587, in <module> main() File "./noteshrink.py", line 584, in main notescan_main(options=get_argument_parser().parse_args()) File "./noteshrink.py", line 559, in notescan_main samples = sample_pixels(img, options) File "./noteshrink.py", line 342, in sample_pixels pixels = img.reshape((-1, 3)) ValueError: cannot reshape array of size 1 into shape (3)

What about 'noteshrinking' a pdf?

I have some scanned brochures in PDF format. Is it possible to use noteshrink app over a PDF file to create a clean beautiful pdf?

Convert the result to vector graphics

Hi,

depending on the input, the output may be shrunk even more when converted to vector graphics, as you also noticed in https://mzucker.github.io/2018/05/14/maptrace.html :)

Here is a quick and dirty example:
./noteshrink.py -e ".pnm" -P "convert %i %o" -c "potrace -b pdf -o %o %i" examples/tree.jpg

Maybe the example can be added to the README?

Unfortunately potrace only supports binary images.

Ciao,
Antonio

PDF converter requirement

By default it tries to convert the output image to PDF, using a command-line program called "convert". This doesn't seem to be one of the listed dependencies - can someone explain what it is and how to install it?

BTW on macOS I can get it working with sips using this parameter:
-c 'sips %i -s format pdf --out %o'

Feature requests to improve readability

I thought a bit since I read the article, and thought of ways to improve the results:

  • Not increasing the intensity of the background color. On a few example pictures the background becomes garish and over-saturated.

  • Using the CIE color space for clustering. It is much more accurate for calculating perceived color distance, and will result in superior clustering. CIE is implemented by the colormath library.

  • Using a walking median to calculate the background color. Could be done individually by color. Would result in a background color not stripped of the least significant bits.

K-means clustering exception with white pages

When get_fg_mask can't find any pixel in the sample that's foreground (e.g. white page, like image attached) will cause a exception running K-means clustering(randint's call inside it to be precise)

Im not sure which behavior would be desirable in this case, check and ignore it or writing the white page on output
breaker

Warning: Error opening image

warning: error opening C:\Users\Kartikey
warning: error opening Kushwah\Downloads\Compressed\noteshrink-master\image.jpg
running PDF command "convert output.pdf"...
Invalid drive specification.
warning: PDF command failed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.