rhsimplex / image-match Goto Github PK
View Code? Open in Web Editor NEW🎇 Quickly search over billions of images
🎇 Quickly search over billions of images
This is the example in quick start, a TypeError "TypeError: unorderable types: dict() > dict()" occured when I run it. Have you ever encountered this problem?
The code is:
`from elasticsearch import Elasticsearch
from image_match.elasticsearch_driver import SignatureES
es = Elasticsearch()
ses = SignatureES(es)
ses.add_image('https://pixabay.com/static/uploads/photo/2012/11/28/08/56/mona-lisa-67506_960_720.jpg')
ses.add_image('https://upload.wikimedia.org/wikipedia/commons/e/e0/Caravaggio_-_Cena_in_Emmaus.jpg')
ses.add_image('https://c2.staticflickr.com/8/7158/6814444991_08d82de57e_z.jpg')
list = ses.search_image('https://pixabay.com/static/uploads/photo/2012/11/28/08/56/mona-lisa-67506_960_720.jpg')
print(list)`
When it was excuted, "TypeError: unorderable types: dict() > dict()" occurred. The detailed problem is as follows.
`runfile('/Users/lvchangtao/local-image-match/storeSearching.py', wdir='/Users/lvchangtao/local-image-match')
/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/image_match/goldberg.py:402: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
lower_y_lim:upper_y_lim]) # no smoothing here as in the paper
Traceback (most recent call last):
File "", line 1, in
File "/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/lvchangtao/local-image-match/storeSearching.py", line 20, in
list = ses.search_image('https://pixabay.com/static/uploads/photo/2012/11/28/08/56/mona-lisa-67506_960_720.jpg')
File "/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/image_match/signature_database_base.py", line 270, in search_image
r = sorted(np.unique(result).tolist(), key=itemgetter('dist'))
File "/Users/lvchangtao/anaconda3/lib/python3.5/site-packages/numpy/lib/arraysetops.py", line 198, in unique
ar.sort()
TypeError: unorderable types: dict() > dict()`
When I use image-match in CentOS, it will takes up a lot of memory. Finally, will be killed by the system.
Documentation says:
We have a Docker image that takes care of setting up image_match and elasticsearch
Looking in the Dockerfile and setup.py, I see no creation of an ElasticSearch container.
Perhaps the documentation should mention it only creates a containerized environment where you can use the Python REPL for execute programs that use the image-match and elasticsearch libraries.
I would suggest, however, that adding an explanation on how you can create an ElasticSearch container:
docker run -P -d elasticsearch
And possibly link
ing it (or whatever is relevant in the latest Docker version) to the created image-match container can be a lot more useful.
At the very least, a better explanation on the expected usage pattern of this container should be added.
Please can you add a method to remove an image from the index, I would like to keep the index up-to-date with my list of images (which is constantly changing).
Is there a way to skip indexing an image, if it already has been indexed? I have some images in elasticsearch, and I need to index some others. However, some of those images were already indexed. Is there way to make those images, which have 100% match in DB, to not be indexed again?
Hi
This project is great, thank you for your open source.
In use, I feel a little less convenient.
When used in conjunction with the ES, add_image ext data can add a field, json format
such as:
add_image ( 'image url', 'json data')
Back out in the search results come together.
Thus, when adding pictures, you can write the image with some extra information, search can directly get.
Own little experience.
We are hoping to add this feature.
Thank you
While I'm trying to run this code:
from image_match.goldberg import ImageSignature gis = ImageSignature() a = gis.generate_signature('/home/francesco/Scrivania/2.jpg') b = gis.generate_signature('/home/francesco/Scrivania/1.jpg') gis.normalized_distance(a, b)
I get this error:
~$ python '/home/francesco/Scrivania/imagematch.py' Traceback (most recent call last): File "/home/francesco/Scrivania/imagematch.py", line 3, in <module> a = gis.generate_signature('/home/francesco/Scrivania/2.jpg') File "/usr/local/lib/python2.7/dist-packages/image_match/goldberg.py", line 164, in generate_signature n=self.n, window=image_limits) File "/usr/local/lib/python2.7/dist-packages/image_match/goldberg.py", line 341, in compute_grid_points x_coords = np.linspace(window[0][0], window[0][1], n + 2, dtype=int)[1:-1] TypeError: linspace() got an unexpected keyword argument 'dtype'
I attempt to import the following:
from image_match.goldberg import ImageSignature
And it raises an issue apparently related to cairo--actually originating in the cairocffi library:
File "/Users/[User]/anaconda/lib/python3.5/site-packages/cairocffi/__init__.py", line 46, in <module>
cairo = dlopen(ffi, 'cairo', 'cairo-2')
raise OSError("dlopen() failed to load a library: %s" % ' / '.join(names))
OSError: dlopen() failed to load a library: cairo / cairo-2
I installed cairo using the MacPorts option and installation seemed successful.
Any ideas?
Something is wrong with scikit-image and travis
Adding elasticsearch 5.3.0 to easy-install.pth file
Installed /home/travis/miniconda/envs/test-environment/lib/python3.5/site-packages/elasticsearch-5.3.0-py3.5.egg
Searching for scikit-image<0.13,>=0.12
Reading https://pypi.python.org/simple/scikit-image/
Downloading https://pypi.python.org/packages/86/d0/b0192dc9a544da90f2d9150bcd84b981c6873e42a1f752b6affb89180ad8/scikit-image-0.12.3.tar.gz#md5=04ea833383e0b6ad5f65da21292c25e1
Best match: scikit-image 0.12.3
Processing scikit-image-0.12.3.tar.gz
Writing /tmp/easy_install-ids2h2tl/scikit-image-0.12.3/setup.cfg
Running scikit-image-0.12.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-ids2h2tl/scikit-image-0.12.3/egg-dist-tmp-mqepoz_1
error: SandboxViolation: mkdir('/home/travis/.config', 511) {}
The package setup script has attempted to modify files on your system
that are not within the EasyInstall build area, and has been aborted.
This package cannot be safely installed by EasyInstall, and may not
support alternate installation locations even if you run its setup
script by hand. Please inform the package's author and the EasyInstall
maintainers to find out if a fix or workaround is available.
The command "python setup.py install" failed and exited with 1 during .
Your build has been stopped.
I was hoping to get some clarification on the intended use-case for this. Should it strictly be used for duplicate detection or can it also be used to identify similar images. This page seems to suggest that it can be used to measure image similarity. However when I try it on the attached images, it does not seem to agree with the intuition that two images of shoes should be significantly more similar than an image of a shoe and something else.
The distance between the first and second image seems to be 0.71422605625006175 but the distance between the first and third is 0.70043762770711271.
To improve development speed and PRs review we can add some integration tests.
We can test image-match
as a "blackbox", having a directory with images on one side, and expected fingerprints on the other. We can always work on unit tests later.
Added some new features, @sbellem will need a little help from you =)
OS: Ubuntu 16.04
Python 3.5.2
I'm just trying to get the basic example working from the Documentation.
I installed using sudo pip3 install .
while in the image-match directory. Here's the code I'm running in test.py
:
from elasticsearch import Elasticsearch
from image_match.elasticsearch_driver import SignatureES
es = Elasticsearch()
ses = SignatureES(es)
ses.add_image('http://i.imgur.com/KUuRtTc.jpg')
The exception / error I'm getting is a chain of exceptions with the most recent one being
elasticsearch.exceptions.ConnectionError:
ConnectionError(<urllib3.connection.HTTPConnection object at 0x7fb8425c27b8>:
Failed to establish a new connection:
[Errno 111] Coonnection.HTTPConnection object at 0x7fb8425c27b8>:
Failed to establish a new connection: [Errno 111] Connection refused)
Causes an error (see #64 )
It seems like ElasticSearch has a 10k rows limit.
Forgive the basic question but does image-match handle this?
Can't seem to figure out how to get our docker set up to run for a larger database.
Currently the entire signature is stored as an array, and the words are stored as 32-bit(?) integers. Investigate compression schemes.
Join the future of Python
for import
`from pymongo.mongo_client import MongoClient
from os import listdir
from os.path import isfile, join
from image_match.goldberg import ImageSignature
from image_match.signature_database_base import make_record
from image_match.mongodb_driver import SignatureMongo`
there is this error:
Traceback (most recent call last): File "/home/mehdi/ws/temp/image_match/src/sample3.py", line 7, in <module> from image_match.mongodb_driver import SignatureMongo File "/home/mehdi/venvs/image_match/lib/python3.5/site-packages/image_match/mongodb_driver.py", line 1, in <module> from signature_database_base import SignatureDatabaseBase ImportError: No module named 'signature_database_base'
in file image_match/mongodb_driver.py
changing
from signature_database_base import SignatureDatabaseBase from signature_database_base import normalized_distance from multiprocessing import cpu_count, Process, Queue from multiprocessing.managers import Queue as managerQueue import numpy as np
to
from .signature_database_base import SignatureDatabaseBase from .signature_database_base import normalized_distance from multiprocessing import cpu_count, Process, Queue from multiprocessing.managers import Queue as managerQueue import numpy as np
other error occurs :
Traceback (most recent call last): File "/home/mehdi/ws/temp/image_match/src/sample3.py", line 7, in <module> from image_match.mongodb_driver import SignatureMongo File "/home/mehdi/venvs/image_match/lib/python3.5/site-packages/image_match/mongodb_driver.py", line 4, in <module> from multiprocessing.managers import Queue as managerQueue ImportError: cannot import name 'Queue'
I had this issue, but have now found a solution.
OS: Ubuntu 16.04
Python 3.5.2
While installing the the needed modules, I ran into a version mismatch error
after entering
sudo pip3 install cairocffi
I got
AssertionError: version mismatch, 1.5.2 != 1.8.3
Full error message from pip install
Solution
Downgrade cffi to previous version and attempt to install cairocffi again.
sudo pip3 uninstall cffi
sudo pip3 install cffi==1.5.2
sudo pip3 install cairocffi
I'm asking this question since I think it might benefit other people.
I made a driver that stores the hashes in a PostgreSQL database. My test database contains 25k images. The query process looks roughly like this:
Choose an image to query database for
Compute the image's signature and lookup words (let's say the words = [12895189, 2517912795, 72159172, 1275215791, ...]
)
Get IDs and signatures of relevant images, using the lookup words in a query similar to this:
SELECT DISTINCT(image_id), image_signature
FROM image
INNER JOIN image_signature_lookup ON image.image_id = image_signature_lookup.image_id
WHERE image_signature_lookup.word IN (12895189, 2517912795, 72159172, 1275215791, ...)
(for a test image this returns about 7.5k images)
image.image_id
, image_signature_lookup.image_id
and image_signature_lookup.word
are all indexed.
Compute the final distances with ProcessPoolExecutor
and numpy
, assemble the actual search results
Test query made this way takes 1.2 s. What I'm worried about is scaling this solution - for every 10k images, the database grows by about 75 MB in size, and over 600 000 lookup records are created.
My questions are:
N
and k
values, besides the default ones, could bring down the database size while still being useful for detecting near duplicates (+- JPEG artifacts etc.)?hi ...
when i wanna call ses.add_image i got this error :
my error :
cannot resize an array that references or is referenced
by another array in this way. Use the resize function
It's pretty big dependency; I think many people would be happy with PIL / Pillow alone.
Is it possible to use this to find sub images in larger images?
So for instance if I have a picture of 20 books arranged neatly, and want to find 1 of those books, can this be used, and can we return a bounded box for that image?
I modified the code, adds a field.
You can add pictures at the same time adding a JSON text description.
At the same time of reading can be obtained.
But I'm not familiar with git, I do not know how to submit to you.
So, playing a compressed package. You see if you can merge into it.
I hope that will not increase your work.
download url:
http://7jpsbs.com1.z0.glb.clouddn.com/image-match.tar.gz
avg_grey[i, j] = np.mean(image[lower_x_lim:upper_x_lim, lower_y_lim:upper_y_lim]) # no smoothing here as in the paper
lower_x_lim, upper_x_lim, lower_y_lim, and upper_y_lim are causing the slice to through an exception in python 3.5.
Does anyone else have this issue?
Hello,
I have a question about record generation. I don't understand why can set a word key like simple_word_0
and use it for search .
From code :
def insert_single_record(self, rec):
"""Insert an image record.
Must be implemented by derived class.
Args:
rec (dict): an image record. Will be in the format returned by
make_record
For example, rec could have the form:
{'path': 'https://pixabay.com/static/uploads/photo/2012/11/28/08/56/mona-lisa-67506_960_720.jpg',
'signature': [0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 0 ... ]
'simple_word_0': 42252475,
'simple_word_1': 23885671,
'simple_word_10': 9967839,
'simple_word_11': 4257902,
'simple_word_12': 28651959,
'simple_word_13': 33773597,
'simple_word_14': 39331441,
'simple_word_15': 39327300,
'simple_word_16': 11337345,
'simple_word_17': 9571961,
'simple_word_18': 28697868,
'simple_word_19': 14834907,
'simple_word_2': 7434746,
'simple_word_20': 37985525,
'simple_word_21': 10753207,
'simple_word_22': 9566120,
...
'metadata': {...}
}
The number of simple words corresponds to the attribute N
"""
As I see the detail implement in elasticsearch and mongodb . I found image-match's searching step is :
{ 'simple_word_0': 42252475, 'simple_word_1': 23885671, 'simple_word_10': 9967839}
word matched
record listword matched
signature , get the high score ones.Here, what I can't understand is step 2 and 3. The word key is assign by order, how could it be match to another pic with some differences (translation or rotation). But actually image-match
is doing so , the only reason I can imagine is phash could persist feature position, is that ture ?
If that is true, there come out another question: if phash could persist feature position , why not use some solution like simhash online query (http://www.wwwconference.org/www2007/papers/paper215.pdf) ?
Split signature into several tables/blocks, comparison times can be reduced.
Hi @rhsimplex, I come again with another question. When I searched images from Elasticsearch backend using part of the image(1/2, 1/3 of the image), it is hardly to find a image matched. I want to solve this problem to some extent. Could you give me some advice?
Hello, I have a weird problem. I have two images, A and B (B is a bigger and with more resolution version of A). If I use ImageSignature to calculate the normalized distance between the two I get 0.314299892917, which is pretty good, showing that they are a match.
Now, here is the problem, if I add image A to elasticsearch using ses.add_image('A.jpg') and then, use ses.search_image('B.jpg'), I get no results. I tried modifying the distance_cutoff to 0.99, and got a bunch of results, BUT these results did not include the A.jpg and all the results in this scenario had a distance of at least 0.60... I KNOW image A is ther because if I ses.search_image('A.jpg') I get a perfect match.
hi, i have question. I try to search image but i got nothing in result. This is my code.
es = Elasticsearch("192.168.20.35:9200") ses = SignatureES(es) list = ses.search_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg') print(list)
And i got []
as the result.
Per this suggestion:
You might want to look at Morelikethis queries to boost performance. I worked on a proprietary version of this and at the time Lucene performance dropped off nearly linearly with the number of query terms.
We used MoreLikeThis to reduce our queries count to the 30-40 most statistically interesting terms. The one hiccup being an issue in Lucene [1] where the term cache wasn't operating properly. We just added our own image query term cache and a custom MLT query to leverage it, which gave us a 10x speed bump over any other methods we tried.
The interestingness of the terms is assessed on a per-term basis though, so you might see a relevence drop for some types of image if you set MoreLikeThis to use too few terms.
[1] https://issues.apache.org/jira/browse/LUCENE-1690
Look into MoreLikeThis instead of BoolQuery
I am not an Elasticsearch expert but I think that when indexing a document the 'path' field which contains the image url is being stored as a text field which means that Elasticsearch is internally tokenizing the url. As a consequence I'm not able to query an image url via a term query (it will return no results) to get an exact match. I am able to use a match query however this will also return other documents that have similar url's which is not optimal. Is there any way the 'path' field can be stored as a keyword field so that the urls do not get tokenized and term queries will work?
Right now exceptions regarding corrupt images need to be caught using except xml.etree.ElementTree.ParseError
which looks rather... abstract.
Image match should throw its own type of exception in such cases.
Related: #59
Hi, I have two identical image, but the algorithm outs two different signatures.
First image:
Second image:
Code below:
from image_match.goldberg import ImageSignature
gis = ImageSignature()
test01 = gis.generate_signature('test01.jpg')
test02 = gis.generate_signature('test02.jpg')
gis.normalized_distance(test01, test02)
which outputs
0.70823708184882128
Is that right?
First of al thanks for this project!
includes a database backend that easily scales to billions of images and supports sustained high rates of image insertion: up to 10,000 images/s on our cluster!
I was wondering what that cluster would look like. How many and what types of nodes for image-match and for Elasticsearch, as well as CPU and memory for each. I'm thinking of using Google's container engine (Kubernetes) for deployment and need to estimate cost.
I cannot pull docker image at ascribe/image-match. [not found]
Need to write documentation for new feature #63
does it have any buildpack for heroku?
I am using a VMware Centos7 system,and install all software follow the guide doc!
and then i test the following code:
import elasticsearch
from image_match.elasticsearch_driver import SignatureES
es = elasticsearch.Elasticsearch()
es.indices.create('tester2')
ses = SignatureES(es, index='tester2')
ses.add_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')
list = ses.search_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')
print(list)
then i got this error!
POST /tester2/image?refresh=false [status:406 request:0.003s]
Traceback (most recent call last):
File "/home/diters/PycharmProjects/imageMatch/imageMatchServer.py", line 6, in <module>
ses.add_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')
File "/usr/local/python3/lib/python3.6/site-packages/image_match/signature_database_base.py", line 203, in add_image
self.insert_single_record(rec, refresh_after=refresh_after)
File "/usr/local/python3/lib/python3.6/site-packages/image_match/elasticsearch_driver.py", line 88, in insert_single_record
self.es.index(index=self.index, doc_type=self.doc_type, body=rec, refresh=refresh_after)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 279, in index
_make_path(index, doc_type, id), params=params, body=body)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/transport.py", line 329, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 109, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/python3/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 108, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: <exception str() failed>
Hi,
I tried to test image-match but I'm getting this error while trying to search inside db for an image :
`
elasticsearch.exceptions.RequestError: TransportError(400, u'parsing_exception', u'no [query] registered for [filtered]')``
I'm using elasticsearch 5, is it supported ?
Hi
I intend to try out this project in my application however I have a doubt regarding the add image call. Does it actually store the image in elastic search server? I hope not. What I want to do is store the images elsewhere,possibly AWS S3 and use the URLs to add images though your app to elastic search and then search them whenever I want. If you store the images as well it is a double storage for my app. So just want to know.
after trying: sudo pip install image_match
it is stuck forever and shows only this:
Processing /home/pi/image-match
Collecting scikit-image<0.13,>=0.12 (from image-match==1.1.2)
Using cached scikit-image-0.12.3.tar.gz
same with trying to build with source .
Hi there,
I have a set of 4000 images which I want to create into a cluster. My images are a large set of images taken from various fixed cameras (might move a small, small bit due to wind), some at day some at night, and they might have people, dogs, cats, etc. I am trying to create clusters based on the camera (i.e. clusters of images all taken by the same camera).
I'm planning on using HDBSCAN for this:
http://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html
I've got image-match running and have done the following modifications to the library to attempt and get a complete distance matrix:
I have tried settings distance_cutoff
of SignatureDatabaseBase() to 1.0, and size
of SignatureES() to 4000, but I seem to be getting a sparse 4000x4000 matrix.
Is there any easy way to get the full distance matrix?
Also, any hints on when increasing k, N and n_grid is correct for more precise results?
I also noticed some images contain specific textual labels embedded in the image in the same places (like date/time and camera name). Since these labels aren't big, I'm pretty sure they're mostly ignored here - am I right?
Thanks for the awesome package. Is there a rationale for choosing to implement the digital signature from "An image signature for any kind of image, Wong et al" versus pHash?
Hi
I intend to try out this project in my application however I have a doubt regarding the add image call. Does it actually storing the image in elastic search server? I hope not. What I want to do is store the images elsewhere,possibly AWS S3 and use the URLs to add images though your app to elastic search and then search them whenever I want. If you store the images as well it is a double storage for my app. So just want to know.
Is it possible to ignore a feature , ex: 'color - black' in the comparison query?
I'm trying to compare two images , for example - a black hand and a white hand - they could be identical if the color can be ignored. Is it possible , any properties can be set in the queries?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.