Giter Club home page Giter Club logo

pyes's Introduction

pyes - Python ElasticSearch

Web:http://pypi.python.org/pypi/pyes/
Download:http://pypi.python.org/pypi/pyes/
Source:http://github.com/aparo/pyes/
Documentation:http://pyes.rtfd.org/
Keywords:search, elastisearch, distribute search

--

[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/aparo/pyes?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

pyes is a pythonic way to use ElasticSearch since 2010.

This version requires elasticsearch 1.x or above. It's a pre release for pyes 1.x. Give a look to migrations documentation to upgrade you code for ElasticSearch 1.x.

We are working to provide full support to ElasticSearch 1.x (check the develop branch: we are using the git-flow workflow) that'll have:

  • connection based on Official ElasticSearch client ??
  • full support to ElasticSearch 1.x (removed old support due incompatibility with old version of ES)
  • migration from multi_field to >field>.fields
  • refactory of old code to be more pythonic
  • performance improvements

Features

  • Python3 support (only HTTP, thrift lib is not available on python3)
  • Thrift/HTTP protocols
  • Bulk insert/delete
  • Index management
  • Every search query types
  • Facet Support
  • Aggregation Support
  • Geolocalization support
  • Highlighting
  • Percolator
  • River support

Changelog

  1. 0.99.0:

    Migrated many code to ElasticSearch 1.x

    Full coverage for actual queries

  1. 0.99:

    Added aggregation

    Fix for python3 compatibility

    Upgraded code to use ElasticSearch 1.x or above

  1. 0.90.1:

    Bug Fix releases for some python3 introduced regression

  1. 0.90.0:

    A lot of improvements.

    Python 3 support.

Migration to version 0.99

CustomScoreQuery has been removed. The FunctionScoreQuery with its functions cover the previous functionalities. For scripting use ScriptScoreFunction.

TODO

  • much more documentation
  • add coverage
  • add jython native client protocol

License

This software is licensed under the New BSD License. See the LICENSE file in the top distribution directory for the full license text.

pyes's People

Contributors

aguereca avatar akheron avatar alambert avatar alstrong avatar andreiz avatar aparo avatar brombomb avatar dalbani avatar fiedzia avatar g-clef avatar gsakkis avatar ieure avatar jukart avatar lins05 avatar maciejkula avatar mastermind2k avatar matterkkila avatar mchruszcz avatar merrellb avatar mikluko avatar mindflayer avatar paykroyd avatar rboulton avatar samekmichal avatar scoursen avatar smaddineni avatar stevencdavis avatar thmttch avatar vinodc avatar zebuline avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyes's Issues

Unit tests require a running ElasticSearch instance

This is really wrong; the tests should use mock (or a similar tool) to stub out network access and returned canned responses. There's nothing wrong with testing using a real ElasticSearch instance, but this shouldn't really be the default behavior, as it makes it very difficult to write and run tests.

Operator is ignored in TextQuery

Using the operator keyword argument when constructing a TextQuery doesn't work and or operator is always used. This seems to be caused by a commented out section in add_query method.

No release tags in Git

If there is some way to get tags for the different releases of pyes, that would be really great. The only tag I see is 0.12.1. I'm trying to find the source for 0.16.0, and it's much harder than it should be because there are no tags.

Inconsistent spelling of indexes and indices

Elasticsearch consistently spells the plural of index as "indices". However, there are many places in pyes where the spelling "indexes" is used (normally as a parameter name). I'd like to change that to "indices", so that users don't have to guess which spelling to use. Unfortunately, that will break any existing code which uses named parameters to call these methods.

We could implement a fallback, so that either spelling is permissible, but this seems like far more trouble than it's worth. I'd prefer to just incompatibly change the code now, before too many clients start using it.

Any better ideas?

(If I were writing elasticsearch myself, I'd go for the spelling indexes, but it's far to late for that kind of discussion; what's important now is that pyes is consistent in its usage.)

QueryFilter

How can i build Filter with QueryFilter (passed something like: "apple OR oranges")?
Thanks for any info ...

The documentation is inadequate.

I am unable to use pyes. pyes.connect() doesn't exist, and I have to use pyes.connection_http.connect(). The resulting object has no methods, and the client object is never used again in the documentation, it's created and then ignored. Every operation is performed on the conn object, which, as far as I know, cannot be created for an http connection (or, at least, there's no mention of that way in the documentation).

It's unfortunate that I can't even find the few lines it takes to try pyes out, but hopefully this is easily fixed.

Thanks!

how to acces a variable in a multi nested mapping

Hi,

I would like to acces a variable in a multi nested mapping

Example :

DECLARATION

mapping = {
u"profile_list" : {
'type': 'nested',
u"affectation_list" : {
"type" : "nested",
"properties" : {
u"employee" : {"type" : "string", 'index':'not_analyzed'},
u"cancel_status" : {"type" : "string", 'index':'not_analyzed'},
}
},
u"planning_list" : {
"type" : "nested",
"properties" : {
"planning_date" : {"type" : "date", 'index':'not_analyzed'}
}
},
u"text1" : {'type': 'string', 'index':'not_analyzed'},
u"text2" : {'type': 'string', 'index':'not_analyzed'}
}
}

conn.put_mapping(document_type, {'properties':mapping}, [db_index])

conn.index( { u"profile_list" :
[
{u"affectation_list" :
[
{u"employee" : u"ef4zef4ze", u"cancel_status" : u"cancel_cpm"},
{u"employee" : u"ezf4zef", u"cancel_status" : u"cancel_emp"}
],
u"planning_list" : [],
u"text1": u"titi",
u"text2": u"toto"}
]
},
db_index, document_type)

conn.index( { u"profile_list" :
[
{u"affectation_list" :
[
{u"employee" : u"ef4zef4ze", u"cancel_status" : u"cancel_cpm"},
{u"employee" : u"ezf4zef", u"cancel_status" : u"--"}
],
u"planning_list" : [],
u"text1": u"tutu",
u"text2": u"tete"}
]
},
db_index, document_type)

conn.refresh([db_index])

QUERY

"""1st"""
q = {
"query": {
"nested" : {
"path" : "profile_list",
"score_mode" : "avg",
"query" : {

                    "nested" : {
                        "path" : "affectation_list",
                        "score_mode" : "avg",
                        "query" : {
                            "term" : {"profile_list.affectation_list.cancel_status" : u"--"}
                            }
                        }
                    }

                }
            }

    }

"""2nd"""
q = {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested" : {
"path" : "profile_list",
"score_mode" : "avg",
"query" : {
"term" : {"profile_list.affectation_list.cancel_status" : u"--"}
}
}
}
}
}
}

"""3rd"""
q = pyes.NestedQuery(u"profile_list", pyes.TermQuery(u"profile_list.affectation_list.cancel_status", u"--"))

EXECUTION

result = conn.search(query = q, indices=[db_index], doc_types=[document_type], size = 1000000)

ERROR MESSAGE

for the 1st and 2nd query : in query.py => res = {"query": self.query.serialize()} AttributeError: 'dict' object has no attribute 'serialize'

the 3rd result = [ ]

You have an idea ??

Thx

When connect via thrift, elasticsearch server emits StreamCorruptedException

Detailed exception stack trace

    java.io.StreamCorruptedException: invalid data length: -2147418111
    at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:42)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:282)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:51)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)

Significantly worse performance vs. urllib2

While working out some performance kinks in my ElasticSearch deployment, I discovered that performance of pyes is significantly worse (2x) vs. using urllib2 directly. I've attempted to eliminate as many variables as possible from these numbers.

Here's what I used to benchmark:

from urllib2 import urlopen, Request
from itertools import islice

import simplejson as json

import pyes
from ostrich import stats

HOST = 'localhost:9200'


@stats.time('pyes')
def search_pyes(conn, search):
    res = conn.search(search, indices='places', doc_types='geojson')
    list(islice(res, 25))


@stats.time('urllib')
def search_urllib(search):
    fp = urlopen(Request('http://%s/places/geojson/_search' % HOST,
                         json.dumps(search),
                         {'Content-Type': 'application/json'}))
    try:
        json.load(fp)
    except Exception, ex:
        stats.incr('errors')
    finally:
        fp.close()


if __name__ == '__main__':
    conn = pyes.ES([HOST])
    search = {'sort': [{'_score': 'desc'}], 'query': {'filtered': {'filter': {'and': [{'term': {'_deleted': False}}, {'geo_bounding_box': {'geometry.coordinates': {'bottom_right': {'lat': 47.609380000000002, 'lon': -122.34135000000001}, 'top_left': {'lat': 47.610700000000001, 'lon': -122.34406}}}}]}, 'query': {'match_all': {}}}}, 'size': 25, 'from': 0, 'fields': ['_source']}
    for x in xrange(15):
        search_pyes(conn, search)
        search_urllib(search)

    print json.dumps(stats.stats(reset=False), default=stats.json_encoder)

This uses python-ostrich to capture performance information.

Here's the timing information for pyes and urllib2:

{"pyes": {"count": 15,
          "p9999": 2619,
          "p999": 2619,
          "p99": 2619,
          "p90": 2619,
          "p75": 2619,
          "p50": 2619,
          "p25": 2619,
          "histogram": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15],
          "minimum": 2252,
          "maximum": 2436,
          "average": 2337,
          "standard_deviation": 46},
 "urllib": {"count": 15,
            "p9999": 1192,
            "p999": 1192,
            "p99": 1192,
            "p90": 1192,
            "p75": 1192,
            "p50": 1192,
            "p25": 917,
            "minimum": 862,
            "maximum": 1136,
            "average": 944,
            "standard_deviation": 68,
            "histogram": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8]}}

testDelete in errors.py fails to raise error

testDelete runs the following:

curl -XDELETE http://127.0.0.1:9200/test-index/flibble/asdf

But this does not return an error if asdf doesn't exist - so there is nothing to raise.

Using the create_river and delete_river throws an error when used with any of the subclasses.

When trying to use the create_river or delete_river functions with any of the River subclasses the following error is thrown.

Traceback (most recent call last):
File "rivers.py", line 18, in testCreateCouchDBRiver
result = self.conn.create_river(test_river, river_name='test_index')
File "/usr/lib/python2.7/site-packages/pyes-0.14.0-py2.7.egg/pyes/es.py", line 582, in create_river
river_name = river.name
AttributeError: 'CouchDBRiver' object has no attribute 'name'

I have created a fork with a possible fix.

How can i make top level filter?

How can i make top level filter, like this (so filter is not inside query).
I don't want to filter facets only hits.

{
"query": {
"match_all": {}
} ,
"filter": {
"term": { "company": "univerza v mariboru" }
},
"facets" : {
"company" : {
"terms" : { "field" : "company" }
}
}

thanks for any info ...

Implements nested queries

Hello,

ElasticSearch 0.17.0 added the nested objects feature with new nested queries.
Has anyone started to implement this in pyes ? It would be great !

Search object requires query

Hi,

The query.Search class tries to serialise query even when it wasn't provided resulting in an unhandled exception like this:

/Users/michal/dev/capital/branches/targeted_device/pyes/query.pyc in serialize(self)
    129 
    130         """
--> 131         res = {"query": self.query.serialize()}
    132         if self.filter:
    133             res['filter'] = self.filter.serialize()

query parameters are ignored in mlt queries

When making a mlt query through ES.morelikethis, query parameters are ignored. Here the size parameter is ignored:

>>>options = {"size":"1"}
>>> res=conn.morelikethis('index','article',id='4c59c437dbe1afb1d8004f01',fields=['title'],**options)
>>>len(res['hits']['hits']) # should be 1
10

Also, if you had options = {"size":"1", "fields":["title","summary"]}, you would get a TypeError from python because of the name clash on fields

Highlighting question

When adding highlighting to a Search, the whole field which has a match is highlighted, instead of the substring of the field which mached the search term:

firstQ = PrefixQuery(UserIndex.first_name, term)
lastQ = PrefixQuery(UserIndex.last_name, term)

q = BoolQuery(should=[firstQ, lastQ])

search = Search(q)
search.add_highlight(UserIndex.first_name)

results = conn.search(search, indices = UserIndex.index_name)

So for the search term John a field containing Johnson will be highlighted <em>Johnson</em> instead of <em>John</em>son

Is it the Prefixquery that is causing problems in conjunction with highlighting?

unable to create an index

I try creating an HTTP connection:
conn = pyes.connect(['http://127.0.0.1:9200/]')

And then to create an index:
conn.create_index("corpus")

Here is what I get:
File "populateSearch.py", line 11, in
conn.create_index("corpus")
File "/usr/lib/python2.6/site-packages/pyes-0.12.0-py2.6.egg/pyes/connection.py", line 158, in _client_call
conn = self._ensure_connection()
File "/usr/lib/python2.6/site-packages/pyes-0.12.0-py2.6.egg/pyes/connection.py", line 169, in _ensure_connection
conn = self.connect()
File "/usr/lib/python2.6/site-packages/pyes-0.12.0-py2.6.egg/pyes/connection.py", line 183, in connect
self._timeout, self._recycle)
File "/usr/lib/python2.6/site-packages/pyes-0.12.0-py2.6.egg/pyes/connection.py", line 33, in init
host, port = server.split(":")
ValueError: too many values to unpack

Cannot read the results of a bulk operation

Actions in a bulk operation can succeed or fail individually, but pyes gives no way to read the result of a bulk operation after it has been sent.

How about returning True from ES.index(..., bulk=True) call if the bulk was sent, and making the result of the last bulk available as ES.last_bulk_result?

Add delete_by_query() into class ES

Could you add the following method into ES class?

def delete_by_query(self, index, doc_type, query):
    """
    Delete a typed JSON documents from a specific index based on query
    """        
    path = self._make_path([index, doc_type, '_query'])
    return self._send_request('DELETE', path, query)

difficult to pass 'from' to ES.search()

Since from is a reserved word in python you can't just call

ES.search(query=..., size=10, from=100)

I had to create a dict of params and then call ES.search(**params) to be able to pass it.

dump_curl option forces output to be in /tmp

Currently, the dump_curl option forces the output to be placed in /tmp, and to have .sh appended to it.

I think it would be much cleaner to simply allow the user to specify the filename directly. For bonus points, if a filehandle (or any other object with a write() method) is passed in the system should just use that, allowing the user to send output to any desired destination.

On IRC, clintongormley suggested that being able to send the output to a file based on the process ID would be a good idea; I'm not sure if it's worth building this in, but it could be done by allowing a %p in the filename to be converted to the PID.

Allow bulk deletes

ElasticSearch supports bulk deletes with a sizable performance boost but the delete() method doesn't accept bulk=True

es.get_mapping() fails when mapping contains "_source" or "_boost" field.

When get mapping that I put with "_source" field in it, (http://www.elasticsearch.org/guide/reference/mapping/source-field.html) pyes throws exception like following.

Traceback (most recent call last):
  File "es_index.py", line 140, in 
    kpi.create_all_mappings(index_name=INDEX_NAME, del_map_before_put=False)
  File "es_index.py", line 89, in create_all_mappings
    mapping = es.get_mapping(doc_type, index_name)
  File "/usr/local/lib/python2.6/dist-packages/pyes-0.15.0-py2.6.egg/pyes/es.py", line 542, in get_mapping
    self.mappings = Mapper(result)
  File "/usr/local/lib/python2.6/dist-packages/pyes-0.15.0-py2.6.egg/pyes/mappings.py", line 313, in __init__
    self._process(data)
  File "/usr/local/lib/python2.6/dist-packages/pyes-0.15.0-py2.6.egg/pyes/mappings.py", line 322, in _process
    self.indexes[indexname][docname] = get_field(docname, docdata)
  File "/usr/local/lib/python2.6/dist-packages/pyes-0.15.0-py2.6.egg/pyes/mappings.py", line 307, in get_field
    return ObjectField(name=name, **data)
TypeError: __init__() got an unexpected keyword argument '_source'

When I read the mapping using browser, it gives me "_source" field well.

Problem running attachment tests

I might be running these incorrectly but I am getting a few failures and a few errors:

(pyes) ± nosetests **/*py --failed                                                                                              on master
..EE.......F.FFF..............F....E...................E
======================================================================
ERROR: test_TermQuery (pyes.tests.attachments.QueryAttachmentTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/attachments.py", line 73, in setUp
    self.conn.put_mapping("test-type", {"test-type":{'properties':mapping}}, ["test-index"])
  File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 529, in put_mapping
    return self._send_request('PUT', path, mapping)
  File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 220, in _send_request
    raise_if_error(response.status, decoded)
  File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/convert_errors.py", line 67, in raise_if_error
    raise excClass(msg, status, result)
MapperParsingException: No handler for type [attachment] declared on field [attachment]
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/_mapping HTTP/1.1" 500 99
--------------------- >> end captured logging << ---------------------

======================================================================
ERROR: test_filesave (pyes.tests.attachments.TestFileSaveTestCase)

Traceback (most recent call last):
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/attachments.py", line 22, in test_filesave
self.conn.put_mapping("test-type", {"test-type":{'properties':mapping}}, ["test-index"])
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 529, in put_mapping
return self._send_request('PUT', path, mapping)
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 220, in _send_request
raise_if_error(response.status, decoded)
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/convert_errors.py", line 67, in raise_if_error
raise excClass(msg, status, result)
MapperParsingException: No handler for type [attachment] declared on field [my_attachment]
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/_mapping HTTP/1.1" 500 102
--------------------- >> end captured logging << ---------------------

======================================================================
ERROR: test_CustomScoreQueryJS (pyes.tests.queries.QuerySearchTestCase)

Traceback (most recent call last):
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/queries.py", line 122, in test_CustomScoreQueryJS
result = self.conn.search(query=q, indexes=["test-pindex"])
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 735, in search
return self._query_call("_search", body, indexes, doc_types, query_params)
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 248, in _query_call
response = self._send_request('GET', path, body, querystring_args)
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 220, in _send_request
raise_if_error(response.status, decoded)
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/convert_errors.py", line 67, in raise_if_error
raise excClass(msg, status, result)
SearchPhaseExecutionException: Failed to execute phase [query], total failure; shardFailures {[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][0]: SearchParseException[[test-pindex][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score
(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }{[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][1]: SearchParseException[[test-pindex][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score
(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }{[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][2]: SearchParseException[[test-pindex][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score*(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }{[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][3]: SearchParseException[[test-pindex][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score*(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }{[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][4]: SearchParseException[[test-pindex][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score*(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32
pyes.urllib3.connectionpool: DEBUG: "GET /test-pindex/_search HTTP/1.1" 500 1940
--------------------- >> end captured logging << ---------------------

======================================================================
ERROR: Failure: ImportError (No module named mock)

Traceback (most recent call last):
File "/Users/dash/.virtualenvs/pyes/lib/python2.6/site-packages/nose/loader.py", line 390, in loadTestsFromName
addr.filename, addr.module)
File "/Users/dash/.virtualenvs/pyes/lib/python2.6/site-packages/nose/importer.py", line 39, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/Users/dash/.virtualenvs/pyes/lib/python2.6/site-packages/nose/importer.py", line 86, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/tests_extra.py", line 8, in
from tests import ESTestCase
File "/Users/dash/.virtualenvs/pyes/src/check/tests.py", line 6, in
from mock import patch, Mock
ImportError: No module named mock
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-pindex HTTP/1.1" 200 31
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: Test error reported by deleting a missing document.

Traceback (most recent call last):
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/errors.py", line 71, in testDelete
"asdf")
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/pyestest.py", line 37, in checkRaises
"Expected exception %s not raised" % excClass
AssertionError: Expected exception <class 'pyes.exceptions.NotFoundException'> not raised
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index/flibble/asdf HTTP/1.1" 200 64
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: test_GeoBoundingBoxFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)

Traceback (most recent call last):
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/geoloc.py", line 70, in test_GeoBoundingBoxFilter
self.assertEquals(result2['hits']['total'], 1)
AssertionError: 0 != 1
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-mindex HTTP/1.1" 400 33
pyes.urllib3.connectionpool: DEBUG: "PUT /test-mindex HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-mindex/test-type/_mapping HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-mindex/test-type/1 HTTP/1.1" 200 64
pyes.urllib3.connectionpool: DEBUG: "PUT /test-mindex/test-type/2 HTTP/1.1" 200 64
pyes.urllib3.connectionpool: DEBUG: "POST /test-mindex/_refresh HTTP/1.1" 200 60
pyes.urllib3.connectionpool: DEBUG: "GET /_cluster/health HTTP/1.1" 200 228
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32
pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 262
pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 104
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: test_GeoDistanceFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)

Traceback (most recent call last):
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/geoloc.py", line 59, in test_GeoDistanceFilter
self.assertEquals(result['hits']['total'], 1)
AssertionError: 0 != 1
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32
pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 263
pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 104
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: test_GeoPolygonFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)

Traceback (most recent call last):
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/geoloc.py", line 90, in test_GeoPolygonFilter
self.assertEquals(result['hits']['total'], 1)
AssertionError: 0 != 1
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32
pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 262
pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 104
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: testMLT (pyes.tests.indexing.IndexingTestCase)

Traceback (most recent call last):
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/indexing.py", line 145, in testMLT
], u'total': 2, u'max_score': 0.19178301})
File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/pyestest.py", line 24, in assertResultContains
self.assertEquals(value, result[key])
AssertionError: [{u'_type': u'test-type', u'_index': u'test-index', u'_score': 0.19178301, u'_source': {u'name': u'Joe Tested'}, u'_version': 1, u'_id': u'3'}, {u'_type': u'test-type', u'_index': u'test-index', u'_score': 0.19178301, u'_source': {u'name': u'Joe Tester'}, u'_version': 1, u'_id': u'2'}] != [{u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'3', u'_source': {u'name': u'Joe Tested'}, u'_index': u'test-index'}, {u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'2', u'_source': {u'name': u'Joe Tester'}, u'_index': u'test-index'}]
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index2 HTTP/1.1" 400 33
pyes.urllib3.connectionpool: DEBUG: "DELETE /another-index HTTP/1.1" 400 35
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index2 HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 200 63
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 200 63
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/3 HTTP/1.1" 200 63
pyes.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 60
pyes.urllib3.connectionpool: DEBUG: "GET /_cluster/health HTTP/1.1" 200 228
pyes.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/1/_mlt?min_doc_freq=1&fields=name&min_term_freq=1 HTTP/1.1" 200 330
--------------------- >> end captured logging << ---------------------


Ran 56 tests in 41.128s

FAILED (errors=4, failures=5)

List of doubles doesn't work

When a query returns a doc that has a list of doubles, pyes fails. I suspect it happens with any list of non-strings.

The fix is the string_to_datetime(self, obj) method in ESJsonDecoder. Just added a check to make sure we're only trying to get the length of strings.

def string_to_datetime(self, obj):
    """Transforma a datetime string to a datetime object
    """
    if isinstance(obj, basestring) and len(obj)==19:
        try:
            return datetime(*value.strptime("%Y-%m-%dT%H:%M:%S")[:6])
        except:
            pass
    return obj

Incorrect error handling with HTTP

It looks like the convert_errors() code has only been tested with Thrift and fails badly with HTTP. As an example, create_index_if_missing() is completely broken because convert_errors raises:

pyes.exceptions.ElasticSearchException: RemoteTransportException[[Brand, Abigail][inet[/127.0.0.1:9300]][indices/createIndex]]; nested: IndexAlreadyExistsException[[places] Already exists];

When the expected exception is IndexAlreadyExistsException.

DotDict instances can't be deep-copied by copy.deepcopy()

Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) 
Type "copyright", "credits" or "license" for more information.

IPython 0.11 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from pyes.es import DotDict

In [2]: a = DotDict()

In [3]: b = dict()

In [4]: from copy import deepcopy

In [5]: deepcopy(b)
Out[5]: {}

In [6]: deepcopy(a)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/alambert/biff/grok/ in ()
----> 1 deepcopy(a)

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in deepcopy(x, memo, _nil)
    180                     reductor = getattr(x, "__reduce_ex__", None)
    181                     if reductor:
--> 182                         rv = reductor(2)
    183                     else:
    184                         reductor = getattr(x, "__reduce__", None)

TypeError: 'NoneType' object is not callable

Looking at copy.py from Python 2.7.1 (http://svn.python.org/view/python/tags/r271/Lib/copy.py?view=markup) gives a bit more detail on what's going wrong:

78      copier = getattr(cls, "__copy__", None)
79      if copier:
80          return copier(x)
81  
82      reductor = dispatch_table.get(cls)
83      if reductor:
84          rv = reductor(x)
85      else:
86          reductor = getattr(x, "__reduce_ex__", None)
87          if reductor:
88              rv = reductor(2)
89          else:
90              reductor = getattr(x, "__reduce__", None)
91              if reductor:
92                  rv = reductor()
93              else:
94                  raise Error("un(shallow)copyable object of type %s" % cls)
95  
In [2]: from pyes.es import DotDict

In [3]: a = DotDict()

In [4]: reductor = getattr(a, "__reduce_ex__")

In [5]: reductor(2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/alambert/biff/grok/ in ()
----> 1 reductor(2)

TypeError: 'NoneType' object is not callable

In [6]: reductor == None
Out[6]: False

In [7]: type(reductor)
Out[7]: builtin_function_or_method

In [8]: 

Bulk index handles docs differently than normal index

Bulk index currently assumes that doc is an object, while normal index also allows it to be a json-encoded string.

replace:
self.bulk_data.write(json.dumps(doc, cls=self.encoder))

with:
if isinstance(doc, dict):
doc = json.dumps(body, cls=self.encoder)
self.bulk_data.write(doc)

Possible to disable thrift dependency

Thrift is pretty annoying, since on windows in order to install you must have a compiler registered. Anyway to disable the thrift dependency if you don't care about higher performance?

Thanks

Missing TermsQuery class

There is a "Terms Query" type in ES, but not available in pyres. I created one for my own purpose. Post it here and hope it is useful:

class TermsQuery(TermQuery):
    _internal_name = "terms"

    def __init__(self, *args, **kwargs):
        super(TermsQuery, self).__init__(*args, **kwargs)

    def add(self, field, value, minimum_match=1):
        if type(value) is not types.ListType:
            raise InvalidParameterQuery("value %r must be valid list" % value)
        self._values[field] = value
        if minimum_match:
            if isinstance(minimum_match, int):
                self._values['minimum_match'] = minimum_match
            else:
                self._values['minimum_match'] = int(minimum_match)

Thanks for making pyes available.

get mapping and then put it back in new index

When i run (first get index mapping and then put it back in newly created index):
mapping = elastic.get_mapping('job', 'test') elastic.delete_index_if_exists('test') elastic.create_index('test') elastic.put_mapping('job', mapping, 'test')
This will not work (error no method to_json), because "mapping" is type pyes.es.DotDict which has defined getattr.
And in put_mapping there is:
if hasattr(mapping, "to_json"): mapping = mapping.to_json()
This works but it's annoying ...
elastic.put_mapping('job', dict(elastic.get_mapping('job', 'test')), 'test')
Bug?

start size query. default argument prob

n the Query class start is set to 0 in the constructor
and then if self.start:
res['from']=self.start

will always be true.==> changing start will never adjust the value :(. I will always be 0

I found the same problem with add_highlight and number_of_fragments

utils.py requires Django

Not nice for using with Pylons.

Please move the django imports from module level to inside the get_values method (since get_values is django specific anyways).

Install with buildout fails

:
import pyes as distmeta

requires urllib3 to be installed before finishing processing of setup.py so it fails before installing requirements.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.