aparo / pyes Goto Github PK

Python connector for ElasticSearch - the pythonic way to use ElasticSearch

License: BSD 3-Clause "New" or "Revised" License

Python 99.52% Shell 0.26% HTML 0.21%

pyes's Introduction

pyes - Python ElasticSearch

Web:	http://pypi.python.org/pypi/pyes/
Download:	http://pypi.python.org/pypi/pyes/
Source:	http://github.com/aparo/pyes/
Documentation:	http://pyes.rtfd.org/
Keywords:	search, elastisearch, distribute search

[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/aparo/pyes?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

pyes is a pythonic way to use ElasticSearch since 2010.

This version requires elasticsearch 1.x or above. It's a pre release for pyes 1.x. Give a look to migrations documentation to upgrade you code for ElasticSearch 1.x.

We are working to provide full support to ElasticSearch 1.x (check the develop branch: we are using the git-flow workflow) that'll have:

connection based on Official ElasticSearch client ??
full support to ElasticSearch 1.x (removed old support due incompatibility with old version of ES)
migration from multi_field to >field>.fields
refactory of old code to be more pythonic
performance improvements

Features

Python3 support (only HTTP, thrift lib is not available on python3)
Thrift/HTTP protocols
Bulk insert/delete
Index management
Every search query types
Facet Support
Aggregation Support
Geolocalization support
Highlighting
Percolator
River support

Changelog

0.99.0:

Migrated many code to ElasticSearch 1.x

Full coverage for actual queries

0.99:

Added aggregation

Fix for python3 compatibility

Upgraded code to use ElasticSearch 1.x or above

0.90.1:

Bug Fix releases for some python3 introduced regression

0.90.0:

A lot of improvements.

Python 3 support.

Migration to version 0.99

CustomScoreQuery has been removed. The FunctionScoreQuery with its functions cover the previous functionalities. For scripting use ScriptScoreFunction.

TODO

much more documentation
add coverage
add jython native client protocol

License

This software is licensed under the New BSD License. See the LICENSE file in the top distribution directory for the full license text.

pyes's People

Contributors

Stargazers

Watchers

Forkers

sandymahalo andreiz merrellb dlg rboulton springpartners davedash pcdinh acdha newgene dbuxton zebuline aguereca pombredanne purem caphrim007 mkramb simplegeo akheron paykroyd jbong nkvoll entequak rufuspollock jatrost vinodc spindlelabs dangra mchruszcz mhluongo bsterne disqus jassinm jkoelker din0bot drawks glitchdotcom krux gonvaled aquamatt sirex morpheu hazzadous cruns stamped steveberryman spoton shayne ehazlett iancmcc ferhatsb vrachnis patricksmith kumar303 mouadino zrlram jackric radu-gheorghe kostko fridgei popotam jeanvc llonchj hoffmann zhenghao1 fuyun fone4u wraithan glim joeshaw reilost tgruben danfairs vbabiy antonygc hellp silver- nextdoor rickowens smaddineni sjolicoeur loku hongphi yelparchive mezeo ansavvides cgenie vaidik sergiojorge hfeeki dourvaris naeka tarass dmpeters lambdafu andreacrotti zerocoolys openlabs strategist922 haiwen

pyes's Issues

ES class constructor does not default to localhost

the documentation states that the ES class constructor will default to localhost if not provided any parameters. this is not accurate.

Unit tests require a running ElasticSearch instance

This is really wrong; the tests should use mock (or a similar tool) to stub out network access and returned canned responses. There's nothing wrong with testing using a real ElasticSearch instance, but this shouldn't really be the default behavior, as it makes it very difficult to write and run tests.

Operator is ignored in TextQuery

Using the operator keyword argument when constructing a TextQuery doesn't work and or operator is always used. This seems to be caused by a commented out section in add_query method.

No release tags in Git

If there is some way to get tags for the different releases of pyes, that would be really great. The only tag I see is 0.12.1. I'm trying to find the source for 0.16.0, and it's much harder than it should be because there are no tags.

Inconsistent spelling of indexes and indices

Elasticsearch consistently spells the plural of index as "indices". However, there are many places in pyes where the spelling "indexes" is used (normally as a parameter name). I'd like to change that to "indices", so that users don't have to guess which spelling to use. Unfortunately, that will break any existing code which uses named parameters to call these methods.

We could implement a fallback, so that either spelling is permissible, but this seems like far more trouble than it's worth. I'd prefer to just incompatibly change the code now, before too many clients start using it.

Any better ideas?

(If I were writing elasticsearch myself, I'd go for the spelling indexes, but it's far to late for that kind of discussion; what's important now is that pyes is consistent in its usage.)

QueryFilter

How can i build Filter with QueryFilter (passed something like: "apple OR oranges")?
Thanks for any info ...

The documentation is inadequate.

I am unable to use pyes. pyes.connect() doesn't exist, and I have to use pyes.connection_http.connect(). The resulting object has no methods, and the client object is never used again in the documentation, it's created and then ignored. Every operation is performed on the conn object, which, as far as I know, cannot be created for an http connection (or, at least, there's no mention of that way in the documentation).

It's unfortunate that I can't even find the few lines it takes to try pyes out, but hopefully this is easily fixed.

Thanks!

how to acces a variable in a multi nested mapping

Hi,

I would like to acces a variable in a multi nested mapping

Example :

DECLARATION

mapping = {
u"profile_list" : {
'type': 'nested',
u"affectation_list" : {
"type" : "nested",
"properties" : {
u"employee" : {"type" : "string", 'index':'not_analyzed'},
u"cancel_status" : {"type" : "string", 'index':'not_analyzed'},
}
},
u"planning_list" : {
"type" : "nested",
"properties" : {
"planning_date" : {"type" : "date", 'index':'not_analyzed'}
}
},
u"text1" : {'type': 'string', 'index':'not_analyzed'},
u"text2" : {'type': 'string', 'index':'not_analyzed'}
}
}

conn.put_mapping(document_type, {'properties':mapping}, [db_index])

conn.index( { u"profile_list" :
[
{u"affectation_list" :
[
{u"employee" : u"ef4zef4ze", u"cancel_status" : u"cancel_cpm"},
{u"employee" : u"ezf4zef", u"cancel_status" : u"cancel_emp"}
],
u"planning_list" : [],
u"text1": u"titi",
u"text2": u"toto"}
]
},
db_index, document_type)

conn.index( { u"profile_list" :
[
{u"affectation_list" :
[
{u"employee" : u"ef4zef4ze", u"cancel_status" : u"cancel_cpm"},
{u"employee" : u"ezf4zef", u"cancel_status" : u"--"}
],
u"planning_list" : [],
u"text1": u"tutu",
u"text2": u"tete"}
]
},
db_index, document_type)

conn.refresh([db_index])

QUERY

"""1st"""
q = {
"query": {
"nested" : {
"path" : "profile_list",
"score_mode" : "avg",
"query" : {

                    "nested" : {
                        "path" : "affectation_list",
                        "score_mode" : "avg",
                        "query" : {
                            "term" : {"profile_list.affectation_list.cancel_status" : u"--"}
                            }
                        }
                    }

                }
            }

    }

"""2nd"""
q = {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested" : {
"path" : "profile_list",
"score_mode" : "avg",
"query" : {
"term" : {"profile_list.affectation_list.cancel_status" : u"--"}
}
}
}
}
}
}

"""3rd"""
q = pyes.NestedQuery(u"profile_list", pyes.TermQuery(u"profile_list.affectation_list.cancel_status", u"--"))

EXECUTION

result = conn.search(query = q, indices=[db_index], doc_types=[document_type], size = 1000000)

ERROR MESSAGE

for the 1st and 2nd query : in query.py => res = {"query": self.query.serialize()} AttributeError: 'dict' object has no attribute 'serialize'

the 3rd result = [ ]

You have an idea ??

Thx

Implements ReindexByQuery functionality

With scan functionality, implementing a reindexing should be easy.

Add trace calls request/response

The trace can be written on file.

Implements Update functionality

Update functionality ala mongodb should be useful in different contexts.

When connect via thrift, elasticsearch server emits StreamCorruptedException

Detailed exception stack trace

    java.io.StreamCorruptedException: invalid data length: -2147418111
    at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:42)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:282)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:51)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.elasticsearch.common.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)

Significantly worse performance vs. urllib2

While working out some performance kinks in my ElasticSearch deployment, I discovered that performance of pyes is significantly worse (2x) vs. using urllib2 directly. I've attempted to eliminate as many variables as possible from these numbers.

Here's what I used to benchmark:

from urllib2 import urlopen, Request
from itertools import islice

import simplejson as json

import pyes
from ostrich import stats

HOST = 'localhost:9200'


@stats.time('pyes')
def search_pyes(conn, search):
    res = conn.search(search, indices='places', doc_types='geojson')
    list(islice(res, 25))


@stats.time('urllib')
def search_urllib(search):
    fp = urlopen(Request('http://%s/places/geojson/_search' % HOST,
                         json.dumps(search),
                         {'Content-Type': 'application/json'}))
    try:
        json.load(fp)
    except Exception, ex:
        stats.incr('errors')
    finally:
        fp.close()


if __name__ == '__main__':
    conn = pyes.ES([HOST])
    search = {'sort': [{'_score': 'desc'}], 'query': {'filtered': {'filter': {'and': [{'term': {'_deleted': False}}, {'geo_bounding_box': {'geometry.coordinates': {'bottom_right': {'lat': 47.609380000000002, 'lon': -122.34135000000001}, 'top_left': {'lat': 47.610700000000001, 'lon': -122.34406}}}}]}, 'query': {'match_all': {}}}}, 'size': 25, 'from': 0, 'fields': ['_source']}
    for x in xrange(15):
        search_pyes(conn, search)
        search_urllib(search)

    print json.dumps(stats.stats(reset=False), default=stats.json_encoder)

This uses python-ostrich to capture performance information.

Here's the timing information for pyes and urllib2:

{"pyes": {"count": 15,
          "p9999": 2619,
          "p999": 2619,
          "p99": 2619,
          "p90": 2619,
          "p75": 2619,
          "p50": 2619,
          "p25": 2619,
          "histogram": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15],
          "minimum": 2252,
          "maximum": 2436,
          "average": 2337,
          "standard_deviation": 46},
 "urllib": {"count": 15,
            "p9999": 1192,
            "p999": 1192,
            "p99": 1192,
            "p90": 1192,
            "p75": 1192,
            "p50": 1192,
            "p25": 917,
            "minimum": 862,
            "maximum": 1136,
            "average": 944,
            "standard_deviation": 68,
            "histogram": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8]}}

Implement a better mapping for fields

Implement a better mapping for fields to easy create complex mapping with the reflection of index structure.

Detect info about the server

Collect info about the server

testDelete in errors.py fails to raise error

testDelete runs the following:

curl -XDELETE http://127.0.0.1:9200/test-index/flibble/asdf

But this does not return an error if asdf doesn't exist - so there is nothing to raise.

Using the create_river and delete_river throws an error when used with any of the subclasses.

When trying to use the create_river or delete_river functions with any of the River subclasses the following error is thrown.

Traceback (most recent call last):
File "rivers.py", line 18, in testCreateCouchDBRiver
result = self.conn.create_river(test_river, river_name='test_index')
File "/usr/lib/python2.7/site-packages/pyes-0.14.0-py2.7.egg/pyes/es.py", line 582, in create_river
river_name = river.name
AttributeError: 'CouchDBRiver' object has no attribute 'name'

I have created a fork with a possible fix.

How can i make top level filter?

How can i make top level filter, like this (so filter is not inside query).
I don't want to filter facets only hits.

{
"query": {
"match_all": {}
} ,
"filter": {
"term": { "company": "univerza v mariboru" }
},
"facets" : {
"company" : {
"terms" : { "field" : "company" }
}
}

thanks for any info ...

Hosts list doesn't support unicode

Hosts only work as old-style strings. Should support a list of unicode as well.

Add geo support

As described here : http://www.elasticsearch.com/blog/2010/08/16/geo_location_and_search.html

Thanks.

Implements nested queries

Hello,

ElasticSearch 0.17.0 added the nested objects feature with new nested queries.
Has anyone started to implement this in pyes ? It would be great !

Search object requires query

Hi,

The query.Search class tries to serialise query even when it wasn't provided resulting in an unhandled exception like this:

/Users/michal/dev/capital/branches/targeted_device/pyes/query.pyc in serialize(self)
    129 
    130         """
--> 131         res = {"query": self.query.serialize()}
    132         if self.filter:
    133             res['filter'] = self.filter.serialize()

pyes sends start/size while elasticsearch expects from/size

see http://www.elasticsearch.org/guide/reference/api/search/from-size.html -- elasticsearch doesn't know the start keyword at all (at least mine doesn't…)

query parameters are ignored in mlt queries

When making a mlt query through ES.morelikethis, query parameters are ignored. Here the size parameter is ignored:

>>>options = {"size":"1"}
>>> res=conn.morelikethis('index','article',id='4c59c437dbe1afb1d8004f01',fields=['title'],**options)
>>>len(res['hits']['hits']) # should be 1
10

Also, if you had options = {"size":"1", "fields":["title","summary"]}, you would get a TypeError from python because of the name clash on fields

Highlighting question

When adding highlighting to a Search, the whole field which has a match is highlighted, instead of the substring of the field which mached the search term:

firstQ = PrefixQuery(UserIndex.first_name, term)
lastQ = PrefixQuery(UserIndex.last_name, term)

q = BoolQuery(should=[firstQ, lastQ])

search = Search(q)
search.add_highlight(UserIndex.first_name)

results = conn.search(search, indices = UserIndex.index_name)

So for the search term John a field containing Johnson will be highlighted <em>Johnson</em> instead of <em>John</em>son

Is it the Prefixquery that is causing problems in conjunction with highlighting?

unable to create an index

I try creating an HTTP connection:
conn = pyes.connect(['http://127.0.0.1:9200/]')

And then to create an index:
conn.create_index("corpus")

Here is what I get:
File "populateSearch.py", line 11, in
conn.create_index("corpus")
File "/usr/lib/python2.6/site-packages/pyes-0.12.0-py2.6.egg/pyes/connection.py", line 158, in _client_call
conn = self._ensure_connection()
File "/usr/lib/python2.6/site-packages/pyes-0.12.0-py2.6.egg/pyes/connection.py", line 169, in _ensure_connection
conn = self.connect()
File "/usr/lib/python2.6/site-packages/pyes-0.12.0-py2.6.egg/pyes/connection.py", line 183, in connect
self._timeout, self._recycle)
File "/usr/lib/python2.6/site-packages/pyes-0.12.0-py2.6.egg/pyes/connection.py", line 33, in init
host, port = server.split(":")
ValueError: too many values to unpack

Cannot read the results of a bulk operation

Actions in a bulk operation can succeed or fail individually, but pyes gives no way to read the result of a bulk operation after it has been sent.

How about returning True from ES.index(..., bulk=True) call if the bulk was sent, and making the result of the last bulk available as ES.last_bulk_result?

Don't set size and from if not requested by the user.

There is no need to set size to 10 in the resultset since elasticsearch itself already provides sensible defaults.

Add delete_by_query() into class ES

Could you add the following method into ES class?

def delete_by_query(self, index, doc_type, query):
    """
    Delete a typed JSON documents from a specific index based on query
    """        
    path = self._make_path([index, doc_type, '_query'])
    return self._send_request('DELETE', path, query)

difficult to pass 'from' to ES.search()

Since from is a reserved word in python you can't just call

ES.search(query=..., size=10, from=100)

I had to create a dict of params and then call ES.search(**params) to be able to pass it.

dump_curl option forces output to be in /tmp

Currently, the dump_curl option forces the output to be placed in /tmp, and to have .sh appended to it.

I think it would be much cleaner to simply allow the user to specify the filename directly. For bonus points, if a filehandle (or any other object with a write() method) is passed in the system should just use that, allowing the user to send output to any desired destination.

On IRC, clintongormley suggested that being able to send the output to a file based on the process ID would be a good idea; I'm not sure if it's worth building this in, but it could be done by allowing a %p in the filename to be converted to the PID.

Allow bulk deletes

ElasticSearch supports bulk deletes with a sizable performance boost but the delete() method doesn't accept bulk=True

"start" parameter is not passed to Query class

There is a typo in query.py file, regarding "start" parameter:

https://github.com/aparo/pyes/blob/master/pyes/query.py#L111

    if self.start is None:
        res['from'] = self.start

should be:

    if self.start is not None:
        res['from'] = self.start

es.get_mapping() fails when mapping contains "_source" or "_boost" field.

When get mapping that I put with "_source" field in it, (http://www.elasticsearch.org/guide/reference/mapping/source-field.html) pyes throws exception like following.

Traceback (most recent call last):
  File "es_index.py", line 140, in 
    kpi.create_all_mappings(index_name=INDEX_NAME, del_map_before_put=False)
  File "es_index.py", line 89, in create_all_mappings
    mapping = es.get_mapping(doc_type, index_name)
  File "/usr/local/lib/python2.6/dist-packages/pyes-0.15.0-py2.6.egg/pyes/es.py", line 542, in get_mapping
    self.mappings = Mapper(result)
  File "/usr/local/lib/python2.6/dist-packages/pyes-0.15.0-py2.6.egg/pyes/mappings.py", line 313, in __init__
    self._process(data)
  File "/usr/local/lib/python2.6/dist-packages/pyes-0.15.0-py2.6.egg/pyes/mappings.py", line 322, in _process
    self.indexes[indexname][docname] = get_field(docname, docdata)
  File "/usr/local/lib/python2.6/dist-packages/pyes-0.15.0-py2.6.egg/pyes/mappings.py", line 307, in get_field
    return ObjectField(name=name, **data)
TypeError: __init__() got an unexpected keyword argument '_source'

When I read the mapping using browser, it gives me "_source" field well.

thrift or pythrift?

Your error message says 'If you want to use thrift, please install pythrift' (which I found at https://github.com/alancastro/pythrift), but the actual code in es/connection.py says "from thrift import...", which is a different package.

Problem running attachment tests

I might be running these incorrectly but I am getting a few failures and a few errors:

(pyes) ± nosetests **/*py --failed                                                                                              on master
..EE.......F.FFF..............F....E...................E
======================================================================
ERROR: test_TermQuery (pyes.tests.attachments.QueryAttachmentTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/attachments.py", line 73, in setUp
    self.conn.put_mapping("test-type", {"test-type":{'properties':mapping}}, ["test-index"])
  File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 529, in put_mapping
    return self._send_request('PUT', path, mapping)
  File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 220, in _send_request
    raise_if_error(response.status, decoded)
  File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/convert_errors.py", line 67, in raise_if_error
    raise excClass(msg, status, result)
MapperParsingException: No handler for type [attachment] declared on field [attachment]
-------------------- >> begin captured logging << --------------------
pyes: DEBUG: Connecting to 127.0.0.1:9200
pyes: INFO: Starting new HTTP connection (1): 127.0.0.1
pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index HTTP/1.1" 200 31
pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/_mapping HTTP/1.1" 500 99
--------------------- >> end captured logging << ---------------------
======================================================================

ERROR: test_filesave (pyes.tests.attachments.TestFileSaveTestCase)
Traceback (most recent call last):

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/attachments.py", line 22, in test_filesave

self.conn.put_mapping("test-type", {"test-type":{'properties':mapping}}, ["test-index"])

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 529, in put_mapping

return self._send_request('PUT', path, mapping)

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 220, in _send_request

raise_if_error(response.status, decoded)

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/convert_errors.py", line 67, in raise_if_error

raise excClass(msg, status, result)

MapperParsingException: No handler for type [attachment] declared on field [my_attachment]

-------------------- >> begin captured logging << --------------------

pyes: DEBUG: Connecting to 127.0.0.1:9200

pyes: INFO: Starting new HTTP connection (1): 127.0.0.1

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 200 31

pyes.urllib3.connectionpool: DEBUG: "PUT /test-index HTTP/1.1" 200 31

pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/_mapping HTTP/1.1" 500 102

--------------------- >> end captured logging << ---------------------
======================================================================

ERROR: test_CustomScoreQueryJS (pyes.tests.queries.QuerySearchTestCase)
Traceback (most recent call last):

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/queries.py", line 122, in test_CustomScoreQueryJS

result = self.conn.search(query=q, indexes=["test-pindex"])

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 735, in search

return self._query_call("_search", body, indexes, doc_types, query_params)

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 248, in _query_call

response = self._send_request('GET', path, body, querystring_args)

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/es.py", line 220, in _send_request

raise_if_error(response.status, decoded)

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/convert_errors.py", line 67, in raise_if_error

raise excClass(msg, status, result)

SearchPhaseExecutionException: Failed to execute phase [query], total failure; shardFailures {[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][0]: SearchParseException[[test-pindex][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }{[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][1]: SearchParseException[[test-pindex][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }{[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][2]: SearchParseException[[test-pindex][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score*(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }{[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][3]: SearchParseException[[test-pindex][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score*(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }{[l1nyp5FgSp6swhnBhIx_Dw][test-pindex][4]: SearchParseException[[test-pindex][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query": {"custom_score": {"lang": "js", "query": {"match_all": {}}, "script": "parseFloat(_score*(5+doc.position.value))"}}}]]]; nested: ElasticSearchIllegalArgumentException[script_lang not supported [js]]; }

-------------------- >> begin captured logging << --------------------

pyes: DEBUG: Connecting to 127.0.0.1:9200

pyes: INFO: Starting new HTTP connection (1): 127.0.0.1

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32

pyes.urllib3.connectionpool: DEBUG: "GET /test-pindex/_search HTTP/1.1" 500 1940

--------------------- >> end captured logging << ---------------------
======================================================================

ERROR: Failure: ImportError (No module named mock)
Traceback (most recent call last):

File "/Users/dash/.virtualenvs/pyes/lib/python2.6/site-packages/nose/loader.py", line 390, in loadTestsFromName

addr.filename, addr.module)

File "/Users/dash/.virtualenvs/pyes/lib/python2.6/site-packages/nose/importer.py", line 39, in importFromPath

return self.importFromDir(dir_path, fqname)

File "/Users/dash/.virtualenvs/pyes/lib/python2.6/site-packages/nose/importer.py", line 86, in importFromDir

mod = load_module(part_fqname, fh, filename, desc)

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/tests_extra.py", line 8, in 

from tests import ESTestCase

File "/Users/dash/.virtualenvs/pyes/src/check/tests.py", line 6, in 

from mock import patch, Mock

ImportError: No module named mock

-------------------- >> begin captured logging << --------------------

pyes: DEBUG: Connecting to 127.0.0.1:9200

pyes: INFO: Starting new HTTP connection (1): 127.0.0.1

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-pindex HTTP/1.1" 200 31

--------------------- >> end captured logging << ---------------------
======================================================================

FAIL: Test error reported by deleting a missing document.
Traceback (most recent call last):

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/errors.py", line 71, in testDelete

"asdf")

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/pyestest.py", line 37, in checkRaises

"Expected exception %s not raised" % excClass

AssertionError: Expected exception <class 'pyes.exceptions.NotFoundException'> not raised

-------------------- >> begin captured logging << --------------------

pyes: DEBUG: Connecting to 127.0.0.1:9200

pyes: INFO: Starting new HTTP connection (1): 127.0.0.1

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index/flibble/asdf HTTP/1.1" 200 64

--------------------- >> end captured logging << ---------------------
======================================================================

FAIL: test_GeoBoundingBoxFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)
Traceback (most recent call last):

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/geoloc.py", line 70, in test_GeoBoundingBoxFilter

self.assertEquals(result2['hits']['total'], 1)

AssertionError: 0 != 1

-------------------- >> begin captured logging << --------------------

pyes: DEBUG: Connecting to 127.0.0.1:9200

pyes: INFO: Starting new HTTP connection (1): 127.0.0.1

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-mindex HTTP/1.1" 400 33

pyes.urllib3.connectionpool: DEBUG: "PUT /test-mindex HTTP/1.1" 200 31

pyes.urllib3.connectionpool: DEBUG: "PUT /test-mindex/test-type/_mapping HTTP/1.1" 200 31

pyes.urllib3.connectionpool: DEBUG: "PUT /test-mindex/test-type/1 HTTP/1.1" 200 64

pyes.urllib3.connectionpool: DEBUG: "PUT /test-mindex/test-type/2 HTTP/1.1" 200 64

pyes.urllib3.connectionpool: DEBUG: "POST /test-mindex/_refresh HTTP/1.1" 200 60

pyes.urllib3.connectionpool: DEBUG: "GET /_cluster/health HTTP/1.1" 200 228

pyes: DEBUG: Connecting to 127.0.0.1:9200

pyes: INFO: Starting new HTTP connection (1): 127.0.0.1

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32

pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 262

pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 104

--------------------- >> end captured logging << ---------------------
======================================================================

FAIL: test_GeoDistanceFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)
Traceback (most recent call last):

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/geoloc.py", line 59, in test_GeoDistanceFilter

self.assertEquals(result['hits']['total'], 1)

AssertionError: 0 != 1

-------------------- >> begin captured logging << --------------------

pyes: DEBUG: Connecting to 127.0.0.1:9200

pyes: INFO: Starting new HTTP connection (1): 127.0.0.1

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32

pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 263

pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 104

--------------------- >> end captured logging << ---------------------
======================================================================

FAIL: test_GeoPolygonFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)
Traceback (most recent call last):

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/geoloc.py", line 90, in test_GeoPolygonFilter

self.assertEquals(result['hits']['total'], 1)

AssertionError: 0 != 1

-------------------- >> begin captured logging << --------------------

pyes: DEBUG: Connecting to 127.0.0.1:9200

pyes: INFO: Starting new HTTP connection (1): 127.0.0.1

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32

pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 262

pyes.urllib3.connectionpool: DEBUG: "GET /test-mindex/_search HTTP/1.1" 200 104

--------------------- >> end captured logging << ---------------------
======================================================================

FAIL: testMLT (pyes.tests.indexing.IndexingTestCase)
Traceback (most recent call last):

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/indexing.py", line 145, in testMLT

], u'total': 2, u'max_score': 0.19178301})

File "/Users/dash/Projects/fd/the_code/vendor/src/pyes/pyes/tests/pyestest.py", line 24, in assertResultContains

self.assertEquals(value, result[key])

AssertionError: [{u'_type': u'test-type', u'_index': u'test-index', u'_score': 0.19178301, u'_source': {u'name': u'Joe Tested'}, u'_version': 1, u'_id': u'3'}, {u'_type': u'test-type', u'_index': u'test-index', u'_score': 0.19178301, u'_source': {u'name': u'Joe Tester'}, u'_version': 1, u'_id': u'2'}] != [{u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'3', u'_source': {u'name': u'Joe Tested'}, u'_index': u'test-index'}, {u'_score': 0.19178301, u'_type': u'test-type', u'_id': u'2', u'_source': {u'name': u'Joe Tester'}, u'_index': u'test-index'}]

-------------------- >> begin captured logging << --------------------

pyes: DEBUG: Connecting to 127.0.0.1:9200

pyes: INFO: Starting new HTTP connection (1): 127.0.0.1

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index HTTP/1.1" 400 32

pyes.urllib3.connectionpool: DEBUG: "DELETE /test-index2 HTTP/1.1" 400 33

pyes.urllib3.connectionpool: DEBUG: "DELETE /another-index HTTP/1.1" 400 35

pyes.urllib3.connectionpool: DEBUG: "PUT /test-index HTTP/1.1" 200 31

pyes.urllib3.connectionpool: DEBUG: "PUT /test-index2 HTTP/1.1" 200 31

pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 200 63

pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 200 63

pyes.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/3 HTTP/1.1" 200 63

pyes.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 60

pyes.urllib3.connectionpool: DEBUG: "GET /_cluster/health HTTP/1.1" 200 228

pyes.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/1/_mlt?min_doc_freq=1&fields=name&min_term_freq=1 HTTP/1.1" 200 330

--------------------- >> end captured logging << ---------------------

Ran 56 tests in 41.128s
FAILED (errors=4, failures=5)

List of doubles doesn't work

When a query returns a doc that has a list of doubles, pyes fails. I suspect it happens with any list of non-strings.

The fix is the string_to_datetime(self, obj) method in ESJsonDecoder. Just added a check to make sure we're only trying to get the length of strings.

def string_to_datetime(self, obj):
    """Transforma a datetime string to a datetime object
    """
    if isinstance(obj, basestring) and len(obj)==19:
        try:
            return datetime(*value.strptime("%Y-%m-%dT%H:%M:%S")[:6])
        except:
            pass
    return obj

Incorrect error handling with HTTP

It looks like the convert_errors() code has only been tested with Thrift and fails badly with HTTP. As an example, create_index_if_missing() is completely broken because convert_errors raises:

pyes.exceptions.ElasticSearchException: RemoteTransportException[[Brand, Abigail][inet[/127.0.0.1:9300]][indices/createIndex]]; nested: IndexAlreadyExistsException[[places] Already exists];

When the expected exception is IndexAlreadyExistsException.

Rename the files in tests/.py to tests/test_.py

To be compliant with some test executor, the names of tests files must be renamed

DotDict instances can't be deep-copied by copy.deepcopy()

Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) 
Type "copyright", "credits" or "license" for more information.

IPython 0.11 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from pyes.es import DotDict

In [2]: a = DotDict()

In [3]: b = dict()

In [4]: from copy import deepcopy

In [5]: deepcopy(b)
Out[5]: {}

In [6]: deepcopy(a)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/alambert/biff/grok/ in ()
----> 1 deepcopy(a)

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in deepcopy(x, memo, _nil)
    180                     reductor = getattr(x, "__reduce_ex__", None)
    181                     if reductor:
--> 182                         rv = reductor(2)
    183                     else:
    184                         reductor = getattr(x, "__reduce__", None)

TypeError: 'NoneType' object is not callable

Looking at copy.py from Python 2.7.1 (http://svn.python.org/view/python/tags/r271/Lib/copy.py?view=markup) gives a bit more detail on what's going wrong:

78      copier = getattr(cls, "__copy__", None)
79      if copier:
80          return copier(x)
81  
82      reductor = dispatch_table.get(cls)
83      if reductor:
84          rv = reductor(x)
85      else:
86          reductor = getattr(x, "__reduce_ex__", None)
87          if reductor:
88              rv = reductor(2)
89          else:
90              reductor = getattr(x, "__reduce__", None)
91              if reductor:
92                  rv = reductor()
93              else:
94                  raise Error("un(shallow)copyable object of type %s" % cls)
95

In [2]: from pyes.es import DotDict

In [3]: a = DotDict()

In [4]: reductor = getattr(a, "__reduce_ex__")

In [5]: reductor(2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/alambert/biff/grok/ in ()
----> 1 reductor(2)

TypeError: 'NoneType' object is not callable

In [6]: reductor == None
Out[6]: False

In [7]: type(reductor)
Out[7]: builtin_function_or_method

In [8]:

Bulk index handles docs differently than normal index

Bulk index currently assumes that doc is an object, while normal index also allows it to be a json-encoded string.

replace:
self.bulk_data.write(json.dumps(doc, cls=self.encoder))

with:
if isinstance(doc, dict):
doc = json.dumps(body, cls=self.encoder)
self.bulk_data.write(doc)

Provide a better system to iterate on results

Provide a better system to iterate on results with autofetching of required result in iteration.

Possible to disable thrift dependency

Thrift is pretty annoying, since on windows in order to install you must have a compiler registered. Anyway to disable the thrift dependency if you don't care about higher performance?

Thanks

morelikethis does not support request body

pyes morelikethis exposes the More Like This API, but does not support additional filters or queries using a POST body.

Missing TermsQuery class

There is a "Terms Query" type in ES, but not available in pyres. I created one for my own purpose. Post it here and hope it is useful:

class TermsQuery(TermQuery):
    _internal_name = "terms"

    def __init__(self, *args, **kwargs):
        super(TermsQuery, self).__init__(*args, **kwargs)

    def add(self, field, value, minimum_match=1):
        if type(value) is not types.ListType:
            raise InvalidParameterQuery("value %r must be valid list" % value)
        self._values[field] = value
        if minimum_match:
            if isinstance(minimum_match, int):
                self._values['minimum_match'] = minimum_match
            else:
                self._values['minimum_match'] = int(minimum_match)

Thanks for making pyes available.

get mapping and then put it back in new index

When i run (first get index mapping and then put it back in newly created index):
mapping = elastic.get_mapping('job', 'test') elastic.delete_index_if_exists('test') elastic.create_index('test') elastic.put_mapping('job', mapping, 'test')
This will not work (error no method to_json), because "mapping" is type pyes.es.DotDict which has defined getattr.
And in put_mapping there is:
if hasattr(mapping, "to_json"): mapping = mapping.to_json()
This works but it's annoying ...
elastic.put_mapping('job', dict(elastic.get_mapping('job', 'test')), 'test')
Bug?

codeblock problems in the docs

See: http://packages.python.org/pyes/manual/installation.html

usually ".. code-block:: sh" shouldn't be visible in the rendered docs.

start size query. default argument prob

n the Query class start is set to 0 in the constructor
and then if self.start:
res['from']=self.start

will always be true.==> changing start will never adjust the value :(. I will always be 0

I found the same problem with add_highlight and number_of_fragments

utils.py requires Django

Not nice for using with Pylons.

Please move the django imports from module level to inside the get_values method (since get_values is django specific anyways).

Install with buildout fails

:
import pyes as distmeta

requires urllib3 to be installed before finishing processing of setup.py so it fails before installing requirements.

aparo / pyes Goto Github PK

pyes's Introduction

pyes - Python ElasticSearch

Features

Changelog

Migration to version 0.99

TODO

License

pyes's People

Contributors

Stargazers

Watchers

Forkers

pyes's Issues

DECLARATION

QUERY

EXECUTION

ERROR MESSAGE

====================================================================== ERROR: test_filesave (pyes.tests.attachments.TestFileSaveTestCase)

====================================================================== ERROR: test_CustomScoreQueryJS (pyes.tests.queries.QuerySearchTestCase)

====================================================================== ERROR: Failure: ImportError (No module named mock)

====================================================================== FAIL: Test error reported by deleting a missing document.

====================================================================== FAIL: test_GeoBoundingBoxFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)

====================================================================== FAIL: test_GeoDistanceFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)

====================================================================== FAIL: test_GeoPolygonFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)

====================================================================== FAIL: testMLT (pyes.tests.indexing.IndexingTestCase)

Recommend Projects

Recommend Topics

Recommend Org

======================================================================
ERROR: test_filesave (pyes.tests.attachments.TestFileSaveTestCase)

======================================================================
ERROR: test_CustomScoreQueryJS (pyes.tests.queries.QuerySearchTestCase)

======================================================================
ERROR: Failure: ImportError (No module named mock)

======================================================================
FAIL: Test error reported by deleting a missing document.

======================================================================
FAIL: test_GeoBoundingBoxFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)

======================================================================
FAIL: test_GeoDistanceFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)

======================================================================
FAIL: test_GeoPolygonFilter (pyes.tests.geoloc.GeoQuerySearchTestCase)

======================================================================
FAIL: testMLT (pyes.tests.indexing.IndexingTestCase)