Giter Club home page Giter Club logo

elasticsearch-analysis-string2int's Introduction

String2Integer Analysis for Elasticsearch

--------------------------------------------------
| String2Integer Analysis Plugin| Elasticsearch  |
--------------------------------------------------
| 1.5.4                        | 2.3.4           |
--------------------------------------------------
| 1.5.3                        | 2.3.3           |
--------------------------------------------------
| 1.4.1                        | 2.2.1           |
--------------------------------------------------
| 1.3.0                        | 1.0.0           |
--------------------------------------------------
| 1.2.0                        | 0.90.x          |
--------------------------------------------------
| 1.0.0                        | 0.19.x          |
--------------------------------------------------

The plugin includes a string2int analyzer , a string2int tokenizer and a string2int token-filter.

this plugin is used for saving your memory and reduce the size of your index. sometimes there are some entities in our index,for example,people's name,the title of you position,and so on, generally we set these fields to not_analyzed and hope to use them together, and they often slightly change ,but if you wanna do faceting over these fields, you should be very carefully,because the memory usage is a headache,especially it contains a lot of of terms, but if you can convert these long string entities into numbers, the memory usage will be a little smaller,and make the impossible thing to be possible.

the plugin use redis to store the mapping of your entity and the number. the number is assigned by auto_increment style,and in order to speedup the indexing,there is a local cache in memory.

how to use this plugin?

NOTICE:

From v2.0 Elasticsearch shiped with a Java Security Manager, you need to config the security policy to enable jedis access, https://www.elastic.co/guide/en/elasticsearch/reference/2.2/modules-scripting-security.html#_disable_the_java_security_manager

1.step one,add a custom analysis type in the elasticsearch.yml

index:
  analysis:
    analyzer:
      my_string2int:
          type: string2int
          redis_server: "127.0.0.1"
          redis_port: 6379
          redis_key: "index1_type1_name1"

the redis_key is like a catalog of your entities

  1. step two,create a index,and create a type ,and make sure the field's index_analyzer is string2int,the analyzer we just defined..
curl -XPOST http://localhost:9200/index/string2int/_mapping -d'
{
    "string2int": {
        "_meta": {
            "author": "medcl"
        },
        "_all": {
            "analyzer": "ik"
        },
        "_source": {
            "enabled": false
        },
        "properties": {
            "author": {
                "type": "string",
                "index_analyzer": "string2int",
                "search_analyzer": "keyword",
                "include_in_all": false,
                "store":true
            }
        }
    }
}'

3.step 3,as the filed is named "author",so let's index some people

 curl -XPOST http://localhost:9200/index/string2int/1 -d'
 {"author":"medcl"}'

 curl -XPOST http://localhost:9200/index/string2int/2 -d'
 {"author":"michael jackson"}'

 ...

4.do faceting now

curl -XPOST http://localhost:9200/index/string2int/_search -d'
{
    "query": {
        "query_string": {
            "query": "*"
        }
    },
    "facets": {
        "author": {
            "terms": {
                "field": "author"
            }
        }
    }
}'

response:

{"took":9,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"index","_type":"string2int","_id":"1","_score":1.0},{"_index":"index","_type":"string2int","_id":"2","_score":1.0},{"_index":"index","_type":"string2int","_id":"3","_score":1.0}]},"facets":{"author":{"_type":"terms","missing":0,"total":3,"other":0,"terms":[{"term":"6","count":2},{"term":"7","count":1}]}}}

the next step is to replace the everything back. and you can also change the field's mapping,by set store to true,to get them back directly.

the default value is below,you should change them first:

redis
    redis_server :"127.0.0.1"
    redis_port:6379
    redis_key:"default_key"
    local_mem_cache: "true"
    use_lru_cache: "true"

another token-filter example:

index:
  analysis:
    tokenizer:
      my_string2int:
          type: "string2int"
          redis_server: "127.0.0.1"
          redis_port: 6379
          redis_key: "index1_type2_name2"

    analyzer:
      custom_string2int:
          type: custom
          tokenizer: whitespace
          filter: [my_string2int,lowercase]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.