Giter Club home page Giter Club logo

elasticsearch-analysis-lc-pinyin's People

Contributors

gitchennan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-analysis-lc-pinyin's Issues

5.5.2版本搜索“baidu”时的问题

使用lc给出的DEMO,我在本地进行测试。发现搜索“baidu”时,“百度”这个条目分数没有“百度糯米”分数高。
以下是我查询的结果:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.8384802,
"hits": [
{
"_index": "index",
"_type": "brand",
"_id": "8",
"_score": 2.8384802,
"_source": {
"name": "百度糯米"
},
"highlight": {
"name": [
"百度糯米"
]
}
},
{
"_index": "index",
"_type": "brand",
"_id": "1",
"_score": 0.8271048,
"_source": {
"name": "百度"
},
"highlight": {
"name": [
"百度"
]
}
}
]
}
}

现在的问题是: 为什么DEMO中的es版本与es5.5.2的版本,查询的结果为什么不一样了。 demo给出的结果是“百度”在前,“百度糯米”在后

插件对带多音字的词语支持有问题

遇到带多音字的词语时,会把多音字的所有读音都转化为term吗?比如
GET /_analyze?analyzer=lc_index&text=成长&pretty
{ "tokens" : [ { "token" : "成", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "cheng", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "c", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "长", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "zhang", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "z", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "chang", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "c", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 } ] }
显然,zhang的读音不应该出现。
插件不支持多音字的词库吗?或者有没有办法解决这种情况?麻烦了。

lc-pinyin如何安装

请问一下,readme只介绍了如何使用,并没有介绍如何安装,可以补充吗?

completion suggest使用lc_index索引计算量巨大

使用ES completion suggest时,采用lc_index索引数据时,ES进程卡死,cpu 直接100%,请教下是什么原因?
字段mapping如下:

"suggestText": {
"type": "completion",
"analyzer": "lc_index",
"search_analyzer": "lc_search",
"payloads": true,
"preserve_separators": false,
"preserve_position_increments": true,
"max_input_length": 50
}
lc_index用于type非completion的字段索引正常,其他分词器,如ik_max_word,ik_smart索引completion类型的字段也正常,唯独lc_index索引completion类型的字段,会出现cpu计算量巨大,索引速度巨慢的情况.

search模式下采用最优分词

目前search模式下分词采用反向最大匹配,且未考虑分词后剩余单个字母的个数,现改为:匹配采用正向匹配算法,回溯取得最优匹配分词。

为什么这个不支持英文搜索呢?

比如词库里面有关键词”iphone 6s“,但我输入“iphone”搜索不到,是怎么回事呢?

{
    "query": {
        "match": {
          "keyword": {
            "query": "iphone",
            "analyzer": "lc_search",
            "type": "phrase"
          }
        }
    }
}

setting 当中tokenizer 设置问题

分词器 - Tokenizer
lc_index:参数 mode: full_pinyin,first_letter,chinese_char
lc_search:参数 mode: smart_pinyin,single_letter

上面是您给出的,但是实际当中如何使用这个mode呢?

这是我的setting

{
  "number_of_shards": 5,
  "number_of_replicas": 1,
  "index": {
    "settings": {
      "analysis": {
        "analyzer": {
          "lc_analyzer": {
            "type": "custom",
            "tokenizer": {
              "lc_index":{
                "mode":"full_pinyin"
              }
            },
            "filter": [
              "lc_full_pinyin"
            ]
          }
        }
      }
    }
  }
}

这是我的mapping

{
  "news": {
    "properties": {
      "newsId": {
        "type": "long"
      },
      "cityId": {
        "type": "integer"
      },
      "desId": {
        "type": "String"
      },
      "newsTitle": {
        "type": "string",
        "store": true,
        "analyzer": "lc_analyzer",
        "search_analyzer": "lc_search"
      },
      "newsTitlePinYin": {
        "type": "string",
        "store": true,
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "newsTitleJianPin": {
        "type": "string",
        "store": true,
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "newsTitleSource": {
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "newsTitlePinYinSource": {
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "newsTitleJianPinSource": {
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "newsAbstract": {
        "type": "string",
        "store": true,
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "newsEditor": {
        "type": "string",
        "store": true,
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "editorNickName":{
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "editorAvatar":{
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "publishTime":{
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      },
      "newsTitleSuggest": {
        "type": "completion",
        "payloads": true,
        "analyzer": "ik_smart",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

下面是错误。。

{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "analyzer [lc_analyzer] not found for field [newsTitle]"
}
],
"type": "mapper_parsing_exception",
"reason": "analyzer [lc_analyzer] not found for field [newsTitle]"
},
"status": 400
}

===========================换一种setting也不行=======================

{
  "number_of_shards": 5,
  "number_of_replicas": 1,
  "index": {
    "settings": {
      "analysis": {
        "analyzer": {
          "lc_analyzer": {
            "type": "custom",
            "tokenizer": "lc_index",
            "filter": [
              "lc_full_pinyin"
            ]
          }
        }
      }
    }
  }
}

然后也是上面的错误,lc_analyzer找不到newsTitle

和demo一样的配置,为什么搜索"yundo"出不来结果呢?

你好,我想实现汉字、拼音、简拼搜索,下面是配置和索引数据,我搜索"yd"、"yundong"和"运动",都可以出来数据,为什么"yundo"出来不了呢

curl -XPUT http://192.168.0.101:9200/index/ -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "ik_letter_smart": {
          "type": "custom",
          "tokenizer": "ik_max_word",
          "filter": [
            "lc_first_letter"
          ]
        },
        "ik_py_smart": {
          "type": "custom",
          "tokenizer": "ik_max_word",
          "filter": [
            "lc_full_pinyin"
          ]
        }
      }
    }
  }
}'





curl -XPOST http://192.168.0.101:9200/index/_mapping/brand -d'
{
  "brand": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "lc_index",
        "search_analyzer": "lc_search",
        "term_vector": "with_positions_offsets"
      }
    }
  }
}'

curl -XPOST http://192.168.0.101:9200/index/brand/1 -d'{"name":"百度"}'
curl -XPOST http://192.168.0.101:9200/index/brand/8 -d'{"name":"百度糯米"}'
curl -XPOST http://192.168.0.101:9200/index/brand/2 -d'{"name":"阿里巴巴"}'
curl -XPOST http://192.168.0.101:9200/index/brand/3 -d'{"name":"腾讯科技"}'
curl -XPOST http://192.168.0.101:9200/index/brand/4 -d'{"name":"网易游戏"}'
curl -XPOST http://192.168.0.101:9200/index/brand/9 -d'{"name":"大众点评"}'
curl -XPOST http://192.168.0.101:9200/index/brand/10 -d'{"name":"携程旅行网"}'
curl -XPOST http://192.168.0.101:9200/index/brand/11 -d'{"name":"运动"}'
curl -XPOST http://192.168.0.101:9200/index/brand/12 -d'{"name":"运动鞋"}'
curl -XPOST http://192.168.0.101:9200/index/brand/13 -d'{"name":"运动鞋 男"}'

lc_search分词出现歧义

使用lc_search分词时,希望匹配首拼,输入payh,希望得到平安银行,但是因为分词为pa,y,h,导致搜索不到正确结果.

不支持elasticsearch 5.4.3

以下是错误信息,另外能否提供直接解压就能用、不需编译的二进制包?

[2017-07-04T09:38:28,839][ERROR][o.e.b.Bootstrap ] Exception
java.lang.IllegalArgumentException: plugin [analysis-lc-pinyin] is incompatible with version [5.4.3]; was designed for version [5.3.0]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:146) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Spawner.spawnNativePluginControllers(Spawner.java:86) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:167) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:350) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) [elasticsearch-5.4.3.jar:5.4.3]
[2017-07-04T09:38:28,852][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-1] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: plugin [analysis-lc-pinyin] is incompatible with version [5.4.3]; was designed for version [5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:127) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) ~[elasticsearch-5.4.3.jar:5.4.3]
Caused by: java.lang.IllegalArgumentException: plugin [analysis-lc-pinyin] is incompatible with version [5.4.3]; was designed for version [5.3.0]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:146) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Spawner.spawnNativePluginControllers(Spawner.java:86) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:167) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:350) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.4.3.jar:5.4.3]
... 6 more

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.