一款运行于Elasticsearch之上的中文拼音智能分词插件，支持全拼、首字母、中文混合搜索

License: Artistic License 2.0

Java 100.00%

elasticsearch-analysis-lc-pinyin's People

Contributors

Stargazers

Watchers

elasticsearch-analysis-lc-pinyin's Issues

使用lc给出的DEMO，我在本地进行测试。发现搜索“baidu”时，“百度”这个条目分数没有“百度糯米”分数高。
以下是我查询的结果：
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.8384802,
"hits": [
{
"_index": "index",
"_type": "brand",
"_id": "8",
"_score": 2.8384802,
"_source": {
"name": "百度糯米"
},
"highlight": {
"name": [
"百度糯米"
]
}
},
{
"_index": "index",
"_type": "brand",
"_id": "1",
"_score": 0.8271048,
"_source": {
"name": "百度"
},
"highlight": {
"name": [
"百度"
]
}
}
]
}
}

现在的问题是：为什么DEMO中的es版本与es5.5.2的版本，查询的结果为什么不一样了。 demo给出的结果是“百度”在前，“百度糯米”在后

插件对带多音字的词语支持有问题

遇到带多音字的词语时，会把多音字的所有读音都转化为term吗？比如
GET /_analyze?analyzer=lc_index&text=成长&pretty
{ "tokens" : [ { "token" : "成", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "cheng", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "c", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "长", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "zhang", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "z", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "chang", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "c", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 } ] }
显然，zhang的读音不应该出现。
插件不支持多音字的词库吗？或者有没有办法解决这种情况？麻烦了。

如何在java项目中引用这个插件呢

lc-pinyin如何安装

请问一下，readme只介绍了如何使用，并没有介绍如何安装，可以补充吗？

拼音提示的中文不是内容的前缀词，可能是内容中的词？

比如说：我搜索一个“ali”,能搜出“阿里巴巴”，“你是阿里”。我想问一下，可以只搜出“阿里”开头的，而不是含有“阿里”的内容都搜出来。谢谢

lc_search分词时,中文能否分词出拼音,目前只能按单子分词

如题:谢谢

completion suggest使用lc_index索引计算量巨大

使用ES completion suggest时,采用lc_index索引数据时,ES进程卡死,cpu 直接100%,请教下是什么原因?
字段mapping如下:

"suggestText": {
"type": "completion",
"analyzer": "lc_index",
"search_analyzer": "lc_search",
"payloads": true,
"preserve_separators": false,
"preserve_position_increments": true,
"max_input_length": 50
}
lc_index用于type非completion的字段索引正常,其他分词器,如ik_max_word,ik_smart索引completion类型的字段也正常,唯独lc_index索引completion类型的字段,会出现cpu计算量巨大,索引速度巨慢的情况.

search模式下采用最优分词

目前search模式下分词采用反向最大匹配，且未考虑分词后剩余单个字母的个数，现改为：匹配采用正向匹配算法，回溯取得最优匹配分词。

不支持es 6.x？

为什么这个不支持英文搜索呢？

比如词库里面有关键词”iphone 6s“，但我输入“iphone”搜索不到，是怎么回事呢？

{
    "query": {
        "match": {
          "keyword": {
            "query": "iphone",
            "analyzer": "lc_search",
            "type": "phrase"
          }
        }
    }
}

新版本的elasticsearch6.0怎么兼容

额，我安装的elasticsearch版本是6.0.0的，现在这个插件最新版本是5.3怎么兼容哦

setting 当中tokenizer 设置问题

分词器 - Tokenizer
lc_index：参数 mode: full_pinyin，first_letter，chinese_char
lc_search：参数 mode: smart_pinyin，single_letter
上面是您给出的，但是实际当中如何使用这个mode呢？

这是我的setting

{
  "number_of_shards": 5,
  "number_of_replicas": 1,
  "index": {
    "settings": {
      "analysis": {
        "analyzer": {
          "lc_analyzer": {
            "type": "custom",
            "tokenizer": {
              "lc_index":{
                "mode":"full_pinyin"
              }
            },
            "filter": [
              "lc_full_pinyin"
            ]
          }
        }
      }
    }
  }
}

这是我的mapping

{
  "news": {
    "properties": {
      "newsId": {
        "type": "long"
      },
      "cityId": {
        "type": "integer"
      },
      "desId": {
        "type": "String"
      },
      "newsTitle": {
        "type": "string",
        "store": true,
        "analyzer": "lc_analyzer",
        "search_analyzer": "lc_search"
      },
      "newsTitlePinYin": {
        "type": "string",
        "store": true,
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "newsTitleJianPin": {
        "type": "string",
        "store": true,
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "newsTitleSource": {
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "newsTitlePinYinSource": {
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "newsTitleJianPinSource": {
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "newsAbstract": {
        "type": "string",
        "store": true,
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "newsEditor": {
        "type": "string",
        "store": true,
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "editorNickName":{
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "editorAvatar":{
        "type": "string",
        "store": true,
        "index": "not_analyzed"
      },
      "publishTime":{
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      },
      "newsTitleSuggest": {
        "type": "completion",
        "payloads": true,
        "analyzer": "ik_smart",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

下面是错误。。

{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "analyzer [lc_analyzer] not found for field [newsTitle]"
}
],
"type": "mapper_parsing_exception",
"reason": "analyzer [lc_analyzer] not found for field [newsTitle]"
},
"status": 400
}

===========================换一种setting也不行=======================

{
  "number_of_shards": 5,
  "number_of_replicas": 1,
  "index": {
    "settings": {
      "analysis": {
        "analyzer": {
          "lc_analyzer": {
            "type": "custom",
            "tokenizer": "lc_index",
            "filter": [
              "lc_full_pinyin"
            ]
          }
        }
      }
    }
  }
}

然后也是上面的错误，lc_analyzer找不到newsTitle

和demo一样的配置，为什么搜索"yundo"出不来结果呢？

你好，我想实现汉字、拼音、简拼搜索，下面是配置和索引数据，我搜索"yd"、"yundong"和"运动"，都可以出来数据，为什么"yundo"出来不了呢

curl -XPUT http://192.168.0.101:9200/index/ -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "ik_letter_smart": {
          "type": "custom",
          "tokenizer": "ik_max_word",
          "filter": [
            "lc_first_letter"
          ]
        },
        "ik_py_smart": {
          "type": "custom",
          "tokenizer": "ik_max_word",
          "filter": [
            "lc_full_pinyin"
          ]
        }
      }
    }
  }
}'





curl -XPOST http://192.168.0.101:9200/index/_mapping/brand -d'
{
  "brand": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "lc_index",
        "search_analyzer": "lc_search",
        "term_vector": "with_positions_offsets"
      }
    }
  }
}'

curl -XPOST http://192.168.0.101:9200/index/brand/1 -d'{"name":"百度"}'
curl -XPOST http://192.168.0.101:9200/index/brand/8 -d'{"name":"百度糯米"}'
curl -XPOST http://192.168.0.101:9200/index/brand/2 -d'{"name":"阿里巴巴"}'
curl -XPOST http://192.168.0.101:9200/index/brand/3 -d'{"name":"腾讯科技"}'
curl -XPOST http://192.168.0.101:9200/index/brand/4 -d'{"name":"网易游戏"}'
curl -XPOST http://192.168.0.101:9200/index/brand/9 -d'{"name":"大众点评"}'
curl -XPOST http://192.168.0.101:9200/index/brand/10 -d'{"name":"携程旅行网"}'
curl -XPOST http://192.168.0.101:9200/index/brand/11 -d'{"name":"运动"}'
curl -XPOST http://192.168.0.101:9200/index/brand/12 -d'{"name":"运动鞋"}'
curl -XPOST http://192.168.0.101:9200/index/brand/13 -d'{"name":"运动鞋 男"}'

search模式下参数化支持首字母搜索和智能最优匹配

目前search模式下不支持用户指定按照首字母或者智能最优匹配来分词，现针对这两种模式支持用户参数化

支持 5.1.1吗

lc_search分词出现歧义

使用lc_search分词时,希望匹配首拼,输入payh,希望得到平安银行,但是因为分词为pa,y,h,导致搜索不到正确结果.

词条包含英文分词搜索有问题

不支持elasticsearch 5.4.3

以下是错误信息，另外能否提供直接解压就能用、不需编译的二进制包？

[2017-07-04T09:38:28,839][ERROR][o.e.b.Bootstrap ] Exception
java.lang.IllegalArgumentException: plugin [analysis-lc-pinyin] is incompatible with version [5.4.3]; was designed for version [5.3.0]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:146) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Spawner.spawnNativePluginControllers(Spawner.java:86) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:167) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:350) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) [elasticsearch-5.4.3.jar:5.4.3]
[2017-07-04T09:38:28,852][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-1] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: plugin [analysis-lc-pinyin] is incompatible with version [5.4.3]; was designed for version [5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:127) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) ~[elasticsearch-5.4.3.jar:5.4.3]
Caused by: java.lang.IllegalArgumentException: plugin [analysis-lc-pinyin] is incompatible with version [5.4.3]; was designed for version [5.3.0]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:146) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Spawner.spawnNativePluginControllers(Spawner.java:86) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:167) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:350) ~[elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.4.3.jar:5.4.3]
... 6 more

gitchennan / elasticsearch-analysis-lc-pinyin Goto Github PK

elasticsearch-analysis-lc-pinyin's People

Contributors

Stargazers

Watchers

Forkers

elasticsearch-analysis-lc-pinyin's Issues

Recommend Projects

Recommend Topics

Recommend Org