Giter Club home page Giter Club logo

elasticsearch-analysis-morphology's Introduction

This project is no longer supported

Elasticsearch 5.6.x is not maintained since March of 2019. This project is no longer supported and due to sunsetting of Bintray the binary builds are no longer available. The best strategy at this point would be to migrate to a supported version of Elasticsearch and another plugin such as Hunspell.

Morphological Analysis Plugin for Elasticsearch

The Morphological Analysis plugin integrates Russian and English morphology for java and lucene framework into elasticsearch. This plugin adds two new analyzers: “russian_morphology” and “english_morphology” and two token filters with the same names.

The demo.sh file shows a few examples of the analyzers behavior.

Switching to Hunspell

NOTE: Please note that this plugin is available only for Elasticsearch v5.6.x and below. For Elasticsearch version 6.0 and above consider switching to the officially supported hunspell token filter with russian dictionaries.

Building from the source

In order to build the project from the source, you need git, maven and java JDK 8 and follow a few simple steps:

First you need to build the last commit of russianmorphology that supported Lucene 6.x:

$ git clone https://github.com/AKuznetsov/russianmorphology.git
$ cd russianmorphology
$ git checkout 6fc7e109cb23c88cfb44313275df44117b8b97f7
$ mvn install

If it resulted in the following output, we can move on.

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------

Now we can check out 5.6 branch of this project and build it using the russianmorphology library that we built in previous step.

$ cd ..
$ git clone https://github.com/imotov/elasticsearch-analysis-morphology.git
$ cd elasticsearch-analysis-morphology
$ git checkout 5.6
$ ./gradlew assemble -Drepos.mavenlocal=true

If you see the following message, the build was successful:

BUILD SUCCESSFUL in 26s
11 actionable tasks: 11 executed

The plugin can be found file is build/distributions/analysis-morphology-5.6.16.zip

elasticsearch-analysis-morphology's People

Contributors

azubanov avatar fl00r avatar imotov avatar zshamrock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-analysis-morphology's Issues

demo.sh not passed

hi, i'd install your plugin, and run demo.sh . It not passed tests "Should return 1,2,3,4" (returning 4,1,2,3) and "Should return 4, 6" (returning 6,4).
It looks strange. What i doing wrong?
elastic search 0.90.0 on debian wheezy
your plugin version 1.1.0

2.0.0 release

Hi. Thanks for plugin.

When will be release for 2.0.0?

Get this error now:

ERROR: Plugin [elasticsearch-analysis-morphology] is incompatible with Elasticsearch [2.0.0]. Was designed for version [2.0.0-beta1]

По запросу "ев" находятся "ели".

Ответ не соотвествует запросу. Почему так?

Маппинг:

        'properties': {
            'id': {'type': 'integer'},
            'name': {'type': 'string', 'analyzer': 'russian_morphology'},
        }

Запрос

curl -XPOST 'http://localhost:9200/myindex/tags/_search' -d '{"query": 
                           {"match": {"name": "ев"}}}'

Ответ:

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "1160",
                "_index": "myindex",
                "_score": 5.7484603,
                "_source": {
                    "id": 1160,
                    "name": "ели"
                },
                "_type": "tag"
            }
        ],
        "max_score": 5.7484603,
        "total": 1
    },
    "timed_out": false,
    "took": 5
}

The plugin is broken

I tried to run demo.sh on my elasticsearch 1.4.2 and it fails.

If it works for you, please provide the config, I'll test it at my comp.

Please do not use bundled org.elasticsearch.common.collect.ImmutableList

On Fedora you plugin failed to start with (check elasticsearch 1.6 and 1.7.1):

Sep  9 18:52:53 hubbitus elasticsearch: SEVERE: Exception
Sep  9 18:52:53 hubbitus elasticsearch: java.lang.NoClassDefFoundError: org/elasticsearch/common/collect/ImmutableList
Sep  9 18:52:53 hubbitus elasticsearch: at org.elasticsearch.plugin.analysis.morphology.AnalysisMorphologyPlugin.modules(AnalysisMorphologyPlugin.java:45)
Sep  9 18:52:53 hubbitus elasticsearch: at org.elasticsearch.plugins.PluginsService.modules(PluginsService.java:222)
Sep  9 18:52:53 hubbitus elasticsearch: at org.elasticsearch.plugins.PluginsModule.spawnModules(PluginsModule.java:51)
Sep  9 18:52:53 hubbitus elasticsearch: at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
Sep  9 18:52:53 hubbitus elasticsearch: at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:182)
Sep  9 18:52:53 hubbitus elasticsearch: at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
Sep  9 18:52:53 hubbitus elasticsearch: at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:77)
Sep  9 18:52:53 hubbitus elasticsearch: at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:245)
Sep  9 18:52:53 hubbitus elasticsearch: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Sep  9 18:52:53 hubbitus elasticsearch: Caused by: java.lang.ClassNotFoundException: org.elasticsearch.common.collect.ImmutableList
Sep  9 18:52:53 hubbitus elasticsearch: at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
Sep  9 18:52:53 hubbitus elasticsearch: at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
Sep  9 18:52:53 hubbitus elasticsearch: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
Sep  9 18:52:53 hubbitus elasticsearch: at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Sep  9 18:52:53 hubbitus elasticsearch: ... 9 more

I assume because all bundled stuff removed from lib directory in packaging.

But simple replacing import org.elasticsearch.common.collect.ImmutableList by com.google.common.collect.ImmutableList and adding dependency to guava:

        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>19.0-rc1</version>
        </dependency>

fix problem (at least for elastic 1.7.1).

Elasticsearch 2.3.2 support

Can you please release version for Elasticsearch 2.3.2?

And why every minor elasticsearch version requires new version of plugin? Is backwards compatibility possible at all? May be I can install 2.3.1 plugins version to elasticsearch 2.3.2 somehow?

Thank you.

Problems with 0.90.1

Looks like the plugin doesn't work with latest elasticsearch. I get TimeOut errors when try to update index. Without plugin it works fine.

Can't install plugin on elasticsearch 5.6.5

Hello!

Trying to install plugin on elasticsearch 5.6.5
When try to use russian_morphology get such error.

failed to find global analyzer [russian_morphology]

Steps to reproduce

1 . Run elasticsearch in docker

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:5.6.5

2 . Login to docker image

docker exec -it elasticsearch5 bash

3 . Install Plugin

bin/elasticsearch-plugin install http://dl.bintray.com/content/imotov/elasticsearch-plugins/org/elasticsearch/elasticsearch-analysis-morphology/5.6.5/elasticsearch-analysis-morphology-5.6.5.zip

4 . Ensure plugin is installed

bin/elasticsearch-plugin list

analysis-morphology

  1. Try to use analyzer
curl -X POST \
  http://localhost:9200/_analyze \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -H 'postman-token: 1a4b60b6-7d0d-5962-63c3-1bf0005e2012' \
  -d '{
  "analyzer" : "russian_morphology",
  "text" : "Стандартный анализатор разбил строку по пробелам и перевел все в нижний регистр, анализатор russian — убрал не значимые слова, перевел в нижний регистр и оставил основу слов."
}'

See such responce:

{
    "error": {
        "root_cause": [
            {
                "type": "remote_transport_exception",
                "reason": "[oGrvxj7][127.0.0.1:9300][indices:admin/analyze[s]]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "failed to find global analyzer [russian_morphology]"
    },
    "status": 400
}

Have similar issue when creating index, as described here:
https://github.com/imotov/elasticsearch-analysis-morphology/blob/master/demo.sh

  1. Tried restart elastic search. Same result
apt-get update
apt-get install -y procps
service elasticsearch restart

Log content:
/var/log/elasticsearch/elasticsearch.log

elasticsearch.log

PS. I have seen this issue, it seems similar, but I can't figure it out.
As i see, I use correct version of plugin

#25

Расход памяти с установленным плагином

День добрый, Игорь!

Столкнулся со странным поведением Elasticsearch (1.4.4 и 1.4.0) при установленном плагине elasticsearch-analysis-morphology. Расход памяти смотрю через плагин kopf (https://github.com/lmenezes/elasticsearch-kopf) и top (у меня Linux 3.13.0-46-generic #76-Ubuntu x86_64 x86_64 x86_64 GNU/Linux, elasticsearch запущен в docker из образа https://github.com/dockerfile/elasticsearch с java-8-oracle ). Запускаю как:

/elasticsearch/bin/elasticsearch -Des.config=/data/elasticsearch.yml -Xmx2048M -Xms2048M

Сценариев два, первый:
Elasticsearch используется пустой, и ручных настроек только указано что без реплик и по одной шарде на индекс и созданных индексов, шаблоны и маппинги не определены. Создаю новый пустой индекс без типов и параметров:

curl -XPUT 'http://localhost:9200/t_1/'

Смотрю расход памяти - немного увеличилось и через пару секунд вернулось. Делаю подобным образом еще 50 индексов с разными именами, всё создаётся, память скачет, но возвращается почти к стартовому варианту.
Все индексы созданы, записей в них нет, в логах красиво, всё хорошо. Перезапускаю elasticsearch,
все индексы стартуют отлично, в момент памяти расход памяти скачет, потом возвращается, green статус появляется очень быстро.

Устанавливаю плагин elasticsearch-analysis-morphology 1.2.0, перезапускаю elasticsearch. Тут начинается проблема.
При старте всё стандартно:

[2015-03-04 17:07:22,943][INFO ][node                     ] [esboston] version[1.4.4], pid[1], build[c88f77f/2015-02-19T13:05:36Z]
[2015-03-04 17:07:22,943][INFO ][node                     ] [esboston] initializing ...
[2015-03-04 17:07:22,986][INFO ][plugins                  ] [esboston] loaded [analysis-icu, analysis-morphology], sites [kopf, head]
[2015-03-04 17:07:25,357][INFO ][node                     ] [esboston] initialized
[2015-03-04 17:07:25,364][INFO ][node                     ] [esboston] starting ...
[2015-03-04 17:07:25,512][INFO ][transport                ] [esboston] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.17.0.63:9300]}
[2015-03-04 17:07:25,548][INFO ][discovery                ] [esboston] esboston/qdiRDKpETUCbmVhDaf5v0w
[2015-03-04 17:07:29,334][INFO ][cluster.service          ] [esboston] new_master [esboston][qdiRDKpETUCbmVhDaf5v0w][39fd3c14cb16][inet[/172.17.0.63:9300]], reason: zen-disco-join (elected_as_master)
[2015-03-04 17:07:29,444][INFO ][http                     ] [esboston] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.0.63:9200]}
[2015-03-04 17:07:29,445][INFO ][node                     ] [esboston] started
[2015-03-04 17:07:32,916][INFO ][gateway                  ] [esboston] recovered [306] indices into cluster_state
...
[2015-03-04 17:16:30,734][INFO ][monitor.jvm              ] [esboston] [gc][old][137][106] duration [5.1s], collections [1]/[5.1s], total [5.1s]/[8m], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [33mb]->[32.9mb]/[33.2mb]}{[old] [1.6gb]->[1.6gb]/[1.6gb]}
[2015-03-04 17:16:39,964][INFO ][monitor.jvm              ] [esboston] [gc][old][139][108] duration [5.1s], collections [1]/[5.1s], total [5.1s]/[8.2m], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [33mb]->[32.9mb]/[33.2mb]}{[old] [1.6gb]->[1.6gb]/[1.6gb]}

Слежу за логами, смотрю в top и kopf. В логах всё чисто, список плагинов, старт, всё пучком. kopf показывает что индексы друг за дружкой запускаются и --- с каждым новым запущенным индексом расход памяти увеличивается процентов на 10, от 100 до 150 мегабайт. После старта 15-17 индексов расход памяти подходит к критическому, top показывает что память процесс ест активно, включается куллер и еще через пару стартовавших индексов в логах начинается:

[2015-03-04 17:08:26,014][INFO ][monitor.jvm              ] [esboston] [gc][old][46][6] duration [5.8s], collections [1]/[5.9s], total [5.8s]/[15.6s], memory [1.9gb]->[1.7gb]/[1.9gb], all_pools {[young] [266.2mb]->[86.9mb]/[266.2mb]}{[survivor] [29.5mb]->[0b]/[33.2mb]}{[old] [1.6gb]->[1.6gb]/[1.6gb]}
[2015-03-04 17:08:37,696][INFO ][monitor.jvm              ] [esboston] [gc][old][48][8] duration [5.4s], collections [1]/[6.2s], total [5.4s]/[25.7s], memory [1.7gb]->[1.7gb]/[1.9gb], all_pools {[young] [100.4mb]->[125.9mb]/[266.2mb]}{[survivor] [0b]->[0b]/[33.2mb]}{[old] [1.6gb]->[1.6gb]/[1.6gb]}
.......
[2015-03-04 17:13:34,913][INFO ][monitor.jvm              ] [esboston] [gc][old][106][66] duration [5.3s], collections [1]/[5.4s], total [5.3s]/[5.1m], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [32.5mb]->[32.8mb]/[33.2mb]}{[old] [1.6gb]->[1.6gb]/[1.6gb]}
[2015-03-04 17:13:44,365][INFO ][monitor.jvm              ] [esboston] [gc][old][108][68] duration [5.4s], collections [1]/[5.4s], total [5.4s]/[5.2m], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [32.4mb]->[32.6mb]/[33.2mb]}{[old] [1.6gb]->[1.6gb]/[1.6gb]}
[2015-03-04 17:13:53,853][INFO ][monitor.jvm              ] [esboston] [gc][old][110][70] duration [5.4s], collections [1]/[5.4s], total [5.4s]/[5.4m], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [32.4mb]->[32.4mb]/[33.2mb]}{[old] [1.6gb]->[1.6gb]/[1.6gb]}

Такое может длиться долго, индексы при этом не стартуют, kopf пишет инициализация 4х шард и тишина. Иногда всё падает в своп (создаётся файл java_pid1.hprof и всё падает).

Второй сценарий.
Запускаю пустой elasticsearch, как в первом сценарии, индексов нет, ставлю плагины морфологии и kopf, перезапускаю, всё хорошо. Расход памяти минимален, в логах хорошо.
Начинаю создавать индексы, той же командой что в первом сценарии, без настроек и маппингов, без указания что в индексе используется плагин elasticsearch-analysis-morphology.
С каждым новым индексом kopf показывает что после создания каждого индекса расход памяти увеличивается, в среднем, на 10%, всё те же 100-150мб. Создал 10 индексов, расход памяти в районе 70%, жду. В течении 5-10 минут память не освобождается. Данных в индексах нет, 10 пустых индексов занимают 70% памяти.
В логах только сообщения что индексы созданы успешно.
Создаю еще индексы, 1,3,5 - kopf показывает что память закончилась, elasticsearch начинает отвечать медленно на все команды (даже "curl http://localhost:9200/" висит по паре секунд ).
Через какое то время всё умирает, на диске опять создаётся файл java_pid1.hprof (в каталоге elasticsearch) и всё умирает.

Для эксперимента удаляю плагин, перезапускаю elasticsearch - все индексы быстро стартуют, памяти опять расходуется совсем немного, новые индексы создаются быстро и в любом количестве.

Итог, выделено 2 гигабайта памяти:

  • с установленным плагином число созданных индексов редко получается больше 20, рестарт elasticsearch занимает продолжительное время и периодически падает
  • без плагина морфологии большее 300 индексов создаются и стартуют моментально

Некоторые мои выводы и мысли:

  • наблюдая за расходом памяти складывается мнение что плагин "привязывается" к каждому индексу отдельно, забирая при этом часть памяти
  • с java7 такого не встречал, может не замечал, может не набиралось критичное число индексов

Спасибо, если дочитали до сюда =) , что скажите, в какую строну капать, хороший поиск нужен, elasticsearch тоже до невозможности нравится, но индексов планируется большее 500, каждый со своим набором данных и возможностью поиска с учетом морфологии, а huspell использовать не хочется.

Plugin's broken under es v0.19.8

Couldn't initialize index under 0.19.8. Returned exception:

{"error":"RemoteTransportException[[Landslide][inet[/ip:9300]][indices/create]]; nested: IndexCreationException[[myindex] failed to create index]; nested: IllegalArgumentException[Custom Analyzer [default] failed to find filter under name [russian_morphology]]; ","status":500}

searching in rustest index

Hi,
I have instaled ElasticSearch on Ubuntu v12.04 with your plagin. Then I lunch the demo.sh and it works fine. Reading the code of demo.sh, I finde some

curl -XPUT 'http://localhost:9200/rustest/type1/5' -d '{"body": "Январский дефицит платежного баланса Японии превысил $5 млрд"}'
curl -XPUT 'http://localhost:9200/rustest/type1/4' -d '{"body": "Японские автомобили вновь заняли в США первые места в рейтингах"}'

I decide to test stemming:

curl -XPOST 'localhost:9200/rustest/_search?pretty' -d '
{
  "query": { "query_string": { "query": "Япония" }, "analyze_wildcard": true }
}'

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

curl -XPOST 'localhost:9200/rustest/_search?pretty' -d '
{
  "query": { "query_string": { "query": "автомобиль" }, "analyze_wildcard": true }
}'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

What am I doing wrong?

Plugin is incompatible with version [5.5.1]

java.lang.IllegalArgumentException: plugin [analysis-morphology] is incompatible with version [5.5.1]; was designed for version [5.5.0]

Could you please update it?
Thanks.

Версия 5.4.2

Приветствую!
Планируется ли выпуск обновления для 5.4.2?

Разрешение омонимии

в текущей реализации генерируются токены всех возможных вариантов, Есть ли возможность разрешения омонимии? хотя-бы статистическими методами

Failed to find filter russian_morphology

Добрый день! Обнаружил подобную проблему, перерыл все известные форумы. Версия 2.0.1. Использовал Ваш demo.sh Лог прилагаю. Буду благодарен, если подскажите, что не так. Спасибо!

log.txt

Elasticsearch 2.2.0 (IndexCreationException)

Возникает такая ошибка при запуске demo.sh

{"error":"IndexCreationException[[rustest] failed to create index]; nested: IllegalArgumentException[Custom Analyzer [my_analyzer] failed to find filter under name [russian_morphology]]; ","status":400}

Устанавливал так
/usr/share/elasticsearch/bin/plugin install http://dl.bintray.com/content/imotov/elasticsearch-plugins/org/elasticsearch/elasticsearch-analysis-morphology/2.2.0/elasticsearch-analysis-morphology-2.2.0.zip

Указанно что модуль установлен.
/usr/share/elasticsearch/bin/plugin list

Installed plugins in /usr/share/elasticsearch/plugins:
- elasticsearch-analysis-morphology

dpkg -l | elasticsearch

ii elasticsearch 2.2.0 all Elasticsearch is a distributed RESTful search engine built for the cloud. Reference documentation can be found at https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html and the 'Elasticsearch: The Definitive Guide' book can be found at https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

Что делаю не так?

Поддержка 5.6.6

Здравствуйте!

Ожидается ли поддержка 5.6.6 ?
apt уже настойчиво предлагает обновиться.

вышел ES5.2.1

/usr/share/elasticsearch/bin/elasticsearch-plugin list
analysis-morphology
Exception in thread "main" java.lang.IllegalArgumentException: Plugin [analysis-morphology] is incompatible with Elasticsearch [5.2.1]. Was designed for version [5.2.0]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:108)
at org.elasticsearch.plugins.ListPluginsCommand.execute(ListPluginsCommand.java:64)
at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122)
at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:69)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122)
at org.elasticsearch.cli.Command.main(Command.java:88)
at org.elasticsearch.plugins.PluginCli.main(PluginCli.java:47)

Stemming override ability

Is it possible to make stemmer exceptions? For example we need to not stem word "Мечел" as it's company name. I've tried to follow this guide with no luck.

"analysis": {
            "analyzer": {
                "ru": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "custom_stem", "russian_morphology", "english_morphology", "ru_stopwords"]
                }
            },
            "filter": {
                "ru_stopwords": {
                    "type": "stop",
                    "stopwords": "<...skipped...>"
                },
                "no_stem": {
                    "type": "keyword_marker",
                    "keywords": ["мечел"]
                },
                "custom_stem": {
                    "type": "stemmer_override",
                    "rules": [ 
                        "мечел=>мечел",
                        "мечеть=>мечеть",
                    ]
                }
            }
        }

Neither no_stem or custom_stem are working, I always get "мечел" => "мечеть" stem result.

curl -XGET 'http://localhost:12001/posts/_analyze?analyzer=ru' -d 'мечел'
{"tokens":[{"token":"мечеть","start_offset":0,"end_offset":5,"type":"<ALPHANUM>","position":1}]}

Update: "stem_exclusion": ["мечел"], inside analyzer=>ru also doesn't work.

Elasticsearch 2.4.1 support

Can you please release version for Elasticsearch 2.4.1?

And why every minor elasticsearch version requires new version of plugin? Is backwards compatibility possible at all? May be I can install 2.4.0 plugins version to elasticsearch 2.4.1 somehow?

Thank you.

after update to 2.1.1 index won't start

Hello, after update to plugin 2.1.1 I got this error in logs
[2015-12-28 12:49:24,488][WARN ][cluster.action.shard ] [index1] [wordstat-11][15] received shard failed for [wordstat-11][15], node[wiIJXUz_Tn6Q7d_Ui8bLYQ], [P], v[3], s[INITIALIZING], a[id=hQsSCrTGQjyUrimlVhvZLQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2015-12-28T09:49:24.229Z], details[failed to create index, failure IndexCreationException[failed to create index]; nested: IllegalArgumentException[Custom Analyzer [morph_analyzer] failed to find filter under name [russian_morphology]]; ]], indexUUID [CgJne_0ERpSQPkt5hgbE2w], message [failed to create index], failure [IndexCreationException[failed to create index]; nested: IllegalArgumentException[Custom Analyzer [morph_analyzer] failed to find filter under name [russian_morphology]]; ]
[wordstat-11] IndexCreationException[failed to create index]; nested: IllegalArgumentException[Custom Analyzer [morph_analyzer] failed to find filter under name [russian_morphology]];

A lot of this errors. What wrong?

URL for 2.X

Hello!
How can we get URL for 2.X version of plugin? When we use
bin/plugin --install imotov/elasticsearch-analysis-morphology
we get error:
Message:
Error while installing plugin, reason: IllegalArgumentException: Plugin installation assumed to be site plugin, but contains source code, aborting installation.

Can't install with latest elasticsearch version

Plugin [elasticsearch-analysis-morphology] is incompatible with Elasticsearch [2.1.1]. Was designed for version [2.1.0]

And there is no version for 2.1.1, is there a way to install plugin for 2.1.1 ?

Elastica\Exception\ResponseException' with message 'IndexCreationException[[tovar] failed to create index]; nested: IllegalArgumentException[Custom Analyzer [my_analyzer] failed to find filter under name [russian_morphology]]

Install
-> Installing analysis-morphology...
Trying http://dl.bintray.com/content/imotov/elasticsearch-plugins/elasticsearch-analysis-morphology-1.0.0.zip...
Downloading .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................DONE
Installed analysis-morphology

Setting
Elastica\Exception\ResponseException' with message 'IndexCreationException[[tovar] failed to create index]; nested: IllegalArgumentException[Custom Analyzer [my_analyzer] failed to find filter under name [russian_morphology]]

Stemming of russian suffixes?

I tried to query by the root of word form, but looks like it doesn't work.

Here's sample: https://gist.github.com/7bec046605fcc9ec8aac

I have word 'паровозик' in the doc, querying by 'паровоз' returns nothing.
I thought, morphology also means that you could tokenize all word forms and plugin covers that. Should I dig into stemming or is it a bug?

Although including snowball stemmer into default analyzer didn't help, too.

v5.6.12

Hello.
Can you update your plugin to this version?
Thanks!

Elasticsearch 2.1.2 Support

ElasticSearch 2.1.2 has been released. This release contains some bug fixes. I'm not sure, but don't think that plugin should be changed in some way to make new release. Isn't it?

Метал и Метан

Добрый день!
Прежде всего огромное спасибо за Вашу работу, -- плагин великолепен.
Теперь о проблеме. Мы используем ES 2.4.0 и соответствующий плагин.

Если искать "метал", то получаем:

$ curl -s -XPOST http://xxxxxx/_search?pretty=true -d '{"explain": true, "query": {"query_string": { "fields": [ "name" ], "query": "метал"}}}' | grep weight
          "description" : "weight(name:метать in 692727) [PerFieldSimilarity], result of:",
          "description" : "weight(name:метать in 1025690) [PerFieldSimilarity], result of:",
          "description" : "weight(name:метать in 708124) [PerFieldSimilarity], result of:",
          "description" : "weight(name:метать in 130504) [PerFieldSimilarity], result of:",
          "description" : "weight(name:метать in 547909) [PerFieldSimilarity], result of:",
          "description" : "weight(name:метать in 126728) [PerFieldSimilarity], result of:",
          "description" : "weight(name:метать in 130832) [PerFieldSimilarity], result of:",
          "description" : "weight(name:метать in 46643) [PerFieldSimilarity], result of:",
          "description" : "weight(name:метать in 60715) [PerFieldSimilarity], result of:",
          "description" : "weight(name:метать in 100922) [PerFieldSimilarity], result of:",

Если искать "метан", то получаем:

uuser@el-master:~$ curl -s -XPOST http://xxxxxx/_search?pretty=true -d '{"explain": true, "query": {"query_string": { "fields": [ "name" ], "query": "метан"}}}' | grep weight
            "description" : "weight(name:метан in 547909) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 547909) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метан in 339757) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 339757) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метан in 1382581) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 1382581) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метан in 1494684) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 1494684) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метан in 602655) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 602655) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метан in 703635) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 703635) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метан in 1090767) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 1090767) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метан in 550806) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 550806) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метан in 240733) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 240733) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метан in 1095751) [PerFieldSimilarity], result of:",
            "description" : "weight(name:метать in 1095751) [PerFieldSimilarity], result of:",

Да, если искать правильно, т.е. "металл", то все хорошо:

uuser@el-master:~$ curl -s -XPOST http://xxxxxx/_search?pretty=true -d '{"explain": true, "query": {"query_string": { "fields": [ "name" ], "query": "металл"}}}' | grep weight
          "description" : "weight(name:металл in 5668) [PerFieldSimilarity], result of:",
          "description" : "weight(name:металл in 26824) [PerFieldSimilarity], result of:",
          "description" : "weight(name:металл in 50970) [PerFieldSimilarity], result of:",
          "description" : "weight(name:металл in 63638) [PerFieldSimilarity], result of:",
          "description" : "weight(name:металл in 72858) [PerFieldSimilarity], result of:",
          "description" : "weight(name:металл in 77831) [PerFieldSimilarity], result of:",
          "description" : "weight(name:металл in 88821) [PerFieldSimilarity], result of:",
          "description" : "weight(name:металл in 98831) [PerFieldSimilarity], result of:",
          "description" : "weight(name:металл in 131402) [PerFieldSimilarity], result of:",
          "description" : "weight(name:металл in 165408) [PerFieldSimilarity], result of:",

Но беда в том, что в тексте могут быть "метал.конструкции" и хочется найти именно "метал" так, чтобы он не смешивался с "метан"-ом. Кроме того, верно и обратное, при поиске "метан", находит и "метал". Т.е. морфологически эти два поисковых значения равнозначны?

Не уверен, что этот вопрос к Вам (может быть это проблема подлежащего кода, Lucene и т.д.), -- однако может быть Вы дадите разъяснения, как нам решить эту проблему?

Install error with 5.6.8 on FreeBSD 10.4

/usr/local/bin/elasticsearch-plugin install http://dl.bintray.com/content/imotov/elasticsearch-plugins/org/elasticsearch/elasticsearch-analysis-morphology/5.6.8/elasticsearch-analysis-morphology-5.6.8.zip

-> Downloading http://dl.bintray.com/content/imotov/elasticsearch-plugins/org/elasticsearch/elasticsearch-analysis-morphology/5.6.8/elasticsearch-analysis-morphology-5.6.8.zip
[=================================================] 100%  

Exception in thread "main" java.lang.IllegalArgumentException: property [elasticsearch.version] is missing for plugin [head]
	at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:135)
	at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:335)
	at org.elasticsearch.plugins.InstallPluginCommand.jarHellCheck(InstallPluginCommand.java:545)
	at org.elasticsearch.plugins.InstallPluginCommand.verify(InstallPluginCommand.java:527)
	at org.elasticsearch.plugins.InstallPluginCommand.install(InstallPluginCommand.java:570)
	at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:218)
	at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:202)
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:70)
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:134)
	at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:69)
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:134)
	at org.elasticsearch.cli.Command.main(Command.java:90)
	at org.elasticsearch.plugins.PluginCli.main(PluginCli.java:47)

running:
elasticsearch-plugin-head-2015.12.16 = up-to-date with index
elasticsearch5-5.6.8_5 = up-to-date with index

Error injecting constructor

We have a cluster with several nodes running on elasticsearch 1.7. For testing purposes at this moment no document is being analyzed with that addon. I can also assure we installed the proper version of the plugin according to our elasticsearch version.
After installing the addon on the first node and we start it up everything work as intended. However, after installing it on the second node and we try to start it up we get the following error:

"""
org.elasticsearch.common.inject.CreationException: Guice creation errors:

  1. Error injecting constructor, java.lang.NumberFormatException: For input string: "671904704682572"
    at org.elasticsearch.indices.analysis.morphology.MorphologyIndicesAnalysis.(Unknown Source)
    while locating org.elasticsearch.indices.analysis.morphology.MorphologyIndicesAnalysis

1 error
at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:344)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:178)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93)
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70)
at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59)
at org.elasticsearch.node.internal.InternalNode.(InternalNode.java:210)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:77)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:245)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.lang.NumberFormatException: For input string: "671904704682572"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.valueOf(Integer.java:766)
at org.apache.lucene.morphology.MorphologyImpl.readSeparators(MorphologyImpl.java:196)
at org.apache.lucene.morphology.MorphologyImpl.readFromInputStream(MorphologyImpl.java:145)
at org.apache.lucene.morphology.MorphologyImpl.(MorphologyImpl.java:38)
at org.apache.lucene.morphology.LuceneMorphology.(LuceneMorphology.java:32)
at org.apache.lucene.morphology.russian.RussianLuceneMorphology.(RussianLuceneMorphology.java:25)
at org.apache.lucene.morphology.russian.RussianAnalyzer.(RussianAnalyzer.java:25)
at org.elasticsearch.indices.analysis.morphology.MorphologyIndicesAnalysis.(MorphologyIndicesAnalysis.java:48)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:830)
at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
... 9 more
"""

Apparently, when initializing the morphology plugin, at some point it tries to convert that string "671904704682572" to an int, which will fail given the dimension of the number. Since the analyzer is not being used to analyze any field, that string does not belong to any document. Even after looking at the code of the plugin, I do not understand exactly where does this number come from.
Any insight about this odd problem would be greatly appreciated.

The best regards

Не находит "рубля", "рублей", но находит "рубль", "рублем", "рублю"

Добрый день,
столкнулся с такой проблемой:
Есть поле с названием search. В нем хранится строка на русском языке.
Конфиг такой для поля:

search:
  type: string
  analyzer: ru_analyzer

analyzer:
  ru_analyzer:
    type: custom
    tokenizer: standard
    filter: [lowercase, word_delimiter, russian_morphology, english_morphology, ru_stopwords]
filter:
  ru_stopwords:
    type: stop
    stopwords: '<пропущено>'

При поиске по полю так же использую ru_analyzer.

Запросы к эластику говорят, что для "рубля" и "рублей" генерируется токен "рубль", как и для "рублем", "рублю".
Когда ищу, то для запросов "рубля" и "рублей" не находится ничего, а для запросов "рублем", "рублю" находятся нужные документы.
Если к "рубля" и "рублей" добавить что-то, например "рубля монета" или "рублей монета", то находятся нужные документы.
Но например, для "10 рубля" и "10 рублей" не находится ни один документ.

Пока не нашлось других слов, которые давали бы такой же результат.

Эта ошибка связана с плагином или она где-то в эластике?

{"error":"IndexCreationException[[rustest] failed to create index]; nested: IllegalArgumentException[Custom Analyzer [my_analyzer] failed to find filter under name [russian_morphology]]; ","status":400}

Hi. I have installed plugin, but elasticserach does not see it.

org.elasticsearch.indices.IndexCreationException: [rustest] failed to create index
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:338)
    at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:371)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:204)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:167)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Custom Analyzer [my_analyzer] failed to find filter under name [russian_morphology]
    at org.elasticsearch.index.analysis.CustomAnalyzerProvider.build(CustomAnalyzerProvider.java:75)
    at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:221)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
    at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
    at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
    at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:830)
    at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
    at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
    at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
    at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
    at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:336)
    ... 7 more

При использовании wildcard есть слова, которые не находятся

Здравствуйте, Игорь.

При использовании плагина русской морфологии возникла описанная в subject проблема. Вот более подробное описание:
Для поиска полей, заполненных с ошибками, использую wildcard символы. Среди прочих примеров, работающих нормально, попались документы "ЛИИЖТ" и "ЛИИЖТ1". Захотел увидеть эти две записи в одной выборке и задал поиск "ЛИИЖТ*", но в результате получил только один документ "ЛИИЖТ1". В тоже время, если у меня в базе есть записи "ИКЕА" и "ИКЕА1", то "ИКЕА*" выдаёт оба документа. Затем я поэкспериментировал с "ЛИИЖТ" разными способами и выяснил, что такая проблема есть у всех слов/буквосочетаний, которые заканчиваются на "ЖТ" (вот выдуманная комбинация пар, которая не находится):

  • два документа (РАЖТ, РАЖТА)-ищем РАЖТ* - результат только один документ РАЖТА
  • два документа (ВЫПУЛЬЖТ, ВЫПУЛЬЖТ1) - ищем ВЫПУЛЬЖТ* - результат только один документ ВЫПУЛЬЖТ1
  • два документа (ЖТ, ЖТ1) - ищем ЖТ* - результат только один документ ЖТ1

Если маска поиска ЛИИЖ* результаты неправильные.

Если в качестве примера для поиска использовать окончание слова совсем без ЖТ перед * (например, ЛИИ* ), то всё находится нормально.

Если после ЖТ что-то написано в обоих случаях, то всё находится:

  • два документа (РАЖТА, РАЖТА1)-ищем РАЖТА* - результат два документ РАЖТА и РАЖТА1
  • два документа (ВЫПУЛЬЖТА, ВЫПУЛЬЖТА1) - ищем ВЫПУЛЬЖТ* или ВЫПУЛЬЖТА*_ - результат два документа ВЫПУЛЬЖТА и ВЫПУЛЬЖТА1
  • два документа (ЖТА и ЖТА1) - поиск ЖТА* - результат два документа (ЖТА и *ЖТА1)

Дополнительно обнаружил, что та же проблема и с этими же буквами при использовании регулярных выражений:

  • два документа (РАЖТ, РАЖТА)-ищем Наименование:/РАЖТ*/ - результат только один документ РАЖТА

Корректная работа:

  • два документа (ЛИИЖТА, ЛИИЖТА1)-ищем Наименование:/ЛИИЖТА*/ - результат два документа ЛИИЖТА и ЛИИЖТА1

Маппинг для поля (использую русские буквы для имён индексов и типов, проблем не обнаружил):

"контрагенты": {
    "ленстрой" : {
        "_all" : {"analyzer" : "russian_morphology", "store" : "true", "index_options": "offsets"},
        "dynamic" : "false", /* Не принимать автоматически новых свойств */
        "properties" : {
                "Наименование"  : {"type" : "string", "index" :     "analyzed", "analyzer" : "russian_morphology", "include_in_all": "true", "store" : "true", "index_options": "offsets", "fields" : { "raw" : {"type" : "string", "index" : "not_analyzed", "include_in_all": "false"}} }, // 
        }
    },
analyzer: {
    "analyzer" : {
        "default" : {
            "tokenizer" : "standard",
            "filter" : [
                "lowercase",
                'russian_morphology', 'english_morphology'
            ]
        },
    },
    "filter" : {}
}

Пример запроса:

POST http://serv-japp2.stpr.ru:9200/_search?pretty HTTP/1.1
Host: serv-japp2.stpr.ru:9200

{
    "from" : 0, "size" : 10,
    "query" : {
        "indices" : {
            "indices" : ["контрагенты"],
            "query" : {
                "query_string" : {
                    "fields" : ["Наименование"],
                    "query" : "ЛИИЖТ*",
                    "default_operator" : "and"
                }
            },
            "no_match_query" : "none"
        }
    }
}

Пример другого запроса, выдающего аналогичные результаты:

POST http://serv-japp2.stpr.ru:9200/_search?pretty HTTP/1.1
Host: serv-japp2.stpr.ru:9200

{
    "from" : 0, "size" : 10,
    "query" : {
                "query_string" : {
                    "fields" : ["Наименование"],
                    "query" : "ЛИИЖТ*",
                    "default_operator" : "and"
                }
            }
}

Версии компонентов:

  • ElasticSearch - 2.1
  • russian_morphology - 2.1
  • Версия Linux: oracle linux 6
  • JDK - 1.8.0_66

Спасибо.

Error ArrayIndexOutOfBoundsException[null] when text contain пм

Create index with settings:

"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "russian_morphology", "english_morphology", "my_stopwords"]
}
},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": "а,без,более,бы,был,была,были,было,быть,в,вам,вас,весь,во,вот,все,всего,всех,вы,где,да,даже,для,до,его,ее,если,есть,еще,же,за,здесь,и,из,или,им,их,к,как,ко,когда,кто,ли,либо,мне,может,мы,на,надо,наш,не,него,нее,нет,ни,них,но,ну,о,об,однако,он,она,они,оно,от,очень,по,под,при,с,со,так,также,такой,там,те,тем,то,того,тоже,той,только,том,ты,у,уже,хотя,чего,чей,чем,что,чтобы,чье,чья,эта,эти,это,я,a,an,and,are,as,at,be,but,by,for,if,in,into,is,it,no,not,of,on,or,such,that,the,their,then,there,thesethey,this,to,was,will,with"
}
}
}

add to field 'name' analyzer 'my_analyzer'

When add new document and name contains 'пм' then error ArrayIndexOutOfBoundsException[null]

ElasticSearch v5.4.2 Install error

When I install in my instance show this error:

root@xxx:/var/www/topvideo# /usr/share/elasticsearch/bin/elasticsearch-plugin install http://dl.bintray.com/content/imotov/elasticsearch-plugins/org/elasticsearch/elasticsearch-analysis-morphology/5.4.2/elasticsearch-analysis-morphology-5.4.2.zip
-> Downloading http://dl.bintray.com/content/imotov/elasticsearch-plugins/org/elasticsearch/elasticsearch-analysis-morphology/5.4.2/elasticsearch-analysis-morphology-5.4.2.zip
[=================================================] 100%  
Exception in thread "main" java.lang.IllegalArgumentException: plugin [analysis-morphology] is incompatible with version [5.4.2]; was designed for version [5.4.1]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:146)
at org.elasticsearch.plugins.InstallPluginCommand.verify(InstallPluginCommand.java:428)
at org.elasticsearch.plugins.InstallPluginCommand.install(InstallPluginCommand.java:495)
at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:215)
at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:199)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122)
at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:69)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122)
at org.elasticsearch.cli.Command.main(Command.java:88)
at org.elasticsearch.plugins.PluginCli.main(PluginCli.java:47)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.