Giter Club home page Giter Club logo

elasticsearch-analysis-greeklish's Introduction

Greeklish Token Filter for ElasticSearch

The Greeklish plugin generates tokens with latin characters from greek tokens.

The generated tokens have the save position and the same offset with the original greek tokens. A detailed example of how to use this plugin you can be found in the wiki

Versions

Greeklish Plugin ElasticSearch Branch
7.5.0 7.5.0 7.5.0
5.4.2.1 5.4.2 5.4.2
5.4.0.1         5.4.0         5.4.0
2.4.4.1         2.4.4         2.4.4
0.11             1.5.0         1.5.0
0.10 0.90.2 -
0.9 0.90.0 -
0.8 0.19.3 -

Installation

Build the plugin

sudo apt-get install maven

cd elasticsearch-analysis-greeklish

mvn package

Installation

sudo bin/elasticsearch-plugin install file:///path/to/plugin.zip

after installation, restart of elasticsearch is required Expansions

There are more than one combinations of latin characters that can substitute each character of the greek alphabet. So, a greek token is expanded to as many greeklish tokens as the the combinations of the latin characters for each greek character of a token and in some cases this produces an enormous number of expansions. In order to prevent this from happening, a threshold of the max expansions is set. The default value is 20.

However, a threshold of the max expansions can be set in the elasticsearch.yml When this threshold is reached the remaining characters are substitute with the most common variant of the greek character.

Example usage:

index:
  analysis:
    filter:
      greeklish_analysis:
        type: greeklish
        max_expansions: 15

Generation of Greek Word Variations

It is difficult to distinguish a greeklish word from an english one during a query. So, if we wanted to stem the greeklish word in order to have the same results for the different forms of this word, we should apply a stemmer in both the greeklish and english words. In order to avoid that, the 0.7 version of the plugin comes with a reverse stemmer for greek words, which produces the different forms of the a greek word (from singular to plural and vice versa) in order to produce their greeklish version.

Now the greeklish word converter has two phases. The first phase produces the diffent forms of a greek word based on some grammar rules, and the second phase produces the greeklish version of each of theses greek words.

This functionality is enabled by default. But, it can be disabled by setting greek_variants variable in the elasticsearch configuration file.

Example usage:

index:
  analysis:
    filter:
      greeklish_analysis:
        type: greeklish
        max_expansions: 15
        greek_variants: false

Warning

This filter acts only on greek lowercase characters and for this reason it should be applied after greek lowercase filter.

elasticsearch-analysis-greeklish's People

Contributors

astathopoulos avatar bandito avatar chief avatar lovemeblender avatar m-peter avatar ssbeefeater avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.