Giter Club home page Giter Club logo

fts-elastic's Introduction

fts-elastic

fts-elastic is a Dovecot full-text search indexing plugin that uses ElasticSearch as a backend.

Dovecot communicates to ES using HTTP/JSON queries. It supports automatic indexing and searching of e-mail. For mailboxes with more than 10000 messages it uses elastic scroll API.

Requirements

  • Dovecot 2.2+
  • JSON-C
  • ElasticSearch 6.x, 7.x
  • Autoconf 2.53+

Compiling

This plugin needs to compile against the Dovecot source for the version you intend to run it on. A dovecot-devel package is unfortunately insufficient as it does not include the required fts API header files.

You can provide the path to your source tree by passing --with-dovecot= to ./configure.

Install dependencies

# sudo apt install dovecot
sudo apt install gcc make libjson-c-dev dovecot-dev

An example build may look like:

./autogen.sh
./configure --with-dovecot=/usr/lib/dovecot/
make
make install
  sudo ln -s /usr/lib/dovecot/lib21_fts_elastic_plugin.so /usr/lib/dovecot/modules/lib21_fts_elastic_plugin.so

Configuration

Create /etc/dovecot/conf.d/90-fts.conf with content:

mail_plugins = $mail_plugins fts fts_elastic

plugin {
  fts = elastic
  fts_elastic = debug url=http://localhost:9200/m/ bulk_size=5000000 refresh=fts rawlog_dir=/var/log/fts-elastic/

# no indexes new emails when user make search
# yes indexes every email when delivered
  fts_autoindex = no
fts_autoindex_exclude = \Junk
fts_autoindex_exclude2 = \Trash
}

and (re)start dovecot:

dovecot stop; dovecot
  • url=<elasticsearch url> Required elastic URL with index name, must end with slash /
  • bulk_size=<positive integer> How large bulk requests we want to send to elastic in bytes (default=5000000)
  • refresh={fts,index,never} When you want to refresh elastic index so new emails will be searchable
    • fts: when dovecot fts plugin calls it (typically before search)
    • index: after each bulk update using ?refrest=true query param (create not effective indexes when combined with fts_autoindex=yes)
    • never: leave it to elastic, indexed emails may not be searchable immediately
  • debug Enables HTTP debugging
  • rawlog_dir is directory where HTTP communication with elasticsearch server is written (useful for debugging plugin or elastic schema)

ElasticSearch index

This plugin stores all message in one elastic index. You can use sharding to support large numbers of users. Since it uses routing key, updates and searches are accessing only one shard. _id is in the form "_id":"uid/mbox-guid/user@domain", example: "_id":"3/f40efa2f8f44ad54424000006e8130ae/[email protected]"

You can setup index mapping on Elasticsearch 6.x with command

curl -X PUT "http://elasticIP:9200/m?pretty" -H 'Content-Type: application/json' -d "@elastic6-schema.json"

on Elasticsearch 7.x there is different date format parser, you need to use different schema:

curl -X PUT "http://elasticIP:9200/m?pretty" -H 'Content-Type: application/json' -d "@elastic7-schema.json"

Fields box and user needs to be keyword fields, as you can see in file elastic-schema.json. In our schema there is _source enabled because we don't see much storage savings when _source is disabled and elastic documentation doesn't recommend it either. This plugin doesn't use _source. It explicitly disables it in response queries, but you can use it for better management and insight to indexed emails or when you want to use elastic for other than dovecot fts (analysis, spammers detection, ...). In case of elastic reindexing _source will be needed.

Any time you can reindex users mailbox with doveadm commands;

doveadm fts rescan -u [email protected]
doveadm index -u user@domain -q '*'

An example of pushed document:

{
  "user": "[email protected]",
  "box": "f40efa2f8f44ad54424000006e8130ae",
  "uid": 3,
  "date": "Thu, 08 Jan 2015 00:20:05 +0000",
  "from": "josh <[email protected]>",
  "sender": "Filip Hanes",
  "to": "<[email protected]>",
  "cc": "User <[email protected]>",
  "bcc": "\"Test User\" <[email protected]>",
  "subject": "Test #3",
  "message-id": "<[email protected]>",
  "body": "This is the body of test #3.\n"
}

An example search:

curl -X POST "http://elasticIP:9200/m/_search?pretty" -H 'Content-Type: application/json' -d '
{
  "query": {
    "bool": {
      "filter": [
        {"term": {"user": "[email protected]"}},
        {"term": {"box": "f40efa2f8f44ad54424000006e8130ae"}}
      ],
      "must": [
        {
          "multi_match": {
            "query": "test",
            "operator": "and",
            "fields": ["from","to","cc","bcc","sender","subject","body"]
          }
        }
      ]
    }
  },
  "size": 100
}
'

TODO

Thanks

This plugin borrows heavily from dovecot itself particularly for the automatic detection of dovecont-config (see m4/dovecot.m4). The fts-solr and fts-squat plugins were also used as reference material for understanding the Dovecot FTS API. FTS-lucene was used as reference for implementing proper rescan.

fts-elastic's People

Contributors

alpianon avatar atkinsj avatar bubu avatar filiphanes avatar infernix avatar

Watchers

 avatar  avatar

Forkers

huy-ngo trangnth

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.