Giter Club home page Giter Club logo

bleve's Introduction

bleve bleve

Tests Coverage Status GoDoc Join the chat at https://gitter.im/blevesearch/bleve codebeat Go Report Card Sourcegraph License

A modern text indexing library in go

Features

  • Index any go data structure (including JSON)
  • Intelligent defaults backed up by powerful configuration
  • Supported field types:
    • text, number, datetime, boolean, geopoint, geoshape, IP, vector
  • Supported query types:
  • tf-idf Scoring
  • Query time boosting
  • Search result match highlighting with document fragments
  • Aggregations/faceting support:
    • Terms Facet
    • Numeric Range Facet
    • Date Range Facet

Indexing

message := struct{
	Id   string
	From string
	Body string
}{
	Id:   "example",
	From: "[email protected]",
	Body: "bleve indexing is easy",
}

mapping := bleve.NewIndexMapping()
index, err := bleve.New("example.bleve", mapping)
if err != nil {
	panic(err)
}
index.Index(message.Id, message)

Querying

index, _ := bleve.Open("example.bleve")
query := bleve.NewQueryStringQuery("bleve")
searchRequest := bleve.NewSearchRequest(query)
searchResult, _ := index.Search(searchRequest)

Command Line Interface

To install the CLI for the latest release of bleve, run:

$ go install github.com/blevesearch/bleve/v2/cmd/bleve@latest
$ bleve --help
Bleve is a command-line tool to interact with a bleve index.

Usage:
  bleve [command]

Available Commands:
  bulk        bulk loads from newline delimited JSON files
  check       checks the contents of the index
  count       counts the number documents in the index
  create      creates a new index
  dictionary  prints the term dictionary for the specified field in the index
  dump        dumps the contents of the index
  fields      lists the fields in this index
  help        Help about any command
  index       adds the files to the index
  mapping     prints the mapping used for this index
  query       queries the index
  registry    registry lists the bleve components compiled into this executable
  scorch      command-line tool to interact with a scorch index

Flags:
  -h, --help   help for bleve

Use "bleve [command] --help" for more information about a command.

Text Analysis

Bleve includes general-purpose analyzers (customizable) as well as pre-built text analyzers for the following languages:

Arabic (ar), Bulgarian (bg), Catalan (ca), Chinese-Japanese-Korean (cjk), Kurdish (ckb), Danish (da), German (de), Greek (el), English (en), Spanish - Castilian (es), Basque (eu), Persian (fa), Finnish (fi), French (fr), Gaelic (ga), Spanish - Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Armenian (hy), Indonesian (id, in), Italian (it), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Swedish (sv), Turkish (tr)

Text Analysis Wizard

bleveanalysis.couchbase.com

Discussion/Issues

Discuss usage/development of bleve and/or report issues here:

License

Apache License Version 2.0

bleve's People

Contributors

a-little-srdjan avatar abhinavdangeti avatar amnonbc avatar avsej avatar bcampbell avatar cascadingradium avatar deoxxa avatar dtylman avatar dtynn avatar ethantkoenig avatar gsathya avatar ikawaha avatar indraniel avatar iredmail avatar metonymic-smokey avatar moshaad7 avatar mschoch avatar pavelbazika avatar pmezard avatar robmccoll avatar rvncerr avatar sacheendra avatar saljam avatar shugyousha avatar slavikm avatar sreekanth-cb avatar steveyen avatar thejas-bhat avatar tuomassalo avatar tylerkovacs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bleve's Issues

use protobufs to encode index values

While we can't use them for the index keys which we craft to get the desired sort order, we should use protobufs to encode the index values. This will make the binary serialization/deserialization less error prone, more compact, and easier to evolve over time.

create initial wiki pages

initial wiki pages for:

  • building bleve with all the c libraries
  • creating a custom mapping
  • one for each type of field
  • one for each type of analyzer/character filter/tokenizer/token filter
    • these can be stubs initially, but serve as place holders for adding information over time
  • one for each type of query

change back index entries to just contain list of keys

Currently the back index contains 2 separate lists of more strongly typed data. This should be changes to just a flat list of keys. This will make it easier to introduce new index row types in the future without having to keep updating the way the back index works.

support for facet queries

Initial implementation should just operate at query time.

If we swap the field and term order in the index key we can support faceting at query time. For every document satisfying the original query, we can look up the document in the back index, and find entries for the field that is being faceted. Seems like we don't even have to load that key, just be able to parse the field id and terms. For categorical facets the terms are bucketed and counted. For numerical range facets the parsed terms are bucketed and counted. The top-N facets are then returned with the query results.

index term entry should be able to include hierarchical position data

Currently index term entries are:

't'

Would like to add support for also storing the position of this term in any arrays that were a part of the field path.

Not 100% decided that this must be in the key, but that would be the only way to have some hope of efficiently querying on this information.

The idea is to be able to further qualify queries and say that in addition to other query criteria, matching items must occur in the same parent element.

Consider the following documents in an index.

{
  "name": "a",
  "children": [
      {
          "name": "c",
          "age": 25
     },
      {
          "name": "d",
          "age": 15
     },
}
{
  "name": "b",
  "children": [
      {
          "name": "c",
          "age": 15
     },
      {
          "name": "d",
          "age": 25
     },
}

Logically we want to query:
child.name = "c" AND child.age < 20 AND same child

Both documents have a child named "c" and a child who's age is less than 25, but ONLY "b" satisfies both criteria in the same child.

The implementation idea is to include the position in the children array, and the query criteria "same child" is accomplished by verifying that matching items have the same value.

ngram filter

options min length, max length
for each input token, compute ngram tokens based on parameters, emit all resulting tokens

create top-level api for indexing

The top-level bleve package should be all one needs to import to achieve the following:

  • create new index
  • open existing index
  • create new default mapping
  • modify default mapping into custom mapping
  • index document/object

cjk_width filter

  • fold fullwidth ASCII variants into the equivalent basic Latin
  • fold halfwidth Katakana variants into the equivalent Kana

See also full ICU analysis

keyword filter

ability to tag words as keywords
keywords should then be ignored by the stemmer

edge ngram filter

options min length, max length, side (front/back)
for each input token, compute ngram tokens based on parameters, emit all resulting tokens

support prefix search

Two modes:

  1. Return terms which start with this prefix
  2. Return documents which contain a term starting with this prefix

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.