Giter Club home page Giter Club logo

levi's Introduction

Levi

Stream based full-text search for Node.js and browsers. Using LevelDB as storage backend.

Build Status

npm install levi

Full-text search using TF-IDF and cosine similarity plus query-time field boost options. Provided with configurable text processing pipeline: Tokenizer, Porter Stemmer and Stopwords filter.

Levi is built on LevelUP - a fast, asynchronous, transactional storage interface. By default, it uses LevelDB on Node.js and IndexedDB on browser. Also works with a variety of LevelDOWN compatible backends.

Using stream based query mechanism with Highland, Levi is designed to be memory efficient, and extensible by combining multiple scoring mechanisms.

API

levi(path, [options])

levi(db, [options])

Create a new Levi instance with a LevelUP database path or instance, or with a SublevelUP section.

var levi = require('levi')

// levi instance of database path `db`
var lv = levi('db') 
.use(levi.tokenizer())
.use(levi.stemmer())
.use(levi.stopword())

Text processing pipeline levi.tokenizer(), levi.stemmer(), levi.stopword() are required for indexing. These are exposed as ginga plugins so that they can be swapped for different language configurations.

.put(key, value, [options], [callback])

Index document identified by key. value can be object or string. Use object fields for value if you want field boost options for search.

All fields are indexed by default. Set options.fields object to specify fields to be indexed.

Accepts optional callback function or returns a promise.

// string as value
lv.put('a', 'Lorem Ipsum is simply dummy text.', function (err) { ... })

// object fields as value
lv.put('b', {
  id: 'b',
  title: 'Lorem Ipsum',
  body: 'Dummy text of the printing and typesetting industry.'
}, function (err) { ... })

// options.fields
lv.put('c', {
  id: 'c',
  title: 'Hello World',
  body: 'Bla bla bla'
}, {
  fields: { title: true } // index title only
}).then(...).catch(...) // returns promise if no callback function

.del(key, [options], [callback])

Delete document key from index.

Accepts optional callback function or returns a promise.

.batch(array, [options], [callback])

Atomic bulk-write operations put and del, similar to LevelUP's array form of batch()

Accepts optional callback function or returns a promise.

lv.batch([
  { type: 'put', key: 'a', value: 'Lorem Ipsum is simply dummy text.' },
  { type: 'del', key: 'b' }
], function (err) { ... })

.get(key, [options], [callback])

Fetch value from the store. Works exactly like LevelUP's get()

Accepts optional callback function or returns a promise.

.readStream([options])

Obtain a ReadStream of documents, lexicographically sorted by key. Works exactly like LevelUP's readStream()

.searchStream(query, [options])

The main search interface of Levi is a Node compatible highland object stream. query can be a string or object fields.

Accepts following options:

  • fields control field boosts. By default every fields weight equally.
  • gt (greater than), gte (greater than or equal) define the lower bound of key range to be searched.
  • lt (less than), lte (less than or equal) define the upper bound of key range to be searched.
  • offset number, offset results. Default 0.
  • limit number, limit number of results. Default infinity.
  • expansions number, maximum expansions of prefix matching for "search as you type" behaviour. Default 0.

A "more like this" query can be done by searching with document itself.

lv.searchStream('lorem ipsum').toArray(function (results) { ... }) // highland method

lv.searchStream('lorem ipsum', {
  fields: { title: 10, '*': 1 } // title field boost. '*' means any field
}).pipe(...)

lv.searchStream('lorem ipusm', {
  fields: { title: 1 }, // title only
}).pipe(...)

// ltgt
lv.searchStream('lorem ipusm', {
  gt: '!posts!',
  lt: '!posts!~'
}).pipe(...)

// document as query
lv.searchStream({ 
  title: 'Lorem Ipsum',
  body: 'Dummy text of the printing and typesetting industry.'
}).pipe(...)

// maximum 10 expansions. 'ips' may also match 'ipso', 'ipsum' etc.
lv.searchStream('lorem ips', {
  expansions: 10
}).pipe(...)

result is of form

{
  key: 'b',
  score: 0.5972843431749838,
  value: { 
    id: 'b',
    title: 'Lorem Ipsum',
    body: 'Dummy text of the printing and typesetting industry.'
  } 
}

.scoreStream(query, [options])

Underlying scoring mechanism of searchStream(). Calculates relevancy score of documents against query, lexicographically sorted by key. Accepts options fields, gt, gte, lt, lte, expansions.

Useful for combining multiple criteria or scoring mechanisms to build a more advanced search functionality.

.pipeline(obj, [callback])

Underlying text processing pipeline of index and query, which extracts text tokens from a serializable obj object.

Accepts optional callback function or returns a promise.

lv.pipeline({
  a: 'foo bar is a placeholder name',
  b: ['foo', 'bar'],
  c: 167,
  d: null,
  e: { ghjk: ['printing'] }
}, function (err, tokens) {
  // tokens
  [ 'foo', 'bar', 'placehold', 'name', 'foo', 'bar', 'print' ]
})

levi.destroy(path, [callback])

Completely remove an existing database at path, which deletes the database directory on Node.js or deletes the IndexedDB database on browser.

If you are using a custom Level backend, you need to invoke its corresponding destroy() function to remove database properly.

Accepts optional callback function or returns a promise.

License

MIT

levi's People

Contributors

cshum avatar

Watchers

James Cloos avatar James Drew avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.