Giter Club home page Giter Club logo

lsh's Introduction

Locality Sensitive Hashing

Greenkeeper badge CircleCI NPM Version

Scalable MinHash computation

Getting Started

Prerequisites

  • node 8 or higher

Installation

npm i @5app/lsh

Usage

  1. Customise the base class for your dataset
const Lsh = require('@5app/lsh')
const B = 10;
const R = 5;

class MyDataLsh extends Lsh {
  constructor (bands = B, height = R) { // set default permutation params
    super(bands, height)
  }

  async getColumnIdSlice ({ cursorId, size, ...custom }) {
    // return a number {size} of ids from cursorId
  }

  async getRowIdSlice ({ cursorId, size, ...custom }) {
    // return a number {size} of ids from cursorId 
  }

  async getRowCount ({...custom }) {
    // return total numbers of rows
  }

  async getShingles ({ columnIds, rowIds, ...custom }) {
    // return Shingles for specified columns and rows
  }
  
  async store ({ index, buckets, data, ...custom }) {
    // store a batch of minhashes and bucket info
    // use data object to store in memory
  }
  
  async finalise ({ blocks, columns, rows, stamp, data }) {
    // ... finalise info lsh storage
    // return report object
  }
  
  static get limit () {
    // return permutation limit
  }
  
  static signature(value, index) {
    // return stringified value
  } 

  static ignore (bucketId) {
    // return whether this bucket is null
  }

  static format (bucketId, index) {
    // return formated bucketId to append to minhash
  }
}

module.exports = MyDataLsh
  1. Compute and compare your minhashes
const MyDataLsh = require('./myDataLsh')
const { compare, getItemMinHash } = require('./myMethods')
const myDataLsh = new MyDataLsh(10, 10)

// ...
  
  // compute and store your items minhash  
  const size = 25 // size of blocks to be computed
  const report = await myDataLsh.run(custom, size)
  
  // ...

  // compare your items minhash
  const [ minHashA, minHashB ] = await Promise.all([
    getItemMinHash(itemA.id),
    getItemMinHash(itemB.id)
  ])
  
  const similarity = compare(minHashA, minHashB)
// ...

Running the tests

npm test

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.