Giter Club home page Giter Club logo

nilsimsa's Introduction

Build Status Latest Version License

##What is this? The nilsimsa module is an implementation of an existing locality-sensitive hashing algorithm designed specifically to handle spam filtering. LSH is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items). This is different from conventional hash functions, such as those used in cryptography, because in this case the goal is to maximize the probability of a collision of similar items rather than avoid collisions.

As per the original description here:

A nilsimsa code is something like a hash, but unlike hashes, a small change in the message results in a small change in the nilsimsa code. Such a function is called a locality-sensitive hash.

##Maven

<dependency>
    <groupId>com.github.rholder</groupId>
    <artifactId>nilsimsa</artifactId>
    <version>1.0.0</version>
</dependency>

##Gradle

compile "com.github.rholder:nilsimsa:1.0.0"

##Quickstart A minimal sample of some of the functionality would look like:

String first  = new Nilsimsa().update("potatoes are the best".getBytes()).toHexDigest();
String second = new Nilsimsa().update("tomatoes are really the best".getBytes()).toHexDigest();
String third  = new Nilsimsa().update("bananas taste pretty good".getBytes()).toHexDigest();

System.out.println(Nilsimsa.compare(first, third));   //   3
System.out.println(Nilsimsa.compare(second, third));  //  -6
System.out.println(Nilsimsa.compare(first, second));  //  53 -- closest match
System.out.println(Nilsimsa.compare(first, first));   // 128 -- exact match

##Building from source The nilsimsa module uses a Gradle-based build system. In the instructions below, ./gradlew is invoked from the root of the source tree and serves as a cross-platform, self-contained bootstrap mechanism for the build. The only prerequisites are Git and JDK 1.6+.

check out sources

git clone git://github.com/rholder/nilsimsa.git

compile and test, build all jars

./gradlew build

install all jars into your local Maven cache

./gradlew install

##License This project is a Java port of py-nilsimsa which is MIT/X11 licensed. The nilsimsa module is released under version 2.0 of the Apache License.

##References

nilsimsa's People

Contributors

rholder avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.