Giter Club home page Giter Club logo

h2_alsh's Introduction

H2_ALSH: Homocentric Hypersphere Asymmetric Locality-Sensitive Hashing

drawing

Introduction

This package provides several LSH-base schemes for k-Maximum Inner Product Search (k-MIPS). It is also an implementation of H2_ALSH from the paper as follows:

Qiang Huang, Guihong Ma, Jianlin Feng, Qiong Fang, and Anthony K. H. Tung. Accurate and Fast
Locality-Sensitive Hashing Scheme for Maximum Inner Product Search. In Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining, pages 1561-1570, 2018.

Compilation

The package requires g++ with c++11 support. To download and compile the code, type:

git clone [email protected]/HuangQiang/H2_ALSH.git
cd H2_ALSH
make

Datasets

We use four real-life datasets Sift, Gist, Netflix, and Yahoo for comparison. The statistics of the datasets are summarized in the following table:

Datasets #Objects #Queries Dimensionality Data Size
Sift 1,000,000 1,000 128 512.0 MB
Netflix 17,770 1,000 300 21.3 MB
Yahoo 624,961 1,000 300 750.0 MB
Gist 1,000,000 1,000 960 3.8 GB

Run Experiments

Usage: alsh [OPTIONS]

This package supports 12 options to evaluate the performance of H2_ALSH, L2_ALSH,
L2_ALSH2, XBOX, Sign_ALSH, Simple_LSH and Linear_Scan for k-MIPS. The parameters
are introduced as follows.

  -alg    integer    options of algorithms (0 - 11)
  -n      integer    cardinality of dataset
  -d      integer    dimensionality of dataset and query set
  -qn     integer    number of queries
  -K      integer    number of hash tables for Sign_ALSH and Simple_LSH
  -m      integer    extra dimension for L2_ALSH, L2_ALSH2, and Sign_ALSH
  -U      float      a value in (0,1] for L2_ALSH, L2_ALSH2, and Sign_ALSH
  -c0     float      approximation ratio for NN Search (c0 > 1)
  -c      float      approximation ratio for MIP Search (0 < c < 1)
  -ds     string     address of data  set
  -qs     string     address of query set
  -ts     string     address of truth set
  -op     string     output path

We provide all scripts to repeat all experiments reported in SIGKDD 2018. A quick example is shown as follows (run H2_ALSH on Mnist):

./alsh -alg 1 -n 60000 -qn 1000 -d 50 -c0 2.0 -c 0.5 -ds data/Mnist/Mnist.ds -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.mip -op results/Mnist/

If you would like to get more information to run other algorithms, please check the scripts in the package. When you run the package, please ensure that the path for the dataset, query set, and truth set is correct. Since the package will automatically create folder for the output path, please keep the path as short as possible.

Related Publication

If you use this package for publications, please cite the paper as follows.

@inproceedings{huang2018accurate,
    title={Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search}
    author={Huang, Qiang and Ma, Guihong and Feng, Jianlin and Fang, Qiong and Tung, Anthony KH},
    booktitle={Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
    pages={1561--1570},
    year={2018},
    organization={ACM}
}

h2_alsh's People

Contributors

huangqiang avatar submission240 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

h2_alsh's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.