Giter Club home page Giter Club logo

learning-to-index-for-nearest-neighbor-search's Introduction

Learning-to-Index-for-Nearest-Neighbor-Search

Chih-Yi Chiu, Member, IEEE, Amorntip Prayoonwong, and Yin-Chih Liao

Abstract

In this study, we present a novel ranking model based on learning the nearest neighbor relationships embedded in the index space. Given a query point, a conventional nearest neighbor search approach calculates the distances to the cluster centroids, before ranking the clusters from near to far based on the distances. The data indexed in the top-ranked clusters are retrieved and treated as the nearest neighbor candidates for the query. However, the loss of quantization between the data and cluster centroids will inevitably harm the search accuracy. To address this problem, the proposed model ranks clusters based on their nearest neighbor probabilities rather than the query-centroid distances to the query. The nearest neighbor probabilities are estimated by employing neural networks to characterize the neighborhood relationships as a nonlinear function, i.e., the density distribution of nearest neighbors with respect to the query. The proposed probability-based ranking model can replace the conventional distance-based ranking model as a coarse filter for candidate clusters, and the nearest neighbor probability can be used to determine the data quantity to be retrieved from the candidate cluster. Our experimental results demonstrated that implementation of the proposed ranking model for two state-of-the-art nearest neighbor quantization and search methods could boost the search performance effectively in billion-scale datasets.

Index Terms—cluster reranking, hash-based indexing, nearest neighbor distribution probability, optimized product quantization, residual vector quantization.

Full Paper

Our full paper is now available online.

View full paper: https://arxiv.org/pdf/1807.02962.pdf

The 1000 NNs in the ground truth of DEEP1B

The 1000 NNs in the ground truth of DEEP1B is available to download from:

https://drive.google.com/file/d/1FMV-oQadE6zUZUt6-WYYRl3beoGoERZJ/view?usp=sharing

Source Code

Our source codes and data for SIFT1B RVQ 4096x4096 is available at:

https://drive.google.com/drive/folders/1LyDFygrQff2DTm0XUwRCAgjgdUin81TQ?usp=sharing

These codes are in the coarse search process. We simulate how to retrieve the candidate second-level clusters for SIFT1B RVQ before performing the ADC. Then, we calculate the top-1 recall along with the candidate second-level clusters.

Only source codes are also provided here : The main procedure : https://github.com/AmorntipPrayoonwong/Learning-to-Index-for-Nearest-Neighbor-Search/blob/master/main_experiment.py

The sub procedures are in the library folder lib : https://github.com/AmorntipPrayoonwong/Learning-to-Index-for-Nearest-Neighbor-Search/tree/master/lib

learning-to-index-for-nearest-neighbor-search's People

Contributors

amorntipprayoonwong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.