Giter Club home page Giter Club logo

ma-nsw's Introduction

MA-NSW

MA-NSW is a GO implementation of the multiple attribute HNSW approximate nearest-neighbour search algorithm based on HNSW (Implemented in C++ in https://github.com/searchivarius/nmslib and described in https://arxiv.org/abs/1603.09320)

Usage

Simple usage example. See examples folder for more. Note that both index building and searching can be safely done in parallel with multiple goroutines. You can always extend the index, even while searching.

package main

import (
	hnsw ".."
	"fmt"
	"github.com/grd/stat"
	"math/rand"
	"time"
)

// NUM: Size of training data
var NUM = 3000

// DIMENSION: Dimension of data
var DIMENSION = 32

// TESTNUM: Size of query data
var TESTNUM = 10

func main() {

	const (
		M              = 16
		efConstruction = 400
		efSearch       = 400
		K              = 100
		distType       = "l2" // l2 or cosine
	)

	var zero hnsw.Point = make([]float32, DIMENSION)

	h := hnsw.New(M, efConstruction, zero, distType)
	h.Grow(NUM)

	provinces := []string{"blue", "red", "green", "yellow"}
	types := []string{"sky", "land", "sea"}
	titles := []string{"boy", "girl"}

	for i := 1; i <= NUM; i++ {
		randomAttr := []string{provinces[rand.Intn(4)], types[rand.Intn(3)], titles[rand.Intn(2)]}
		h.Add(randomPoint(), uint32(i), randomAttr)
		if (i)%1000 == 0 {
			fmt.Printf("%v points added\n", i)
		}
	}

	fmt.Println("Saving index...")
	err := h.Save("test.ind", true)  //true: product mode
	if err != nil {
		panic("Save error!")
	}
	fmt.Println("Done! Loading index...")
	h, timestamp, _ := hnsw.Load("test.ind")
	fmt.Printf("Index loaded, time saved was %v\n", time.Unix(timestamp, 0))

	fmt.Printf("Now searching with HNSW...\n")
	timeRecord := make([]float64, TESTNUM)
	hits := 0
	queries := make([]hnsw.Point, TESTNUM)
	truth := make([][]uint32, TESTNUM)
	// start := time.Now()
	for i := 0; i < TESTNUM; i++ {
		searchAttr := []string{provinces[rand.Intn(4)], types[rand.Intn(3)], titles[rand.Intn(2)]}
		fmt.Printf("Generating queries and calculating true answers using bruteforce search...\n")

		queries[i] = randomPoint()
		resultTruth := h.SearchBrute(queries[i], K, searchAttr)
		truth[i] = make([]uint32, K)
		for j := K - 1; j >= 0; j-- {
			item := resultTruth.Pop()
			truth[i][j] = item.ID
		}

		startSearch := time.Now()
		result := h.Search(queries[i], efSearch, K, searchAttr)
		//result := h.Search(queries[i], efSearch, K, []string{"nil", "nil", "nil"})
		stopSearch := time.Since(startSearch)
		timeRecord[i] = stopSearch.Seconds() * 1000
		fmt.Print("Searching with attributes:")
		fmt.Println(searchAttr)
		if result.Size != 0 {
			for j := 0; j < K; j++ {
				item := result.Pop()
				fmt.Printf("%v  ", item)
				if item != nil {
					fmt.Println(h.GetNodeAttr(item.ID))
					for k := 0; k < K; k++ {
						if item.ID == truth[i][k] {
							hits++
						}
					}
				}
			}
		} else {
			fmt.Println("Can't return any node")
		}

		fmt.Println()
	}

	// stop := time.Since(start)

	data := stat.Float64Slice(timeRecord)
	mean := stat.Mean(data)
	variance := stat.Variance(data)

	fmt.Printf("Mean of queries time(MS): %v\n", mean)
	fmt.Printf("Variance of queries time: %v\n", variance)
	fmt.Printf("%v queries / second (single thread)\n", 1000.0/mean)
	fmt.Printf("Average 10-NN precision: %v\n", float64(hits)/(float64(TESTNUM)*float64(K)))
	fmt.Printf("\n")
	fmt.Printf(h.Stats())
}

func randomPoint() hnsw.Point {
	var v hnsw.Point = make([]float32, DIMENSION)
	for i := range v {
		v[i] = rand.Float32()
	}
	return v
}

ma-nsw's People

Contributors

ryanligod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ma-nsw's Issues

Multi-attribute spaces: Calibration for attribute fusion and similarity search

https://ieeexplore.ieee.org/abstract/document/6248021
Abstract:
Recent work has shown that visual attributes are a powerful approach for applications such as recognition, image description and retrieval. However, fusing multiple attribute scores - as required during multi-attribute queries or similarity searches - presents a significant challenge. Scores from different attribute classifiers cannot be combined in a simple way; the same score for different attributes can mean different things. In this work, we show how to construct normalized “multi-attribute spaces” from raw classifier outputs, using techniques based on the statistical Extreme Value Theory. Our method calibrates each raw score to a probability that the given attribute is present in the image. We describe how these probabilities can be fused in a simple way to perform more accurate multiattribute searches, as well as enable attribute-based similarity searches. A significant advantage of our approach is that the normalization is done after-the-fact, requiring neither modification to the attribute classification system nor ground truth attribute annotations. We demonstrate results on a large data set of nearly 2 million face images and show significant improvements over prior work. We also show that perceptual similarity of search results increases by using contextual attributes.

Approximate nearest neighbor algorithm based on navigable small world graphs

https://www.sciencedirect.com/science/article/pii/S0306437913001300#bib8
**We propose a novel approach to solving the approximate k-nearest neighbor search problem in metric spaces. The search structure is based on a navigable small world graph with vertices corresponding to the stored elements, edges to links between them, and a variation of greedy algorithm for searching. The navigable small world is created simply by keeping old Delaunay graph approximation links produced at the start of construction. The approach is very universal, defined in terms of arbitrary metric spaces and at the same time it is very simple. The algorithm handles insertions in the same way as queries: by finding approximate neighbors for the inserted element and connecting it to them. Both search and insertion can be done in parallel requiring only local information from the structure. The structure can be made distributed. The accuracy of the probabilistic k-nearest neighbor queries can be adjusted without rebuilding the structure.

The performed simulation for data in the Euclidean spaces shows that the structure built using the proposed algorithm has small world navigation properties with log2(n) insertion and search complexity at fixed accuracy, and performs well at high dimensionality. Simulation on a CoPHiR dataset revealed its high efficiency in case of large datasets (more than an order of magnitude less metric computations at fixed recall) compared to permutation indexes. Only 0.03% of the 10 million 208-dimensional vector dataset is needed to be evaluated to achieve 0.999 recall (virtually exact search). For recall 0.93 processing speed 2800 queries/s can be achieved on a dual Intel X5675 Xenon server node with Java implementation.**

Graph based Nearest Neighbor Search: Promises and Failures

https://arxiv.org/abs/1904.02077
Recently, graph based nearest neighbor search gets more and more popular on large-scale retrieval tasks. The attractiveness of this type of approaches lies in its superior performance over most of the known nearest neighbor search approaches as well as its genericness to various metrics. In this paper, the role of two strategies, namely hierarchical structure and graph diversification that are adopted as the key steps in the graph based approaches, is investigated. We find the hierarchical structure could not achieve "much better logarithmic complexity scaling" as it was claimed in the original paper, particularly on high dimensional cases. Moreover, we find that similar high search speed efficiency as the one with hierarchical structure could be achieved with the support of flat k-NN graph after graph diversification. Finally, we point out the difficulty, that is faced by most of the graph based search approaches, is directly linked to "curse of dimensionality".

A robust multi-attribute search structure

https://ieeexplore.ieee.org/abstract/document/47229/
A multiattribute index structure called the hB-tree is introduced. The hB-tree internode search and growth processes are precisely analogous to the corresponding processes in B-trees. The intranode processes are unique. A k-d tree is used as the structure within nodes for very efficient searching. Node splitting requires that this k-d tree be split. This produces nodes which do not represent brick-like regions in k-space but that can be characterized as holey bricks, i.e. bricks in which subregions have been extracted. Results are presented that guarantee hB-tree users decent storage utilization, reasonable-size index terms, and good search and insert performance regardless of key distribution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.