ryanligod / ma-nsw Goto Github PK

Multiple attributes NSW implemented by Golang

Go 83.16% Shell 0.57% Assembly 11.51% Python 4.76%

ma-nsw's Introduction

MA-NSW

MA-NSW is a GO implementation of the multiple attribute HNSW approximate nearest-neighbour search algorithm based on HNSW (Implemented in C++ in https://github.com/searchivarius/nmslib and described in https://arxiv.org/abs/1603.09320)

Usage

Simple usage example. See examples folder for more. Note that both index building and searching can be safely done in parallel with multiple goroutines. You can always extend the index, even while searching.

package main

import (
	hnsw ".."
	"fmt"
	"github.com/grd/stat"
	"math/rand"
	"time"
)

// NUM: Size of training data
var NUM = 3000

// DIMENSION: Dimension of data
var DIMENSION = 32

// TESTNUM: Size of query data
var TESTNUM = 10

func main() {

	const (
		M              = 16
		efConstruction = 400
		efSearch       = 400
		K              = 100
		distType       = "l2" // l2 or cosine
	)

	var zero hnsw.Point = make([]float32, DIMENSION)

	h := hnsw.New(M, efConstruction, zero, distType)
	h.Grow(NUM)

	provinces := []string{"blue", "red", "green", "yellow"}
	types := []string{"sky", "land", "sea"}
	titles := []string{"boy", "girl"}

	for i := 1; i <= NUM; i++ {
		randomAttr := []string{provinces[rand.Intn(4)], types[rand.Intn(3)], titles[rand.Intn(2)]}
		h.Add(randomPoint(), uint32(i), randomAttr)
		if (i)%1000 == 0 {
			fmt.Printf("%v points added\n", i)
		}
	}

	fmt.Println("Saving index...")
	err := h.Save("test.ind", true)  //true: product mode
	if err != nil {
		panic("Save error!")
	}
	fmt.Println("Done! Loading index...")
	h, timestamp, _ := hnsw.Load("test.ind")
	fmt.Printf("Index loaded, time saved was %v\n", time.Unix(timestamp, 0))

	fmt.Printf("Now searching with HNSW...\n")
	timeRecord := make([]float64, TESTNUM)
	hits := 0
	queries := make([]hnsw.Point, TESTNUM)
	truth := make([][]uint32, TESTNUM)
	// start := time.Now()
	for i := 0; i < TESTNUM; i++ {
		searchAttr := []string{provinces[rand.Intn(4)], types[rand.Intn(3)], titles[rand.Intn(2)]}
		fmt.Printf("Generating queries and calculating true answers using bruteforce search...\n")

		queries[i] = randomPoint()
		resultTruth := h.SearchBrute(queries[i], K, searchAttr)
		truth[i] = make([]uint32, K)
		for j := K - 1; j >= 0; j-- {
			item := resultTruth.Pop()
			truth[i][j] = item.ID
		}

		startSearch := time.Now()
		result := h.Search(queries[i], efSearch, K, searchAttr)
		//result := h.Search(queries[i], efSearch, K, []string{"nil", "nil", "nil"})
		stopSearch := time.Since(startSearch)
		timeRecord[i] = stopSearch.Seconds() * 1000
		fmt.Print("Searching with attributes:")
		fmt.Println(searchAttr)
		if result.Size != 0 {
			for j := 0; j < K; j++ {
				item := result.Pop()
				fmt.Printf("%v  ", item)
				if item != nil {
					fmt.Println(h.GetNodeAttr(item.ID))
					for k := 0; k < K; k++ {
						if item.ID == truth[i][k] {
							hits++
						}
					}
				}
			}
		} else {
			fmt.Println("Can't return any node")
		}

		fmt.Println()
	}

	// stop := time.Since(start)

	data := stat.Float64Slice(timeRecord)
	mean := stat.Mean(data)
	variance := stat.Variance(data)

	fmt.Printf("Mean of queries time(MS): %v\n", mean)
	fmt.Printf("Variance of queries time: %v\n", variance)
	fmt.Printf("%v queries / second (single thread)\n", 1000.0/mean)
	fmt.Printf("Average 10-NN precision: %v\n", float64(hits)/(float64(TESTNUM)*float64(K)))
	fmt.Printf("\n")
	fmt.Printf(h.Stats())
}

func randomPoint() hnsw.Point {
	var v hnsw.Point = make([]float32, DIMENSION)
	for i := range v {
		v[i] = rand.Float32()
	}
	return v
}

ma-nsw's People

Contributors

Stargazers

Watchers

ma-nsw's Issues

Multi-attribute spaces: Calibration for attribute fusion and similarity search

https://ieeexplore.ieee.org/abstract/document/6248021
Abstract:
Recent work has shown that visual attributes are a powerful approach for applications such as recognition, image description and retrieval. However, fusing multiple attribute scores - as required during multi-attribute queries or similarity searches - presents a significant challenge. Scores from different attribute classifiers cannot be combined in a simple way; the same score for different attributes can mean different things. In this work, we show how to construct normalized “multi-attribute spaces” from raw classifier outputs, using techniques based on the statistical Extreme Value Theory. Our method calibrates each raw score to a probability that the given attribute is present in the image. We describe how these probabilities can be fused in a simple way to perform more accurate multiattribute searches, as well as enable attribute-based similarity searches. A significant advantage of our approach is that the normalization is done after-the-fact, requiring neither modification to the attribute classification system nor ground truth attribute annotations. We demonstrate results on a large data set of nearly 2 million face images and show significant improvements over prior work. We also show that perceptual similarity of search results increases by using contextual attributes.

TODO 多线程支持修复

Approximate nearest neighbor algorithm based on navigable small world graphs

https://www.sciencedirect.com/science/article/pii/S0306437913001300#bib8
**We propose a novel approach to solving the approximate k-nearest neighbor search problem in metric spaces. The search structure is based on a navigable small world graph with vertices corresponding to the stored elements, edges to links between them, and a variation of greedy algorithm for searching. The navigable small world is created simply by keeping old Delaunay graph approximation links produced at the start of construction. The approach is very universal, defined in terms of arbitrary metric spaces and at the same time it is very simple. The algorithm handles insertions in the same way as queries: by finding approximate neighbors for the inserted element and connecting it to them. Both search and insertion can be done in parallel requiring only local information from the structure. The structure can be made distributed. The accuracy of the probabilistic k-nearest neighbor queries can be adjusted without rebuilding the structure.

The performed simulation for data in the Euclidean spaces shows that the structure built using the proposed algorithm has small world navigation properties with log2(n) insertion and search complexity at fixed accuracy, and performs well at high dimensionality. Simulation on a CoPHiR dataset revealed its high efficiency in case of large datasets (more than an order of magnitude less metric computations at fixed recall) compared to permutation indexes. Only 0.03% of the 10 million 208-dimensional vector dataset is needed to be evaluated to achieve 0.999 recall (virtually exact search). For recall 0.93 processing speed 2800 queries/s can be achieved on a dual Intel X5675 Xenon server node with Java implementation.**

Graph based Nearest Neighbor Search: Promises and Failures

https://arxiv.org/abs/1904.02077
Recently, graph based nearest neighbor search gets more and more popular on large-scale retrieval tasks. The attractiveness of this type of approaches lies in its superior performance over most of the known nearest neighbor search approaches as well as its genericness to various metrics. In this paper, the role of two strategies, namely hierarchical structure and graph diversification that are adopted as the key steps in the graph based approaches, is investigated. We find the hierarchical structure could not achieve "much better logarithmic complexity scaling" as it was claimed in the original paper, particularly on high dimensional cases. Moreover, we find that similar high search speed efficiency as the one with hierarchical structure could be achieved with the support of flat k-NN graph after graph diversification. Finally, we point out the difficulty, that is faced by most of the graph based search approaches, is directly linked to "curse of dimensionality".

A robust multi-attribute search structure

https://ieeexplore.ieee.org/abstract/document/47229/
A multiattribute index structure called the hB-tree is introduced. The hB-tree internode search and growth processes are precisely analogous to the corresponding processes in B-trees. The intranode processes are unique. A k-d tree is used as the structure within nodes for very efficient searching. Node splitting requires that this k-d tree be split. This produces nodes which do not represent brick-like regions in k-space but that can be characterized as holey bricks, i.e. bricks in which subregions have been extracted. Results are presented that guarantee hB-tree users decent storage utilization, reasonable-size index terms, and good search and insert performance regardless of key distribution.

TODO 多层入口点

优化入口点的选择。

ryanligod / ma-nsw Goto Github PK

ma-nsw's Introduction

MA-NSW

Usage

ma-nsw's People

Contributors

Stargazers

Watchers

ma-nsw's Issues

Multi-attribute spaces: Calibration for attribute fusion and similarity search

TODO 多线程支持修复

Approximate nearest neighbor algorithm based on navigable small world graphs

Graph based Nearest Neighbor Search: Promises and Failures

A robust multi-attribute search structure

TODO 多层入口点

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent