Giter Club home page Giter Club logo

graphknn's Introduction

PyPI version

Graph KNN Python module

Given an undirected graph and a set of terminal (or seed) vertices T, this python package finds, for every vertex, its K nearest neighbors from the set T.

Installation

pip install graphknn

Usage

The main functions are graphknn.algorithm1(W, mask, k) and graphknn.algorithm2(W, mask, k). Both algorithms have the same interface with slightly different implementations. Input:

  • W: n x n matrix of edge weights, of type scipy.sparse.csr_matrix.
  • mask: boolean array of length n indicating which vertices belong to the terminal set T.
  • k: how many nearest neighbors to find for each vertex.

Output:

  • knn: this is an array of size n such that knn[i] is a list of up to k pairs of (distance, terminal_vertex_index). Note that knn[i] is not sorted.

Algorithm 1 is simpler whereas Algorithm 2 has tighter runtime guarantees. We have seen cases where algorithm 1 is faster than algorithm 2 and vice versa, so try both on your data and choose the faster one.

Example

import numpy as np
import scipy.sparse
import graphknn

def build_sparse_undirected_nonnegative_csr_matrix(n):
    W = np.random.random((n,n))
    W = W + W.transpose()
    W[W < 1.5] = np.inf
    return scipy.sparse.csr_matrix(W)


def test_graphknn():
    N = 10
    p = 0.5 
    k = 3
    
    W = build_sparse_undirected_nonnegative_csr_matrix(N)
    mask = np.random.random(N) < p

    print('Graph edges:')
    print(W,'\n')

    print('Terminal indices:')
    print(mask.nonzero()[0], '\n')

    result = graphknn.algorithm1(W, mask, k)

    print('K nearest terminal indices of all vertices:')
    for i in range(len(result)):
        print('result[{0}]:\n{1}'.format(i, sorted(result[i])))

test_graphknn()

Details

A simple solution to the problem of finding the k nearest terminal vertices is to run Dijkstra's algorithm from each of the terminal vertices, forming a |T| by |V| matrix. Then for each vertex i we examine the i-th column of the matrix and pick the k nearest cells (this can be done efficiently using Hoare's selection algorithm). The runtime of this method is O(|T||V|log|V| + |E|). However, this approach is wasteful, since it spends a lot of time finding irrelevant shortest paths from terminals to vertices that are very far from them.

This module implements a faster approach that can be described as performing |T| Dijkstra runs in parallel combined with an early stopping rule that prevents unnecessary traversals. This stopping rule simply stops exploring vertices once we have found shortest paths from k different terminals.

For more details, including a proof of correctness and runtime bounds, see Section 4 and Appendix B of our paper:

Amit Moscovich, Ariel Jaffe, Boaz Nadler Minimax-optimal semi-supervised regression on unknown manifolds, AISTATS (2017).

Please cite our paper if using this code for your research.

graphknn's People

Contributors

mosco avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.