Giter Club home page Giter Club logo

tree-vp's Introduction

Name

Tree::VP - Vantage-Point Tree builder and searcher.

Synopsis

A spellchecker.

my @words = read_file("/usr/share/dict/words", { chomp => 1, binmode => ":utf8" });
my $vptree = Tree::VP->new( distance => \&Text::Levenshtein::XS::distance );
$vptree->build(\@words);

my $r = $vptree->search(query => "amstedam", size => 5);
say "suggestion: " . join " ", map { $_ . " (" . distance($_, $q) . ")" } @{$r->{values}};

Methods

  • new( distance => sub { ... })

    Construct the top-level tree object. The distance function must be able to calculate the distance between any 2 values in the ArrayRef passed to build method. It must return a number range from 0 to Inf. The number "0" meaning that the 2 values are the same, and larger number means that the given 2 values are further away in space.

  • build( ArrayRef[ Val ] )

    Take a ArrayRef of values of whatever type that can be handled by the distance function, and build the tree structure.

  • search( query => Val, size => Int )

    Take a "query", which is just a value of whatever type contained in the tree. And return HashRef that contains the results of top-K nearest nodes according to the distance function. size means the the upper-bound of result size.

  • tree() (a public attribute)

    This points to the underlying tree data structure, which is an instance of Tree::DAG_Node . Since the creation process of VP trees is expensive, it is desired to be able to store the tree structure and re-use the stored state. To achieve this, do something like this:

      # Storing
      my $vptree = Tree::VP->new( distance => \&distance );
      $vptree->build(\@words);
      write_file("/db/tree_stored.db", freeze($vptree->tree));
    
      # Loading and use
      my $tree =  unfreeze(read_file("/db/tree_stored.db"));
      my $vptree = Tree::VP->new( tree => $tree, distance => \&distance );
      $vptree->search(...);
    

    Since we use Tree::DAG_Node objects, the freeze and unfreeze subroutine here needs be able to serealize and unserealize perl objects. Sereal is a good choice, but basically any subroutines that can convert Tree::DAG_Node objects to string and back, can be used. Obviously, the distance function must be the same in order to produce valid response.

See Also

http://en.wikipedia.org/wiki/Vantage-point_tree

Author

Kang-min Liu [email protected]

License

The MIT License.

tree-vp's People

Contributors

gugod avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.