Giter Club home page Giter Club logo

trie's Introduction

GoDoc

Trie

Data structure and relevant algorithms for extremely fast prefix/fuzzy string searching.

Usage

Create a Trie with:

t := trie.New()

Add Keys with:

// Add can take in meta information which can be stored with the key.
// i.e. you could store any information you would like to associate with
// this particular key.
t.Add("foobar", 1)

Find a key with:

node, ok := t.Find("foobar")
meta := node.Meta()
// use meta with meta.(type)

Remove Keys with:

t.Remove("foobar")

Prefix search with:

t.PrefixSearch("foo")

Fast test for valid prefix:

t.HasKeysWithPrefix("foo")

Fuzzy search with:

t.FuzzySearch("fb")

Contributing

Fork this repo and run tests with:

go test

Create a feature branch, write your tests and code and submit a pull request.

License

MIT

trie's People

Contributors

arussellk avatar curlup avatar davidsteinsland avatar deepsourcebot avatar derekparker avatar igungor avatar jamierobertsusa avatar jsign avatar lunewcome avatar prologic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trie's Issues

What is mask for?

type Node struct {
	val       rune
	path      string
	term      bool
	depth     int
	meta      interface{}
	mask      uint64
	parent    *Node
	children  map[rune]*Node
	termCount int
}

hello derekparker:
I don't understand the function of the field mask in the Node, could you explain it? And why this trie is so fast. Thank you so much!

FuzzySearch fails on empty string?

Hey, kudos for the good work.

By the way, FuzzySearch fails on empty string input. I will look into your code and see if I can fix it.

package main

import "github.com/derekparker/trie"

func main() {
	t := trie.New()
	t.Add("foo", 1)
	t.FuzzySearch("")
}

Output -

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/derekparker/trie.fuzzycollect(0xc000024080, 0xc000046688, 0x0, 0x20, 0x0, 0x20, 0x20)
	/home/amogh/go/src/github.com/derekparker/trie/trie.go:272 +0x4b0
github.com/derekparker/trie.Trie.FuzzySearch(0xc000024080, 0x1, 0x0, 0x0, 0x4aadf8, 0xc000024180, 0x0)
	/home/amogh/go/src/github.com/derekparker/trie/trie.go:117 +0x96
main.main()
	/home/amogh/go/src/tmp/tmp.go:8 +0xed
exit status 2

High RAM usage

This package is pulled in as a dependency of one of my dependencies, and it's causing unusually high RAM usage. Related issue: https://github.com/prologic/bitcask/issues/67

This looks like a memory leak to me since it's unlikely the garbage collector ran in the short time my program was running, and during that time it managed to slurp up 13 GB of memory.

Two Bug !

bug1: add duplicated key into the trie will incr the size but the key was replaced.
bug2: the meta was not store in the node of last character of key but the new child with val=0x00
`

trie.Add("apple\x00p1", "1") // key contains 0
trie.Add("apple\x00p2", "1") // key contains 0
trie.Add("apple", "1") // after apple inserted, the above 2 keys are dispear.

`

Handle adding distinct meta info for key

Currently there is no way to append meta information for duplicate keys. This should be resolved by storing meta as []interface{}, and append meta for duplicate keys.

defer impact on trie.Add()

I want to share a similar exploration I've made in other project regarding the impact on defer.

I avoid using defer in the Add method with some interesting results. Considering there wasn't a benchmark for Add, I created one. Using this new benchmark, I tested the actual implementation and this new one.

The end results are:

benchmark               old ns/op     new ns/op     delta
BenchmarkAdd/128B-8     55666075      49689240      -10.74%
BenchmarkAdd/256B-8     54631261      50188483      -8.13%
BenchmarkAdd/512B-8     57324361      50005511      -12.77%
BenchmarkAdd/1K-8       55605380      51304633      -7.73%
BenchmarkAdd/2K-8       55761200      50059447      -10.23%
BenchmarkAdd/4K-8       56268353      50719990      -9.86%
BenchmarkAdd/8K-8       55517450      50338050      -9.33%
BenchmarkAdd/16K-8      57047413      50159699      -12.07%
BenchmarkAdd/32K-8      56182766      50119649      -10.79%

benchmark               old allocs     new allocs     delta
BenchmarkAdd/128B-8     306915         306915         +0.00%
BenchmarkAdd/256B-8     306915         306915         +0.00%
BenchmarkAdd/512B-8     306915         306915         +0.00%
BenchmarkAdd/1K-8       306915         306915         +0.00%
BenchmarkAdd/2K-8       306915         306915         +0.00%
BenchmarkAdd/4K-8       306915         306915         +0.00%
BenchmarkAdd/8K-8       306915         306915         +0.00%
BenchmarkAdd/16K-8      306915         306915         +0.00%
BenchmarkAdd/32K-8      306915         306915         +0.00%

benchmark               old bytes     new bytes     delta
BenchmarkAdd/128B-8     14731932      14731929      -0.00%
BenchmarkAdd/256B-8     14731921      14731923      +0.00%
BenchmarkAdd/512B-8     14731922      14731920      -0.00%
BenchmarkAdd/1K-8       14731920      14731920      +0.00%
BenchmarkAdd/2K-8       14731921      14731920      -0.00%
BenchmarkAdd/4K-8       14731921      14731920      -0.00%
BenchmarkAdd/8K-8       14731921      14731920      -0.00%
BenchmarkAdd/16K-8      14731921      14731920      -0.00%
BenchmarkAdd/32K-8      14731920      14731920      +0.00%

This means that not using defer seems to have a considerable impact in ns/op of Add.

What are those sub-benchs? Just applying the same idea of bitcask to test if the meta size has some influence on the results. Seeing the implementation, it shouldn't and the results seem to reflect that. (so we can ignore the sub-benchmarks if you like, I kept them just in case).

How to reproduce these results? (not completely elegant, but enough to show the experiment)

git clone --branch master_avoiddefer https://github.com/jsign/trie.git
rm *.txt ; go test -benchmem -benchtime=5s -bench="BenchmarkAdd\b" > before.txt && go test -benchmem -benchtime=5s -bench=BenchmarkAddWithoutDefer > after.txt && sed -i 's/WithoutDefer//g' after.txt && benchcmp before.txt after.txt

Small explanation:

  • Clone my fork wich has the new benchmark plus both the original a new implementation of Add
  • Run both benchmarks, fix the ouput to be compared with benchcmp.

This idea may apply to other parts where defer sync.Mutex.Unlock() exists.

In the case of bitcask, the impact was close to 4%, here seems to be greater. If I'm not missing something, looks like the Add function is quite fast and defer overhead is relatively big (relatively).

Trie.Keys() panic

desc

new trie object, and call 'Keys'

panic code

trie.New().Keys()

panic stack

panic: runtime error: makeslice: cap out of range

goroutine 1 [running]:
github.com/derekparker/trie.collect(0xc00007e2a0, 0xc0000dde58, 0x0, 0x20)
	/Users/fmt/work/gopath1/src/github.com/derekparker/trie/trie.go:245 +0xad
github.com/derekparker/trie.Trie.PrefixSearch(0x0, 0xc00007e2a0, 0x0, 0x0, 0x0, 0x0, 0x4, 0xc00000e360)
	/Users/fmt/work/gopath1/src/github.com/derekparker/trie/trie.go:143 +0xaa
github.com/derekparker/trie.(*Trie).Keys(...)
	/Users/fmt/work/gopath1/src/github.com/derekparker/trie/trie.go:126

Use of a Read Write Mutex

I was wondering if this library supports concurrency, and saw that a recent commit seems to imply that the support has been added: 1ce4922

However, wouldn't your trie still be open to concurrent read operations while you have mutex locked Add or Remove. In that case, would it make sense to shift to sync.RWMutex.

Also to support custom concurrent walks, it would make sense to have GetChild(rune) *Node function instead of Children() map[rune]*Node. So that a read lock can be taken inside GetChild. Whereas, using Children() would allow the caller to easily ignore an internal mutex lock.

Profiling results

Here are some of the slowest lines in my usage
https://github.com/derekparker/trie/blob/master/trie.go#L72
I am attempting to optimize these by short circuit logic and removing unneeded function calls.

ROUTINE ======================== github.com/derekparker/trie.(*Trie).Find in /home/mdupont/gocode/src/github.com/derekparker/trie/trie.go
     240ms      1.93s (flat, cum) 36.90% of Total
         .          .     67:}
         .          .     68:
         .          .     69:// Finds and returns meta data associated
         .          .     70:// with `key`.
         .          .     71:func (t *Trie) Find(key string) (*Node, bool) {
      90ms      450ms     72:   node := findNode(t.Root(), []rune(key))
         .          .     73:   if node == nil {
         .          .     74:           return nil, false
         .          .     75:   }
         .          .     76:
     150ms      1.48s     77:   node, ok := node.Children()[nul]
         .          .     78:   if !ok || !node.term {
         .          .     79:           return nil, false
         .          .     80:   }
         .          .     81:
         .          .     82:   return node, true

And the second one

ROUTINE ======================== github.com/derekparker/trie.findNode in /home/mdupont/gocode/src/github.com/derekparker/trie/trie.go
      70ms       70ms (flat, cum)  1.34% of Total
         .          .    179:// mask of this node.
         .          .    180:func (n Node) Mask() uint64 {
         .          .    181:   return n.mask
         .          .    182:}
         .          .    183:

Maby we could check if the node was nil before calling it. 
https://github.com/derekparker/trie/blob/master/trie.go#L185
  60ms       60ms    184:func findNode(node *Node, runes []rune) *Node {
     .          .    185:   if node == nil {
     .          .    186:           return nil
     .          .    187:   }
     .          .    188:

  10ms       10ms    189:   if len(runes) == 0 {
     .          .    190:           return node
     .          .    191:   }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.