wolfgarbe / pruningradixtrie Goto Github PK

PruningRadixTrie - 1000x faster Radix trie for prefix search & auto-complete

Home Page: https://seekstorm.com/blog/pruning-radix-trie/

License: MIT License

C# 100.00%

radix-trie patricia-trie radix-tree patricia-tree autocomplete auto-complete auto-completion auto-suggest trie trie-tree tree trees prefix-search

pruningradixtrie's Issues

Capable of backing a web api for autocomplete

Would have the need of an autocomplete functionality which can handle thousands of words of a given domain within a website.

So my question is if I can use your data structure in a website with high concurrency and if its writable or read only ?

Its awesome what you have done here, the only place where I found something similar is CQEngine in Java, which has also special Trie structure for prefix searches...and is build for concurre cy in mind.

[BUG] UpdateMaxCounts can contaminate the sorted state

The UpdateMaxCounts function updates the termFrequencyCountChildMax field, which is also the field by which the Node.Children Lists are sorted. Since the sort call comes before the UpdateMaxCounts call in the following code, each layer is not actually guaranteed to be in sorted order.

PruningRadixTrie/PruningRadixTrie/PruningRadixTrie.cs

Lines 126 to 136 in 40b01f3

 else 

 { 

 curr.Children.Add((term, new Node(termFrequencyCount))); 

 //sort children descending by termFrequencyCountChildMax to start lookup with most promising branch 

 curr.Children.Sort((x, y) => y.Item2.termFrequencyCountChildMax.CompareTo(x.Item2.termFrequencyCountChildMax)); 

 } 

 termCount++; 

 UpdateMaxCounts(nodeList, termFrequencyCount); 

 } 

 catch (Exception e) { Console.WriteLine("exception: " + term + " " + e.Message); } 

 }

And there is no sort call after this UpdateMaxCounts:

PruningRadixTrie/PruningRadixTrie/PruningRadixTrie.cs

Lines 56 to 61 in 40b01f3

 if ((common == term.Length) && (common == key.Length)) 

 { 

 if (node.termFrequencyCount == 0) termCount++; 

 node.termFrequencyCount += termFrequencyCount; 

 UpdateMaxCounts(nodeList, node.termFrequencyCount); 

 }

In practice, this can subtly worsen performance.

Also, because UpdateMaxCounts updates termFrequencyCountChildMax in potentially all ancestor nodes, those layers would need to be sorted as well if every layer is supposed to maintain sorted order.

Terms.txt

I believe this file https://github.com/wolfgarbe/PruningRadixTrie/blob/master/PruningRadixTrie/terms.zip , referenced in the Readme is missing.

Is there Infix search possible

Is there a chance also to look for croso to get results for microsoft?
So not only Prefix search.

Any plans to put out a nuget package?

Hi,
Thanks for this, I have a custom implementation of Tries that this could likely replace. Are there any plans to upload this as a nuget package? I appreciate it's only 2 classes of code which can easily be copied, but was just wondering.

Terms.txt file missing

Could you also provide your terms file to start the benchmark ?
Would be great

Thanks a lot for this piece of code!

Store additional data

It would be helpful if we could store additional data in relation to the term like an database id or similar.
Maybe there is an easier way but storing all terms inside an dictionary inside the trie (or maybe there is a whole set of all terms statically) would be a suggestion.

And is there a specific reason why the result tuple is not typed (a separate type). If it was maybe it makes sense to introduce also a generic result where T is the type which is added when indexing.

something like

AddTerm(string term, int weight, T data)

LookupResult results = Lookup(string term, int limit, int offset)
LookupResult results = Lookup(string term, int limit, int offset)

I have ported to Java. Reference in the README?

Hi Mr Garbe,

Thanks for your brilliant work! I have ported the source code to Java, and put it in benldr/JPruningRadixTrie. You might like to reference this in your readme, so that any other Java users can avoid having to port the code themselves?

I have tested it and seems to work as desired (in any case, I will be actively using this code in months to come so will become aware of any bugs if there are any). Note I did not port the PruningRadixTrie.Benchmark code.

Thanks again for your great work,
Ben

Memory consumption

Is there any way to calculate memory based on given terms ?

wolfgarbe / pruningradixtrie Goto Github PK

pruningradixtrie's Issues

Capable of backing a web api for autocomplete

[BUG] UpdateMaxCounts can contaminate the sorted state

Terms.txt

Is there Infix search possible

Any plans to put out a nuget package?

Terms.txt file missing

Store additional data

I have ported to Java. Reference in the README?

Memory consumption

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	else
	{
	curr.Children.Add((term, new Node(termFrequencyCount)));
	//sort children descending by termFrequencyCountChildMax to start lookup with most promising branch
	curr.Children.Sort((x, y) => y.Item2.termFrequencyCountChildMax.CompareTo(x.Item2.termFrequencyCountChildMax));
	}
	termCount++;
	UpdateMaxCounts(nodeList, termFrequencyCount);
	}
	catch (Exception e) { Console.WriteLine("exception: " + term + " " + e.Message); }
	}

	if ((common == term.Length) && (common == key.Length))
	{
	if (node.termFrequencyCount == 0) termCount++;
	node.termFrequencyCount += termFrequencyCount;
	UpdateMaxCounts(nodeList, node.termFrequencyCount);
	}