Giter Club home page Giter Club logo

trie's Introduction

Erlang Trie Implementation

The data structure is only for storing keys as strings (lists of integers), but is able to get performance close to the process dictionary when doing key lookups (based on results here with the benchmark here). So, this data structure is (currently) the quickest for lookups on key-value pairs where all keys are strings, if you ignore the process dictionary (which many argue should never be used).

The implementation stores leaf nodes as the string suffix because it is a PATRICIA trie (PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric, D.R.Morrison (1968)). Storing leaf nodes this way helps avoid single child leafs (compressing the tree a little bit).

The full OTP dict API is supported in addition to other functions. Functions like foldl, iter, itera, and foreach traverse in alphabetical order. Functions like map and foldr traverse in reverse alphabetical order. There are also functions like find_prefix, is_prefix, and is_prefixed that check if a prefix exists within the trie. The functions with a "_similar" suffix like find_similar, foldl_similar, and foldr_similar all operate with trie elements that share a common prefix with the supplied string.

The trie data structure supports string patterns. The functions find_match/2, fold_match/4, and pattern_parse/2 utilize patterns that contain a"*"wildcard character(s) (equivalent to ".+" regex while"**"is forbidden). The function find_match/2 operates on a trie filled with patterns when supplied a string non-pattern, while the function fold_match/4 operates on a trie without patterns when supplied a string pattern. The functions find_match2/2 and pattern2_parse/2 add "?" as an additional wildcard character (with "**", "??", "*?" and "?*" forbidden) that consumes greedily to the next character ("?" must not be the last character in the pattern).

The btrie data structure was added because many people wanted a quick associative data structure for binary keys. However, other alternatives provide better efficiency, so the btrie is best used for functions that can not be found elsewhere (or perhaps extra-long keys)... more testing would be needed to determine the best use-cases of the btrie.

Tests

rebar compile
ERL_LIBS="/path/to/proper" rebar eunit

Author

Michael Truog (mjtruog at protonmail dot com)

License

MIT License

trie's People

Contributors

jcomellas avatar mniec avatar okeuday avatar rleonid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trie's Issues

Wrong behavior of trie:is_prefixed/2 ?

Hi,
Just encountered some weird behaviors of trie:is_prefixed/2.

76> Foo = trie:store("abc",value,trie:new()).
{97,97,{{"bc",value}}}
77> trie:is_prefixed("abcdefghijk",Foo).     
false

78> Bar = trie:store("abcd",value,Foo).      
{97,97,
 {{{98,98,{{{99,99,{{{100,100,{{[],value}}},value}}},error}}},
   error}}}
79> trie:is_prefixed("abcdefghijk",Bar).     
true

Although utterly unfamiliar with trie, is this behavior intended?
Or I'm just using it wrong? ๐Ÿ˜‚

After a little source code digging, it seems like L625-626 only matches when calling trie:is_prefixed(Key,Trie) with a pre-existing Key in Trie, which is essentially calling

trie:is_prefixed("abc",Foo).

Maybe change it to this?

{MaybePreT, _} ->
    case lists:prefix(MaybePreT,T) of true -> true; false -> false end;

Will this break the properties of trie? ๐Ÿ˜‚

FYC, some tracing messages:

7> trie:is_prefixed("abc",Foo).
(<0.41.0>) call trie:is_prefixed("abc",{97,97,{{"bc",value}}})
(<0.41.0>) returned from trie:is_prefixed/2 -> true
true

8> trie:is_prefixed("abcdefghijk",Foo).                                                                                   
(<0.41.0>) call trie:is_prefixed("abcdefghijk",{97,97,{{"bc",value}}})
(<0.41.0>) returned from trie:is_prefixed/2 -> false
false

9> trie:is_prefixed("abcdefghijk",Bar).     
(<0.41.0>) call trie:is_prefixed("abcdefghijk",{97,97,{{{98,98,{{{99,99,{{{100,100,{{[],value}}},value}}},error}}},error}}})
(<0.41.0>) call trie:is_prefixed("bcdefghijk",{98,98,{{{99,99,{{{100,100,{{[],value}}},value}}},error}}})
(<0.41.0>) call trie:is_prefixed("cdefghijk",{99,99,{{{100,100,{{[],value}}},value}}})
(<0.41.0>) returned from trie:is_prefixed/2 -> true
(<0.41.0>) returned from trie:is_prefixed/2 -> true
(<0.41.0>) returned from trie:is_prefixed/2 -> true
true

fetch_similar_keys returning too many results

Hi,

It seems like fetch_keys_similar will sometimes return the keys from the whole tree rather than the ones I expect.

Here's an example:

L = [{"00", zeros}, {"11", ones}],
T = trie:from_list(L),
io:fwrite("Data: ~p\n", [L]),
io:fwrite("Keys that start with \"0\": ~p\n", [trie:fetch_keys_similar("0", T)]),

As output I get:

Data: [{"00",zeros},{"11",ones}]
Keys that start with "0": ["00","11"]

It doesn't seem to happen very often and it's quite rare when the trie is big. It doesn't happen with single character keys.

I did a little digging through the code and it seems like it's something to do with fold_similar_node when the child node is a binary, which ends up calling fold_similar_element on the whole trie. I'm not sure if that's helpful but there it is anyway ;-)

Cheers,
Sam.

limiting the number of search results?

Interesting library! For a search for similar things, like fold_similar, has there been any thinking around being able to reduce the number of search results, like being able to specify N, where N is the maximum number of search results one would like to be returned.

The use case is that one might use this library as part of some lookup engine where you might have many entries stored in the trie, but there really is no need to present the entire database (or all entries that match the first character) to the user as he/she searches, but just giving a few and then indicating that there would be others to be found if the user specifies further characters - pretty much how google search does it I guess. I suppose this could be done either in the form of providing search results as an iterator that you could iterate through or to, as mentioned above, being able to specify some offset and limit when performing fold_similar and the likes.

(feel free to close this issue if it's just irrelevant)

** (FunctionClauseError) no function clause matching in :trie.store/3

Hi
Thumbs up for the library. I am trying to use trie in elixir from v1.7.4.

And I just created some small tests, in which I try your samples , and I noticed that for example :trie.new([{"abcdefghijklmnopqrstuvwxyz", 1},{"aac", 2}]) throws an exception:
** (FunctionClauseError) no function clause matching in :trie.store/3 in trie/src/trie.hrl:1005: :trie.store/3

I also tried btrie, and this one is working. Maybe I am doing something wrong.

I tried the trie functionality in erlang shell and it is working, so it seams that it is crashing only in elixir ( I compiled the code before trying)

I have:
Erlang/OTP 21 and Elixir 1.7.3

Thank you,

Utf8 keys?

I have a need for utf8 keys. How far away is support for these?

find_prefixes/2 always returns an empty result set

Not entirely sure weather this is me misusing it but I fail to get find_prefixes/2 to return any data other then an empty list.

3> T1 = btrie:append(<<"this is a test year really!">>, t, btrie:new()).
{116,116,{{<<"his is a test year really!">>,[t]}}}
4> btrie:find_prefixes(<<"this">>, T1).
[]
5> btrie:find_prefix(<<"this">>, T1).
prefix

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.