Giter Club home page Giter Club logo

trie-search's People

Contributors

aviadhahami avatar dependabot[bot] avatar gnithin avatar joshjung avatar mhalle avatar mmontag avatar oaustegard avatar ramneekgill avatar ryoha000 avatar slavik-chapelskyi avatar vamshi29292 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trie-search's Issues

Example 2 (add items individually or from Array) doesn't work as expected

ts.get('a'); // Returns all 4 items above. 
ts.get('an'); // Returns all 4 items above.
ts.get('and'); // Returns all 3 items above that begin with 'and'
ts.get('andr'); // Returns all 2 items above that begin with 'andr'
ts.get('andre'); // Returns only andrew.

First line we get only one result and for the rest empty array

Exact Match Information in get()

There should be a way to mark which found items are an exact match in the returned results to help with sorting.

This would also make it trivial to add the ability to do an exact match search using quotes (similar to google).

TypeScript fork

I thought I should let you guys know that I have been working a TypeScript version of this library. You may see a version of it here.

While I have forked the project for my own use case, I was wondering if there would be interest in merging this upstream?

Cheers to @joshjung and thanks for the great library 🙂

ignoreCase doesn't really work as expected

The ignoreCase option seems to apply to both insertion and retrieval. Here's how I'd expect it to work:

const trie = new TrieSearch('key', {
  ignoreCase: false,
  splitOnRegEx: /[^a-zA-Z0-9]|(?<=[a-z])(?=[A-Z])/, // 'camelCase' to ['camel', 'Case']
});

trie.add({ key: 'fooBar', value: 123 });
trie.add({ key: 'foo bar', value: 456 });

trie.get('bar'); // should return both items

Looking closer, this might be a bug because you call toLowerCase before and after splitOnRegEx:

https://github.com/joshjung/trie-search/blob/master/src/TrieSearch.js#L88
https://github.com/joshjung/trie-search/blob/master/src/TrieSearch.js#L197

Searching for 'constructor' crashes

Here's a test case that crashes:

diff --git a/test/trie-search.test.js b/test/trie-search.test.js
index 0f21f68..5cbd2ef 100644
--- a/test/trie-search.test.js
+++ b/test/trie-search.test.js
@@ -82,6 +82,10 @@ describe('TrieSearch', function() {
       assert.equal(ts.get('blab').length, 0);
       assert.equal(ts.get('nope').length, 0);
     });
+
+    it('get(\'constructor\') should work', function() {
+      assert.equal(ts.get('constructor').length, 0);
+    });
   });
 
   describe('TrieSearch::addAll(...)should work for an array', function() {

The stack is:

    TypeError: Cannot read properties of undefined (reading 'length')
    
          331 |         accumulator = reducer(accumulator, phrases[i], matches, this);
          332 |       } else {
        > 333 |         ret = ret ? ret.addAll(matches) : new HashArray(haKeyFields).addAll(matches);
              |                                                                      ^
          334 |       }
          335 |     }
          336 |
    
          at JClass.addAll (node_modules/hasharray/src/HashArray.js:70:13)
          at TrieSearch.get (src/TrieSearch.js:333:70)
          at Object.<anonymous> (test/trie-search.test.js:88:23)

Add 'ranking' reducer based on depth and count of matches

If typing in spot rock, the TrieSearch.RANK_REDUCER will sort the results like:

0: 'spot rock'                // exact word match, correct order
1: 'spotify rocks'         // incomplete word match, correct order
2: 'rock spot'               // exact match, incorrect order, no word gap
3: 'spot the rocks'     // exact word match, correct order, with gap word(s)                        
4: 'rock the spot'       // exact word matches, incorrect order, with gap word(s)

Need to give this more thought, but this would definitely be a nice ranking style feature for reducing the found results. Downside is sorting, but if this can done inline with the array concatenations this might actually mean that it just dictates the order in which the arrays are concatenated when done which is a much faster operation than sorting all the items at the end.

Unexpected behaviour when query ends with a space

Consider the following example:

var TrieSearch = require('trie-search');
var trie = new TrieSearch(['name', 'city']);

trie.add({ name: 'a b', city: 'c' });
trie.add({ name: 'd', city: 'c' });

console.log(trie.get('a ')); // => [ { name: 'a b', city: 'c' }, { name: 'd', city: 'c' } ]

Why { name: 'd', city: 'c' } is found?

Add deletion 'misspelling' feature / option

I think it would be fairly trivial to add a feature to the Trie that allows it during lookups to "look ahead" one character to allow for matches on misspellings where a character was missed. For example:

Inserted: "hello"
Searched: "helo"

If we lookahead 1-character at the first "l", we could match on "helo" as well.

get returns undefined

Hi Josh,

When the key is 'constructor' the function get throws an error

TypeError: Cannot read property 'length' of undefined at JClass.addAll (/scripts/node_modules/hasharray/src/HashArray.js:70:13)
at TrieSearch.get (/scripts/node_modules/trie-search/src/TrieSearch.js:314:70)

after debugging the problem was in the function _get in the getCache to be more precise

  _get: function (phrase) {
    phrase = this.options.ignoreCase ? phrase.toLowerCase() : phrase;
    
    var c, node;
    if (this.options.cache && (c = this.getCache.get(phrase))){
      return c.value
    }

i changed it to this

  _get: function (phrase) {
    phrase = this.options.ignoreCase ? phrase.toLowerCase() : phrase;
    
    var c, node;
    if (this.options.cache && this.getCache.has(phrase)){
      return this.getCache.get(phrase).value;
    }

the problem is in the this.getCache.get implementation.

Thanks in advance

Option for max results?

Would it improve speed if the ts.get('foo') could return early when results reach an optional maximum? For example, I might have 24,000 matches but can only display 20 results at a time.

Search in any position of words

For the term ob and the given object array:

const names = [
  {name: 'bill', age: 35},
  {name: 'billy', age: 35},
  {name: 'bob', age: 35},
  {name: 'bobby', age: 22},
  {name: 'bo', age: 35}
];

I'd like to get the result:

[
 {name: 'bob', age: 35},
 {name: 'bob', age: 15},
]

Is it possible now?

Upgrade to ES6: Arrays with added prototype properties break

Note: this bug actually exists in the HashArray code but I'm putting it here since this library is the primary user of HashArray.

For example, if you do this somewhere in your code:

Array.prototype.range = function() {
  ...
};

The HashArray breaks here:

  //-----------------------------------
  // add()
  //-----------------------------------
  addOne: function (obj) {
    var needsDupCheck = false;
    for (var key in this.keyFields) {
      key = this.keyFields[key];
      var inst = this.objectAt(obj, key);

This was written eons ago and really just needs to be updated to ES6 or TypeScript as per someone else's request and fork.

Anyway, putting this ticket here so I can take a look at it when I have some free time. In the meantime, TrieSearch does not work if you do fancy global manipulations to the Array.prototype.

Poor performance

Hi, I was experiencing poor performance with trie-search and I thought I'd post my findings here.

My trie is built from 70932 key phrases (containing 57575 unique tokens).
trie.get(['s']) returns 58587 matches in 1400 ms. This is very slow.

I went into HashArray.js and changed ignoreDuplicates to true.
Now, trie.get(['s']) takes 97 ms, 15x faster.
All the results for my queries (I'm using UNION_REDUCER) are still valid.

I don't know the purpose of ignoreDuplicates here, but it seems like you might be able to get some quick wins for optimization!

Issue with matching last word in a sentence

Hello,
Great job on this library, I appreciate the work so far. I am having some issues with matching the last word in a sentence.

For example:
`
const optionsArray = [
'Hello world',
'Sample test'
];

const trie = new TrieSearch([], { splitOnRegEx: false });

optionsArray.forEach((item) => {
trie.map(item, item);
});

`

Searching for world or test returns 0 results, how can I fix this?

trie.search('world') --> null
trie.search('test') --> null

trie.search('hello') --> Hello world

How to query nested arrays?

I have an item that looks like so:

{
  name: 'Apple',
  quotes: [{
    symbol: 'AAPL:US'
  }, {
    symbol: 'AAPL:SO'
  }]
}

How can I initialize my TreeSearch instance to search on both the name and nested symbol fields?

I've tried new TrieSearch(['name', ['quotes', 'symbol']]) and a couple of other variations without success. Thank for taking a look in advance!

Problem passing an Array of Strings, takes only the first value.

Hi,

I have a structure like this:

categories: [ 0: Object name: "Test", synonyms: ["Hello", "Nice", "String"] 1: Object name: "Susi", synonyms: ["Peter", "Betti", "Christian"] ... ... ]

So I created a TrieSearch with:

new TrieSearch(['name', ['synonyms']], { min: 3 }),

ts.addAll(categories);

However, with ts.get() I only receive the first index of synonyms.

So, ts.get("Hello") will give me the value of index 0 of categories.

But with ts.get("Nice") I won't get anything.

Am I missing something here?

I hope someone can help me.

"constructor" key causes issues with retrieval

This might actually be an issue for https://github.com/joshjung/hash-array

Reproducible example

const TrieSearch = require('trie-search')

const trie = new TrieSearch();
trie.addFromObject({ constructor: 1 });
trie.get('c');

Output

TypeError: this._map[inst].indexOf is not a function
    at JClass.addOne (/tmp/node_modules/hasharray/src/HashArray.js:44:31)
    at JClass.add (/tmp/node_modules/hasharray/src/HashArray.js:60:12)
    at JClass.addAll (/tmp/node_modules/hasharray/src/HashArray.js:71:16)
    at aggregate (/tmp/node_modules/trie-search/src/TrieSearch.js:202:12)
    at aggregate (/tmp/node_modules/trie-search/src/TrieSearch.js:206:11)
    at aggregate (/tmp/node_modules/trie-search/src/TrieSearch.js:206:11)
    at aggregate (/tmp/node_modules/trie-search/src/TrieSearch.js:206:11)
    at aggregate (/tmp/node_modules/trie-search/src/TrieSearch.js:206:11)
    at aggregate (/tmp/node_modules/trie-search/src/TrieSearch.js:206:11)
    at aggregate (/tmp/node_modules/trie-search/src/TrieSearch.js:206:11)

Wrong written values in README

In Example 1, ts.get('andre') should return both andrew and andrea.
Similar mistakes throughout. I fixed a few of those in the PR.

Add 'remove' and 'readd' function

This is going to take a bit of work. Essentially, I envision:

new TrieSearch(keyFields, {+++idField});

Where idField is used to build a map such that inside the TrieSearch instance we have a new property called idToPhrases which maps those fields to all the phrases that contain the object:

this.idToPhrases = {
  [idField]: ['a', 'ab', 'abr', 'abra'...'abracadabra']
} ;

Then, when you call remove(obj) it pulls the id from it, finds all the phrases that map to the object and splices the array in the root lookup tree.

Now, this is great for removal, so I also want to add a 'readd' method. The readd method figures out all the old phrases that map to the object, all the new phrases that map to the object, and then intelligently prunes the root tree doing the minimum number of operations required to do so to improve performance.

Preserve order for matching keys

Hey there, love this library! I'm writing a component that will autocomplete from a list of the top 1000 highest traffic websites. The array that I'm passing to addAll is already sorted from most- to least-popular.

It would be great if trie-search preserved insert order, so if I search for the prefix g google.com would be the search result. Would this be possible? I'm happy to take a look at the source and submit a PR if this sounds generally useful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.