Giter Club home page Giter Club logo

fuzzysearch's People

Contributors

aguynamedben avatar jeancroy avatar ke4roh avatar mrkishi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fuzzysearch's Issues

Proper way to use FuzzySearch.prototype.add()

Hi, like others, I can't believe not more people are aware of this library. I love it. Thanks you for working on it. I'm happy to contribute to the project if this is a bug that can be fixed or better docs are helpful.

I'm trying to use FuzzySearch.prototype.add() to add new documents to my searcher after initial index creation. After calling searcher.add(), I introspect searcher.index.length and searcher.nb_indexed and can see that the document has been added to the index. But when I call searcher.search(), the expected result is missing.

Here's my test code:

    // TEST

    const docs = [
      { _id: 1, title: 'Item 1', domain: 'item1.com' },
      { _id: 2, title: 'Item 2', domain: 'item2.com' },
    ];

    const keys = {
      title: 'title',
      domain: 'domain',
    };

    const searcher = new FuzzySearch({
      source: docs,
      keys: keys,
      identify_item: function(doc) {
        return doc._id;
      },
    });

    let results;

    // First test search
    console.log(`Results for 'title:Item', round 1`);
    results = searcher.search('title:Item');
    results.forEach((result) => {
      console.log(result);
    });
    console.log(`There are ${searcher.nb_indexed} docs`);

    // Add a doc
    console.log(`Adding a doc`);
    const newDoc = { _id: 3, title: 'Item 3', domain: 'item3.com' };
    searcher.add(newDoc, keys);
    // Do I need searcher.dirty = true; here? That doesn't work either

    // Sanity check internals
    console.log(`Doc added, now there are ${searcher.nb_indexed} docs, let's search 'em!`);

    // Second test search
    console.log(`Results for 'title:Item', round 2`);
    results = searcher.search('title:Item');
    results.forEach((result) => {
      console.log(result);
    });

Console output:

Results for 'title:Item', round 1
commandDB.js:76 Object {_id: 1, title: "Item 1", domain: "item1.com"}
commandDB.js:76 Object {_id: 2, title: "Item 2", domain: "item2.com"}
commandDB.js:78 There are 2 docs
commandDB.js:81 Adding a doc
commandDB.js:86 Doc added, now there are 3 docs, let's search 'em!
commandDB.js:89 Results for 'title:Item', round 2
commandDB.js:92 Object {_id: 1, title: "Item 1", domain: "item1.com"}
commandDB.js:92 Object {_id: 2, title: "Item 2", domain: "item2.com"}

The expected result is that the Item 3 document would also show in the results, but it doesn't

Route 1

I noticed that there is a FuzzySearch.prototype.dirty property, and I've tried setting searcher.dirty = true; after search.add() but that doesn't fix the problem, and in fact it resets searcher.nb_indexed back to 2 if I do that. Maybe there is an issue where settings searcher.dirty = true; causes a refresh of the index based on original data, and documents added are lost?

Route 2

Maybe I'm just not adding the documents in the right way?

Other explorations

The issue doesn't seem to be related to the usage of Tagged Search or identify_item.

Any hints or ways you could point me are much appreciated. I'm happy to hunt things down. Thanks again for building this great library.

Score evaluation

I need a way to specify the score result if it fit a minimum, I found that thresh_include not convenient in general, where it depends input length. I think I need to compare the score and its relativity to input length ?

dynamically adjusting json source

In the minimal set up, we have this:

setsource("movies.json");

in my case, I wanted to be something like

setsource("movies.json.php?query"+firstWordOfWhatUserTypedSoFar);

This is because the movies.json in my case 1.6M records and that becomes too much for the js to handle when running your fuzzy search algs. I thought, if I could manage to return a sub set of the movies.json by sending the very first word or whatever he typed up so far, then I can significantly minimize the JSON size being returned by the server.

But I guess this is not possible, or is it?

Sort by Length

Hi Jean,

Another quick check, can i sort the result based on string length instead of alphabetic order.
can we pass in an argument option or do we need to change in FuzzySearch.js

Thanks Again for your work.

Numbers

Small word are currently filtered out. With length 2, we filter words like "a, of".
However when the small word are number they may be significant. Ie 2, 5, 10, 18 ...

Possible solution is to accept any length if first char is a number.

synonyms substitute

Hello Team,

Quick question, do we have a synonym substitute that we could do
for example `'lat' -> 'Geo Coordinates``

--Thanks B

which params should I play with?

Hi Jean,

Please take a look at the 2 screenshots

The first screenshot is showing the search results of "Black Seed and Honey".
The second screenshot shows the same data set when just the "honey" is searched.

Screen Shot 2022-05-10 at 1 34 35 PM

Screen Shot 2022-05-10 at 1 35 31 PM

When honey is searched with "black seed and honey", almost all of its entries are being crashed and pushed down.
When the word "black" is a full match, I cannot criticize it because both "black" and "honey" are 5 letters...
But the following entries should not end up getting more points than the 2nd screenshot I posted above.

Screen Shot 2022-05-10 at 1 46 21 PM

What do I need to achieve so that the one above is less valued than the full word match "honey"?

Also, as the examples below demonstrates, these are very poor matches. What param should be at which value ( in the option ) so the algorithm does not think these are great matches and therefore no scores should be given for those scattered matches.
Screen Shot 2022-05-10 at 1 52 11 PM

As you can easily tell, these have nothing to do with "Black Seed and Honey" and entires like the below ( with perfect & bingo "honey" matches should definitely be above them.

Screen Shot 2022-05-10 at 1 54 34 PM

Sometimes, I think of catching the full results set of the fuzzy search and then run thru a loop ( which I will be writing to weed out the scattered ones and move the bingo words to the top by revalueing them with more scores..

But if we can do this with your params already, then, why get into a second loop and a second scoring process. Right?

License

Please, could you specify under which conditions can we use your library?

Litterals

Sometime it's good to be a little less fuzzy.

  • Filters:
    Prune results using a custom function. Example use case is to filter by a status code.
    Must be done before spending time on computing fuzzy score, must also preserve cache.

  • "literal expression":
    Remove the literal word from the fuzzy search, but build a filter that require it to be present as-is in one of the field (case insensitive indexOf). Let the highlighter be aware of this and act accordingly.

  • field: "literal expression":
    Same, but must exist in this particular field. (see tagged search)

  • literal field:
    Use case is for date / status code. A match is not required. But for this field to contribute to the score, one of the query word must be present (case insensitive indexOf). Does not require usage of "literal" although it should respect it (for example "multi word"). Let the highlighter be aware of this and act accordingly.

  • sigil #word , @word :
    For all use case this syntactic sugar over the tag: syntax. Difference is that it may happen in the middle of a query, and is limited to a single word.

Caching index

Would like to store the Index in local storage once it has built for a lot of entries.
Is there a public API for retrieving / storing?
And any issues known when compressing the index with ez?

score handling

Please take a look at this screenshot.

Screen Shot 2022-03-26 at 6 00 55 AM

You may also play with the actual page:
https://www.islamicity.org/quransearch/lab/scrape-sunnah-com/search/fuzzy-search-client.php and selecting the 2nd option (Search Field: FUZZYSEARCH__topic) . Type Ablution pls and you will see the same thing I posted in the screenshot

My question is...

The first match is a great bingo.. But starting with the 2nd and downward is too far away from the first word. It is better to not show such suggestions. Which settings do I need to play with so that it only shows the first one (Ablutions ) as it has a huge resemblance to what the user has typed and the others far far away from it such as ( Afflictions )?

I think this is a hard one.. Cause trying to make that view better may result sacrificing some of the good suggestions..
If this is not much possible, let me ask you something else...

Why in your demo the entire word Brain is bolded. Should not only the applicable letters to be bold?
Here is the view:
Screen Shot 2022-03-26 at 6 11 20 AM

Should not we expect only the b and r and a to be in bold but not the i and n?

intelligently caching the JSON data

For setting the source and initializing the fuzzyhound, we do this.

function setsource(url, keys, output) {

    $.getJSON(url).then(function (response) {
        fuzzyhound.setOptions({
            source: response,
            keys: keys,
            output_limit: 200,  // Return up to N result, 0 to disable
            //output_map: output,
            output_map: function(root){ final_item = removeTags(root.item); console.log (final_item); return final_item; },    
        })
    });
}


```Let's assume our url is this 

`blah.blah.com/blah.js 
`
And we have also this:  

`$.ajaxSetup({cache: true});
`
This is great for efficiency but, what if the blah.js is changed on the server, ( say a spelling error has been fixed in it ), how do you make sure people who already cached the blah.js earlier in their browser gets the latest and greatest blah.js? 

Would the following trick work?

`url = 'blah.blah.com/blah.js?version=' .getFileModificationDate("... /blah.js");
`
So whenever the file changes, I can change the version number on the URL of the setSource, either programmatically in an unattended fashion or manually. 
 
In that case, would  "$.ajaxSetup({cache: true}); " still be serving the older file? 

Or is there a better way to address this intelligent caching?

`remove()` to complement `add()` ?

Hello, @jeancroy I LOVE your library! It's fantastic. So much better than the many other popular fuzzy search libraries.

A lot of your writing on the theory and algorithms implementation goes over my head and I don't have time to get into that, but I love that it's there. The mere presence of extensive intellectual stuff gives me extra confidence in your library, plus I can tell you really value the things you create, and aim to produce stuff that works well. So I give your library many stars! ⭐ ⭐ ⭐ ⭐ ⭐ ⭐

I also LOVE how you have an add method, it fits my use case perfectly. My use case is adding documents (and queries, but they are shorter, and are easier to handle), to an index inside of a desktop app that archives your browsing history so you can read it when you are offline. I recently added full text search to this (using flexsearch and ndx), but while these are good (and serializedable, to save time on restarting the app), they do not support typos. So I went searching for a fuzzy library, and after a few false starts, ended up here (actually via the GitHub search, not by npm search nor by Google search). I'm really thankful I did end up here.

Even if you can't implement any of these things (such as serialization, or remove), I am still immensely grateful for your library. Thank you for crafting such a thing of beauty and releasing into the world for all to use!

Anyway, I'd like there to be a remove, but I know it's not your job to make it, and I am sure you are in any case super busy. But let me just describe it here for the sake of the issue.

If I added a document to the index using add(), and then later removed() that document from the index, the document would not show up in search results, nor in serializations (such as they may be) of the index.

My particular motivation for this feature is that very often, a webpage (with the same URL) will add additional content (such as comments on a story), and my archiver would like to index those so that the person using the archiver can get (and search for) the most up-to-date version of the content they have seen.

Right now my solution is simply to discard all fz-search results that are older duplicates, but I am worried about the scalability of this, so have delayed deploying it to production. This is because people may re-index a page on the change of its content many times, but if I never remove those pages from the fuzzy search index, my index will end up dominated by pages that I no longer need and are out of date. So you can see I'm very scared of this possibility! It would be terrible!

But it's OK. If that's the case that you cannot implement a remove (or even serialization--I commented on another issue #21 ), I will still use your wonderful fz-search library for providing typeahead and suggestion for search queries (just like Google search), rather than on results. It will be a little sad to not be able to search with typos, but I'm sure I can figure out another way to do it somehow.

Please delete this issue if you already have this functionality (how embarrassing! 😱 )

Performance

Hello,

I was planing on rewriting fuzz-aldrin-plus to c++, until I found this library. It seems to be more suited for my case, however what matters to me is performance. Is it more or less like in fuzz-aldrin-plus? I would test performance myself, but I'm unable to do it for a couple of days (and I can't get this question out of my head).

Tuning for large datasets

Hi. First, thank you for building this promising library, and thanks for your work on fuzzaldrin-plus as well.

My data set is 1mm records (natural language document titles), and I'm providing an auto-complete search that only needs to return 15 results. With fuzzaldrin-plus, there's a settings maxInners which is detailed here. I've found that with 1mm records, this fuzzaldrin-plus config works amazingly well, returning 15 results from 1mm records in 1-2ms.

let fuzzaldrinOptions = {
  debug: false,
  maxResults: 15,
  usePathScoring: false,
  maxInners: 50,
  key: 'title',
};

This is due to the maxInners option and its High positive count mitigation as detailed in the README. Basically, my interpretation is that maxInners tells the algorithm to abort scoring early in the process if there are a ton of potential matches (i.e. searching for "a" across 1mm records).

I'm looking at the tuning options for FuzzySearch. Is there a similar mitigation in FuzzySearch that lets me efficiently escape from a large number of matches early? I tried playing with the thresh_relative_to_best setting, thinking that putting it higher (i.e. 0.8) would select from a smaller set of records, but that didn't change the performance significantly.

Any advice you could provide on tuning for a search across a large number of records would be very helpful. Maybe FuzzySearch doesn't have that mechanism, as it looks like maxInners was added as a later performance enhancement PR to fuzzaldrin-plus, but I figured it was worth asking.

Thanks again for your work on this library.

how do you select the matcing items?

Hi, when the algorithm suggests a few items, how do you select one so it is added into the textbox or submit that selection to my server? By default, when I click/choose an item, the text box gets cleared immediately and the suggestions drop-down disappears. I would like it to behave like a typical Google search... In google, when you type, google makes some auto-suggestions, and as soon as you select an item on the suggestions list, you end up firing a Google search for that selection. Is there something like onClick, onSelect type hook, that passes the selection to a js function as inout and from there, I do whatever I can???

updating the index in the middle of search

Hi Jean,

I've got a 13,000 item JSON. and works great....
but sometimes, I need to add, say 1000-to-2000 more items if I detect a certain pattern in the user input!

The example, you give in the docs is as follows but it is only for a single item addding.

    var data = ["survey","surgery","insurgence"];
    var searcher = new FuzzySearch({source:data});
    var query = "assurance";
    var result = searcher.search(query)
    var newEntry = "surgeon";
    searcher.add(newEntry);
    var newResult = searcher.search("surgeo");

And, this example is only for a single dimension array!
In my case, the json array is multiple item! And I specify which key to search for too ( thru the final arg of setsource function )

When adding 1000 or more items, repeating code such as var newEntry = "surgeon";searcher.add(newEntry); 1000 times does not make sense... I also do not know how to point to a key in a more complex json array.

Is there a way in the API to tell the engine, "here! add this JSON file on top of the first json loaded! " or perhaps, completely change the json with another json in the middle of the search..

Again, the scenario is something like this..

User types a string, typeahead start working based on the original json data set...
user input reaches the 4th word (and because it is a long input or my algorithm detects a certain pattern in the user input), I then replace the org json with another data set, so that the input at that moment is searched against the new json data set... And if the user starts removing words (or deletes that pattern), I can load the first json...

Can I do something like this?

how to honor the matches in short entries greater than long entries?

Is there a built-in option so that when experimented with it, the density of the number of matching characters ( I mean the highlighted matches in red ) is honored more compared to the lengthier entries? I think that would increase the relevancy automatically.

Example screenshot:

Here, I searched for Hajj ... ( in 1.8Mb 13,000 items file ) And the most top of the line entries ( which are below ) ended up around at 100th or so in the suggested items.

  • Hajj (Pilgrimage) 247 @sahih al-Bukhari
  • Hajj: Hajj 261 @Muwatta Malik

They are pretty short and bingo like matches yet plenty of long ones were preceding them.

How can I easily rise them to the top? Or at least near to the top?

Here are the winners...

Screen Shot 2022-04-11 at 11 45 09 AM

Here are the poor little ones being crashed by the winners:

Screen Shot 2022-04-11 at 11 45 26 AM

Clearly, the red density on those short ones are noticeably higher.

Especially this guy:

Screen Shot 2022-04-11 at 11 59 08 AM

What do you think Jean?

I have a feeling this has to do with your compare method.
The solution may be it, but if so, how do I create that comparison?

I use

bonus_match_start: 0.6, 
highlight_bridge_gap: 0 

Must see it in practice, but what I'm asking for could create a tremendous difference in quality especially when main topics and sub-topics are searched like in my case.

Highlight exact search query

Characters that are not present in a search query get highlighted:

Code

import FuzzySearch from 'fz-search'; // v1.0.0

const nodes = [
  {
    value: 'primer'
  },
];

const fuzzy = new FuzzySearch({
  source: nodes,
  keys: 'value',
});
  
const [result] = fuzzy.search('pre').map(
  item => fuzzy.highlight(item.value)
);

console.log(result);

Actual result

<strong class="highlight">prime</strong>r

primer

Expected result

<strong class="highlight">pr</strong>im<strong class="highlight">e</strong>r

primer

Problem

As you can see, symbols im aren't present in a search query, but they are still highlighted.

Is it possible to avoid highlighting of symbols that are not present in a search query?

Scoring items higher / grouping results

Hi Jean, thank you for Fuzzysearch, it's excellent. My feed contains categories and products, I'm wondering if...

  1. I can give a higher score/priority to specific items, so they appear before anything else.
  2. I can group the results, so that matched categories always appear before matched products.

My feed looks like this...

[
{ "type": "P", "id": "123", "title": "Product Name", "descr": "Description", "sku": "ABC123", "price": "19.95", "image": "./images/image.jpg", "category": "Category Name" },
...
{ "type": "C", "id": "456", "title": "Category Name", "descr": "Description" },
...
]

Note, I'm using Twitter typeahead. Grateful for any advice you can give.
Phil

How can i do it with post method? and mysql

I need data from database via post method, so how can i do that?
I've just try as follow:
`<script>
var fuzzyhound = new FuzzySearch();

    $('#demo-query').typeahead({
        minLength: 2,
        highlight: false //let FuzzySearch handle highlight
    },
    {
        name: 'movies',
        source: fuzzyhound,
        templates: {
            suggestion: function(result){return "<div>"+fuzzyhound.highlight(result)+"</div>"}
        }
    });
$.ajaxSetup({cache: true});
function setsource(url, keys, output) {
$.post( url, { query:query,_csrf:_csrf })
    .done(function( response ) fuzzyhound.setOptions({
            source: response,
            keys: keys,
            output_map: output
        })
    });    
}
setsource("test.php");

</script>`
How can i define the keyword (query)?
Thanks

Node.js support?

I notice there's at least one place the library uses window.performance, which fails in node. It's also not published to npm... But it sure looks like a very nice module for autocompletion amongst other things. Any reason it doesn't support node and/or isn't published?

debug mode

Hi Jean,
Just to see which item matched with what score, is it possible modify the code so I can see numbers like

one flew over the cuckoos nest (1975) @ 78 // biggest score in this case was 78
one false move ( 1992) 54 // here the score is 54 and so on...
...
last item 20 // here the last item's score is 20 because field_good_enough setting is set to 20

If I see what score items are matching, I can play with field_good_enough more effectively to fine tune the numbers.

Ability to index documents incrementally

This patch provides the ability to add documents incrementally by using a new add method in conjunction with an identifyItem function. It does this by keeping track of the item ids in the indexmap dictionary where the values are the position into the index of where the item lives.

0001-Provide-the-ability-to-incrementally-add-to-the-inde.patch.zip

Additionally, this patch moves grunt into devDependencies instead of as a dependency.

0002-Make-grunt-a-devDependency.patch.zip

give higher ranking to full words?

i have these items to search through

  • I would like a SeaPass today
  • Can I buy Air2Sea tickets
  • i want to go to the sea

when I search "sea" the results are in this order

  • Can I buy Air2Sea tickets
  • I would like a SeaPass today
  • i want to go to the sea

I would like to give full words that are in the query that match higher ranking is this possible?

here are my configurations

  output_limit: 6,
  output_map: 'alias',
  source: this.items,
  thresh_include: 8,
  minimum_match: 1,
  bonus_token_order: 10,
  bonus_match_start: 10,
  bonus_position_decay: 0,
  highlight_bridge_gap: 2,
  highlight_prefix: true,
  thresh_relative_to_best: 0.5

Search multiple keyword

Hi...

Beforehand, I am really thank to you because of creating this library. This library help me to search keyword based on similarity.
My question :

  1. is this supporting for searching multiple keyword ? I wanna search keyword with "R and B" and "R&B"
  2. is this supporting for searching some keyword like "R and B" => "R&B" ? I wanna improve my searching like 1 => one or 12 => twelve.

"Sorry for my bad grammar"
Thanks

hiding html comments

Is it possible to let fuzzy-search search items like

United States <!--America-->

so the user can type America and match it but only see this in the search results:

United States

This way, we can add lots of helper keys or synonyms in HTML comments to help the search but not confuse the user with those helper words..

Think of it like this

user types "below" and you match "under" because of this:
under <!--below underneath -->

or user types "heaven" or "gardens of eden" and you match "paradise" because of this:
paradise <!--heaven, gardens of eden -->]

etc..

Index Structure

Current Situation

The classic edit distance algorithm is O(NMK)
Where N is number of item. M is average length of item and K the length of search term.

With bit parallelism we are able to shave K.
Which bring us closer to O(N*M) for a single word query.

We still have to visit every character of every items !

We would like an index structure that can reduce N.
That is pre-selecting some good candidates, on which we do the slower but higher quality scoring.

Index structure

We can improve over that using an index data structure.

They basically amount to pre-generating a number of spelling error that we can accept, and organize those results in an efficient way.

Symmetric Delete.

Out of the misspelling we can pre-compute, delete is one of the easiest. In particular because it does not require knowledge of the alphabet where the word exist.

The symmetric delete algorithm apply delete variation computation to both candidate and search term.

3 out of 5 strategy.

One way to understand symmetric delete, is that we are trying to match X out of Y first characters of the word. To do so we generate every subset of X characters at a given delete-distance.

For example, two deletes variations of "hello" and "hills" are:

// del2("hello")
[ 'llo', 'elo', 'ell', 'hlo', 'hll', 'heo', 'hel' ]

// del2("hills")
[ 'lls', 'ils', 'ill', 'hls', 'hll', 'his', 'hil' ]

They share the hll key, as they indeed match from a 3 out of 5 standpoint.

// del2("jello")
[ 'llo', 'elo', 'ell', 'jlo', 'jll', 'jeo', 'jel' ]

Jello and hello share llo, elo, ell keys. And thus is a better match than hello and hills.

Prefix and fallback

For longer words we simply take the first Y characters.

For smaller words we have fallback X out of Y strategies.

We also compute smaller X strategy for long words. (This allow smaller search query to match and is a difference between spell-check and autocomplete.)

Strategies

We following principle "editorial choice"

  • Match must happen toward the start of the word
    (With start that reset at each word)

  • Match must include at least half the characters
    (And more is better)

We consider the following strategy X out of Y:

(XoY) 
3o6,3o5,3o4,3o3
2o4,2o3,2o2
1o1 

Note that, for example, 2o4 include every variation included in 2o3 and some more. So for every X, we only compute one XoY strategy, with the largest Y available.

1o1 index the word by the first letter, and 2o2 is not included because we only consider word > 3 chars.

Those rules are subject to change as we gain more feedback.
For example 3o6 might generate too many variations.

For the search term we compute a single X out of Y strategy, so we use as many character typed as possible.

Absolute maximum on keys with above strategies ans a-z alphabet:
26^3 + 26^2 + 26 = 18278 keys.

Multi word

The multi-word strategy consist of splitting input in word and looping the key generation strategy on each of those words. All of the collected word part point to the same index entry.

This mean we'll have false positive where an entry can match the query using mix and match of word parts.

For now we accept this deficit because it simplify the index structure and it as a pre-selector we prefer to approximate toward false positive, than false negative.

A proper fix might involve two maps:
key => word and word =>entry

This might be needed if each entry is a large corpus and the pre selection part is rendered useless because of word-part soup.

In which case:

For every word of query:

word-part => word => filter for minimum quality.

Put those word together and retrieve entries:

word => entry => filter for minimum quality.
entry => refined score => filter for minimum quality.

Additional reading

This was inspired by this blog post, in particular they make very bold speed claim and the algorythm is relatively easy to understand.

http://blog.faroo.com/2012/06/07/improved-edit-distance-based-spelling-correction/

Considerations.

Javascript come built in with a good hashmap implementation, and we are taking advantage of that.

Data structure is mostly flat compared to say a radix tree, and that can help import/export and thus offset some of the preparation time.

As implemented the index structure has some measure of quality. It might be enough for certain use (IE do not need to fold that into FuzzySearch)

If this is implemented (and turned of) FuzzySearch will become less fuzzy. It may be a good thing. Or if not, it may be a preferable trade off to something like max-inner.

Implementation.

I've made a proof of concept of the above.
https://gist.github.com/jeancroy/9187627975a2632a8f8e4aaae17d358e

The work of integrating something like this into FuzzySearch only make sens if it is both much faster and fast enough for our need.

multiple occurrances of a highlight makes no diff

Hi Jean,

Please take a look at the screen shot which demonstrates the issue more visually.

In my demo, I type "Belief", And many Belief entries match but I want one that has the max "Belief" words to match as #1.

For that, I created a "belief" item in which I put 4 "Belief" words back to back.
But it is still not #1.

Is there a setting to move that kind of match higher?

Screen Shot 2022-04-01 at 8 34 23 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.