lithammer / fuzzysearch Goto Github PK
View Code? Open in Web Editor NEW:pig: Tiny and fast fuzzy search in Go
Home Page: https://pkg.go.dev/github.com/lithammer/fuzzysearch
License: MIT License
:pig: Tiny and fast fuzzy search in Go
Home Page: https://pkg.go.dev/github.com/lithammer/fuzzysearch
License: MIT License
Hello, I think creating the Normalize and NormalizaFold functions might be a good idea.
When you want to reuse a []target you don't need to normalize it again. Just normalize once and then normalize source and use Find().
In fzf
, for example, there's so called "exact search mode", triggered by '
, which searches the exact phrase provided. Would it be possible to have something like this in fuzzysearch
?
I.e., as of now, searching the string ae
in archive, arkenfox, aerc, zambezi
will return all four candidates. However, if an option for strict matching could be added, the result would be only one, aerc
.
Why needed? Well, because exactly of this, sometimes there are just too many candidates :)
What do you think?
First of all, what an awesome lib! Thanks for the work. But I do have this design doubt:
Let's say I wanna compare strings two by two to a target using RankMatchNormalizedFold
, for example:
s1 := "Hello Wor"
s2 := "Hello"
target := "hello world"
fmt.Println(fuzzy.RankMatchNormalizedFold(s1, target)) //2
fmt.Println(fuzzy.RankMatchNormalizedFold(s2, target)) //6
I could easily use math.Min(2,6) -> "Hello Wor"
to find the best string in this case. But it would get me in trouble if any of the strings don't match, eg:
s1 := "foo"
s2 := "Hello"
target := "hello world"
fmt.Println(fuzzy.RankMatchNormalizedFold(s1, target)) //-1
fmt.Println(fuzzy.RankMatchNormalizedFold(s2, target)) //6
Where math.Min(-1,6) -> "foo"
Why using -1 instead of math.MaxInt
if the lower the distance, the closer the strings?
Hello everyone.
I want to ask you why this library uses string type for targets, not a Stringer interface or other. I think devs more often have to search in a slice of objects than in an array of raw strings. I suppose that this change type for targets will be useful.
Hi,
Thank you for maintaining this repo.
Was wondering if you could create a new release please? The last release seems to be using go 1.15 and was in Oct 2020.
Thanks.
It seems it's trying to reslice with invalid index in v1.1.5
panic: runtime error: slice bounds out of range [5:3]
goroutine 399472 [running]:
golang.org/x/text/transform.String({0xe4c3d0, 0x13ce5c0}, {0xc00042d358, 0x3})
/vendor/golang.org/x/text/transform/transform.go:650 +0xbe5
github.com/lithammer/fuzzysearch/fuzzy.stringTransform({0xc00042d358, 0x3}, {0xe4c3d0?, 0x13ce5c0?})
/vendor/github.com/lithammer/fuzzysearch/fuzzy/fuzzy.go:243 +0x5a
github.com/lithammer/fuzzysearch/fuzzy.match({0xc00042d358?, 0x44fe72?}, {0xc0007a1950, 0x13}, {0xe4c3d0, 0x13ce5c0})
When I run go mod tidy
I get the following error:
go mod tidy
go: finding module for package github.com/lithammer/fuzzysearch
project/cmd imports
github.com/lithammer/fuzzysearch: module github.com/lithammer/fuzzysearch@latest found (v1.1.2), but does not contain package github.com/lithammer/fuzzysearch
The import is not resolving correctly in my project. Any ideas?
Hi,
Damerau–LevenshteinDistance is the minimum number of edit operations ( insertions, deletions ,substitutions and transposition) required to change one word to another.It has one more action(transposition) compared to LevenshteinDistance.
I think Damerau–LevenshteinDistance is more practical than LevenshteinDistance. And it's time complexity is also O(N^2), but unfortunately it's space complexity is O(N^2) while LevenshteinDistance's space complexity is O(N)
Do you think it's an acceptable change? If so, I would like to raise a PR to change LevenshteinDistance to Damerau–Levenshtein distance.
Current implementation assumes that strings are indexed by characters, this assumption holds only for 1-byte UTF8 subset (latin alphabet); Go strings are UTF8 strings, so proper way of iterating over characters would be using range
. Right now for non-latin strings this would search byte distance, rendering this package useless (in a non-obvious way).
README.md says:
fuzzy.RankMatch("kitten", "sitting") // 3
But RankMatch
returns -1
always.
package main
import (
"fmt"
"github.com/lithammer/fuzzysearch/fuzzy"
)
func main() {
src := "kitten"
target := "sitting"
m := fuzzy.RankMatch(src, target)
fmt.Println(m)
}
-1
The min
function used in the Levenshtein distance algorithm can be improved by using the CMOV
instruction instead of conditional jumps. The updated function produces better performance than the original min
function.
link: https://gcc.godbolt.org/z/EazrW9zKb
func min2(a, b int) int {
if a < b {
return a
}
return b
}
func min3(a, b, c int) int {
return min2(min2(a, b), c)
}
@Renstrom , I'd like to add some facility for case insensitivity would that be find with you?
I noticed that when doing Match with no normalization or folding the nop transformer does a unecessary roundtrip string -> []byte -> string, this results in 2 allocations, in my case these 2 allocs per operation end up taking 40% of the CPU time and adding a lot of GC pressure
I made a patch that fixes the issue by avoiding the transform.Nop (all tests passed, and the benchmarks confirmed 0 allocs)
noallocnop.txt
Hi,
I'm planning of using your library, but in order to have no future build-issues, I would like to pin versions to avoid the obvious problems. Would you mind creating branches/tags for your releases? The (simple) instructions are on: https://labix.org/gopkg.in
Cheers,
Mark
Can we get a implementation where abc xyz
and xyz abc
match
Hi, thanks for this library!
Currently, it seems that MatchFold can be used for case insensitive matching, but there doesn't seem to be any way to do case insensitive RankFind.
RankFind
still uses plain Find
to determine initial results, then measures Levenshtein distance afterwards. Maybe we can have another RankFindLevenshtein
that uses Levenshtein distance to determine results.
Add this test case to var fuzzyTests
, and run the tests:
{"\xffinvalid UTF-8\xff", "", false, -1},
Result:
--- FAIL: TestFuzzyMatchFold (0.00s)
panic: runtime error: slice bounds out of range [19:15] [recovered]
panic: runtime error: slice bounds out of range [19:15]
goroutine 35 [running]:
testing.tRunner.func1.2({0x1031423e0, 0x14000114018})
/Users/josh/go/1.20/src/testing/testing.go:1526 +0x1c8
testing.tRunner.func1()
/Users/josh/go/1.20/src/testing/testing.go:1529 +0x384
panic({0x1031423e0, 0x14000114018})
/Users/josh/go/1.20/src/runtime/panic.go:884 +0x204
golang.org/x/text/transform.String({0x103153b28, 0x103297c60}, {0x1030cb139, 0xf})
/Users/josh/pkg/mod/golang.org/x/[email protected]/transform/transform.go:650 +0x9e4
github.com/lithammer/fuzzysearch/fuzzy.stringTransform({0x1030cb139, 0xf}, {0x103153b28?, 0x103297c60?})
/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:242 +0x64
github.com/lithammer/fuzzysearch/fuzzy.match({0x1030cb139?, 0x7?}, {0x0, 0x0}, {0x103153b28, 0x103297c60})
/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:55 +0x38
github.com/lithammer/fuzzysearch/fuzzy.MatchFold(...)
/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:41
github.com/lithammer/fuzzysearch/fuzzy.TestFuzzyMatchFold(0x1400011cb60)
/Users/josh/x/fuzzysearch/fuzzy/fuzzy_test.go:65 +0xbc
testing.tRunner(0x1400011cb60, 0x103152088)
/Users/josh/go/1.20/src/testing/testing.go:1576 +0x10c
created by testing.(*T).Run
/Users/josh/go/1.20/src/testing/testing.go:1629 +0x368
exit status 2
FAIL github.com/lithammer/fuzzysearch/fuzzy 0.131s
This existed prior to #53 (phew!). The root cause is that unicodeFoldTransformer.Transform is returning n, n, err
, but when utf8.RuneError is present, nSrc
may differ from nDst
. I'll try to put together a fix sometime soonish.
(Found by fuzzing. Once the fuzz tests make it out of the gate without stumbling, I'll PR them.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.