Giter Club home page Giter Club logo

hamedfathi / simmetricscore Goto Github PK

View Code? Open in Web Editor NEW
8.0 3.0 6.0 38 KB

A text similarity metric library, e.g. from edit distance's (Levenshtein, Gotoh, Jaro, etc) to other metrics, (e.g Soundex, Chapman). This library is compiled based on the .NET standard with a lot of useful extension methods.

License: MIT License

C# 100.00%
levenshtein metrics similarity similarity-score csharp dotnet metric levenshtein-distance jaro jaro-distance

simmetricscore's Introduction

opacity

Open Source Love Nuget Nuget

Install-Package SimMetricsCore

dotnet add package SimMetricsCore

SimMetricsCore supports the following algorithms

BlockDistance
ChapmanLengthDeviation
ChapmanMeanLength
CosineSimilarity
DiceSimilarity
EuclideanDistance
JaccardSimilarity
Jaro 
JaroWinkler 
Levenstein // Default
MatchingCoefficient 
MongeElkan
NeedlemanWunch 
OverlapCoefficient
QGramsDistance
SmithWaterman 
SmithWatermanGotoh
SmithWatermanGotohWindowedAffine

Extension Methods

// GetSimilarity
// [0-1] => [0%-100%] similarity 
double GetSimilarity(this string firstWord, string secondWord, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)
SimMetricResult GetMinSimilarityInfo(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)

// GetSimilarities
// Get similarity score for each input.
IEnumerable<SimMetricResult> GetSimilarities(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)
IEnumerable<SimMetricResult> GetSimilarities(this string first, string[] second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)

// GetMinSimilarity
// Returns the first item that has the least similarity.
string GetMinSimilarity(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)
SimMetricResult GetMinSimilarityInfo(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)

// GetMinSimilarities
// Returns the items that have the least similarity. 
// A list can contain unique items with the same similarity score.
IEnumerable<string> GetMinSimilarities(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)
IEnumerable<SimMetricResult> GetMinSimilaritiesInfo(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)

// GetMaxSimilarity
// Returns the first item that has the most similarity.
string GetMaxSimilarity(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)
SimMetricResult GetMaxSimilarityInfo(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)

// GetMaxSimilarities
// Returns the items that have the most similarity. 
// A list can contain unique items with the same similarity score.
IEnumerable<string> GetMaxSimilarities(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)
IEnumerable<SimMetricResult> GetMaxSimilaritiesInfo(this string first, IEnumerable<string> second, SimMetricType simMetricType = SimMetricType.Levenstein, bool convertToPercentage = false)

// Contains
// Getting closer to '1.0' for the 'threshold' increases the accuracy of the comparison.
bool ContainsFuzzy(this string source, string search, double threshold = 0.7, SimMetricType simMetricType = SimMetricType.Levenstein)
// Returns approved values from the 'source' items.
IEnumerable<string> ContainsFuzzy(this IEnumerable<string> source, string search, double threshold = 0.7, SimMetricType simMetricType = SimMetricType.Levenstein)
IEnumerable<string> ContainsFuzzy(this string[] source, string search, double threshold = 0.7, SimMetricType simMetricType = SimMetricType.Levenstein)

SimMetricResult class contains the following data:

public class SimMetricResult
{
    public string Item { get; set; }
    // [0-1] => [0%-100%] similarity 
    public double Score { get; set; }
}

Icons made by Freepik from www.flaticon.com

simmetricscore's People

Contributors

hamedfathi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.