Giter Club home page Giter Club logo

fuzzywuzzy's Introduction

FuzzyWuzzy

Build Status

Fuzzy string matching for PHP, based on the python library of the same name.

Requirements

  • PHP 5.4 or higher

Installation

Using Composer

composer require wyndow/fuzzywuzzy

Usage

use FuzzyWuzzy\Fuzz;
use FuzzyWuzzy\Process;

$fuzz = new Fuzz();
$process = new Process($fuzz); // $fuzz is optional here, and can be omitted.

Simple Ratio

>>> $fuzz->ratio('this is a test', 'this is a test!')
=> 96

Partial Ratio

>>> $fuzz->partialRatio('this is a test', 'this is a test!')
=> 100

Token Sort Ratio

>>> $fuzz->ratio('fuzzy wuzzy was a bear', 'wuzzy fuzzy was a bear')
=> 90
>>> $fuzz->tokenSortRatio('fuzzy wuzzy was a bear', 'wuzzy fuzzy was a bear')
=> 100

Token Set Ratio

>>> $fuzz->tokenSortRatio('fuzzy was a bear', 'fuzzy fuzzy was a bear')
=> 84
>>> $fuzz->tokenSetRatio('fuzzy was a bear', 'fuzzy fuzzy was a bear')
=> 100

Process

>>> $choices = ['Atlanta Falcons', 'New York Jets', 'New York Giants', 'Dallas Cowboys']
>>> $c = $process->extract('new york jets', $choices, null, null, 2)
=> FuzzyWuzzy\Collection {#205}
>>> $c->toArray()
=> [
     [
       "New York Jets",
       100,
     ],
     [
       "New York Giants",
       78,
     ],
   ]
>>> $process->extractOne('cowboys', $choices)
=> [
     "Dallas Cowboys",
     90,
   ]

You can also pass additional parameters to extractOne to make it use a specific scorer.

>>> $process->extractOne('cowbell', $choices, null, [$fuzz, 'ratio'])
=> [
     "Dallas Cowboys",
     38,
   ]
>>> $process->extractOne('cowbell', $choices, null, [$fuzz, 'tokenSetRatio'])
=> [
     "Dallas Cowboys",
     57,
   ]

Caveats

Unicode strings may produce unexpected results. We intend to correct this in future versions.

Further Reading

fuzzywuzzy's People

Contributors

mcrumm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fuzzywuzzy's Issues

Get the matching tokens list

Hi,Thanks for this wonderful library. I am using token_set_ratio for my task. If there is a match beyond a threshold, I want to get the matching tokens also. Is there a way to get the matching tokens list from token_set_ratio?

Update namespace

Use FuzzyWuzzy as the root namespace to more closely match the original library

Help with a basic example

Can you please provide a simple usage example? The one you provided is for python I guess. I'm not getting any output on my screen.
Thanks.

PHP 8.2.8 Fatal error: During inheritance of ArrayAccess

In PHP 8.2.8 I'm getting the following error:

PHP Fatal error:  During inheritance of ArrayAccess: Uncaught ErrorException: Return type of FuzzyWuzzy\Collection::offsetExists($offset) should either be compatible with ArrayAccess::offsetExists(mixed $offset): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /var/www/html/config/bootstrap.php:25

Stack trace:
#0 /var/www/html/vendor/wyndow/fuzzywuzzy/lib/Collection.php(10): {closure}()
#1 /var/www/html/vendor/composer/ClassLoader.php(576): include('...')
#2 /var/www/html/vendor/composer/ClassLoader.php(427): Composer\Autoload\{closure}()
#3 /var/www/html/vendor/wyndow/fuzzywuzzy/lib/StringProcessor.php(56): Composer\Autoload\ClassLoader->loadClass()
#4 /var/www/html/vendor/wyndow/fuzzywuzzy/lib/Fuzz.php(272): FuzzyWuzzy\StringProcessor::split()
#5 /var/www/html/vendor/wyndow/fuzzywuzzy/lib/Fuzz.php(255): FuzzyWuzzy\Fuzz->processAndSort()
#6 /var/www/html/vendor/wyndow/fuzzywuzzy/lib/Fuzz.php(180): FuzzyWuzzy\Fuzz->tokenSort()
#7 /var/www/html/src/Library/FullTitleMatcher.php(98): FuzzyWuzzy\Fuzz->tokenSortRatio()

Inverting string order gives different results

$stringA = 'Alpin 4 Grnx';
$stringB = 'AGILIS+ GRNX';
$fuzzy = FuzzyWuzzy::getFuzzy();
echo $fuzzy->tokenSortRatio($stringA, $stringB) . "\n"; // 69
echo $fuzzy->tokenSortRatio($stringB, $stringA) . "\n"; // 60

I don't presume this is the intended behaviour, as when testing this with the javascript port of the original python library, the results on tokenSort-ing are the same.

Difference with python library

Hi,

great library but I notice difference compared to the python library. I also use it in nodjs and the result there are same as the python library but this is not the case in PHP.
This is the example result using python library

choices = ["ACOB751", "ACAB5861"]
process.extract("ACO8751", choices, scorer=fuzz.partial_ratio)
[('ACOB751', 86), ('ACAB5861', 53)]

And same example in PHP

$choices = ["ACOB751", "ACAB5861"];
$this->fuzzProcess->extract(
    $choices,
    "ACO8751",
    null,
    [$this->fuzz, 'partialRatio']
);
Collection {#1081
    -elements: array:2 [
        0 => array:2 [
            0 => "ACOB751"
            1 => 42
        ]
        1 => array:2 [
            0 => "ACAB5861"
            1 => 30
        ]
    ]
}

Any advice? Thanks.

slow matchin

Deployed fuzzywuzzy in my php code, but matches loading sort of slow. Short string approx 2s longer takes 6s and more
What trick I can use to speed it up? Resources are pretty fat.. mysql dbase with plenty of mem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.