Giter Club home page Giter Club logo

upm's Introduction

Unsupervised Product Matching Using Combinations and Permutations

A product matching algorithm that groups products into clusters of similarity based on their title using combinations and permutations.

Based on the paper "Unsupervised Product Matching Using Combinations and Permutations" by Leonidas Akritidis, Athanasios Fevgas, Panayiotis Bozanis and Christos Makris https://arxiv.org/pdf/1903.04276.pdf

Structure of Project

bin - Contains the executables
build - Contains the object files (.o)
include - Contains the header files
src - Contains the C++ code
build.bat - Builds the project and creates the main.exe file in bin (for Windows)
build.sh - Builds the project and creates the main.exe file in bin (for Ubuntu)
* Change the directory to the build directory build.bat or build.sh for your system

How to Use

Simply create a Product object that takes the vendor (the retailer to which the product is associated). The vendor parameter is a string. The next parameter is the title of the product, which is also a string. See main.cpp to see the test code.

How it Works

For each product, all of the combinations of the tokens (words) of the title are created up to n choose 6. Each token has its own properties, like its frequency and semantic value (item model, attribute or normal token) and based on this, they get different weights. The combinations also have their own properties, such as their Euclidean distance average and the frequency of the combination among the products. From this, using the equations outlined in the paper, each product is assigned a "dominating cluster", that is the combination that best represents what the product is. The products with the same cluster represent the same product.

Uses

This algorithm aims to solve the issue of how to deal with discrepincies among products across different retailers. Using this algorithm, it should be able identify the correct matching product among a wide range of products.

Reality

In contrast to the aim of this algorithm, it doesn't actually do its job. For example, running the algorithm on the set of data provided by the authors, it is very inaccurate, and it is not my implementation. I acheived similar, if not identical results using the author's algorithm, which can be found at https://github.com/lakritidis/UPM-full

upm's People

Contributors

mascerade avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

upm's Issues

What does the reality disclaimer mean?

Sorry to open this here as I don't know where else to ask this. Did you mean this works better than https://github.com/lakritidis/UPM-full, or the other way around?

Reality

In contrast to the aim of this algorithm, it doesn't actually do its job. For example, running the algorithm on the set of data provided by the authors, it is very inaccurate, and it is not my implementation. I acheived similar, if not identical results using the author's algorithm, which can be found at https://github.com/lakritidis/UPM-full

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.