Giter Club home page Giter Club logo

ssc-ipa's Introduction

SSC-IPA

Source code of "Semi-Supervised Clustering with Inaccurate Pairwise Annotations" (Gribel, Gendreau and Vidal, 2021).

Related Article

Semi-Supervised Clustering with Inaccurate Pairwise Annotations: https://arxiv.org/abs/2104.02146

Run

To run the SSC-IPA algorithm, open the Julia terminal and try the following commands:

julia> include("Optimizer.jl")

julia> in = Input(seed, max_it, supervision_flag, prior)

julia> main("dataset", "must_graph", "cannot_graph", in)

Example

julia> include("Optimizer.jl")

julia> in = Input(1234, 50, 1, 0.9)

julia> main("vertebral.data", "vertebral-must.link", "vertebral-cannot.link", in)

Parameters of Input

seed: Numerical seed

max_it: Maximum number of iterations the algorithm will take.

supervision_flag: Determines if pairwise supervision is used (0: unsupervised algorithm, 1: semi-supervised algorithm).

prior: Prior estimation regarding the experts' accuracy (between 0 and 1; enter -1 for no priors)

Parameters of the main function

dataset: Dataset file. Important: You must provide a file with the .data extension along with a labels (ground-truth) file. The labels file must have the .label extension. Example: For a dataset named "vertebral.data", you must provide the "vertebral.label" file in the same folder.

must_graph: Must-link graph file.

cannot_graph: Cannot-link graph file.

Important: The dataset, labels, must-link graph, and cannot-link graph files must be within the /data folder inside the project.

Data format

Dataset files. The dataset file has N rows and D columns, where N is the number of data samples and D is the number of features. Each line contains the values of the D features of a data sample, where xij correspond to the j-th feature of the i-th sample of the data. Each feature value is separated by a single space, as depicted in the scheme below:

x11 x12 x13 ... x1d
x21 x22 x23 ... x2d
... ... ... ... ...
xn1 xn2 xn3 ... xnd

Important: The dataset files must have the .data extension.

Graph files. A graph file (must-link or cannot-link) has m rows and 3 columns, where m is the number of connections (links) in the graph. The first two columns represent the two data samples of an edge, whereas and third column represents the edge weight. The scheme below describes a graph file, where si and ti are two connected samples, and wi is the corresponding edge weight:

s1 t1 w1
s2 t2 w2
... ... ...
sm tm wm

Labels files. The content of a labels file exhibits the cluster of each sample of the dataset according to the ground-truth, where yi corresponds to the label of the i-th sample:

y1

y2

...

yn

Important: The labels files must have the .label extension.

ssc-ipa's People

Contributors

danielgribel avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.