Giter Club home page Giter Club logo

svhc's Introduction

Statistically Validated Hierarchical Clustering

Installation

On Ubuntu Linux

$ sudo apt-get install pip python-tk
$ sudo pip install svhc

On Mac OsX

The easist way to install the library is with easy_install (or with port).

$ sudo easy_install install pip
$ sudo pip install svhc

if there is a problem with the ssl certificate, then run:

$ curl https://bootstrap.pypa.io/get-pip.py | sudo python
$ sudo pip install svc

if the last command raise conflicts, then you should try:

$ sudo pip install --ignore-installed svhc

Then you can try to call the script:

$ svhc

if it does not find the script, you can add the path temporaneously with the command:

export PATH="/Users/$USER/Library/Python/2.7/bin:$PATH"

or permanently with (da provare):

$ echo "export PATH=$PATH:/Users/$USER/Library/Python/2.7/bin" >>  ~/.bash_profile

on Windows

Install python2.7 and pip by following the instruction at the link. Then from the prompt

pip install svhc

Usage

Generate a Benchmark

If you want generate a dataset starting from a factor model, then you need a factor loading matrix. It is possible to download an example of such matrix from the link (pattern_example.dat). Then, to generate the data series run:

$ svhc_benchmark pattern_example.dat 500 test

where 500 is the lenght of the data series. Test is the output name. The program will produce test_dataSeries_benchmark.dat, that is the data matrix (a matrix 500x100), and test_cluster_reference.dat, that is the list of the nodes that belong to each cluster; each line is a different cluster, nodes are comma separated.

If you want to add noise to the data you can use the optional parameter:

$ svhc_benchmark pattern.dat 500 test --noise 0.3

Evaluate Statistically Validated Hierachical Clusters

To estimate the validated clusters on the dataseries generated in the previous example, you can run:

$ svhc test_dataSeries_benchmark.dat 1000 test

where 1000 is the number of bootstrap copies, and test is the output name. If you want to use your own dataset, please remember to store your object by columns tab separated and without header of index numbers. Then the algoithm will find the clusters of objects.

Few optional parameters are allowed:

$ svhc test_dataSeries_benchmark.dat 1000 test --alpha 0.5 --nan 0 --ncpu 1

where alpha is the confidence of the FDR multiple comparison correction (default 0.05); nan is a boolean entry that must be fixed to 1 if there are NaN entries in the dataset; ncpu is the number of core used for the evaluation of the bootstrap copies (default 1).

The program will produce three files: test_Validated_Cluster.dat, that is the list (by row) of validated clusters, each row is the list of nodes within the cluster (comma separated); prova_pvalue.dat that is the list of pvalues associated to each cluster; test_dendrogram.dat that contain the full information of the dendrogram.

Plot the Dendrogram

To plot the dendrogram with the output of the previous example please run:

$ svhc_plot test_dataSeries_benchmark.dat test_Validated_Cluster.dat test_dendrogram.dat test_pic

the program will procude a test_pic.pdf

svhc's People

Contributors

cbongiorno avatar

Stargazers

 avatar  avatar Lorenzo Campoli avatar Luca Pinello avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.