Giter Club home page Giter Club logo

piham's Introduction

PIHAM

Probabilistic Inference in Heterogeneous and Attributed Multilayer networks

License: MIT Made with Python ARXIV: 2301.11226

This repository contains the implementation of the PIHAM model presented in

   [1] Flexible inference in heterogeneous and attributed multilayer networks
        Contisciani M., Hobbhahn M., Power E.A., Hennig P., and De Bacco C. (2024)
        [ ArXiv ]

If you make use of this code please cite our work in the form of the reference [1] above.

What's included

  • src: Contains the Python implementation of the PIHAM algorithm, the code to generate synthetic data and additional utilities
  • data/input: Contains a synthetic dataset generated using the PIHAM approach
  • data/output: Contains some results

Requirements

In order to be able to run the code, you need to install the packages contained in requirements.txt. We suggest to create a conda environment with conda create --name PIHAM --no-default-packages, activate it with conda activate PIHAM, and install all the dependencies by running (inside the PIHAM directory):

pip install -r requirements.txt

Perform inference

To perform the inference in a given heterogeneous and attributed multilayer network, run:

python main_inference.py

The script takes in input the name of the dataset, the path of the folder where it is stored, and the number of communities K. It then executes the PIHAM algorithm from the file src/model.py using the configuration provided in the src/setting_inference.yaml file.

See the demo jupyter notebook for an example on how to analyse the output results.

Input format

The data should be stored in a .pt file, , which includes:

  • A: An adjacency tensor of dimension L x N x N containing the interactions of every layer
  • X_categorical: A design matrix with the categorical attribute
  • X_poisson: A design matrix with the Poisson attributes
  • X_gaussian: A design matrix with the Gaussian attributes

Here, L is the number of layers and N is the number of nodes.

The code example in this directory is suitable to analyze a network with L = 3 layers (one with binary interactions, the second with nonnegative discrete weights, and the third with real values) and three covariates (one categorical, one with nonnegative discrete values, and the last with real values). However, the model can be easily adapted to accommodate datasets with other data types.

Output

The algorithm outputs a compressed file inside the data/output folder. To load the inferred results and display the out-going membership matrix, run:

import numpy as np 
theta = np.load("theta_<file_label>.npz")
print(theta["U"])

The variable theta includes the following parameters inferred by PIHAM:

  • U: The out-going membership matrix of dimension N x K
  • V: The in-coming membership matrix of dimension N x K
  • W: The affinity tensor of dimension L x K x K
  • Hcategorical: The community-covariate matrix related to the categorical attribute of dimension K x Z_categorical
  • Hpoisson: The community-covariate matrix related to the Poisson attributes of dimension K x P_poisson
  • Hgaussian: The community-covariate matrix related to the Gaussian attribute of dimension K x P_gaussian
  • Cov: The covariance matrix
  • Cov_diag: The diagonal matrix of the variances

Here, K is the number of communities, Z_categorical is the number of categories for the categorical attribute, P_poisson is the number of Poisson attributes, and P_gaussian is the number of Gaussian attributes.

Run a cross-validation routine

If you are interested in assessing the prediction performance of PIHAM in a dataset for a given K, run:

python main_cv.py

The script takes in input the following parameters:

  • in_folder: Path of the input folder
  • data_file: Name of the dataset to analyse
  • K: Number of communities
  • NFold: Number of folds for the cross-validation routine
  • cv_type: Type of cross-validation routine
  • out_results: Flag to save the prediction performance
  • --out_mask: Flag to save the masks used during the cross-validation routine to hide entries of A and X
  • --out_inference: Flag to save the inferred parameters during the cross-validation routine

For each fold, the script runs the PIHAM algorithm on the training set to learn its parameters, and evaluates its performance on the test set. This process is repeated NFold times, each time with a different fold as the test set. Various performance metrics are used depending on the type of information being evaluated. The results are saved in a .csv file in the data/output/cv folder.

Generate synthetic data

If you want to generate synthetic data using the PIHAM approach, run:
python main_generation.py

The script takes in input the number of independent samples to generate, a random seed, the number of communities K, and the number of nodes N. The code example generates a heterogeneous and attributed network with L = 3 layers (one with binary interactions, the second with nonnegative discrete weights, and the third with real values) and three covariates (one categorical, one with nonnegative discrete values, and the last with real values), using the default parameters specified in the file src/synthetic.py. However, the script can be easily adapted to generate datasets with other data types and parameters.

piham's People

Contributors

mcontisc avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.