PIHAM

Probabilistic Inference in Heterogeneous and Attributed Multilayer networks

This repository contains the implementation of the PIHAM model presented in

   [1] Flexible inference in heterogeneous and attributed multilayer networks
        Contisciani M., Hobbhahn M., Power E.A., Hennig P., and De Bacco C. (2024)
        [ ArXiv ]

If you make use of this code please cite our work in the form of the reference [1] above.

What's included

src: Contains the Python implementation of the PIHAM algorithm, the code to generate synthetic data and additional utilities
data/input: Contains a synthetic dataset generated using the PIHAM approach
data/output: Contains some results

Requirements

In order to be able to run the code, you need to install the packages contained in requirements.txt. We suggest to create a conda environment with conda create --name PIHAM --no-default-packages, activate it with conda activate PIHAM, and install all the dependencies by running (inside the PIHAM directory):

pip install -r requirements.txt

Perform inference

To perform the inference in a given heterogeneous and attributed multilayer network, run:

python main_inference.py

The script takes in input the name of the dataset, the path of the folder where it is stored, and the number of communities K. It then executes the PIHAM algorithm from the file src/model.py using the configuration provided in the src/setting_inference.yaml file.

See the demo jupyter notebook for an example on how to analyse the output results.

Input format

The data should be stored in a .pt file, , which includes:

A: An adjacency tensor of dimension L x N x N containing the interactions of every layer
X_categorical: A design matrix with the categorical attribute
X_poisson: A design matrix with the Poisson attributes
X_gaussian: A design matrix with the Gaussian attributes

Here, L is the number of layers and N is the number of nodes.

The code example in this directory is suitable to analyze a network with L = 3 layers (one with binary interactions, the second with nonnegative discrete weights, and the third with real values) and three covariates (one categorical, one with nonnegative discrete values, and the last with real values). However, the model can be easily adapted to accommodate datasets with other data types.

Output

The algorithm outputs a compressed file inside the data/output folder. To load the inferred results and display the out-going membership matrix, run:

import numpy as np 
theta = np.load("theta_<file_label>.npz")
print(theta["U"])

The variable theta includes the following parameters inferred by PIHAM:

U: The out-going membership matrix of dimension N x K
V: The in-coming membership matrix of dimension N x K
W: The affinity tensor of dimension L x K x K
Hcategorical: The community-covariate matrix related to the categorical attribute of dimension K x Z_categorical
Hpoisson: The community-covariate matrix related to the Poisson attributes of dimension K x P_poisson
Hgaussian: The community-covariate matrix related to the Gaussian attribute of dimension K x P_gaussian
Cov: The covariance matrix
Cov_diag: The diagonal matrix of the variances

Here, K is the number of communities, Z_categorical is the number of categories for the categorical attribute, P_poisson is the number of Poisson attributes, and P_gaussian is the number of Gaussian attributes.

Run a cross-validation routine

If you are interested in assessing the prediction performance of PIHAM in a dataset for a given K, run:

python main_cv.py

The script takes in input the following parameters:

in_folder: Path of the input folder
data_file: Name of the dataset to analyse
K: Number of communities
NFold: Number of folds for the cross-validation routine
cv_type: Type of cross-validation routine
out_results: Flag to save the prediction performance
--out_mask: Flag to save the masks used during the cross-validation routine to hide entries of A and X
--out_inference: Flag to save the inferred parameters during the cross-validation routine

For each fold, the script runs the PIHAM algorithm on the training set to learn its parameters, and evaluates its performance on the test set. This process is repeated NFold times, each time with a different fold as the test set. Various performance metrics are used depending on the type of information being evaluated. The results are saved in a .csv file in the data/output/cv folder.

Generate synthetic data

If you want to generate synthetic data using the PIHAM approach, run:

python main_generation.py

The script takes in input the number of independent samples to generate, a random seed, the number of communities K, and the number of nodes N. The code example generates a heterogeneous and attributed network with L = 3 layers (one with binary interactions, the second with nonnegative discrete weights, and the third with real values) and three covariates (one categorical, one with nonnegative discrete values, and the last with real values), using the default parameters specified in the file src/synthetic.py. However, the script can be easily adapted to generate datasets with other data types and parameters.

mcontisc / piham Goto Github PK

piham's Introduction

PIHAM

Probabilistic Inference in Heterogeneous and Attributed Multilayer networks

What's included

Requirements

Perform inference

Input format

Output

Run a cross-validation routine

Generate synthetic data

piham's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent