Giter Club home page Giter Club logo

dmprdpg's Introduction

Spectral embedding of dynamic multiplex networks

This repository contains a Python library supporting the work on spectral embedding of dynamic multiplex graphs by Maximilian Baum, Francesco Sanna Passino, and Axel Gandy (Department of Mathematics, Imperial College London).

The library dmprdpg can be installed in edit mode as follows:

pip3 install -e lib/

The library can then be imported in any Python session:

import dmprdpg

A demo on how to use the library can be found in notebooks/test_library.ipynb.

In this work, we explore a spectral embedding method for dynamic graphs with multiple layers. The base model for this work is a network in which a set of shared nodes exhibit connections across a number of different layers and this multiplex network is observed at a fixed number of points in time. We extend the theory of Unfolded Adjacency Spectral Embedding (UASE) to the dynamic case and plan to provide stability guarantees as well as a central limit theorem.

Background and Notation

We consider the case of an undirected network with $K$ layers observed at $T$ points in time for $K, T \in \mathbb{N}$. This network can be encoded in a collection of adjacency matrices $\textbf{A} = {\textbf{A}_{k,t}}$ where $k = 1,\dots , K$ and $t = 1,\dots , T$. Currently, we consider only the case of undirected networks. For our model, we adopt the concept of the latent position model in which the connection probabilities between nodes are defined by each node's latent position in an underlying $d$ dimensional embedding space. Specifically, each node in our model is represented by a position in two different embedding spaces $\mathcal{X} \subset \mathbb{R}^d$ and $\mathcal{Y} \subset \mathbb{R}^d$ where the positions $\textbf{X}^{k}_i \in \mathcal{X}$ are shared across time but are different across layers and the positions $\textbf{Y}^{t}_j \in \mathcal{Y}$ are shared across layers but vary over time. The connection probability for nodes $i$ and $j$ at time $t$ in layer $k$ is given by the inner product of these positions. We can therefore express the adjacency matrices probabilistically as

$$\textbf{A}_{k,t, i,j} \sim \mathrm{Bernoulli}\left(\textbf{X}^{k \intercal}_i \textbf{Y}^{t}_j\right).$$

Methodology

Given a set of adjacency matrices $\textbf{A}_{k,t} \in \lbrace 0,1\rbrace^{n \times n}$ we define the adjacency unfolding $\textbf{A} \in \lbrace 0,1\rbrace ^{nK \times nT}$ as:

$$\textbf{A} = \begin{bmatrix} \textbf{A}_{1,1} & \dots & \textbf{A}_{1,T} \\\ \vdots & \ddots & \vdots \\\ \textbf{A}_{K,1} & \dots & \textbf{A}_{K,T} \end{bmatrix}.$$

In order to estimate the latent positions $\textbf{X}$ and $\textbf{Y}$, we propose Doubly Unfolded Adjacency Spectral Embedding (DUASE) for dynamic multiplex graphs. Given the realized adjacency matrices $\textbf{A}_{k,t}$ $k = 1,\dots , K$ and $t = 1,\dots , T$ we make use of a truncated SVD of rank $d$ to obtain a low-rank approximation of the doubly unfolded matrix $\textbf{A}$ as $\textbf{A} \approx \textbf{UDV}^{\intercal}$ where $\textbf{D}$ contains the top $d$ singular values on the diagonal and $\textbf{U}$ and $\textbf{V}$ contain the corresponding singular vectors. The estimates for the latent positions for each node are recovered according to

$$\hat{\textbf{X}} = \textbf{UD}^{1/2}, \hspace{1cm} \hat{\textbf{Y}} = \textbf{VD}^{1/2}.$$

We then retrieve the estimates $\hat{\textbf{X}}^k$ and $\hat{\textbf{Y}}^t$ by unstacking the $n \times d$ chunks of $\hat{\textbf{X}}$ and $\hat{\textbf{Y}}$ .

Simulation Results

We have generated a number of computer simulations to empirically test the properties of our embedding technique in the simplified case of a stochastic block model. This represents the setting where each node is assigned to one of a fixed number of communities which have a shared latent position. In this case, our model has $1000$ nodes and these are equally distributed among four communities with latent positions defined such that at time $t$ in layer $k$ the $(i,j)^{th}$ entry of $B^{(k)}_t$ denotes the connection probability between a node in community $i$ and a node in community $j$.

$$B^{(1)}_1 = \begin{bmatrix} 0.08 & 0.02 & 0.18 & 0.10\\\ 0.02 & 0.20 & 0.04 & 0.10\\\ 0.18 & 0.04 & 0.02 & 0.02\\\ 0.10 & 0.10 & 0.02 & 0.06 \end{bmatrix}, % B^{(1)}_2 = \begin{bmatrix} 0.16 & 0.16 & 0.04 & 0.10\\\ 0.16 & 0.16 & 0.04 & 0.10\\\ 0.04 & 0.04 & 0.09 & 0.02\\\ 0.10 & 0.10 & 0.02 & 0.06 \end{bmatrix}, % B^{(1)}_3 = \begin{bmatrix} 0.08 & 0.02 & 0.18 & 0.10\\\ 0.02 & 0.20 & 0.04 & 0.10\\\ 0.18 & 0.04 & 0.02 & 0.02\\\ 0.10 & 0.10 & 0.02 & 0.06 \end{bmatrix}$$ $$B^{(2)}_1 = \begin{bmatrix} 0.08 & 0.02 & 0.18 & 0.10\\\ 0.02 & 0.20 & 0.04 & 0.10\\\ 0.18 & 0.04 & 0.02 & 0.02\\\ 0.10 & 0.10 & 0.02 & 0.06 \end{bmatrix}, % B^{(2)}_2 = \begin{bmatrix} 0.16 & 0.16 & 0.04 & 0.10\\\ 0.16 & 0.16 & 0.04 & 0.10\\\ 0.04 & 0.04 & 0.09 & 0.02\\\ 0.10 & 0.10 & 0.02 & 0.06 \end{bmatrix}, % B^{(2)}_3 = \begin{bmatrix} 0.08 & 0.02 & 0.18 & 0.10\\\ 0.02 & 0.20 & 0.04 & 0.10\\\ 0.18 & 0.04 & 0.02 & 0.02\\\ 0.10 & 0.10 & 0.02 & 0.06 \end{bmatrix}$$ $$B^{(3)}_1 = \begin{bmatrix} 0.08 & 0.08 & 0.08 & 0.08\\\ 0.08 & 0.08 & 0.08 & 0.08\\\ 0.08 & 0.08 & 0.08 & 0.08\\\ 0.08 & 0.08 & 0.08 & 0.08 \end{bmatrix}, % B^{(3)}_2 = \begin{bmatrix} 0.08 & 0.08 & 0.08 & 0.08\\\ 0.08 & 0.08 & 0.08 & 0.08\\\ 0.08 & 0.08 & 0.08 & 0.08\\\ 0.08 & 0.08 & 0.08 & 0.08 \end{bmatrix}, % B^{(3)}_3 = \begin{bmatrix} 0.08 & 0.08 & 0.08 & 0.08\\\ 0.08 & 0.08 & 0.08 & 0.08\\\ 0.08 & 0.08 & 0.08 & 0.08\\\ 0.08 & 0.08 & 0.08 & 0.08 \end{bmatrix}$$

In the figures below, the true latent position for each node is one of four points denoted with an orange $\times$ and the estimated latent positions for each community are plotted by color. The mean embedding for each group is denoted with a red dot. These simulations show that our estimates for each group are generally clustered around the true latent positions and suggest that the desired stability properties are likely to apply.

It also appears that our estimators for the latent positions have normally distributed errors and a comparison of estimator variance for networks of different sizes appears to be roughly consistent with $1/n$ scaling. These results are consistent with the central limit theorem we aim to prove.

dmprdpg's People

Contributors

mjbaum avatar fraspass avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.