Stress minimization is family of graph layout algorithms that produces nice looking two dimensional visualizations of graphs by minimizing difference between "ideal" desired distance and actual distance of nodes.
For example, given graph of city roads, such algorithm could use ground truth about distances of road segments, and converge on globally optimal solution, producing layouts that match exactly what we have on a map:
For abstract graphs we can assign distance between two nodes based on shortest path length between them (this distance is also called "graph theoretical distance"):
node2vec is an algorithm that assigns a high dimensional vector to each node of a graph. The algorithm tries to preserve similarity between nodes on the graph (similar nodes of the graph are close in the high dimensional space).
Since node2vec
gives us vectors, we can always measure distance between nodes in such
space and use it as "ideal" desired distance of stress minimization algorithm. This repository
explores node2vec graph embeddings with stress minimization.
Why this could be important? The answer is twofold:
-
Stress minimization requeres graph theoretical distance between every single node in the graph. In most cases this requeres at least
O(n^2)
memory andO(n * (m + n))
preprocessing time. Having nice node embeddings could potentially speed up stress minimization driven graph layouts. -
node2vec
embedding algorithms have a lot of hyperparemeters. Stress minimization driven graph layout algorithm can help researchers judge quality of selected embedding by visually exploring produced layouts and quantitatively observing finalstress cost
function value.
This repository uses two graphs: a primitive 10x10
grid graph, and miserables
graph.
It performs various embeddings with node2vec and then performs graph layout, following Graph Drawing by Stochastic Gradient Descent paper. Isntead of graph-theoretical distance we take distance between nodes in embedded space.
For miserables
graph, embeddings in very high dimensional space produced results very similar
to ideal, graph-theoretical distances. Image below used 4,048
dimensions in the embedded space:
However going to smaller number of dimensions results in poor layouts:
For 10x10
grid I wasn't able to find a good embedding, no matter how many dimensions I used.
Image below used 1,048
dimensions for embedding:
If you find a bug in the code, please don't hesitate to ping me.
To setup code make sure you have downloaded and compiled node2vec
tool:
git clone https://github.com/snap-stanford/snap/
cd snap/examples/node2vec
make
This should create an executable node2vec
file. Remember path to it, as it will
be needed later.
Now, clone this repository and install its dependencies:
git clone https://github.com/anvaka/explore_node2vec/
cd explore_node2vec
npm install
Open sequence.sh and modify node_to_vec
variable to point to your
compiled node2vec
.
Create test graph with node createTestGraphs.js
, and launch sequence.sh
to produce
layouts into svg
folder