The louvain-communities-openmp from puzzlef

The world is in the midst of an unprecedented growth of interconnected data, and graph processing systems are expected to play a vital role. Conventional graph algorithms designed for static graphs struggle to efficiently handle the continuous changes and updates that occur within these networks. As these networks grow in complexity, the need for algorithms capable of efficiently analyzing dynamic graph data is increasingly crucial. Our research aims to address the computational challenges posed by the need for real-time insights and scalable processing in dynamic and complex networks.

However, many dynamic algorithms are sequential, tailored towards web graphs, do not utilize reducibility, locality benefits of SCCs, overestimate affected vertices, and have high overhead, implementations are not well optimized, do not take advantage of auxiliary information, and do not gracefully tolerate soft-faults which modern architectures introduce. Our dynamic approaches for PageRank and community detection address these issues. Our work has been accepted by IPDPS workshops (3), the Euro-Par conference (1), and the ICPP conference (1). Key outputs from our work include the design of a common framework for dynamic graph algorithms, and techniques to address soft faults in dynamic algorithms.

Publications

📰 Dynamic Batch Parallel Algorithms for Updating PageRank (IPDPSW ParSocial 2022)
📰 Shared-Memory Parallel Algorithms for Community Detection in Dynamic Graphs (Outstanding paper awarded, IPDPSW ADPCM 2024)
📰 Lock-free Computation of PageRank in Dynamic Graphs (IPDPSW ParSocial 2024)
📰 DF* PageRank: Improved Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs (Accepted at Euro-Par 2024)
📰 Fast Leiden Algorithm for Community Detection in Shared Memory Setting (Accepted at ICPP 2024)

Technical Reports

Manuscripts

Thesis Materials

Software

Tool	Description
📦 nvgraph.sh	CLI for nvGraph, which is a GPU-based graph analytics library written by NVIDIA, using CUDA.
📦 snap-data.sh	CLI for SNAP dataset, which is a collection of more than 50 large networks.
⛏️ graph-properties	List a few graph properties.
⛏️ graph-generate	Perform certain operations upon a fixed graph.
🧵 graphs	A few sample graphs in Matrix Market (.mtx) format.

Others

👨‍🏫 Top Researchers in High Performance Computing
📰 Top Research Papers in High Performance Computing
🎃 Top Conferences in High Performance Computing
📚 Top Journals in High Performance Computing
💵 Travel Grant for Conferences: Mayank Tripathi
🧪 List of Experiments
📰 Research Notes
🧵 Kaggle Datasets

How does Cheong et al.'s multi-GPU Louvain work?

The source code of their implementation is not available online. Lets see what I can find from their paper. They present some interesting profiling results - indicating most of the runtime is spent in the local-moving phase (when unoptimized).

At the highest level, the original network is partitioned into a number of subnetworks and a set of removed links which consists of the links that join nodes residing in different sub-networks. The Louvain method can then be applied to solve the community detection problem in each of the sub-networks in parallel.

After this, the resulting networks are combined into a single network using the removed links, and then the Louvain method is applied once more on this combined network to obtain the final community results.

The second level of parallelism involves visiting nodes in parallel during each iteration of the modularity optimization phase.

The third and lowest level of parallelism involves computing the gain in modularity of inserting a node into each of its neighboring communities in parallel. This level of parallelism is intuitive and would be effective when a node has a large number of neighboring communities.

GPU kernel 1 performs two functions. Based on the current community status of the network, the assigned GPU thread converts each neighboring node ID in the data structure to its corresponding community ID. The thread also prepares the key for the GPU radix sort in the next step.

The GPU radix sort arranges the entire array first in order of increasing node ID and then in order of increasing neighboring community ID for array elements with the same node ID. The radix sort in the Thrust library is used in this paper.

With the sorted array, each node is being assigned a GPU thread in GPU kernel 2. The thread goes down the array elements belonging to the node and sums up the weights for adjacent elements with the same neighboring community ID to give the final output of FNC.

It appears Cheong et al. do not perform aggregation phase on the GPU.

In the paper Scalable multi-node multi-GPU Louvain community detection algorithm for heterogeneous architectures by Bhowmick et al. (Section 6.4.2 Comparison with the work by Cheong et al.):

The GPU is used only to find neighbor communities and best neighbor community, while the other steps of the Louvain algorithm use multi-core CPU.

puzzlef / louvain-communities-openmp Goto Github PK

louvain-communities-openmp's Introduction

Publications

Technical Reports

Manuscripts

Thesis Materials

Software

Others

louvain-communities-openmp's People

Contributors

Stargazers

Watchers

Forkers

louvain-communities-openmp's Issues

Recommend Projects

Recommend Topics

Recommend Org