Giter Club home page Giter Club logo

grappolo's Introduction

# Grappolo: Parallel clustering using the Louvain method as the serial template

## Description

Grappolo implements a parallel version of the Louvain community detection algorithm, using several heuristics to gain computational speed. There can be a larger memory footprint and some loss of accuracy arising from non-deterministic order of vertex processing and use of different heuristics. In general, we have observed significant gains in speed with minimal impact on clustering accuracy (measure in terms of the final modularity score). Further, Grappolo enables processing of large inputs that would otherwise remain unsolved using the serial Louvain implementation. We also note that the distributed-memory version (Vite) is available for extremely large data sets.


The single slice Grappolo has been divided to 7 folders

/DefineStructure: Contain all .h files from different dirctories
/Utility: check basic_ultil.h and utilityClusteringFunc.h
/BasicCommunitiesDection: check basic_comm.h
/Coloring: check coloring.h and comm_coloring.h
/FullSyncOptimization: check sync_comm.h
/InputsOutput: check input_output.h


/****************************************************/
Makefile will create 3 executable in the /bin folder
	1)	./convertFileToBinary
	2)	./driverForGraphClustering
	3)	./driverForColoring



/****************************************************/
To update code, record each update in the folder.
To updates for each particular type of communities detection

1) Change code in particualr folder
	/ Add different communities detection method
	
2) Change the runMultiPhaseXXX.cpp to capture the changes

3) Update the .h files in /DefineStructure

4) Drivers and other folder can remain unchanged

5) To update the Utility code must be done with care, API should
	stay the same

/****************************************************/
To run the code, it will be in the menu of ./driverForGraphClustering





## Input parameters that can be customized:

`links` A numeric matrix of network edges.

`coloring` (1) An integer between 0 and 3 that controls the distance-1 graph coloring heuristic used to partition independent sets of vertices in a graph for parallel processing. 
  * 0 - No coloring.
  * 1 - (Default) Distance-1 graph coloring. Every vertex receives a color and no two neighbors have the same color.
  * 2 – Distance-1 graph coloring, rebalanced for evenly distributed color classes (#nodes per color).
  * 3 - Incomplete coloring, limited to `numColors`, by default 16.

`numColors` (16): An integer between 1 and 1024. Limits graph coloring. Only used if `coloring=3`, incomplete coloring, is set.

`C_thresh` (1e-6): The threshold value determines how long the algorithm iterates. This value (a real number between 0 and 1; >0) is checked when coloring is enabled. The algorithm will stop iterating when the gain in modularity becomes less than `C_thresh`. A final iteration is performed using the value specified by the `threshold` parameter. Desired value for `C_thresh` should be larger than `threshold` for gains in performance.

`minGraphSize` (1,000): Determines when multi-phase operations should stop. Execution stops when the coarsened graph has collapsed the current graph to a fewer than `minGraphSize` nodes. 

`threshold` (1e-9): The threshold value determines how long the algorithm iterates. It is a real number between 0 and 1 (>0). The algorithm will stop the iterations in the current phase when the gain in modularity is less than `threshold`. The algorithm can enter the next phase based on the size of the coarsened graph. 

`syncType` (0) An integer between 0 and 4 that controls synchronization between threads. Only applies if `coloring=0` (no coloring). Synchronization forces the parallel algorithm to behave similar to the execution of a serial Louvain implementation.
  * 0 - (Default) No sync (gives the best performance in terms of runtime).
  * 1 - Full sync (behaves like a serial algorithm).
  * 2 - Neighborhood sync (a hybrid between 0 and 1).
  * 3 - Early termination (stops processing a vertex if it has not changed its community for the past few iterations – leads to gain in performance).
  * 4 - Full sync with early termination (hybrid of 1 and 3).

`basicOpt` (1) Either 0 or 1, controls the representation of intermediate data structures.
  * 0 – Uses stl::map data structure. While it has a smaller memory footprint and computational efficiency, excessive memory allocations and deallocations can lead to loss of performance.
  * 1 - (Default) Use stl::vector to replace the functionality of stl::map. This option comes at the expense of a larger memory footprint and loss in performance when the algorithm has a large number of communities and does not converge quickly (does not have a good community structure). In general, this option can be faster than the stl::map option for inputs with good community structure.


## Further Details:

The only required parameter is `links`. All other parameters are tuning parameters that control how speed vs accuracy vs memory tradeoffs are made. We specifically note that the original Louvain algorithm is non-deterministic and varies considerably for different input structures. Grappolo inherits these limitations with added complications from parallelization.

## Return Value:

A list with two elements:

  * `modularity` - The modularity of the computed partitioning of the network into a set of non-overlapping clusters (communities or partitions). Modularity is a measure of the connectedness in a given network when partitioned based on a given approach.
  * `communities` - A vector where the i'th value is the cluster number that the i'th node in the links matrix has been assigned to.



grappolo's People

Contributors

exa-graph avatar sg0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

grappolo's Issues

Cannot convert ‘graph*’ to ‘dGraph*’

Seems there's an issue with malloc line of G in main. Converting this to dGraph breaks more things down the line.

[ 81%] Built target driverForMatrixReordering /home/Development/Apps/grappolo/gpu_graph/gpu_main.cpp: In function ‘int main(int, char**)’: /home/Development/Apps/grappolo/gpu_graph/gpu_main.cpp:134:40: error: cannot convert ‘graph*’ to ‘dGraph*’ 134 | case 7: parse_DirectedEdgeList(G, inFile); break; | ^ | | | graph* In file included from /home/Development/Apps/grappolo/gpu_graph/gpu_main.cpp:97: /home/Development/Apps/grappolo/DefineStructure/input_output.h:12:37: note: initializing argument 1 of ‘void parse_DirectedEdgeList(dGraph*, char*)’ 12 | void parse_DirectedEdgeList(dGraph *G, char *fileName); //Directed graph | ~~~~~~~~^ make[2]: *** [gpu_graph/CMakeFiles/gpu_graph.dir/build.make:9415: gpu_graph/CMakeFiles/gpu_graph.dir/gpu_main.cpp.o] Error 1 make[1]: *** [CMakeFiles/Makefile2:329: gpu_graph/CMakeFiles/gpu_graph.dir/all] Error 2 make: *** [Makefile:130: all] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.