DBSCAN for PyCOMPSs is a distributed approach to the well known clustering algorithm proposed for the first time in 1996 here. The application is implemented in Python and its parallelisation is performed by the COMPSs framework.
In this repository you will find the following files and directories:
DBSCAN.py
contains the main algorithm and task invokation. It requires however classes included in the/classes/
folder./classes/
contains two modules imported byDBSCAN.py
- One of them is a custom-built data class.
- The second one is a disjoint-set data structure (merge-find set) found here.
run.sh
shell scripts to run the algorithm both in localhost and in a cluster with COMPSs installed.launchDBSCAN.py
script to run a batch of executions usinglaunch.sh
as launcher.launch.sh
launcher for a slurm based cluster.Gen_Data_DBSCAN.py
python script to generate randomly shaped clustering datasets as the ones in/data/
./data/
bunch of datasets to test the algorithm in./ext_versions/
contains other DBSCAN implementations that might be useful for benchmarking.DBSCAN_Seq.py
Sequential naive (all vs all) implementation of the algorithm.
/kmeans/
contains an implementation of the k-means algorithm, in PyCOMPSs, used as well for benchmarking.script_times.py
post-processing script to gather times from a big batch of executions.
- Python 2.7.x (with NumPy) COMPSs won't work with Python 3.x
- COMPSs Latest, if you are trying to install it this manual might be useful.
- Pandas 0.21 (this is the one I use, older versions may work as well but they need to support callables as arguments to the
pd.skip_rows
method.
For any inquires or problems when trying to run the algorithm or COMPSs itself, don't hesitate to contact me at: a=carlos.segarra b=bsc.es mailto: a @ b