Giter Club home page Giter Club logo

metacluster's Introduction

MetaCluster


GitHub release Wheel PyPI version PyPI - Python Version PyPI - Status Downloads Tests & Publishes to PyPI GitHub Release Date Documentation Status Chat GitHub contributors GitTutorial DOI License: GPL v3

MetaCluster is the largest open-source nature-inspired optimization (Metaheuristic Algorithms) library for clustering problem in Python

  • Free software: GNU General Public License (GPL) V3 license
  • Provided 3 classes: MetaCluster, MhaKCentersClustering, and MhaKMeansTuner
  • Total nature-inspired metaheuristic optimizers (Metaheuristic Algorithms): > 200 optimizers
  • Total objective functions (as fitness): > 40 objectives
  • Total supported datasets: 48 datasets from Scikit learn, UCI, ELKI, KEEL...
  • Total performance metrics: > 40 metrics
  • Total different way of detecting the K value: >= 10 methods
  • Documentation: https://metacluster.readthedocs.io/en/latest/
  • Python versions: >= 3.7.x
  • Dependencies: numpy, scipy, scikit-learn, pandas, mealpy, permetrics, plotly, kaleido

Citation Request

Please include these citations if you plan to use this library:

@article{VanThieu2023,
  author = {Van Thieu,  Nguyen and Oliva,  Diego and Pérez-Cisneros,  Marco},
  title = {MetaCluster: An open-source Python library for metaheuristic-based clustering problems},
  journal = {SoftwareX},
  year = {2023},
  pages = {101597},
  volume = {24},
  DOI = {10.1016/j.softx.2023.101597},
}

@article{van2023mealpy,
  title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python},
  author={Van Thieu, Nguyen and Mirjalili, Seyedali},
  journal={Journal of Systems Architecture},
  year={2023},
  publisher={Elsevier},
  doi={10.1016/j.sysarc.2023.102871}
}

Installation

$ pip install metacluster==1.2.0
  • Install directly from source code
$ git clone https://github.com/thieu1995/metacluster.git
$ cd metacluster
$ python setup.py install
  • In case, you want to install the development version from Github:
$ pip install git+https://github.com/thieu1995/permetrics 

After installation, you can import MetaCluster as any other Python module:

$ python
>>> import metacluster
>>> metacluster.__version__

Examples

We implement a dedicated Github repository for examples at MetaCluster_examples

Let's go through some basic examples from here:

1. First, load dataset. You can use the available datasets from MetaCluster:

# Load available dataset from MetaCluster
from metacluster import get_dataset

# Try unknown data
get_dataset("unknown")
# Enter: 1      -> This wil list all of avaialble dataset

data = get_dataset("Arrhythmia")
  • Or you can load your own dataset
import pandas as pd
from metacluster import Data

# load X and y
# NOTE MetaCluster accepts numpy arrays only, hence use the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y, name="my-dataset")

2. Next, scale your features

You should confirm that your dataset is scaled and normalized

# MinMaxScaler 
data.X, scaler = data.scale(data.X, method="MinMaxScaler", feature_range=(0, 1))

# StandardScaler 
data.X, scaler = data.scale(data.X, method="StandardScaler")

# MaxAbsScaler 
data.X, scaler = data.scale(data.X, method="MaxAbsScaler")

# RobustScaler 
data.X, scaler = data.scale(data.X, method="RobustScaler")

# Normalizer 
data.X, scaler = data.scale(data.X, method="Normalizer", norm="l2")   # "l1" or "l2" or "max"

3. Next, select Metaheuristic Algorithm, Its parameters, list of objectives, and list of performance metrics

list_optimizer = ["BaseFBIO", "OriginalGWO", "OriginalSMA"]
list_paras = [
    {"name": "FBIO", "epoch": 10, "pop_size": 30},
    {"name": "GWO", "epoch": 10, "pop_size": 30},
    {"name": "SMA", "epoch": 10, "pop_size": 30}
]
list_obj = ["SI", "RSI"]
list_metric = ["BHI", "DBI", "DI", "CHI", "SSEI", "NMIS", "HS", "CS", "VMS", "HGS"]

You can check all supported metaheuristic algorithms from: https://github.com/thieu1995/mealpy. All supported clustering objectives and metrics from: https://github.com/thieu1995/permetrics.

If you don't want to read the documents, you can print out all supported information by:

from metacluster import MetaCluster 

# Get all supported methods and print them out
MetaCluster.get_support(name="all")

4. Next, create an instance of MetaCluster class and run it.

model = MetaCluster(list_optimizer=list_optimizer, list_paras=list_paras, list_obj=list_obj, n_trials=3, seed=10)

model.execute(data=data, cluster_finder="elbow", list_metric=list_metric, save_path="history", verbose=False)

model.save_boxplots()
model.save_convergences()

As you can see, you can define different datasets and using the same model to run it. Remember to set the name to your dataset, because the folder that hold your results is the name of your dataset. More examples can be found here

Support

Official links (questions, problems)

Supported links

1. https://jtemporal.com/kmeans-and-elbow-method/
2. https://medium.com/@masarudheena/4-best-ways-to-find-optimal-number-of-clusters-for-clustering-with-python-code-706199fa957c
3. https://github.com/minddrummer/gap/blob/master/gap/gap.py
4. https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101
5. https://doi.org/10.1016/j.engappai.2018.03.013
6. https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/Clustering_metrics.ipynb
7. https://elki-project.github.io/
8. https://sci2s.ugr.es/keel/index.php
9. https://archive.ics.uci.edu/datasets
10. https://python-charts.com/distribution/box-plot-plotly/
11. https://plotly.com/python/box-plots/?_ga=2.50659434.2126348639.1688086416-114197406.1688086416#box-plot-styling-mean--standard-deviation

metacluster's People

Contributors

thieu1995 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.