mcap
provides a model-based clustering approach in very high
dimensions (especially when p
is much larger than n
) via adaptive
projections. Clustering is based on full variances Gaussian mixture
modelling in a lower dimensional (projected) space. The projection
dimension is set adaptively in a data-driven manner based on a cluster
stability criterion. Available projection variants (so far) include PCA
and random Projections (Gaussian as well as sparse methods).
See our paper: currently under review
preprint: โฆ
Clone or download the code from github.
Alternatively, you can install mcap
directly from github with:
# install.packages("devtools")
devtools::install_github("btaschler/mcap")
Dependencies on other packages:
-
for parallelisation:
foreach
,doParallel
,parallel
-
for clustering:
pcaMethods
,nethet
,mclust
,RandPro
,kernlab
-
misc:
iterators
,magrittr
,stats
,dplyr
,tidyverse
,utils, methods
,data.table
,RevoUtilsMath
This is a basic example showing how to use mcap
to cluster two (known)
groups:
library(mcap)
### basic example code
K <- 2 #number of clusters (groups)
n_k <- 200 #number of samples per group
p <- 1000 #number of features (dimension)
A <- matrix(rnorm(n_k*p), n_k, p) #data for group 1
B <- matrix(rnorm(n_k*p, mean = 1), n_k, p) #data for group 2
X <- rbind(A, B) #input matrix
Y <- c(rep(0, n_k), rep(1, n_k)) #known labels
## using PCA projections
model_fit <- MCAPfit(X, k = K, projection = 'PCA', centering_per_group = FALSE,
true_labels = Y, parallel = TRUE)
## sparse random projection
model_fit <- MCAPfit(X, k = K, projection = 'li', centering_per_group = FALSE,
true_labels = Y, parallel = TRUE)
## adjusted Rand index
print(model_fit$fit_gmm$aRI)
## display assigned cluster labels for each sample
print(model_fit$fit_gmm$model_fit$comp)
## show optimised projection dimension
print(model_fit$fit_q_opt$q_opt)
For all available versions, see releases. We use Semantic Versioning.
- Bernd Taschler, Sach Mukherjee
List of contributors.
This project is licensed under the GNU General Public License โ see the GPL-3.0 for details.
-
Konstantinos Perrakis for valuable discussions.
-
The coffee machine for mental and physical support.