Giter Club home page Giter Club logo

mixmod's Introduction

mixmod

Mixmod is a software package for Model-Based supervised and unsupervised classification on qualitative, quantitative and mixed data.

Available components:

  • A package for R: Rmixmod
  • A module for Python: Pymixmod
  • A computational library: mixmodLib (C++)
  • A Graphical User Interface: mixmodGUI

Main Statistical functionalities:

  • Likelihood maximization with EM, CEM and SEM algorithm
  • Parsimonious models
    • 14 models for quantitative data (Gaussian mixture models)
    • 5 models for qualitative data (Multinomial mixture models)
    • 20 models for mixed data (quantitative/qualitative)
    • 8 specific models for High Dimension
  • Selection criteria: BIC, ICL, NEC, CV

Previous repository: https://gforge.inria.fr/projects/mixmod

Table of contents

Folder structure

  • Rmixmod R interface of the C++ mixmod library
  • mixmodLib C++ mixmod library
  • mixmodIOStream C++ library to manage IO
  • mixmodCLI Command Line Interface
  • Pymixmod Python interface
  • mixmodGUI unmaintained
  • mixmodMVC unmaintained

C++ components

C++ components

Requirements

Requirements for mixmoLib:

sudo apt install -y cmake libeigen3-dev

Extra requirement for mixmodIOStream

sudo apt install -y libxml++2.6-dev

Compilation

A main CMake file is used to compiled mixmodLib, mixmodIOStream and mixmodCLI.

Compilation options:

  • -DCMAKE_INSTALL_PREFIX: installation folder
  • -DCMAKE_BUILD_TYPE: Debug or Release (default)
  • -DCMAKE_CXX_FLAGS: extra c++ compilation flags (optional)
  • -DMIXMOD_BUILD_IOSTREAM: ON or OFF (default). Compile or not mixmodIOStream
  • -DMIXMOD_BUILD_CLI: ON or OFF (default). Compile or not mixmodCLI. It requires -DMIXMOD_BUILD_IOSTREAM=ON
  • -DMIXMOD_BUILD_EXAMPLES: ON or OFF (default)
  • -DMIXMOD_ENABLE_OPENMP: OFF or ON (default). Enable OpenMP or not.

Generate makefile:

mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=~/usr/local/ -DCMAKE_CXX_FLAGS="-Wall -Wextra -D_GLIBCXX_ASSERTIONS" ..

Compile:

make install -j2

Examples

See mixmodLib/EXAMPLES for some examples.

Rmixmod

R interface of the C++ mixmod library.

Rmixmod build status CRAN_Status_Badge Total Downloads Downloads

R Requirements

Install the following R packages in order to build Rmixmod:

install.packages(c("Rcpp", "RcppEigen", "devtools"))

Build

In a terminal, run:

./build_rmixmod.sh

It creates a directory named Rmixmod_[version] containing the package archive.

The package can be installed running:

R CMD INSTALL Rmixmod_[version]/Rmixmod_[version].tar.gz

and checked running:

R CMD check --as-cran Rmixmod_[version]/Rmixmod_[version].tar.gz

PyMixmod

Python interface of the C++ mixmod library.

Test PyMixmod

See the dedicated README.

Docs

See the doc folder to find the different paper about Rmixmod, a statistical documentation and a user guide for mixmod.

Citation

Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G., & Govaert, G. (2015). Rmixmod: The R Package of the Model-Based Unsupervised, Supervised, and Semi-Supervised Classification Mixmod Library. Journal of Statistical Software, 67(6), 1–29. https://doi.org/10.18637/jss.v067.i06

See CITATION.bib

License

mixmod is distributed under the GPL v3 license

Contributing

  • Use the .clang-format file to format the c++ code.

  • Use lintr to format the R code with the following .lintr file:

    linters: linters_with_defaults(
      line_length_linter(127),
      object_name_linter = NULL
      )
    

mixmod's People

Contributors

jschueller avatar quentin62 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

mixmod's Issues

CRAN packages not using the correct compiler in configure scripts

Message from Brian Ripley

That is

C50 Cubist GPBayes RDieHarder RPostgreSQL RcppGSL RhpcBLASctl Rmixmod
Rrdrand SimInf SymTS eaf excursions fRLR ff float getip git2r gsl gslnls
hdf5r island memuse mmap ore poismf randtoolbox rayrender rgeolocate
rmatio rnetcarto rngWELL scrypt survSNP xgboost

These show

checking for gcc... gcc

even on platforms using clang as the compiler. Please re-read

https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Configure-and-cleanup

There need be no 'gcc' compiler, it might be faked (macOS) or very old
(Solaris). The aim of a configure script is to test things using the
same setup as will be used to compile code in the package.

In several cases you are using a C compiler in configure but a C++
compiler to compile the code: they usually have access to different
headers. Looks like this is the case in at least

GPBayes RcppGSL Rmixmod fRLR ff rayrender xgboost

The logs showing

checking for gcc option to accept ISO C89... none needed

need configure rebuilt with a modern autoconf, which will check C11 (and
that or C17 are the default in all the current compilers we know of --
however, that old autoconf test will fail when C23 comes into use).

Please correct before 2023-01-26 to safely retain your package on CRAN.

Some help from Dirk Eddelbuettel

Fellow maintainers,

Pardon the mass-mail intrusion but I was for once more than a little
surprised by the bulk email by Brian Ripley. In case you are wondering what
to do, in my case the change, as best as I can tell, simply consisted of
removing the line AC_PROG_CC in configure.ac as (minimal) src/Makevars.in
for package RcppGSL and RDieHarder does not actually set a compiler. I only
operate on compiler flags.

If any of you have question, please feel free to reach out (maybe by reply
not reply-all) or maybe we start a thread on r-pkg-devel.

Hope this helps, best regards, Dirk

undefined reference to `XEM::ModelOutput::getModel() const

class ModelOutput has a Model * getModel() const; method declared but its not defined:
https://github.com/mixmod/mixmod/blob/master/mixmodLib/SRC/mixmod/Kernel/IO/ModelOutput.h#L91

how can I get the Model after a clustering step is done ? my code looks like this:

// Prepare Mixmod for clustering
  XEM::ClusteringInput * clusteringInput(new XEM::ClusteringInput(nbCluster, dataDescription));
  XEM::ModelType modelType(XEM::StringToModelName(covarianceModel_));
  clusteringInput->setModelType(&modelType, 0);
  clusteringInput->finalize();

  // Do the computation
  XEM::ClusteringMain clusteringMain(clusteringInput);
  clusteringMain.run();

  // Extract the results
  XEM::ClusteringOutput * clusteringOutput(clusteringMain.getOutput());

  XEM::ClusteringModelOutput * clusteringModelOutput(clusteringOutput->getClusteringModelOutput(0));

the goal is to get logLikelihood & entropy results, I see there is a LikelihoodOutput class but I need the model

Degeneracy of the variance-covariance matrix

From: [email protected]

There is a degeneracy in the estimation of variance-covariance matrices.
If we run the following code:

rm(list=ls())

data(iris)
library(Rmixmod)
res <- mixmodCluster(iris[, 3:4], 5, model = mixmodGaussianModel())
plot(res)
res

We can see that the estimation of one of the variance matrices will tend towards a singular matrix (because of the 8 points aligned at the bottom left) which will make the likelihood explode and necessarily choose this model which is degenerate.

I had talked about it a few years ago on the Mixmod list and I was told that the error would be corrected but I see that it is not the case. Now, this distorts a number of estimates.

downcast error

error when running the file tests/heterogeneousDAtest.R from Rmixmod

library(Rmixmod)
data(heterodatatest)
data(heterodatatrain)
learn<-mixmodLearn(heterodatatrain[-1], knownLabels=heterodatatrain$V1)
predict<-mixmodPredict(heterodatatest[-1], classificationRule=learn["bestResult"]) 
missclassified<-sum(as.integer(predict@partition)-as.integer(heterodatatest$V1))

ERROR on CRAN on clang-UBSAN, gcc-UBSAN: Tests of memory access errors using Undefined Behavior Sanitizer https://www.stats.ox.ac.uk/pub/bdr/memtests/README.txt

on CRAN:
https://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-UBSAN/Rmixmod/tests/heterogeneousDAtest.Rout
https://www.stats.ox.ac.uk/pub/bdr/memtests/clang-UBSAN/Rmixmod/tests/heterogeneousDAtest.Rout

# > predict<-mixmodPredict(heterodatatest[-1], classificationRule=learn["bestResult"])
# Kernel/Parameter/GaussianEDDAParameter.cpp:297:95: runtime error: downcast of address 0x60e00000c0c0 which does not point to an object of type 'GaussianGeneralParameter'
# 0x60e00000c0c0: note: object is of type 'XEM::GaussianDiagParameter'
# 02 00 80 36  20 0b e0 b0 c7 7f 00 00  02 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  30 e8 06 00
# ^~~~~~~~~~~~~~~~~~~~~~~
#   vptr for 'XEM::GaussianDiagParameter'
# #0 0x7fc7b0333d29 in XEM::GaussianEDDAParameter::initUSER(XEM::Parameter*) Kernel/Parameter/GaussianEDDAParameter.cpp:297
# mixmod/mixmod_archive#1 0x7fc7b03d1abf in XEM::GaussianDiagParameter::initUSER(XEM::Parameter*) Kernel/Parameter/GaussianDiagParameter.cpp:172
# mixmod/mixmod_archive#2 0x7fc7b04b6e16 in XEM::Model::initUSER(XEM::Parameter*) Kernel/Model/Model.cpp:998
# mixmod/mixmod_archive#3 0x7fc7b030a21d in XEM::PredictStrategy::run(XEM::Model*) DiscriminantAnalysis/Predict/PredictStrategy.cpp:59
# mixmod/mixmod_archive#4 0x7fc7b0308fad in XEM::PredictMain::run(XEM::IoMode, int, int) DiscriminantAnalysis/Predict/PredictMain.cpp:141
# mixmod/mixmod_archive#5 0x7fc7b0190024 in predictMain /data/gannet/ripley/R/packages/tests-gcc-SAN/Rmixmod/src/predictMain.cpp:295
# #6 0x57b493 in R_doDotCall /data/gannet/ripley/R/svn/R-devel/src/main/dotcode.c:598
# #7 0x5839ac in do_dotcall /data/gannet/ripley/R/svn/R-devel/src/main/dotcode.c:1281
# #8 0x61e951 in bcEval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:7105
# #9 0x66b6ff in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:723
# #10 0x670b85 in R_execClosure /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:1888
# #11 0x6732a4 in Rf_applyClosure /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:1814
# #12 0x66bb48 in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:846
# #13 0x679581 in do_set /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:2960
# #14 0x66c02c in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:798
# #15 0x6ea51d in Rf_ReplIteration /data/gannet/ripley/R/svn/R-devel/src/main/main.c:264
# #16 0x6ea51d in Rf_ReplIteration /data/gannet/ripley/R/svn/R-devel/src/main/main.c:200
# #17 0x6eac18 in R_ReplConsole /data/gannet/ripley/R/svn/R-devel/src/main/main.c:314
# #18 0x6ead64 in run_Rmainloop /data/gannet/ripley/R/svn/R-devel/src/main/main.c:1113
# #19 0x6eadb2 in Rf_mainloop /data/gannet/ripley/R/svn/R-devel/src/main/main.c:1120
# #20 0x419368 in main /data/gannet/ripley/R/svn/R-devel/src/main/Rmain.c:29
# #21 0x7fc7c2bddf42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
# #22 0x41baad in _start (/data/gannet/ripley/R/gcc-SAN/bin/exec/R+0x41baad)

the same code was in file R/mixmodPredict.R in oldmixmodpredict example and was removed:

# '   ## Prediction on the testing data
# '   data(heterodatatest)
# '   prediction <- mixmodPredict(heterodatatest[-1],learn["bestResult"])
# '   # compare prediction with real results
# '   paste("accuracy= ",mean(heterodatatest$V1 == prediction["partition"])*100,"%",sep="")

autoconf warnings in Rmixmod

Output from running autoreconf:
configure.ac:25: warning: AC_OUTPUT should be used without arguments.
configure.ac:25: You should run autoupdate.

warning: use of bitwise '&' with boolean operands

https://cran.r-project.org/web/checks/check_results_Rmixmod.html

Version: 2.1.6
Check: whether package can be installed
Result: WARN
Found the following significant warnings:
ClusteringInputHandling.cpp:61:12: warning: use of bitwise '&' with boolean operands [-Wbitwise-instead-of-logical]
ClusteringInputHandling.cpp:72:12: warning: use of bitwise '&' with boolean operands [-Wbitwise-instead-of-logical]
Flavors: r-devel-linux-x86_64-debian-clang, r-devel-linux-x86_64-fedora-clang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.