modal-inria / mixtcomp Goto Github PK

Model-based clustering package for mixed data

License: Other

C++ 30.07% Makefile 0.20% Shell 0.18% TeX 2.82% CMake 0.98% C 0.26% R 20.56% Python 6.20% Jupyter Notebook 38.72%

statistics mixture-model r cpp clustering missing-data heterogeneous-data mixed-data cran

mixtcomp's Introduction

MixtComp

MixtComp (Mixture Composer) is a model-based clustering package for mixed data originating from the Modal team (Inria Lille).

Mixture models parameters are estimated using a SEM algorithm. Five basic models (Gaussian, Multinomial, Poisson, Weibull, NegativeBinomial) are implemented to manage real, integer and categorical variables, as well as two advanced models (Func_CS for functional data and Rank_ISR for rank data). MixtComp has the ability to natively manage missing data (completely or by interval).

MixtComp is used as an R package, but its internals are coded in C++ using state of the art libraries for faster computation. It has been engineered around the idea of easy and quick integration of all new univariate models, under the conditional independence assumption. New models will eventually be available from researches, carried out by the Modal team or by other contributors. Currently, central architecture of MixtComp is built and functionality has been field-tested through industry partnerships.

CRAN package:

Build

master:

staging:

Credits

The following people contributed to the development of MixtComp: Vincent Kubicki, Christophe Biernacki, Quentin Grimonprez, Serge Iovleff, Matthieu Marbac-Lourdelle, Étienne Goffinet.

Copyrigth Inria - Université de Lille - CNRS

Licence

MixtComp is distributed under the AGPL 3.0 licence. For more details about the licences of MixtComp and its dependencies see the LICENCE.md file.

Code organization

MixtComp MixtComp C++ library
JMixtComp C++ executable using JSON files as input/output
RMixtComp Main R package loading RMixtCompIO and RMixtCompUtilities
RMixtCompIO R package linking MixtComp C++ library with Rcpp
RMixtCompUtilities R package containing graphical, formatting and getter functions
RJMixtComp R package using a JMixtComp executable
RMixtCompHier R package containing a hierarchical version of MixtComp
pyMixtComp Minimal python interface using Boost.Python

A description of the links between packages and external libraries can be found here in a text version and here in a visual version

Documentation

Scientific papers about algorithm and models are available in the article folder.

Examples

See https://github.com/vandaele/mixtcomp-notebook for RMixtComp examples.
See pyMixtComp/python/notebooks for pyMixtComp examples.

Other tools

Branches

There are two branches tested with github actions

master this branch is protected, MixtComp must always work on it.
staging this branch is used for short development, testing new features, bug fixes... and its content is regularly pushed to master when tests are OK.

mixtcomp's People

Contributors

Stargazers

Watchers

Forkers

vkubicki kiminh minghao2016 rnaimehaom

mixtcomp's Issues

c++ warnings ignoring return value

To correct before 10/05

https://cran.r-project.org/web/checks/check_results_RMixtCompIO.html

Version: 4.0.9
Check: whether package can be installed
Result: WARN
    Found the following significant warnings:
     RGraph.cpp:95:58: warning: ignoring return value of 'std::__cxx11::basic_string<_CharT, _Traits, _Allocator> std::operator+(__cxx11::basic_string<_CharT, _Traits, _Allocator>&&, const __cxx11::basic_string<_CharT, _Traits, _Allocator>&) [with _CharT = char; _Traits = char_traits<char>; _Alloc = allocator<char>]', declared with attribute 'nodiscard' [-Wunused-result]
     RGraph.h:122:41: warning: ignoring return value of 'std::__cxx11::basic_string<_CharT, _Traits, _Allocator> std::operator+(__cxx11::basic_string<_CharT, _Traits, _Allocator>&&, const __cxx11::basic_string<_CharT, _Traits, _Allocator>&) [with _CharT = char; _Traits = char_traits<char>; _Alloc = allocator<char>]', declared with attribute 'nodiscard' [-Wunused-result]
    See ‘/data/gannet/ripley/R/packages/tests-devel/RMixtCompIO.Rcheck/00install.out’ for details.
    * used C++ compiler: ‘g++-13 (GCC) 13.1.0’

Flavor: r-devel-linux-x86_64-fedora-gcc

Error with only intervals for gaussian data

data.zip

library(RMixtComp)

dat = readRDS("bugdata.rds")
algo <- createAlgo(nInitPerClass = 1000)
model = list(molybdène = "Gaussian")

res <- mixtCompLearn(dat, model, algo, nClass = 1:3, nRun = 2, criterion = "ICL")

an R error is generated:

 Error in res[[indMax]] : 
  attempt to select less than one element in get1index 
3.
rmcMultiRun(algo, dataList, model, list(), nRun, nCore, verbose) at MIXTCOMP_mixtCompLearn.R#396
2.
classicLearn(data, model, algo, nClass, criterion, nRun, nCore, 
    verbose, mode) at MIXTCOMP_mixtCompLearn.R#283
1.
mixtCompLearn(datMC[datMC$Racine == "ELE100", ], model, algo, 
    nClass = 1:10, nRun = 10, criterion = "ICL")

the error comes from in RMixtCompIO:

  logLikelihood <- sapply(res, function(x) {ifelse(is.null(x$warnLog), x$mixture$lnObservedLikelihood, -Inf)})

  indMax <- which.max(logLikelihood)

  return(res[[indMax]])

If all warnlog are null then logLikelihood should be a vector of -Inf and this should not generate an error for which.max

warning: use of bitwise with boolean operands

https://cran.r-project.org/web/checks/check_results_RMixtCompIO.html

Version: 4.0.7
Check: whether package can be installed
Result: WARN
Found the following significant warnings:
lib/Composer/MixtureComposer.h:247:7: warning: use of bitwise '&' with boolean operands [-Wbitwise-instead-of-logical]
optim/include/cppoptlib/solver/../linesearch/morethuente.h:175:20: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
Flavors: r-devel-linux-x86_64-debian-clang, r-devel-linux-x86_64-fedora-clang

trigger github actions on path events

In order to reduce github-actions running time

don't run MixtComp/JMixtComp actions if no changes where made in the c++ code
if a change occurred in an R package, only test the R packages
if a change occurred in the python package, only test the python package

https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#example-using-positive-and-negative-patterns-1

Create docker image for universal usage

To make the python-package user-friendly and platform agnostic, it would be useful to create a docker image where it can be built and run directly by the user without any hassle

RMixtCompIO does not compile on clang 17

RMixtCompIO.log

all errors are error: invalid operands to binary expression

/usr/local/clang-trunk/bin/../include/c++/v1/__algorithm/sort.h:287:15: 
error: invalid operands to binary expression ('const Eigen::MatrixBase<Eigen::Matrix<double, 1, -1>>::Iterator' and 'Eigen::MatrixBase<Eigen::Matrix<double, 1, -1>>::Iterator')

Build with clang17:

create a check directory and copy the tar.gz of RMixtCompIO package in the folder, then run:

docker run -v `pwd`/check:/check ghcr.io/r-hub/containers/clang17:latest r-check

std::iterator is deprecated in c++17

https://stackoverflow.com/questions/37031805/preparation-for-stditerator-being-deprecated/38103394#38103394

Enforce minum python version and packages' versions

To avoid any unexpected errors on certain platforms, it is better to enforce a minimum python version on users. Also, to be future-proof, specify python package versions.

MC_DETERMINISTIC (RNG seed) behaviour

Not really a bug, but something that must be known for replicability:
the seed for rng (set with the MC_DETERMINISTIC environment variable) is valid per session.
So, if you run mixtCompLearn several times with the same seed inside the same R session, you will have different results:

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn2 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn1$mixture$lnObservedLikelihood
# [1] -1036.835
resLearn2$mixture$lnObservedLikelihood
# [1] -1040.155

If you restart R between 2 runs, you will have the same results

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn1$mixture$lnObservedLikelihood
# [1] -1036.835

# If we close R and restart a new session

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 


resLearn1$mixture$lnObservedLikelihood
# [1] -1036.835

If you change the seed inside a session, it has no effect:

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

Sys.setenv(MC_DETERMINISTIC = 50)
resLearn2 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn1$mixture$lnObservedLikelihood
# [1] -1036.835
resLearn2$mixture$lnObservedLikelihood
# [1] -1040.155  # the same second results as previously. Changing the seed inside the same session has no effect

If we start a new session and run with a randomSeed of 50

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 50)
resLearn2 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn2$mixture$lnObservedLikelihood
# [1] -1036.879

If you run without a seed then with a seed inside the same r session, the seed has no effect

library(RMixtComp)

data(simData)

algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn2 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn1$mixture$lnObservedLikelihood
# [1] -1037.089
resLearn2$mixture$lnObservedLikelihood
# [1] -1037.915 # not the results associated with seed 42

It is because the seed is managed in C++ with a static variable:
https://github.com/modal-inria/MixtComp/blob/master/MixtComp/src/lib/Statistic/RNG.h

build: remove debug and release folder

build: remove debug and release folder to have a cleaner repo and less shell script.
Instead, use a more common "build" folder.

It requires to have access to Jenkins to change the CI

One test fails on 32-bit: Error: cannot allocate vector of size 381.5 Mb

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)

> # MixtComp version 4.0  - july 2019
> # Copyright (C) Inria - Université de Lille - CNRS
> 
> # This program is free software: you can redistribute it and/or modify
> # it under the terms of the GNU Affero General Public License as
> # published by the Free Software Foundation, either version 3 of the
> # License, or (at your option) any later version.
> # This program is distributed in the hope that it will be useful,
> # but WITHOUT ANY WARRANTY; without even the implied warranty of
> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> # GNU Affero General Public License for more details.
> #
> # You should have received a copy of the GNU Affero General Public License
> # along with this program.  If not, see <https://www.gnu.org/licenses/>
> 
> 
> library(testthat)
> library(RMixtCompIO)
> 
> test_check("RMixtCompIO")
R(17741,0xa0dfb620) malloc: *** mmap(size=400003072) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(17741,0xa0dfb620) malloc: *** mmap(size=400003072) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(17741,0xa0dfb620) malloc: *** mmap(size=400003072) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
        partition
zPredict 1 2
       1 0 3
       2 3 0
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 50 ]

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test.run.R:174:3'): NegativeBinomial model works ────────────────────
Error: cannot allocate vector of size 381.5 Mb
Backtrace:
    ▆
 1. ├─testthat::expect_gte(rand.index(partition, resGen$z), 0.9) at test.run.R:174:2
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. └─RMixtCompIO:::rand.index(partition, resGen$z)

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 50 ]
Error: Test failures
Execution halted

boost version warning

/usr/local/include/boost/math/tools/config.hpp:23:6: warning: "The minimum language standard to use Boost.Math will be C++14 starting in July 2023 (Boost 1.82 release)" [-W#warnings]
#    warning "The minimum language standard to use Boost.Math will be C++14 starting in July 2023 (Boost 1.82 release)"

Need to update the cmakefile and the docs

clang16 warnings

Message from Brian:

clang 16 is being prepared for release in early March, so it is time to
start reporting its issues. See
https://www.stats.ox.ac.uk/pub/bdr/clang16/

Shortly these logs will appear as 'clang16' additional issues.

Please correct before 2023-02-16 to safely retain your package on CRAN.
(Some have much closer deadlines for other issues.)

README.txt

Tests as for fedora-clang but using clang 16.0.0git rather than 15.0.7.
This is scheduled for release on Mar 7th.

Currently using C23 as the default C standard (but C23 issues are
reported elsewhare).

Other details as
https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang

References for Rf_length in system headers come from includeing R headers
before system headers rather than after: see 'Writing R Extnesions'.

The [-Wenum-constexpr-conversion] errors seen in

RSQLite

are from the old Boost headers.

std:unary_function was deprecated in C+11 and removed in C++17, and finally
in clang 16 in C++17 mode. Boost 1.80 was still using it.

RMixtCompIO.out

using log directory ‘/data/gannet/ripley/R/packages/tests-clang-trunk/RMixtCompIO.Rcheck’

using R Under development (unstable) (2023-01-20 r83646)

using platform: x86_64-pc-linux-gnu (64-bit)

R was compiled by
clang version 16.0.0 (https://github.com/llvm/llvm-project.git 2133e8b9f942f91ec54e28c580fccf6d6b26c62e)
GNU Fortran (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)

running under: Fedora Linux 36 (Workstation Edition)

using session charset: UTF-8

using option ‘--no-stop-on-test-error’

checking for file ‘RMixtCompIO/DESCRIPTION’ ... OK

checking extension type ... Package

this is package ‘RMixtCompIO’ version ‘4.0.8’

package encoding: UTF-8

checking package namespace information ... OK

checking package dependencies ... OK

checking if this is a source package ... OK

checking if there is a namespace ... OK

checking for executable files ... OK

checking for hidden files and directories ... OK

checking for portable file names ... OK

checking for sufficient/correct file permissions ... OK

checking whether package ‘RMixtCompIO’ can be installed ... [454s/374s] ERROR
Installation failed.
See ‘/data/gannet/ripley/R/packages/tests-clang-trunk/RMixtCompIO.Rcheck/00install.out’ for details.

DONE

Status: 1 ERROR
See
‘/data/gannet/ripley/R/packages/tests-clang-trunk/RMixtCompIO.Rcheck/00check.log’
for details.

Command exited with non-zero status 1
Time 6:16.95, 436.39 + 19.82

RMixtCompIO.log

Compile PyMixtComp produces a .dylib file and not .os

After compiling, pyMixtCompBridge library can not be found at the following location: build/lib/pyMixtCompBridge.so as mentioned. However, another file build/lib/pyMixtCompBridge .dylib is created. this file can't be read by the python scripts.

It was created on

MacBook Intel-Chip
boost==1.8.0
boost-python3 (latest version)
python==3.10.x

Use c++17 as default instead of c++

From Brian

random_shuffle was replaced by shuffle in C++11

Full message

That is

BET RMixtCompIO chngpt cit diversityForest fbati fdaPDE
genepop geojsonsf ggraph jsonify keyATM lmSubsets mapdeck
mapscanner matchingMarkets plfm prioritizr quanteda.textmodels
ranger ruimtehol sctransform sgd sirus spatialwidget stream

You will notice that R CMD check is now reporting

checking C++ specification ... NOTE
Specified C++11: please update to current default of C++17

and we investigated for those packages if a non-default C++ standard was
actually needed. For all but 26 out of 1205 it was not. Common issues

random_shuffle was replaced by shuffle in C++11: BET RMixtCompIO cit
ggraph matchingMarkets sctransform sgd stream

bind1st was deprecated in C++11 and removed in C++17: chngpt

diversityForest ranger sirus: define own make_unique (a C++14 addition)
in a namespace but do not qualify the usages.

fbati genepop quanteda.textmodels : 'data' clashes with C++17 headers
which have std::data.

CRAN submissions will now auto-reject packages with this NOTE, so do
update before your next submission.

error with class parameter in plotDataCI and plotDataBoxplot

with class = 2 the class number 2 has not the same color as the class number 2 when class = NULL in plotDataCI and plotDataBoxplot
with class = 2 the class number 2 has a wrong position in plotDataCI
using both grl = TRUE and class = 2 generates an error in plotDataCI

library(RMixtComp)

data(simData)

algo <- list(
  nInd = 100,
  nbBurnInIter = 100,
  nbIter = 100,
  nbGibbsBurnInIter = 100,
  nbGibbsIter = 100,
  nInitPerClass = 3,
  nSemTry = 20,
  confidenceLevel = 0.95,
  ratioStableCriterion = 0.95,
  nStableCriterion = 10
  )

resLearn <- mixtCompLearn(simData$dataLearn$data.frame[,3:4], simData$model$supervised[3:4], nClass = 2, nRun = 3)

plotDataBoxplot(resLearn, "Gaussian1", grl = FALSE)
plotDataBoxplot(resLearn, "Gaussian1", class = 1, grl = FALSE) 
plotDataBoxplot(resLearn, "Gaussian1", class = 2, grl = FALSE) # wrong color

plotDataBoxplot(resLearn, "Gaussian1", grl = TRUE)
plotDataBoxplot(resLearn, "Gaussian1", class = 1, grl = TRUE) # wrong color 
plotDataBoxplot(resLearn, "Gaussian1", class = 2, grl = TRUE) # wrong color


plotDataCI(resLearn, "Gaussian1", grl = FALSE)
plotDataCI(resLearn, "Gaussian1", class = 1, grl = FALSE) #  wrong position
plotDataCI(resLearn, "Gaussian1", class = 2, grl = FALSE) # wrong color + wrong position

plotDataCI(resLearn, "Gaussian1", grl = TRUE)
plotDataCI(resLearn, "Gaussian1", class = 1, grl = TRUE) # error
plotDataCI(resLearn, "Gaussian1", class = 2, grl = TRUE) # error




plotDataBoxplot(resLearn, "Categorical1", grl = FALSE)
plotDataBoxplot(resLearn, "Categorical1", class = 1, grl = FALSE) 
plotDataBoxplot(resLearn, "Categorical1", class = 2, grl = FALSE) # wrong color

plotDataBoxplot(resLearn, "Categorical1", grl = TRUE)
plotDataBoxplot(resLearn, "Categorical1", class = 1, grl = TRUE) # wrong color 
plotDataBoxplot(resLearn, "Categorical1", class = 2, grl = TRUE) # wrong color


plotDataCI(resLearn, "Categorical1", grl = FALSE)
plotDataCI(resLearn, "Categorical1", class = 1, grl = FALSE) #  wrong position
plotDataCI(resLearn, "Categorical1", class = 2, grl = FALSE) # wrong color + wrong position

plotDataCI(resLearn, "Categorical1", grl = TRUE)
plotDataCI(resLearn, "Categorical1", class = 1, grl = TRUE) # error
plotDataCI(resLearn, "Categorical1", class = 2, grl = TRUE) # error

Python package

Develop PyMixtComp to run C++ executable from python. Reproduce all R functions in python

heatmap* functions generate an error with pkg = "plotly"

require(RMixtCompIO) # for learning a mixture model
dataLearn <- list(var1 = as.character(c(rnorm(50, -2, 0.8), rnorm(50, 2, 0.8))),
                  var2 = as.character(c(rnorm(50, 2), rpois(50, 8))))

model <- list(var1 = list(type = "Gaussian", paramStr = ""),
              var2 = list(type = "Poisson", paramStr = ""))

algo <- list(
  nClass = 2,
  nInd = 100,
  nbBurnInIter = 100,
  nbIter = 100,
  nbGibbsBurnInIter = 100,
  nbGibbsIter = 100,
  nInitPerClass = 3,
  nSemTry = 20,
  confidenceLevel = 0.95,
  ratioStableCriterion = 0.95,
  nStableCriterion = 10,
  mode = "learn"
)

resLearn <- rmcMultiRun(algo, dataLearn, model, nRun = 3)

# plot
heatmapVar(resLearn, pkg = "plotly")
heatmapTikSorted(resLearn, pkg = "plotly")
heatmapClass(resLearn, pkg = "plotly")

Error in matchSignature(signature, fdef) : 
  more elements in the method signature (2) than in the generic signature (1) for function ‘asJSON’

RMixtComp does not work on windows 11

When running the following code on windows 11, it crashes or the session must be stopped.
It does not happened on Linux.

The problem is the same with the examples from the documentation.
Those examples are tested by the CRAN on a windows server 2022 (see config), and it works. So it seems that the problem is related to windows 11.

Data: prostate.csv

data <- read.table("~/Téléchargements/prostate.csv", sep = ";", header = TRUE)
z = data[,1]
data = data[,2:13]
head(data)

library(RMixtComp)

model <- list(Age = "Gaussian", Wt = "Gaussian", PF = "Multinomial", 
              HX = "Multinomial", SBP = "Gaussian", DBP = "Gaussian", 
              EKG = "Multinomial", HG = "Gaussian", SZ = "Gaussian", 
              SG = "Gaussian", AP = "Gaussian", BM = "Multinomial")

algo <- list(nbBurnInIter = 50,
             nbIter = 100,
             nbGibbsBurnInIter = 50,
             nbGibbsIter = 10,
             nInitPerClass = floor(nrow(data)/2),
             nSemTry = 5,
             confidenceLevel = 0.95,
             ratioStableCriterion = 0.99,
             nStableCriterion = 10)

nClass <- 1
nRun <- 3
sink(file = "mc_output.txt")
res <- mixtCompLearn(data, model, algo, nClass = nClass, criterion = "ICL", nRun = nRun, nCore = 1)
sink(file = NULL)

alternatives on windows 11

Use online notebooks https://github.com/vandaele/mixtcomp-notebook
Install windows subsystem for linux in order to use ubuntu on windows
https://ubuntu.com/tutorials/install-ubuntu-on-wsl2-on-windows-10#1-overview
Use docker to run rstudio in an ubuntu container on your computer ( cf. https://rocker-project.org/images/versioned/rstudio.html)
Install docker, then in a terminal, run
docker run --rm -ti -e PASSWORD=mixtcomp -p 8787:8787 rocker/rstudio
In a browser, open http://localhost:8787/
username: rstudio password: mixtcomp

Use std distribution instead of boost ones

https://en.cppreference.com/w/cpp/header/random

Issue with new Eigen version

Eigen 3.4.0 was released earlier in the year, and I have been asked to update
RcppEigen to it. As documented in this issue [1] at its GitHub repo, there are
about nine packages that built at CRAN under the previous release, but not
with the Eigen 3.4.0 changes in the release candidate package -- which can be
installed via

install.packages("RcppEigen", repo="https://RcppCore.github.io/drat")

For RMixtCompIO, I could not work out a minimal change. Something in the
multinomial sampler setup makes Eigen 3.4.0 (via the prerelease of RcppEigen
in the repo above) unhappy.

It would be terrific if you and the MixtComp team could take a look at this
and possibly update the package to work with Eigen 3.4.0 which brings a few
new features other R users would like to deploy.

Let me know if you have any question, and please do not hesitate to ask.

Best regards, Dirk

[1] RcppCore/RcppEigen#103

mean curves out of confidence interval in plotDataCI

res.zip

res = readRDS("res.rds")


plotDataCI(res, var = "func", class = 2)

add a github action with clang17

In order to check with cran settings.
The checks with clang on cran are usually the ones that give errors
https://r-hub.github.io/containers/gha.html

basic_mode modifies the user's data set

from pyMixtComp import MixtComp
from pyMixtComp.data import load_iris

iris, _ = load_iris()
iris = iris.rename(columns={"species": "z_class"})
iris

sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	species
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
...	...	...	...	...
6.7	3.0	5.2	2.3	virginica
6.3	2.5	5.0	1.9	virginica
6.5	3.0	5.2	2.0	virginica
6.2	3.4	5.4	2.3	virginica
5.9	3.0	5.1	1.8	virginica

mod = MixtComp(n_components=3, n_init=5)
mod.fit(iris)
iris

sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	z_class
5.1	3.5	1.4	0.2	0
4.9	3.0	1.4	0.2	0
4.7	3.2	1.3	0.2	0
4.6	3.1	1.5	0.2	0
5.0	3.6	1.4	0.2	0
...	...	...	...	...
6.7	3.0	5.2	2.3	2
6.3	2.5	5.0	1.9	2
6.5	3.0	5.2	2.0	2
6.2	3.4	5.4	2.3	2
5.9	3.0	5.1	1.8	2

RMixtCompIO tests fail on windows with R < 4.3

Dear package maintainer,

your package RMixtCompIO_4.0.9.tar.gz did not pass 'R CMD check' on
Windows with R versions < 4.3.0 and will be omitted from the
corresponding CRAN directory.

Please check the attached log-file and submit a version
with increased version number that passes R CMD check on Windows.

please also fix the gcc13 issues shown on
https://cran.r-project.org/web/checks/check_results_RMixtCompIO.html

All the best,
Uwe Ligges
(Maintainer of binary packages for Windows)

Test for NegativeBinomial failed
00check.log

CI fails on jenkins

The CI fails on jenkins due to the change for the python part of MixtComp.
It requires access to jenkins to change the pipeline.

Add hierarchical mode for functional data in pyMixtComp

Implement in python the hierarchical version of MixtComp
https://github.com/modal-inria/MixtComp/blob/master/RMixtComp/R/MIXTCOMP_hierarchical.R

variable named z_class with the wrong type in basic mode

The variable name z_class is used for LatenClass.
In basic mode (data.frame, no model given, so MixtComp infers the model according to the data type), when a variable named z_class is given with the wrong type (numeric instead of integer), it is processed as a gaussian variable and the real z_class variable (partition) can not be accessed in the output and it generates bugs in other functions

library(RMixtComp)

X <- data.frame(x = rnorm(100), y = c(rnorm(50), rnorm(50, 2)), z_class = rep(c(1., NA, 2., NA), each = 25))
sapply(X, class)

res <- mixtCompLearn(X, nClass = 2)

res$variable$type$z_class
# [1] "Gaussian" # instead of "LatentClass"
# res$variable$param$z_class has gaussian parameter

plot(res)
# error

Idea:

When inferring type, refuse to use a variable named z_class if it is not an integer and send a warning to the user

sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	z_class
5.1	3.5	1.4	0.2	0
4.9	3.0	1.4	0.2	0
4.7	3.2	1.3	0.2	0
4.6	3.1	1.5	0.2	0
5.0	3.6	1.4	0.2	0
...	...	...	...	...
6.7	3.0	5.2	2.3	2
6.3	2.5	5.0	1.9	2
6.5	3.0	5.2	2.0	2
6.2	3.4	5.4	2.3	2
5.9	3.0	5.1	1.8	2

sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	z_class
5.1	3.5	1.4	0.2	0
4.9	3.0	1.4	0.2	0
4.7	3.2	1.3	0.2	0
4.6	3.1	1.5	0.2	0
5.0	3.6	1.4	0.2	0
...	...	...	...	...
6.7	3.0	5.2	2.3	2
6.3	2.5	5.0	1.9	2
6.5	3.0	5.2	2.0	2
6.2	3.4	5.4	2.3	2
5.9	3.0	5.1	1.8	2