Giter Club home page Giter Club logo

mixtcomp's Introduction

MixtComp

MixtComp (Mixture Composer) is a model-based clustering package for mixed data originating from the Modal team (Inria Lille).

Mixture models parameters are estimated using a SEM algorithm. Five basic models (Gaussian, Multinomial, Poisson, Weibull, NegativeBinomial) are implemented to manage real, integer and categorical variables, as well as two advanced models (Func_CS for functional data and Rank_ISR for rank data). MixtComp has the ability to natively manage missing data (completely or by interval).

MixtComp is used as an R package, but its internals are coded in C++ using state of the art libraries for faster computation. It has been engineered around the idea of easy and quick integration of all new univariate models, under the conditional independence assumption. New models will eventually be available from researches, carried out by the Modal team or by other contributors. Currently, central architecture of MixtComp is built and functionality has been field-tested through industry partnerships.

CRAN package: CRAN_Status_Badge Total Downloads

Build

master:

MixtComp RMixtComp JMixtComp pyMixtComp

staging:

MixtComp RMixtComp JMixtComp pyMixtComp

Credits

The following people contributed to the development of MixtComp: Vincent Kubicki, Christophe Biernacki, Quentin Grimonprez, Serge Iovleff, Matthieu Marbac-Lourdelle, Étienne Goffinet.

Copyrigth Inria - Université de Lille - CNRS

Licence

MixtComp is distributed under the AGPL 3.0 licence. For more details about the licences of MixtComp and its dependencies see the LICENCE.md file.

Code organization

  • MixtComp MixtComp C++ library
  • JMixtComp C++ executable using JSON files as input/output
  • RMixtComp Main R package loading RMixtCompIO and RMixtCompUtilities
  • RMixtCompIO R package linking MixtComp C++ library with Rcpp
  • RMixtCompUtilities R package containing graphical, formatting and getter functions
  • RJMixtComp R package using a JMixtComp executable
  • RMixtCompHier R package containing a hierarchical version of MixtComp
  • pyMixtComp Minimal python interface using Boost.Python

A description of the links between packages and external libraries can be found here in a text version and here in a visual version

Documentation

Scientific papers about algorithm and models are available in the article folder.

Examples

Other tools

Branches

There are two branches tested with github actions

  • master this branch is protected, MixtComp must always work on it.
  • staging this branch is used for short development, testing new features, bug fixes... and its content is regularly pushed to master when tests are OK.

mixtcomp's People

Contributors

egoffi avatar jonasrenault avatar mostafaabdelrashied avatar quentin62 avatar vandaele avatar vkubicki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mixtcomp's Issues

c++ warnings ignoring return value

To correct before 10/05

https://cran.r-project.org/web/checks/check_results_RMixtCompIO.html

Version: 4.0.9
Check: whether package can be installed
Result: WARN
    Found the following significant warnings:
     RGraph.cpp:95:58: warning: ignoring return value of 'std::__cxx11::basic_string<_CharT, _Traits, _Allocator> std::operator+(__cxx11::basic_string<_CharT, _Traits, _Allocator>&&, const __cxx11::basic_string<_CharT, _Traits, _Allocator>&) [with _CharT = char; _Traits = char_traits<char>; _Alloc = allocator<char>]', declared with attribute 'nodiscard' [-Wunused-result]
     RGraph.h:122:41: warning: ignoring return value of 'std::__cxx11::basic_string<_CharT, _Traits, _Allocator> std::operator+(__cxx11::basic_string<_CharT, _Traits, _Allocator>&&, const __cxx11::basic_string<_CharT, _Traits, _Allocator>&) [with _CharT = char; _Traits = char_traits<char>; _Alloc = allocator<char>]', declared with attribute 'nodiscard' [-Wunused-result]
    See ‘/data/gannet/ripley/R/packages/tests-devel/RMixtCompIO.Rcheck/00install.out’ for details.
    * used C++ compiler: ‘g++-13 (GCC) 13.1.0’

Flavor: r-devel-linux-x86_64-fedora-gcc

Error with only intervals for gaussian data

data.zip

library(RMixtComp)

dat = readRDS("bugdata.rds")
algo <- createAlgo(nInitPerClass = 1000)
model = list(molybdène = "Gaussian")

res <- mixtCompLearn(dat, model, algo, nClass = 1:3, nRun = 2, criterion = "ICL")

an R error is generated:

 Error in res[[indMax]] : 
  attempt to select less than one element in get1index 
3.
rmcMultiRun(algo, dataList, model, list(), nRun, nCore, verbose) at MIXTCOMP_mixtCompLearn.R#396
2.
classicLearn(data, model, algo, nClass, criterion, nRun, nCore, 
    verbose, mode) at MIXTCOMP_mixtCompLearn.R#283
1.
mixtCompLearn(datMC[datMC$Racine == "ELE100", ], model, algo, 
    nClass = 1:10, nRun = 10, criterion = "ICL")

the error comes from in RMixtCompIO:

  logLikelihood <- sapply(res, function(x) {ifelse(is.null(x$warnLog), x$mixture$lnObservedLikelihood, -Inf)})

  indMax <- which.max(logLikelihood)

  return(res[[indMax]])

If all warnlog are null then logLikelihood should be a vector of -Inf and this should not generate an error for which.max

warning: use of bitwise with boolean operands

https://cran.r-project.org/web/checks/check_results_RMixtCompIO.html

Version: 4.0.7
Check: whether package can be installed
Result: WARN
Found the following significant warnings:
lib/Composer/MixtureComposer.h:247:7: warning: use of bitwise '&' with boolean operands [-Wbitwise-instead-of-logical]
optim/include/cppoptlib/solver/../linesearch/morethuente.h:175:20: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
Flavors: r-devel-linux-x86_64-debian-clang, r-devel-linux-x86_64-fedora-clang

Create docker image for universal usage

To make the python-package user-friendly and platform agnostic, it would be useful to create a docker image where it can be built and run directly by the user without any hassle

RMixtCompIO does not compile on clang 17

RMixtCompIO does not compile on clang 17

RMixtCompIO.log

all errors are error: invalid operands to binary expression

/usr/local/clang-trunk/bin/../include/c++/v1/__algorithm/sort.h:287:15: 
error: invalid operands to binary expression ('const Eigen::MatrixBase<Eigen::Matrix<double, 1, -1>>::Iterator' and 'Eigen::MatrixBase<Eigen::Matrix<double, 1, -1>>::Iterator')

Build with clang17:

create a check directory and copy the tar.gz of RMixtCompIO package in the folder, then run:

docker run -v `pwd`/check:/check ghcr.io/r-hub/containers/clang17:latest r-check

MC_DETERMINISTIC (RNG seed) behaviour

Not really a bug, but something that must be known for replicability:
the seed for rng (set with the MC_DETERMINISTIC environment variable) is valid per session.
So, if you run mixtCompLearn several times with the same seed inside the same R session, you will have different results:

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn2 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn1$mixture$lnObservedLikelihood
# [1] -1036.835
resLearn2$mixture$lnObservedLikelihood
# [1] -1040.155

If you restart R between 2 runs, you will have the same results

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn1$mixture$lnObservedLikelihood
# [1] -1036.835

# If we close R and restart a new session

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 


resLearn1$mixture$lnObservedLikelihood
# [1] -1036.835

If you change the seed inside a session, it has no effect:

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

Sys.setenv(MC_DETERMINISTIC = 50)
resLearn2 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn1$mixture$lnObservedLikelihood
# [1] -1036.835
resLearn2$mixture$lnObservedLikelihood
# [1] -1040.155  # the same second results as previously. Changing the seed inside the same session has no effect  

If we start a new session and run with a randomSeed of 50

library(RMixtComp)

data(simData)
 
algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

Sys.setenv(MC_DETERMINISTIC = 50)
resLearn2 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn2$mixture$lnObservedLikelihood
# [1] -1036.879

If you run without a seed then with a seed inside the same r session, the seed has no effect

library(RMixtComp)

data(simData)

algo <- list(nbBurnInIter = 50, nbIter = 50, nbGibbsBurnInIter = 50,
             nbGibbsIter = 50,  nInitPerClass = 20, nSemTry = 20, confidenceLevel = 0.95)

resLearn1 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)

Sys.setenv(MC_DETERMINISTIC = 42)
resLearn2 <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
                           nClass = 1:2, nRun = 1, nCore = 1)
Sys.unsetenv("MC_DETERMINISTIC") 

resLearn1$mixture$lnObservedLikelihood
# [1] -1037.089
resLearn2$mixture$lnObservedLikelihood
# [1] -1037.915 # not the results associated with seed 42

It is because the seed is managed in C++ with a static variable:
https://github.com/modal-inria/MixtComp/blob/master/MixtComp/src/lib/Statistic/RNG.h

build: remove debug and release folder

build: remove debug and release folder to have a cleaner repo and less shell script.
Instead, use a more common "build" folder.

It requires to have access to Jenkins to change the CI

One test fails on 32-bit: Error: cannot allocate vector of size 381.5 Mb

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)

> # MixtComp version 4.0  - july 2019
> # Copyright (C) Inria - Université de Lille - CNRS
> 
> # This program is free software: you can redistribute it and/or modify
> # it under the terms of the GNU Affero General Public License as
> # published by the Free Software Foundation, either version 3 of the
> # License, or (at your option) any later version.
> # This program is distributed in the hope that it will be useful,
> # but WITHOUT ANY WARRANTY; without even the implied warranty of
> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> # GNU Affero General Public License for more details.
> #
> # You should have received a copy of the GNU Affero General Public License
> # along with this program.  If not, see <https://www.gnu.org/licenses/>
> 
> 
> library(testthat)
> library(RMixtCompIO)
> 
> test_check("RMixtCompIO")
R(17741,0xa0dfb620) malloc: *** mmap(size=400003072) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(17741,0xa0dfb620) malloc: *** mmap(size=400003072) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(17741,0xa0dfb620) malloc: *** mmap(size=400003072) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
        partition
zPredict 1 2
       1 0 3
       2 3 0
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 50 ]

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test.run.R:174:3'): NegativeBinomial model works ────────────────────
Error: cannot allocate vector of size 381.5 Mb
Backtrace:
    ▆
 1. ├─testthat::expect_gte(rand.index(partition, resGen$z), 0.9) at test.run.R:174:2
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. └─RMixtCompIO:::rand.index(partition, resGen$z)

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 50 ]
Error: Test failures
Execution halted

boost version warning

/usr/local/include/boost/math/tools/config.hpp:23:6: warning: "The minimum language standard to use Boost.Math will be C++14 starting in July 2023 (Boost 1.82 release)" [-W#warnings]
#    warning "The minimum language standard to use Boost.Math will be C++14 starting in July 2023 (Boost 1.82 release)"

Need to update the cmakefile and the docs

clang16 warnings

Message from Brian:

clang 16 is being prepared for release in early March, so it is time to
start reporting its issues. See
https://www.stats.ox.ac.uk/pub/bdr/clang16/

Shortly these logs will appear as 'clang16' additional issues.

Please correct before 2023-02-16 to safely retain your package on CRAN.
(Some have much closer deadlines for other issues.)

README.txt

Tests as for fedora-clang but using clang 16.0.0git rather than 15.0.7.
This is scheduled for release on Mar 7th.

Currently using C23 as the default C standard (but C23 issues are
reported elsewhare).

Other details as
https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang

References for Rf_length in system headers come from includeing R headers
before system headers rather than after: see 'Writing R Extnesions'.

The [-Wenum-constexpr-conversion] errors seen in

RSQLite

are from the old Boost headers.

std:unary_function was deprecated in C+11 and removed in C++17, and finally
in clang 16 in C++17 mode. Boost 1.80 was still using it.

RMixtCompIO.out

  • using log directory ‘/data/gannet/ripley/R/packages/tests-clang-trunk/RMixtCompIO.Rcheck’
  • using R Under development (unstable) (2023-01-20 r83646)
  • using platform: x86_64-pc-linux-gnu (64-bit)
  • R was compiled by
    clang version 16.0.0 (https://github.com/llvm/llvm-project.git 2133e8b9f942f91ec54e28c580fccf6d6b26c62e)
    GNU Fortran (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
  • running under: Fedora Linux 36 (Workstation Edition)
  • using session charset: UTF-8
  • using option ‘--no-stop-on-test-error’
  • checking for file ‘RMixtCompIO/DESCRIPTION’ ... OK
  • checking extension type ... Package
  • this is package ‘RMixtCompIO’ version ‘4.0.8’
  • package encoding: UTF-8
  • checking package namespace information ... OK
  • checking package dependencies ... OK
  • checking if this is a source package ... OK
  • checking if there is a namespace ... OK
  • checking for executable files ... OK
  • checking for hidden files and directories ... OK
  • checking for portable file names ... OK
  • checking for sufficient/correct file permissions ... OK
  • checking whether package ‘RMixtCompIO’ can be installed ... [454s/374s] ERROR
    Installation failed.
    See ‘/data/gannet/ripley/R/packages/tests-clang-trunk/RMixtCompIO.Rcheck/00install.out’ for details.
  • DONE

Status: 1 ERROR
See
‘/data/gannet/ripley/R/packages/tests-clang-trunk/RMixtCompIO.Rcheck/00check.log’
for details.

Command exited with non-zero status 1
Time 6:16.95, 436.39 + 19.82

RMixtCompIO.log

Compile PyMixtComp produces a .dylib file and not .os

After compiling, pyMixtCompBridge library can not be found at the following location: build/lib/pyMixtCompBridge.so as mentioned. However, another file build/lib/pyMixtCompBridge .dylib is created. this file can't be read by the python scripts.

It was created on

  1. MacBook Intel-Chip
  2. boost==1.8.0
  3. boost-python3 (latest version)
  4. python==3.10.x

Use c++17 as default instead of c++

From Brian

random_shuffle was replaced by shuffle in C++11

Full message

That is

BET RMixtCompIO chngpt cit diversityForest fbati fdaPDE
genepop geojsonsf ggraph jsonify keyATM lmSubsets mapdeck
mapscanner matchingMarkets plfm prioritizr quanteda.textmodels
ranger ruimtehol sctransform sgd sirus spatialwidget stream

You will notice that R CMD check is now reporting

  • checking C++ specification ... NOTE
    Specified C++11: please update to current default of C++17

and we investigated for those packages if a non-default C++ standard was
actually needed. For all but 26 out of 1205 it was not. Common issues

random_shuffle was replaced by shuffle in C++11: BET RMixtCompIO cit
ggraph matchingMarkets sctransform sgd stream

bind1st was deprecated in C++11 and removed in C++17: chngpt

diversityForest ranger sirus: define own make_unique (a C++14 addition)
in a namespace but do not qualify the usages.

fbati genepop quanteda.textmodels : 'data' clashes with C++17 headers
which have std::data.

CRAN submissions will now auto-reject packages with this NOTE, so do
update before your next submission.

error with class parameter in plotDataCI and plotDataBoxplot

  • with class = 2 the class number 2 has not the same color as the class number 2 when class = NULL in plotDataCI and plotDataBoxplot
  • with class = 2 the class number 2 has a wrong position in plotDataCI
  • using both grl = TRUE and class = 2 generates an error in plotDataCI
library(RMixtComp)

data(simData)

algo <- list(
  nInd = 100,
  nbBurnInIter = 100,
  nbIter = 100,
  nbGibbsBurnInIter = 100,
  nbGibbsIter = 100,
  nInitPerClass = 3,
  nSemTry = 20,
  confidenceLevel = 0.95,
  ratioStableCriterion = 0.95,
  nStableCriterion = 10
  )

resLearn <- mixtCompLearn(simData$dataLearn$data.frame[,3:4], simData$model$supervised[3:4], nClass = 2, nRun = 3)

plotDataBoxplot(resLearn, "Gaussian1", grl = FALSE)
plotDataBoxplot(resLearn, "Gaussian1", class = 1, grl = FALSE) 
plotDataBoxplot(resLearn, "Gaussian1", class = 2, grl = FALSE) # wrong color

plotDataBoxplot(resLearn, "Gaussian1", grl = TRUE)
plotDataBoxplot(resLearn, "Gaussian1", class = 1, grl = TRUE) # wrong color 
plotDataBoxplot(resLearn, "Gaussian1", class = 2, grl = TRUE) # wrong color


plotDataCI(resLearn, "Gaussian1", grl = FALSE)
plotDataCI(resLearn, "Gaussian1", class = 1, grl = FALSE) #  wrong position
plotDataCI(resLearn, "Gaussian1", class = 2, grl = FALSE) # wrong color + wrong position

plotDataCI(resLearn, "Gaussian1", grl = TRUE)
plotDataCI(resLearn, "Gaussian1", class = 1, grl = TRUE) # error
plotDataCI(resLearn, "Gaussian1", class = 2, grl = TRUE) # error




plotDataBoxplot(resLearn, "Categorical1", grl = FALSE)
plotDataBoxplot(resLearn, "Categorical1", class = 1, grl = FALSE) 
plotDataBoxplot(resLearn, "Categorical1", class = 2, grl = FALSE) # wrong color

plotDataBoxplot(resLearn, "Categorical1", grl = TRUE)
plotDataBoxplot(resLearn, "Categorical1", class = 1, grl = TRUE) # wrong color 
plotDataBoxplot(resLearn, "Categorical1", class = 2, grl = TRUE) # wrong color


plotDataCI(resLearn, "Categorical1", grl = FALSE)
plotDataCI(resLearn, "Categorical1", class = 1, grl = FALSE) #  wrong position
plotDataCI(resLearn, "Categorical1", class = 2, grl = FALSE) # wrong color + wrong position

plotDataCI(resLearn, "Categorical1", grl = TRUE)
plotDataCI(resLearn, "Categorical1", class = 1, grl = TRUE) # error
plotDataCI(resLearn, "Categorical1", class = 2, grl = TRUE) # error

Python package

Develop PyMixtComp to run C++ executable from python. Reproduce all R functions in python

  • Check boost python
  • MixtComp class following scikit API with
    • fit
    • predict
    • fit_predict
    • score
    • score_sample
    • predict_proba
    • aic
    • bic
    • icl
    • sample
  • multiple run in parallel
  • manage different data types:
    • dict
    • numpy array
    • pandas dataframe
  • plot functions
    • plotDataCI :
    • plotDataBoxplot :
    • similarities, discriminative power
    • proportion
    • convergence
  • basic mode (auto detection of model using dataframe type)
  • #17
  • Getter functions
  • data

heatmap* functions generate an error with pkg = "plotly"

require(RMixtCompIO) # for learning a mixture model
dataLearn <- list(var1 = as.character(c(rnorm(50, -2, 0.8), rnorm(50, 2, 0.8))),
                  var2 = as.character(c(rnorm(50, 2), rpois(50, 8))))

model <- list(var1 = list(type = "Gaussian", paramStr = ""),
              var2 = list(type = "Poisson", paramStr = ""))

algo <- list(
  nClass = 2,
  nInd = 100,
  nbBurnInIter = 100,
  nbIter = 100,
  nbGibbsBurnInIter = 100,
  nbGibbsIter = 100,
  nInitPerClass = 3,
  nSemTry = 20,
  confidenceLevel = 0.95,
  ratioStableCriterion = 0.95,
  nStableCriterion = 10,
  mode = "learn"
)

resLearn <- rmcMultiRun(algo, dataLearn, model, nRun = 3)

# plot
heatmapVar(resLearn, pkg = "plotly")
heatmapTikSorted(resLearn, pkg = "plotly")
heatmapClass(resLearn, pkg = "plotly")
Error in matchSignature(signature, fdef) : 
  more elements in the method signature (2) than in the generic signature (1) for function ‘asJSON’

RMixtComp does not work on windows 11

When running the following code on windows 11, it crashes or the session must be stopped.
It does not happened on Linux.

The problem is the same with the examples from the documentation.
Those examples are tested by the CRAN on a windows server 2022 (see config), and it works. So it seems that the problem is related to windows 11.

Data: prostate.csv

data <- read.table("~/Téléchargements/prostate.csv", sep = ";", header = TRUE)
z = data[,1]
data = data[,2:13]
head(data)

library(RMixtComp)

model <- list(Age = "Gaussian", Wt = "Gaussian", PF = "Multinomial", 
              HX = "Multinomial", SBP = "Gaussian", DBP = "Gaussian", 
              EKG = "Multinomial", HG = "Gaussian", SZ = "Gaussian", 
              SG = "Gaussian", AP = "Gaussian", BM = "Multinomial")

algo <- list(nbBurnInIter = 50,
             nbIter = 100,
             nbGibbsBurnInIter = 50,
             nbGibbsIter = 10,
             nInitPerClass = floor(nrow(data)/2),
             nSemTry = 5,
             confidenceLevel = 0.95,
             ratioStableCriterion = 0.99,
             nStableCriterion = 10)

nClass <- 1
nRun <- 3
sink(file = "mc_output.txt")
res <- mixtCompLearn(data, model, algo, nClass = nClass, criterion = "ICL", nRun = nRun, nCore = 1)
sink(file = NULL)

image

alternatives on windows 11

Issue with new Eigen version

Eigen 3.4.0 was released earlier in the year, and I have been asked to update
RcppEigen to it. As documented in this issue [1] at its GitHub repo, there are
about nine packages that built at CRAN under the previous release, but not
with the Eigen 3.4.0 changes in the release candidate package -- which can be
installed via

install.packages("RcppEigen", repo="https://RcppCore.github.io/drat")

For RMixtCompIO, I could not work out a minimal change. Something in the
multinomial sampler setup makes Eigen 3.4.0 (via the prerelease of RcppEigen
in the repo above) unhappy.

It would be terrific if you and the MixtComp team could take a look at this
and possibly update the package to work with Eigen 3.4.0 which brings a few
new features other R users would like to deploy.

Let me know if you have any question, and please do not hesitate to ask.

Best regards, Dirk

[1] RcppCore/RcppEigen#103

basic_mode modifies the user's data set

from pyMixtComp import MixtComp
from pyMixtComp.data import load_iris

iris, _ = load_iris()
iris = iris.rename(columns={"species": "z_class"})
iris
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
... ... ... ... ...
6.7 3.0 5.2 2.3 virginica
6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica
mod = MixtComp(n_components=3, n_init=5)
mod.fit(iris)
iris
  sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) z_class
5.1 3.5 1.4 0.2 0
4.9 3.0 1.4 0.2 0
4.7 3.2 1.3 0.2 0
4.6 3.1 1.5 0.2 0
5.0 3.6 1.4 0.2 0
... ... ... ... ...
6.7 3.0 5.2 2.3 2
6.3 2.5 5.0 1.9 2
6.5 3.0 5.2 2.0 2
6.2 3.4 5.4 2.3 2
5.9 3.0 5.1 1.8 2

RMixtCompIO tests fail on windows with R < 4.3

Dear package maintainer,

your package RMixtCompIO_4.0.9.tar.gz did not pass 'R CMD check' on
Windows with R versions < 4.3.0 and will be omitted from the
corresponding CRAN directory.

Please check the attached log-file and submit a version
with increased version number that passes R CMD check on Windows.

please also fix the gcc13 issues shown on
https://cran.r-project.org/web/checks/check_results_RMixtCompIO.html

All the best,
Uwe Ligges
(Maintainer of binary packages for Windows)

Test for NegativeBinomial failed
00check.log

CI fails on jenkins

The CI fails on jenkins due to the change for the python part of MixtComp.
It requires access to jenkins to change the pipeline.

variable named z_class with the wrong type in basic mode

The variable name z_class is used for LatenClass.
In basic mode (data.frame, no model given, so MixtComp infers the model according to the data type), when a variable named z_class is given with the wrong type (numeric instead of integer), it is processed as a gaussian variable and the real z_class variable (partition) can not be accessed in the output and it generates bugs in other functions

library(RMixtComp)

X <- data.frame(x = rnorm(100), y = c(rnorm(50), rnorm(50, 2)), z_class = rep(c(1., NA, 2., NA), each = 25))
sapply(X, class)

res <- mixtCompLearn(X, nClass = 2)

res$variable$type$z_class
# [1] "Gaussian" # instead of "LatentClass"
# res$variable$param$z_class has gaussian parameter

plot(res)
# error

Idea:

When inferring type, refuse to use a variable named z_class if it is not an integer and send a warning to the user

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.