Giter Club home page Giter Club logo

misscforest's Introduction

missCforest

missCforest is an Ensemble Conditional Trees algorithm for Missing Data Imputation. It performs single imputation based on the Cforest algorithm which is an ensemble of Conditional Inference Trees.

The aim of missCforest is to produce a complete dataset using an iterative prediction approach by predicting missing values after learning from the complete cases.

Installing

You can install the development version of missCforest as follow:

#install.packages("devtools")
devtools::install_github("ielbadisy/missCforest")

Examples

library(missCforest)
#> Loading required package: partykit
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm

# import the GBSG2 dataset
library(TH.data)
#> Loading required package: survival
#> Loading required package: MASS
#> 
#> Attaching package: 'TH.data'
#> The following object is masked from 'package:MASS':
#> 
#>     geyser
data("GBSG2")

# consider the cens variable as a factor
GBSG2$cens <- as.factor(GBSG2$cens)

# introduce randomly 30% of NA to variables
datNA <- missForest::prodNA(GBSG2, 0.2)
head(datNA)
#>   horTh age menostat tsize tgrade pnodes progrec estrec time cens
#> 1    no  70     Post    21     II      3      NA     NA 1814 <NA>
#> 2   yes  56     Post    12     II      7      NA     77 2018 <NA>
#> 3   yes  58     <NA>    35     II     NA      52    271  712 <NA>
#> 4   yes  NA     Post    17   <NA>      4      60     NA   NA    1
#> 5  <NA>  NA     <NA>    NA     II     NA      26     65  772    1
#> 6    no  32      Pre    57    III     24       0     13  448 <NA>

You can impute all the missing values using all the possible combinations of the imputation model formula:

impdat <- missCforest(datNA, .~., 
                      ntree = 300L,
                      minsplit = 20L,
                      minbucket = 7L,
                      alpha = 0.05,
                      cores = 4)  
head(impdat)
#>   horTh      age menostat    tsize tgrade    pnodes  progrec    estrec     time
#> 1    no 70.00000     Post 21.00000     II  3.000000 107.2613 188.43849 1814.000
#> 2   yes 56.00000     Post 12.00000     II  7.000000 115.5780  77.00000 2018.000
#> 3   yes 58.00000     Post 35.00000     II  5.829284  52.0000 271.00000  712.000
#> 4   yes 57.86864     Post 17.00000     II  4.000000  60.0000  77.58405 1051.515
#> 5    no 51.75196     Post 30.54075     II  5.685274  26.0000  65.00000  772.000
#> 6    no 32.00000      Pre 57.00000    III 24.000000   0.0000  13.00000  448.000
#>   cens
#> 1    0
#> 2    0
#> 3    1
#> 4    1
#> 5    1
#> 6    1

Citing

To cite missCforest in publications please use:

El Badisy I (2023). missCforest: Ensemble Conditional Trees for Missing Data Imputation. R package version 0.0.8, https://CRAN.R-project.org/package=missCforest.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {missCforest: Ensemble Conditional Trees for Missing Data Imputation},
    author = {Imad {El Badisy}},
    year = {2023},
    note = {R package version 0.0.8},
    url = {https://CRAN.R-project.org/package=missCforest},
  }

Contributing

  • If you encounter any bugs or have an idea for contribution, please submit an issue.

  • Please include a reprex for reproducibility.

misscforest's People

Contributors

ielbadisy avatar

Watchers

 avatar

misscforest's Issues

Warnings

I am getting warnings using this example. What do these warnings mean?

data("airquality")
airquality_subset <- airquality[ , 1:4]
ximp <- missCforest::missCforest(airquality_subset, .~.)

There were 50 or more warnings (use warnings() to see the first 50)

warnings()

Warning messages:
1: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
2: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
3: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
4: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
5: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
6: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
7: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
8: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
9: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
10: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
11: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
12: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
13: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
14: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
15: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
16: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
17: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
18: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
19: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
20: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
21: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
22: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
23: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
24: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
25: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
26: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
27: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
28: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
29: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
30: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
31: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
32: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
33: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
34: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
35: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
36: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
37: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
38: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
39: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
40: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
41: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
42: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
43: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
44: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
45: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
46: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
47: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
48: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
49: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length
50: In retid[indx] <- fitted_node(kids_node(node)[[i]], data, ... :
number of items to replace is not a multiple of replacement length

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.