See: UCI Machine Learning datasets for R
- D1 - bank-marketing
- D2 - census-income
- D3 - credit-card
- D4 - magic
- D5 - wine-quality
See: caret
- K1 - glm
- K2 - nnet
- K3 - svmLinear
- K4 - rpart
- K5 - nnet
- I1 - median & mode
- I2 - random forest (missForest)
- I3 - chained equations (mice)
- Ai - aggregation strategies from the article Solving the problem of incomplete data in medical diagnosis via interval modeling
For each Di:
- Divide Di into Di^1 and Di^2
- Learn classifiers Kj on Di^1 (10-CV)
- Du^2 := randomly obscured Di^2 with data loss from 0% to 50%
- Calculate accuracy, sensitivity, specificity and decisiveness for classifiers Kj on Di^2
- Choose the best impuation on Di^2
- Find aggregation operator:
- Calculate inteval predictions on Di^2
- Choose the best aggregation operator on inteval predictions
- Compare Kj with the best imputation and agregation operator