This is based on the randomForest package 4.6-14 from the following website: https://cran.r-project.org/web/packages/randomForest/
I couldn't find the repo for the most current version of randomForest, so I had to gracely steal the codes from the above link.
The option of using sequential bootstrap (see: Advances in Financial Machine Learning, Marcos Lopez de Prado, 2018), instead of the ordinary bootstrap for random forests / bagging has been added.
Currently, sequential bootstrap relies on an index vector (SB) that is generated by some R codes (see below), but could be very time consuming; C codes should be able to dramatically increase the speed of generating SB.
devtools::install_github("larryleihua/randomForestFML")
library(randomForestFML)
ntree <- 10 # use a small number for experiment, about 13 sec for 1 tree
seedvec <- seq(1,ntree)
data(trainSet)
indMat <- getIndMat(trainSet$tFea, trainSet$tLabel)
iRun <- function(i)
{
set.seed(seedvec[i])
return(seqBoot(indMat))
}
iRun <- Vectorize(iRun, "i")
# check how much for 1 tree on your computer
sm <- Sys.time(); iRun(1); Sys.time() - sm
library(parallel)
cc <- makeCluster(detectCores()-1)
clusterExport(cc, c("indMat", "seedvec", "seqBoot"))
i_sB <- tryCatch(parLapply(cc, 1:ntree, iRun), error=function(e) NA, warning=function(w) NA)
clusterEvalQ(cc, {gc()})
stopCluster(cc)
sBmat <- NULL
for(i in 1:ntree) sBmat <- cbind(sBmat, i_sB[[i]])
SB <- as.vector(as.matrix(sBmat))
bag <- randomForestFML(Y ~ C+V, data = trainSet, mtry = 2, importance = TRUE, ntree = ntree, SB=SB)
Implement a function for generating SB for the C functions (rf.c and regrf.c in src/)