kapelner / bartmachine Goto Github PK
View Code? Open in Web Editor NEWAn R-Java Bayesian Additive Regression Trees implementation
License: MIT License
An R-Java Bayesian Additive Regression Trees implementation
License: MIT License
Hi, thank you for developing such a fantastic package. I find the use of bartMachine very convenient. One question I have for the y_hat_train in the bartMachine outcome object and the calc_credible_intervals function is that both of them can give negative predicted probabilities when I use a binary outcome as response variable and do bartmachine with regression. Presumably given that the default model for binary regression is probit in bartMachine, shouldn't the
Hello, I don't know If I should raise an issue since it is more a theoretical problem.
I am using BART for causal prediction with a dichotomic outcome, by comparing the observation level prediction by setting a specific predictor to some counterfactual values. I noticed that the distribution of average prediction (across all observed individuals in a posterior) sometimes is multimodal.
I guess this may be due to the natural discreteness of a tree-based distribution, where a different choice near the root of a tree may cause a totally different structure in the rest of it.
How should I interpret these multimodal posteriors? that there are many possible effects and the model cannot decide? or that there is a possible interaction with other variables? or may it be a by-product of forcing a predictor value in an individual in which such value is not likely given the other covariates?
Use Java's implementation of the Mersenne Twister (see http://cs.gmu.edu/~sean/research/)
Hello,
I guess there is a typo in the checkmate
package's name:
ERROR: dependency ‘chekmate’ is not available for package ‘bartMachine’
Hello!
I am trying to use bartMachine as part of a larger package, but I'm struggling against Java memory management.
When used inside a package, I never run library(bartMachine)
and even if the java.parameters
option is set up, the information is never passed to the java machine underneath.
Trying to emulate the loading of the package I run:
libname <- list.files(R.home(), 'library', full.names = T)
rJava::.jpackage('bartMachine', lib.loc = libname)
rJava::.jpackage('bartMachineJARs', lib.loc = libname)
after setting the memory option. But nevertheless running rJava::.jcall(rJava::.jnew("java/lang/Runtime"), "J", "maxMemory") / 1e9
after it shows me the default amount of memory.
To give more info, this is my workflow:
java.parameters
is set, if not set up, ask the user to choose an amount of memory to use..jpackage()
code as above, but I'm not sure it's doing anything.bart_machine_get_posterior()
get_var_props_over_chain()
which is where I get the memory problem:<OutOfMemoryError/VirtualMachineError/Error/Throwable/Object/Exception/error/condition>
Error in `.jcall(bart_machine$java_bart_machine, "[D", "getAttributeProps",
type)`: java.lang.OutOfMemoryError: Java heap space
Is it possible to use bartMachine inside a package without loading and attaching it?
I would like to plot some sample trees in the BART model. Is "printTreeIllustations" used for visualizing trees?
I uncommented ".jcall(java_bart_machine, "V", "printTreeIllustations")" in "build_bart_machine". However, when I ran it, there is an error: method printTreeIllustations with signature ()V not found.
How can I fix it? Or is there other function to visualize sample trees?
Thanks!
Min
Hi,
This bartMachine object was loaded from an R image but was not serialized.
Please build bartMachine using the option "serialize = TRUE" next time.
I've seen the error message being reported elsewhere, but the solution has been to ensure the bartMachine version used is the same version the model was built with. For version 1.2.6, this does not remove the error message.
Thank you!
I ran a BART model with 11000 samples and 20 features(half of them are categorical variable). My mac has 8G ram. At first, I set memory to 5000 MB via function set_bart_machine_memory(5000).
Then I can fit a model through the function bartMachine one time. If I want to run another model then the R returns a error like this:
Exception in thread "pool-10-thread-1" Exception in thread "pool-10-thread-3"
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-10-thread-2" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-10-thread-4" java.lang.OutOfMemoryError: Java heap space
錯誤在.jcall(bart_machine$java_bart_machine, "Z", "isDestroyed") :
java.lang.OutOfMemoryError: Java heap space
I think that having two bartMachine object in memory may not be a good idea, so I just kill the first model through function destroy_bart_machine(), then the second model is OK to run.
The main problem is on bartMachineCV(). There are about 20 model to fit in default, and the memory error like the one above hits me when R is running the bart model with second set of parameter setting (that is : bartMachine CV try: k: 2 nu, q: 3, 0.9 m: 200 ).
Is that the bartMachineCV() function run all the 20 or more model, save all of them in the memory, then pick the "best RMSE performance" one up? I think that will be a problem to computer with limited memory space.
If the bartMachineCV() function can finish the first model, save the RMSE result, then destroy the first bart object in memory, then run the second model..... until all the CV models are finished. Now you have 20 RMSE values, and pick the one has best RMSE, then run THIS BEST MODEL again...... It will take a litter bit more time, but can save a lot of memory space...... Is that a good idea? Or is there some way to run bartMachineCV() on a 8GB RAM computer?
Thanks.
Hi, I'm interested in using bartMachine to build a BART model to explore interactions with the interaction_investigator
function. My dataset is ~200,000 rows and 49 variables. However, even on a subset of this dataset (~20,000 rows and 9 columns), running bartMachine(predictors, y, num_trees = 20)
gives an error:
Error in .jarray(model_matrix_training_data, dispatch = TRUE) : java.lang.OutOfMemoryError: Java heap space
Is bartMachine able to deal with a dataset of this size or is it intended for small datasets? If it can deal with this, are there any tips/workarounds for avoiding this error?
Thanks for your help!
The package is reversing either binary classifications or it's reversing the class probabilities. Whether I'm using the package through mlr (installed from master today via devtools) or just bartMachine itself through CRAN, here's what happens when I run the example from the vignette on R 3.2.3:
> data("Pima.te", package = "MASS")
> X <- data.frame(Pima.te[, -8])
> y <- Pima.te[, 8]
> bartMachine(X, y)
bartMachine initializing with 50 trees...
bartMachine vars checked...
bartMachine java init...
bartMachine factors created...
bartMachine before preprocess...
bartMachine after preprocess... 8 total features...
bartMachine sigsq estimated...
bartMachine training data finalized...
Now building bartMachine for classification ...
evaluating in sample data...
Iteration 100/1250
Iteration 200/1250
Iteration 300/1250
Iteration 400/1250
Iteration 500/1250
Iteration 600/1250
Iteration 700/1250
Iteration 800/1250
Iteration 900/1250
Iteration 1000/1250
Iteration 1100/1250
Iteration 1200/1250
done building BART in 2.11 sec
burning and aggregating chains from all threads... done
done
bartMachine v1.2.2 for classification
training data n = 332 and p = 7
built in 2.4 secs on 1 core, 50 trees, 250 burn-in and 1000 post. samples
confusion matrix:
predicted No predicted Yes model errors
actual No 13.000 210.000 0.942
actual Yes 71.000 38.000 0.651
use errors 0.845 0.847 0.846
Obviously not a great classification there...
I looked at #10, which seems like a similar problem, but that should be fixed in 1.2.2, right? Is this a new bug?
Greetings,
I have spent a lot of time researching how to fix the memory issue in bartMachine, but nothing that I have tried has worked.
Interestingly enough, it worked first try last night. Then I saved my script and logged off. Today I went back in, and nothing is working. I have started a new session with no package loaded, ran the "java.prameters" code BEFORE "library(bartMachine)" a few dozen times; nothing works. During some iterations, it works, but then upon repeating the exact same steps that made it work, it failed again.
I have tried every suggestions here https://stackoverflow.com/questions/34624002/r-error-java-lang-outofmemoryerror-java-heap-space and here #5 . Is there something that I am missing? My machine has 64GB; I have been allocating up to 50 but nothing seems to work.
Thanks,
Lou
Thank you for the excellent package.
I would like to work with the individual trees from a bartMachine object. Is this possible?
To clarify, let me explain my reason for wanting to do so. It is based on a result in Stefan Wager & Susan Athey (JASA, 2018; link: https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1319839). They show that a conditional mean prediction from a random forest can be interpreted as a kernel-weighted average: kernel weights are assigned to each observation to generate a prediction at point X=x in the covariate space, and these weights equal the share of trees for which a given observation is placed into a leaf that would include X=x.
If I have the individual trees, then I can construct such kernel weights. I would just need the ability to input a covariate value X=x and know, for each tree, what observations fall into the leaf that would generate the prediction for X=x.
Doing so is useful for various reasons (not to get too deep into the weeds, but I am interested to use this for a kernel-based implementation of "trimming bounds" from a paper by David S. Lee (2008, REStat)).
you can save some cycles there...
First of all, thank you for creating this package, the documentation is very clear and easy to follow. I apologize in advance for the long post, but I'd like to make my problem as clear as possible.
I have a response variable which is the nighttime light luminosity (called ntl in the data.frame
I am using). As you can see in the image below, it has some bright spots (highlighted in red) which are areas with high brightness (outliers in the data.frame
).
Also, here is the histogram of the response. It's clear that the distribution is right-skewed (and possibly there is bimodality?).
Purpose of my analysis
My goal is to predict the ntl from the coarse spatial resolution to a higher. This means that, I need to maintain these bright spots (the outliers) in the predicted image. Because at a later stage, I will downscale the residuals of the regression using area-to-point Kriging, I need them to be random (no spatial structure).
Analysis
Following the approach found in your paper, I created this code:
options(java.parameters = "-Xmx5g")
library("bartMachine")
set_bart_machine_num_cores(3)
# set working directory
wd <- "path/"
# Projected reference system (in order to convert the residuals into a raster image)
provoliko <- "EPSG:24313"
# original df
df <- read.csv(paste0(wd, 'block.data.csv'))
# extract the x and y columns (coordinates) from the df
crds <- df[, 1:2]
# here I keep only the necessary columns for my analysis
keep <- c("ntl", "pop", "agbh", "nir", "ebbi", "ndbi", "road", "pan", "nbai",
"tirs")
df <- df[keep]
x <- df[, 2:10]
y <- df[, 1]
bart_machine <- bartMachine(x, y)
bart_machine
The output of the default bartMachine
model is:
> bart_machine
bartMachine v1.3.4.1 for regression
training data size: n = 5658 and p = 9
built in 11.7 secs on 1 core, 50 trees, 250 burn-in and 1000 post. samples
sigsq est for y beforehand: 76.35
avg sigsq estimate after burn-in: 40.8314
in-sample statistics:
L1 = 21551.53
L2 = 208688.47
rmse = 6.07
Pseudo-Rsq = 0.83
p-val for shapiro-wilk test of normality of residuals: 0
p-val for zero-mean noise: 0.98127
Using the function plot_convergence_diagnostics(bart_machine)
, the result is:
Again, for a "good" model I would like to see a symmetric scatter of points around the horizontal like at zero, indicating random deviations of predictions from the observed values, but from the plot above I see that this isn't the case.
Moreover, a map of the residuals is shown below. In red, I highlighted the areas were I believe the model couldn't model well (you can compare these areas to the areas in the first image above).
As you can see, there is clearly a spatial structure (i.e., the residuals do not show a random pattern).
My question is, is there a way tell BART
to consider the outliers (i.e., bright spots in the study area) as "more important" when modelling the NTL? What are your recommendations?
Because the csv
I'm using has several thousands of rows, I can share it via a link, from here. Just so you know, running a model with default parameters takes less than 30 seconds on my laptop (8 gigs of RAM, 4 cores processor (CPU: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz)).
Lastly, I tried to use a model computed by the bartMachineCV
function (as well as variable selection) but the results are not better.
supposedly a 2x speedup
This is an enhancement request - not an Issue.
I was wondering if you were still planning on implementing multi-class models. The following quote is from Kapelner A, Bleich J. bartMachine: Machine Learning with Bayesian Additive Regression Trees. J. Stat. Soft. [Internet]. 2016 Apr. 4;70(4):1-40.
"2.3. BART for classification
BART can easily be modified to handle classification problems for categorical response variables. In Chipman et al. (2010), only binary outcomes were explored but recent work has extended BART to the multiclass problem (Kindo, Wang, and Pe 2013). Our implementation handles binary classification and we plan to implement multiclass outcomes in a future release."
(emphasis mine)
I currently have to switch to an alternative package, with substantially different formatting requirements, output, etc... It would be wonderful to have that capacity in bartMachine
.
Hello,
Thanks for the amazingly great package! I'm loving it, and able to do lots of useful things.
I seem to have found a minor bug related to creating reproducible results.
When I run k_fold_cv several times with the same seed, I get different results. I assumed the seed = would work the same way it works in build_bart_machine.
Example below (ignore the poor classification results - those are unrelated to the issue)
oos_stats <- k_fold_cv(xx, ww, k_folds = 5, seed = 82002)
.predicting probabilities where "1" is considered the target level...
.predicting probabilities where "1" is considered the target level...
.predicting probabilities where "1" is considered the target level...
.predicting probabilities where "1" is considered the target level...
.predicting probabilities where "1" is considered the target level...
oos_stats$confusion_matrix
predicted 1 predicted 0 model errors
actual 1 6.000 129.000 0.956
actual 0 8.000 1915.000 0.004
use errors 0.571 0.063 0.067
Then run again....
oos_stats <- k_fold_cv(xx, ww, k_folds = 5, seed = 82002)
.predicting probabilities where "1" is considered the target level...
.predicting probabilities where "1" is considered the target level...
.predicting probabilities where "1" is considered the target level...
.predicting probabilities where "1" is considered the target level...
.predicting probabilities where "1" is considered the target level...
oos_stats$confusion_matrix
predicted 1 predicted 0 model errors
actual 1 4.0 131.000 0.970
actual 0 6.0 1917.000 0.003
use errors 0.6 0.064 0.067
Hello,
For interpretability reasons it could be helpfull to access individual trees in the model. Is there a way to do this?
Thank you very much!
Hi Adam,
Would be possible to add an argument to the function bart_machine_get_posterior
to decide the number of posterior samples? I see that at the moment the default is 1000.
Many thanks,
Claudio
I'm getting a segmentation fault while installing bartMachineJARs
in R 3.4.0:
> install.packages("bartMachineJARs", repos="https://cran.r-project.org")
Installing package into ‘/home/aorth/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.r-project.org/src/contrib/bartMachineJARs_1.0.tar.gz'
Content type 'application/x-gzip' length 3213066 bytes (3.1 MB)
==================================================
downloaded 3.1 MB
* installing *source* package ‘bartMachineJARs’ ...
** package ‘bartMachineJARs’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
No man pages found in package ‘bartMachineJARs’
*** installing help indices
** building package indices
** testing if installed package can be loaded
sh: line 1: 5502 Segmentation fault '/export/apps/R/3.4.0/lib64/R/bin/R' --no-save --slave 2>&1 < '/tmp/Rtmpdjw7VM/file157a539d6b34'
ERROR: loading failed
* removing ‘/home/aorth/R/x86_64-pc-linux-gnu-library/3.4/bartMachineJARs’
The downloaded source packages are in
‘/tmp/RtmpYLaVlB/downloaded_packages’
Warning message:
In install.packages("bartMachineJARs", repos = "https://cran.r-project.org") :
installation of package ‘bartMachineJARs’ had non-zero exit status
rJava
and other dependencies are already installed. The environment is CentOS 6 with Java OpenJDK 1.7.0_121.
Note: I just succeeded to install bartMachine
in R 3.3.3, but I'm posting this issue to track the problem with R 3.4.0.
Is there a means to implement pd_plot() to plot predictor vars on the original scale. i.e. similar the 'x_quantile' parameter option in ICEbox, not necessarily calculated only at specified quantiles?
Additionally can the dataframe be extracted from pd_plot with credible interval values, in order to plot with other plotting packages? The list contained in the pd_plot function doesn't contain CI values.
Thank you for the help!
Hello @kapelner ,
I'm replicating the results from your paper, but I am getting different results for the interaction effects in section 4.11. I think it's because of this loop, specifically the j<=i part.
for (i in 1:bart_machine$p) { for (j in 1:bart_machine$p) { if (j <= i) { avg_counts[iter] = interaction_counts_avg[i, j] sd_counts[iter] = interaction_counts_sd[i, j] names(avg_counts)[iter] = paste(rownames(interaction_counts_avg)[i], "x", rownames(interaction_counts_avg)[j]) iter = iter + 1 } } }
Could you have a look?
First of all, thank you for this implementation it works really well in R and is easy to use as well as efficient.
There are a lot of implemented functions that allow the user to measure the performance of a BART model. Other metrics are normally implemented by the user (e.g. correlation or R2 etc.). What about explaining the model? I don't know if there is an implemented way to draw the final decision tree that was made by the Bart machine.
For example, I am looking for a way to generate a DAG graph from the decisions made by the final BART tree. This way we could calculate the importance/contribution of each feature to the final model performance, possibly implementing SHAP values as well.
In short, is there a way to draw the BART tree as a DAG or any other way to measure feature contribution?
It looks like there's no way to set bartMachine
's random seed from R?
(I don't mean to intrude or anything, so let me know if Github issues aren't the best way to communicate this sort of thing to you.)
Hi, the installation I am seeing hangs on
checking whether Java run-time works... yes
checking whether -Xrs is supported...
I can confirm that I have a JDK installed, and I have done the R CMD javareconf step. I installed rJava through apt-get install r-cran-rjava. Do you have any tips on what to look for while debugging?
Hello,
I was wondering if the heteroskedastic BART (hBART) features are/will be available in the main branch. It seems like the hBART branch is very outdated at this point, and I'm not sure if the features of that branch will work with the more modern features of the bartMachine package (such as visualization and variable selection).
Best,
Jacob
Hi Adam,
For the partial plot of binary responses, is it possible to change y-axis to probabilities from probits? I find it more useful to directly display probabilities rather than probits to people when I show the partial plot.
Best,
Jason
Hello,
I'm generating an ensemble model with bartMachineArr()
to produce a more robust posterior predictive distribution. I need to save the model for later use.
When I restore the array though, only the first model will work with bart_machine_get_posterior()
, while for the others I get:
Error in check_serialization(bart_machine) :
This bartMachine object was loaded from an R image but was not serialized.
Please build bartMachine using the option "serialize = TRUE" next time.
I guess the serialize
argument of bartMachine
doesn't get passed to the other models, or some connection is lost.
Here's the dummy code to produce the model:
n_models <- 5
model <- bartMachine(X = X, y = y, serialize = TRUE, ...)
if (n_models > 1) {
model <- bartMachineArr(model, R = n_models)
} else {
model <- list(model)
}
readr::write_rds(model, 'model.rds', compress = 'gz')
And to produce averaged predictive posteriors:
pred_post <- bart_machine_get_posterior(model[[1]], new_data = data)$y_hat_posterior_samples
if (n_models > 1) {
for (i in 2:n_models) {
pred_post <- pred_post + bart_machine_get_posterior(model[[i]], new_data = data)$y_hat_posterior_samples
}
}
pred_post <- pred_post / n_models
My session info:
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
attached base packages:
[1] grid parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] knitr_1.31 tidytrees_0.2.2 pROC_1.17.0.1 magrittr_2.0.1
[5] pbmcapply_1.5.0 pbapply_1.4-3 readr_1.4.0 glue_1.4.2
[9] bartMachine_1.2.6 missForest_1.4 itertools_0.1-3 iterators_1.0.13
[13] foreach_1.5.1 randomForest_4.6-14 bartMachineJARs_1.1 rJava_0.9-13
[17] bayestestR_0.8.2 partykit_1.2-11 mvtnorm_1.1-1 libcoin_1.0-7
[21] readxl_1.3.1 stringr_1.4.0 dplyr_1.0.6
loaded via a namespace (and not attached):
[1] pkgload_1.1.0 splines_4.0.5 Formula_1.2-4 assertthat_0.2.1
[5] pander_0.6.3 cellranger_1.1.0 yaml_2.2.1 remotes_2.2.0
[9] sessioninfo_1.1.1 pillar_1.6.0 backports_1.2.1 lattice_0.20-41
[13] digest_0.6.27 pryr_0.1.4 checkmate_2.0.0 htmltools_0.5.1.1
[17] Matrix_1.3-2 plyr_1.8.6 pkgconfig_2.0.3 devtools_2.3.2
[21] magick_2.6.0 purrr_0.3.4 processx_3.5.2 tibble_3.1.1
[25] generics_0.1.0 usethis_2.0.0 ellipsis_0.3.2 cachem_1.0.4
[29] withr_2.4.1 cli_2.5.0 survival_3.2-10 crayon_1.4.1
[33] memoise_2.0.0 evaluate_0.14 ps_1.6.0 fs_1.5.0
[37] fansi_0.4.2 pkgbuild_1.2.0 rapportools_1.0 tools_4.0.5
[41] prettyunits_1.1.1 hms_1.0.0 matrixStats_0.58.0 lifecycle_1.0.0
[45] callr_3.5.1 compiler_4.0.5 inum_1.0-2 tinytex_0.30
[49] rlang_0.4.11 base64enc_0.1-3 rmarkdown_2.7 testthat_3.0.1
[53] codetools_0.2-18 DBI_1.1.1 R6_2.5.0 lubridate_1.7.10
[57] fastmap_1.1.0 utf8_1.2.1 rprojroot_2.0.2 insight_0.12.0
[61] desc_1.3.0 stringi_1.6.1 Rcpp_1.0.6 vctrs_0.3.8
[65] rpart_4.1-15 tidyselect_1.1.1 xfun_0.22
I was wondering if it would be possible to enable interaction constraints to bartMachine. These have recently been added to xgboost (link). Having interaction constraints would be very handy for fitting models where some variables are held out from the whole response surface, e.g, y ~ f(x1) + f(x2, x3, x4, x5,...)
.
when I tried to run bartMachine for a dataset with 350000 obs * 13 features, I got the following message:
building BART with mem-cache speedup...
Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap space
done building BART in 1.513 sec
burning and aggregating chains from all threads...
at gnu.trove.list.array.TDoubleArrayList.(TDoubleArrayList.java:91)
at gnu.trove.list.array.TDoubleArrayList.(TDoubleArrayList.java:79)
at bartMachine.bartMachineTreeNode.propagateDataByChangedRule(Unknown Source)
at bartMachine.bartMachine_g_mh.doMHGrowAndCalcLnR(Unknown Source)
at bartMachine.bartMachine_g_mh.metroHastingsPosteriorTreeSpaceIteration(Unknown Source)
done
at bartMachine.bartMachine_e_gibbs_base.SampleTree(Unknown Source)
at bartMachine.bartMachine_e_gibbs_base.DoOneGibbsSample(Unknown Source)
at bartMachine.bartMachine_e_gibbs_base.DoGibbsSampling(Unknown Source)
at bartMachine.bartMachine_e_gibbs_base.Build(Unknown Source)
at bartMachine.bartMachineRegressionMultThread$1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
evaluating in sample data...
Error in .jarray(model_matrix_training_data, dispatch = TRUE) :
java.lang.OutOfMemoryError: Java heap space
any suggestion how to solve this problem?
Hi there. First, thanks a lot for this great package :)
Second, I was wondering if there's a way to save/access the confidence bounds created in the pd_plot
command? I would like to customise the output in ggplot
before exporting but pd_plot
only saves quantiles and posterior means but not the confidence bounds shown in the graph.
Any suggestions would be very welcome. Many thanks in advance!
bartMachine is fantastic and has much nicer features than the other BART implementations available in R. The one thing it's missing is the capacity to handle multinomial outcomes. Is that on the roadmap?
The vector of posterior samples on the error variance returned by get_sigsqs
contains many zeroes. Minimal example:
library(bartMachine)
n = 50
X = data.frame(cippa=rnorm(n), lippa=rnorm(n), pasqualino=rnorm(n))
y = rnorm(n)
bm = bartMachine(X, y)
ss = get_sigsqs(bm, plot_hist=T)
print(ss)
Output:
Loading required package: rJava
Loading required package: bartMachineJARs
Loading required package: randomForest
randomForest 4.7-1.1
Type rfNews() to see new features/changes/bug fixes.
Loading required package: missForest
Welcome to bartMachine v1.3.2! You have 0.54GB memory available.
If you run out of memory, restart R, and use e.g.
'options(java.parameters = "-Xmx5g")' for 5GB of RAM before you call
'library(bartMachine)'.
bartMachine initializing with 50 trees...
bartMachine vars checked...
bartMachine java init...
bartMachine factors created...
bartMachine before preprocess...
bartMachine after preprocess... 3 total features...
bartMachine sigsq estimated...
bartMachine training data finalized...
Now building bartMachine for regression...
evaluating in sample data...done
[1] 0.6738567 0.6823280 0.3963788 0.5420501 0.5526897 0.4949011 0.6561506
[8] 0.4309214 0.6297360 0.6574286 0.8740465 0.6049023 0.7190190 0.6186070
[15] 0.4147947 0.5096590 0.3854179 0.3236273 0.6343288 0.7104224 0.5492379
[22] 0.5403672 0.7041673 0.5576870 0.6404842 0.4569708 0.7032812 0.6057967
[29] 0.4068528 0.6529362 0.4497582 0.5405523 0.5146371 0.5606146 0.7231270
[36] 0.4251352 0.5017961 0.4039385 0.4185928 0.5516639 0.5349040 0.6444614
[43] 0.4168752 0.4755697 0.4472841 0.6164776 0.5950383 0.6032409 0.4814202
[50] 0.3817277 0.4124045 0.6495234 0.6515946 0.6564860 0.5243605 0.4625689
[57] 0.5676971 0.5400568 0.3962279 0.3540788 0.4758006 0.4983467 0.4925431
[64] 0.5626715 0.9368755 0.7203849 0.4467512 0.5900052 0.2964828 0.6399533
[71] 0.4715198 0.7480451 0.5043543 0.5348309 0.4169205 0.3890896 0.5225142
[78] 0.5893633 0.6248662 0.4411586 0.5316090 0.5906821 0.6600916 0.6358257
[85] 0.5430974 0.4191855 0.4234045 0.6401893 0.6160910 0.6149260 0.5239037
[92] 0.7346262 0.7927698 0.8559771 0.8564749 0.6399547 0.6617545 0.4862764
[99] 0.5218394 0.5361528 0.4645625 0.4862391 0.4249653 0.4966983 0.5722455
[106] 0.5756632 0.5889968 0.7468448 0.7112704 0.4752701 0.4422910 0.6224502
[113] 0.7478145 0.6917788 0.6593739 0.5041079 0.5702368 0.4908382 0.5388601
[120] 0.6565747 0.7446141 0.4281959 0.8973551 0.5082524 0.6022598 0.6682022
[127] 0.6210314 0.6441824 0.4827757 0.7639993 0.4104385 0.8480981 0.7278081
[134] 0.6674551 0.7050705 0.5499230 0.7574979 0.6489151 0.7373134 0.5471537
[141] 0.5827605 0.5526380 0.5107312 0.4410340 0.4361805 0.3881677 0.6540108
[148] 0.4434175 0.5201778 0.7684820 0.6036935 0.7783705 0.8112201 0.5085767
[155] 0.4166957 0.5891744 0.8272326 0.8059974 0.6039739 0.4926725 0.5685766
[162] 0.4819520 0.4345115 0.7241730 0.5001127 0.6093101 0.8074775 0.6211340
[169] 0.7598558 0.6495594 0.5982428 0.6298588 0.7029029 0.5206628 0.6280212
[176] 0.5671791 0.4642438 0.9423288 0.6641100 0.5236050 0.4615422 0.5714215
[183] 0.6319731 0.5353613 0.4966538 0.5876032 0.6829575 0.5461618 0.3516722
[190] 0.4463553 0.4113644 0.7175616 0.7268501 0.9897334 0.5659359 0.5467450
[197] 0.3853242 0.4799703 0.4543558 0.3864065 0.3867739 0.4059116 0.4904520
[204] 0.4990398 0.5829876 0.6681405 0.5245365 0.4816886 0.7247148 0.4489095
[211] 0.4673745 0.5346889 0.5267316 0.5896845 0.7151791 0.4212330 0.6294356
[218] 0.7690222 0.6634902 0.6094897 0.5036922 0.5318404 0.4286724 0.4636125
[225] 0.3526284 0.4528986 0.3979473 0.6440758 0.4455897 0.4236689 0.5220958
[232] 0.5161978 0.6882160 0.5583662 0.6369836 0.4804866 0.5249673 0.3036010
[239] 0.3186854 0.3680794 0.3473360 0.3445205 0.3993789 0.6134708 0.5605042
[246] 0.4203016 0.5262919 0.7055306 0.4151140 0.3950877 0.4945300 0.3403064
[253] 0.6336636 0.5760601 0.5937295 0.7816578 0.5743999 0.5130912 0.3211351
[260] 0.4333243 0.4532728 0.8836527 0.5542467 0.5189086 0.3853733 0.5863797
[267] 0.6456744 0.5490664 0.6830404 0.5591154 0.6707948 0.7266904 0.8191645
[274] 0.6519134 0.4418197 0.6754014 0.6498065 0.6172073 0.6850096 0.7220211
[281] 0.4933638 0.3720437 0.5880862 0.4299151 0.5928170 0.6562778 0.8242513
[288] 0.4299219 0.4942766 0.5546288 0.4497238 0.5144395 0.8645564 0.5358512
[295] 0.6815332 0.4826089 0.5361426 0.6610768 0.4361343 0.7495521 0.8360831
[302] 0.6435257 0.4058136 0.3989772 0.7589361 0.6532333 0.6084130 0.6920568
[309] 0.4990992 0.7679044 0.6186806 0.7343711 0.7986024 0.5366567 0.4798369
[316] 0.6301960 0.4961707 0.5888135 0.4025134 0.5232452 0.6341495 0.5325702
[323] 0.5920682 0.4763510 0.7678790 0.3672965 0.8628438 0.5431980 0.5667032
[330] 0.4591356 0.5990987 0.5662614 0.4698281 0.6696803 0.5874177 0.5867314
[337] 0.5061874 0.4649711 0.6824143 0.4890839 0.5424590 0.3392641 0.3994007
[344] 0.3851744 0.5515101 0.5159149 0.4030643 0.6579546 0.5439394 0.4285955
[351] 0.5869903 0.4381423 0.8446627 0.3845370 0.5609957 0.4207567 0.6653356
[358] 0.4841911 0.4964988 0.4404760 0.5421151 0.3389921 0.7477354 0.9128842
[365] 0.6247109 0.4614823 0.6427378 0.4976938 0.7045662 0.4602396 0.4469913
[372] 0.5559259 0.5829157 0.4815411 0.4331116 0.6426673 0.0000000 0.0000000
[379] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[386] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[393] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[400] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[407] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[414] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[421] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[428] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[435] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[442] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[449] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[456] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[463] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[470] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[477] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[484] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[491] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[498] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[505] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[512] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[519] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[526] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[533] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[540] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[547] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[554] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[561] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[568] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[575] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[582] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[589] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[596] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[603] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[610] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[617] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[624] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[631] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[638] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[645] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[652] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[659] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[666] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[673] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[680] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[687] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[694] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[701] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[708] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[715] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[722] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[729] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[736] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[743] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[750] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[757] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[764] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[771] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[778] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[785] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[792] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[799] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[806] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[813] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[820] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[827] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[834] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[841] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[848] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[855] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[862] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[869] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[876] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[883] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[890] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[897] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[904] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[911] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[918] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[925] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[932] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[939] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[946] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[953] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[960] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[967] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[974] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[981] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[988] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[995] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
building BART with mem-cache speedup...
Iteration 100/1250
Iteration 200/1250
Iteration 300/1250
Iteration 400/1250
Iteration 500/1250
Iteration 600/1250
Iteration 700/1250
Iteration 800/1250
Iteration 900/1250
Iteration 1000/1250
Iteration 1100/1250
Iteration 1200/1250
done building BART in 0.217 sec
burning and aggregating chains from all threads... done
Versions:
Hello,
This is quite interesting package with some important and fast results. Is there an implementation in python or an efficient alternative in python?
Thank you in advance :)
First of all I would like to thank you for your continued support for this software.
I would like to use this package in MATLAB for a regression problem. Since Matlab has a java runtime embedded I thought this would be easy to accomplish since the bartMachine is implemented in Java. However this did not go as planned. For testing I used a deterministic scalar function in two variables which should get approximated by the bartMachine. The prediction does not approximate the desired function at all, instead the predicted points on the domain seem to show a parabolic function. If you could give me some pointers how to fix this issue I would be highly grateful.
The following code was used as a test in Matlab R2022a:
% load bartMachine java dependencies
bartJARs = dir('path-to-bartMachine-folder/**/inst/java/*.jar');
for ii = 1:length(bartJARs)
javaaddpath(strcat(bartJARs(ii).folder, '\', bartJARs(ii).name));
end
rng(12341234);
[X, Y] = ndgrid(-10:0.5:10,-10:0.5:10);
f = @(x,y) x.^3 + x.*y + y.^3;
Z = f(X,Y);
mesh(X,Y,Z)
Feature = [X(:),Y(:)];
Label = Z(:);
Combined = [Feature,Label];
bart = bartMachine.bartMachineRegressionMultThread;
alist = java.util.ArrayList;
for ii = 1:size(Combined,1)
alist.add(Combined(ii,:));
end
bart.setData(alist);
bart.Build();
Label_pred = zeros(size(Label));
for ii = 1:size(Label_pred,1)
Label_pred(ii) = bart.Evaluate(Feature(ii,:));
end
figure;
scatter3(Feature(:,1), Feature(:,2), Label_pred, '.');
Hi there,
Is there any way to analyze survival outcomes using the package currently?
Thanks!
Have you thought about implementing thinning of the iterations? I'm thinking this would be a parameter that lets you specify that only every n
iterations (e.g. n=10
) are saved after burn-in.
I've only played with your implementation a little bit, but in a toy example, 10k post-burn-in iterations got me an effective sample size of around 250. I'd be almost as well off (and my memory would be much better off) keeping every 10th iteration.
(P.S. Thanks for doing this. I'm really excited about bartMachine!)
I've been using bartMachine
for a while now, but I recently tried to install it in a new install of R and in the R GUI, when I try to load the bartMachine package or even the bartMachineJARs
package, I get an immediate fatal error with no output. I'm using macOS 10.12.6. Here is my sessionInfo output:
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19
The last error in my log file is below, but I don't think it relates to this because the timing is off (it's currently 13:41 and the last log file entry is labeled today at 17:25):
09 Jul 2018 17:25:55 [rsession-david] ERROR Unexpected exception: boost: mutex lock failed in pthread_mutex_lock: Invalid argument; LOGGED FROM: void rstudio::session::ClientEventService::run() /Users/vagrant/workspace/IDE/macos/src/cpp/session/SessionClientEventService.cpp:351
Now, the really strange thing is that when I open a terminal and start R in the terminal, it will load bartMachine
just fine.
MacBook-Pro:~ david$ R
R version 3.5.0 (2018-04-23) -- "Joy in Playing"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> library(bartMachine
+ )
Loading required package: rJava
Loading required package: bartMachineJARs
Loading required package: car
Loading required package: carData
Loading required package: randomForest
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Loading required package: missForest
Loading required package: foreach
Loading required package: itertools
Loading required package: iterators
Welcome to bartMachine v1.2.3! You have 0.54GB memory available.
If you run out of memory, restart R, and use e.g.
'options(java.parameters = "-Xmx5g")' for 5GB of RAM before you call
'library(bartMachine)'.
I tried installing the previous version of both bartMachine and bartMachineJARs, both to no avail. In the interest of completeness. Here's what happens when I run javareconf
(which I had done previously as root).
MacBook-Pro:~ david$ R CMD javareconf
Java interpreter : /usr/bin/java
Java version : 10.0.1
Java home path : /Library/Java/JavaVirtualMachines/jdk-10.0.1.jdk/Contents/Home
Java compiler : /usr/bin/javac
Java headers gen.: /usr/bin/javah
Java archive tool: /usr/bin/jar
trying to compile and link a JNI program
detected JNI cpp flags : -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/darwin
detected JNI linker flags : -L$(JAVA_HOME)/lib/server -ljvm
clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I/Library/Java/JavaVirtualMachines/jdk-10.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/jdk-10.0.1.jdk/Contents/Home/include/darwin -I/usr/local/include -fPIC -Wall -g -O2 -c conftest.c -o conftest.o
clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o conftest.so conftest.o -L/Library/Java/JavaVirtualMachines/jdk-10.0.1.jdk/Contents/Home/lib/server -ljvm -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
JAVA_HOME : /Library/Java/JavaVirtualMachines/jdk-10.0.1.jdk/Contents/Home
Java library path: $(JAVA_HOME)/lib/server
JNI cpp flags : -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/darwin
JNI linker flags : -L$(JAVA_HOME)/lib/server -ljvm
Updating Java configuration in /Library/Frameworks/R.framework/Resources
Done.
This isn't a deal breaker, but it would be nice to have it working in the GUI or RStudio.
When I predict probabilities, I'm getting probabilities for the opposite class of what I'm expecting. Example:
library(bartMachine)
data(Sonar, package = "mlbench")
model = bartMachine(Sonar[-61], Sonar$Class)
classes = predict(model, new_data = Sonar[-61], type = "class")
probs = predict(model, new_data = Sonar[-61], type = "prob")
levels(Sonar$Class)
I'm getting something like
[1] R R R R R R R M R R R R ...
Levels: M R
for classes
and for probs
[1] 0.61762749 0.63869063 0.51221708 ...
So the probabilities are for the "R" class, which is the second class in the level set. I would expect probabilities for the first class.
Changing the level set before giving the data to bartMachine doesn't seem to make a difference.
Hi, I met a memory issue when I ran bartMachine in parallel by using the function ‘foreach’. Sample codes look like below:
option(java.parameters = '-Xmx5g')
library(bartMachine)
bart = bartMachine(X, Y)
result = foreach(i = 1:n, .combine = c, .packages = c(‘bartMachine’)) %dopar% {
newX = …
predict(bart, newX)
}
The execution hangs after going inside the foreach loop with an error message saying ‘java.lang.OutOfMemoryError: Java heap space’, no matter how much memory I set at the beginning.
No error returned when I removed predict(bart, newX)
from the foreach loop. No error returned when I changed the foreach
loop to regular for
loop. I feel like there is some conflict between bartMachine and foreach? Can you help me with this? Thank you!
Hello,
I was trying to reproduce the example in the package vignette but when I use the function plot_y_vs_yhat
with the argument prediction_intervals = TRUE
i receive the following error:
Error in credible_intervals[, 1] : incorrect number of dimensions
I tried also with another set of data and the same happens.
I have been attempting to create a PS model using bartMachine. My code is as follows:
new_ps_model <- bartMachine(X = data %>%
dplyr::select(colnames(bart_ps_model$X)),
y = data[,"drugclass"] %>%
unlist(),
numtrees = 50,
use_missing_data = TRUE,
num_burn_in = 10,
num_iterations_after_burn_in = 10,
serialize = FALSE)
When this function runs, I get the following error:
Error in if (xi > xj) 1L else -1L : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In xtfrm.data.frame(x) : cannot xtfrm data frames
2: In Ops.factor(xi, xj) : '>' not meaningful for factors
Not sure exactly where to go with this - I went through my data and while there are some missing values in the columns in X, there are no missing values in y. Would appreciate any recommendations for next steps.
I am running a dichotomous classification problem using bartMachine.
When I check the posterior (example below) using bart_machine_get_posterior I see some negative values.
According to the help, the units are probabilities (e.g., not probits).
"y_hat_posterior_samples The full set of posterior samples of size num_iterations_after_burn_in for each observation. For regression, the estimates have the same units as the response. For classification, the estimates are probabilities."
Thus, negative values should be impossible. Any thoughts regarding what's happening here? It's not an isolated case.
Running the code directly from the vignette, I get the following error when attempting to fit the model with missing covariates.
Error in .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, :
java.lang.NullPointerException
Specifically, I ran
library("bartMachine")
options(java.parameters="-Xmx1000m")
set_bart_machine_num_cores(4)
y <- automobile$log_price
X <- automobile; X$log_price <- NULL
bart_machine <- bartMachine(X=X, y=y, use_missing_data = TRUE,
use_missing_data_dummies_as_covars = TRUE)
Particularly confusing, because I ran this code on old versions of the package as well (and got the same error), so I'm unsure whether this is a problem with the package or a problem with my install. For reference, this issue also appears here.
I use bartMachine
together with caret
. As recommended, I always turn the option serialize
to TRUE
in train
. However, I noticed that bartMachine v1.2.5.1
is not able to use model created by bartMachine v1.2.5.0
. I receive the following error message:
Error in check_serialization(object) :
This bartMachine object was loaded from an R image but was not serialized.
Please build bartMachine using the option "serialize = TRUE" next time.
Edit
This is reproducible with the example from ?bartMachine
library(bartMachine) ### Here version 1.2.5.0 is mandatory
## Generate Friedman data
set.seed(11)
n = 200
p = 5
X = data.frame(matrix(runif(n * p), ncol = p))
y = 10 * sin(pi* X[ ,1] * X[,2]) +20 * (X[,3] -.5)^2 + 10 * X[ ,4] + 5 * X[,5] + rnorm(n)
## Build BART regression model
bart_machine = bartMachine(X, y, serialize = TRUE)
summary(bart_machine)
bartMachine v1.2.3 for regression
training data n = 200 and p = 5
built in 0.8 secs on 1 core, 50 trees, 250 burn-in and 1000 post. samples
sigsq est for y beforehand: 7.281
avg sigsq estimate after burn-in: 0.76294
in-sample statistics:
L1 = 104.28
L2 = 92.58
rmse = 0.68
Pseudo-Rsq = 0.9801
p-val for shapiro-wilk test of normality of residuals: 0.0258
p-val for zero-mean noise: 0.41082
## Save the bartMachine object
saveRDS(bart_machine, "~/bartMachine_version1.2.5.0.rda")
Now we upgrade bartMachine
to version 1.2.5.1
bart_machine <- readRDS("~/bartMachine_version1.2.5.0.rda")
summary(bart_machine)
bartMachine v1.2.5.1 for regression
training data n = 200 and p = 5
built in 0.8 secs on 1 core, 50 trees, 250 burn-in and 1000 post. samples
Error in .jcall(bart_machine$java_bart_machine, "[D", "getGibbsSamplesSigsqs") :
RcallMethod: attempt to call a method of a NULL object.
y_hat = predict(bart_machine, X)
Error in check_serialization(object) :
This bartMachine object was loaded from an R image but was not serialized.
Please build bartMachine using the option "serialize = TRUE" next time.
Hi,
This is probably a very naive question but I have multiple datasets from different sites that might have different properties. My response variable and my explanatory var are similar in all datasets. I wonder how it is possible to take into account the sites information.
If I split by sites, I might lose some of them as they have less information than other.
Thanks!
Nico
Sorry - but I might not be using the terminology correctly. Maybe this is an "R" issue (I'm a user - but not a programming expert). However, this behavior differs from any other function I have used in R (base or packages).
The issue is that when I call bartMachine(X, y) from inside of a function, it pulls X and y from the Global environment - not the values of X and y that I passed it.
Here is a reproducible example:
Basically, the first time through I make sure x and y are not present in the global environment. I then check that x and y are present inside my function (i.e., I didn't screw up passing them). But, when bartMachine(x, y) is called, it says it can't find x.
Then I define x and y in the global environment, make the same call, and voila.
library(tidyverse)
library(bartMachine)
data(mtcars)
vlist <- c("cyl", "disp")
testmod <- function(d, t, v){
x <- d[v]
y <- d[[t]]
cat("Is x here? Yes - here is x[1,1] ", x[1,1])
cat("\n")
cat("Is y here? Yes, here is y[1] ", y[1])
cat("\n")
mod <- bartMachine(X = x, y = y)
return(mod)
}
rm(x, y)
#> Warning in rm(x, y): object 'x' not found
#> Warning in rm(x, y): object 'y' not found
modwt <- testmod(mtcars, "wt", vlist)
#> Is x here? Yes - here is x[1,1] 6
#> Is y here? Yes, here is y[1] 2.62
#> bartMachine initializing with 50 trees...
#> Error in (function (X = NULL, y = NULL, Xy = NULL, num_trees = 50, num_burn_in = 250, : object 'x' not found
x <- mtcars[c("cyl", "disp")]
y <- mtcars[["wt"]]
modwt <- testmod(mtcars, "wt", vlist)
#> Is x here? Yes - here is x[1,1] 6
#> Is y here? Yes, here is y[1] 2.62
#> bartMachine initializing with 50 trees...
#> bartMachine vars checked...
#> bartMachine java init...
#> bartMachine factors created...
#> bartMachine before preprocess...
#> bartMachine after preprocess... 2 total features...
#> bartMachine sigsq estimated...
#> bartMachine training data finalized...
#> Now building bartMachine for regression...
#> evaluating in sample data...done
Created on 2023-02-27 with reprex v2.0.2
Hello,
First of all, many compliments for bartMachine, a really nice implementation of a wonderful algorithm.
I am using BART for potential outcomes causal inference, which is based on comparing Yhat at the observation level, predicted after assigning specific values to variable X for all observations while keeping the other covariates Z fixed at their original value. (https://nyuscholars.nyu.edu/en/publications/bayesian-nonparametric-modeling-for-causal-inference)
The problem is that my dataset is very large [27358 x 224] and predictions made with bart_machine_get_posterior simply take forever. Since I'll need to do this for 224 variables with multiple evaluated values each the analysis would take days.
Reading around the issues I saw that the Array version of BART would fix memory issues, but would it also fix speed ones? Or is there any setting in bartMachine which would make predictions faster? Considering that my problem is more prediction time than estimation time (for my model it takes ~ 30 mins), is there a way to balance speed toward the first?
The only alternative solution I could think of is to build the model on 2/3 of the dataset and estimate the variable effects on the other third.
Here are the arguments I use for the model:
bartMachine(X = X, y = Y,
verbose = T,
num_trees = 200,
num_iterations_after_burn_in = 5000,
run_in_sample = F,
mem_cache_for_speed = F, # Otherwise it crashes
use_missing_data = T, serialize = save)
And this is the code I use to estimate the Individual Treatment Effect (maybe some speedup is possible also here):
compute_BART_ITE <- function(bart.mod, data = NULL, vars = NULL, quants = c(.1, .3, .5, .7, .9)) {
if (is.null(vars)) vars <- bart.mod$X %>% colnames()
data <- if (is.null(data)) bart.mod$X else data %>% select(any_of(vars))
lapply(vars, function(V) {
print(glue("{which(V %in% vars)}/{length(vars)}: {V}"))
if (n_distinct(data[[V]]) > 5 & is.numeric(data[[V]])) {
pred.val <- quantile(data[[V]], quants, na.rm = T) %>% sort %>% signif(3)
} else pred.val <- unique(data[[V]]) %>% sort
data[[V]] <- pred.val[1]
tictoc::tic('Computed reference matrix')
ref.matrix <- bart_machine_get_posterior(bart.mod, new_data = data)$y_hat_posterior_samples
tictoc::toc()
pblapply(pred.val[-1], function(val) {
data[[V]] <- val
log(bart_machine_get_posterior(bart.mod, new_data = data)$y_hat_posterior_samples) - log(ref.matrix)
}) %>% magrittr::set_names(paste(pred.val[-1], 'vs', pred.val[1]))
}) %>% magrittr::set_names(vars)
}
bartMachineCV
is very verbose, even with verbose = FALSE
. I know some messages come from Java directly, but there are others that come from R and that, arguably, should not be produced with verbose = FALSE
. Two main issues:
build_bart_machine_cv
, here:
, the call to bart_machine_cv
does not pass the verbose
argument (passing the ...
does not do here), and thus bart_machine_cv
is run with its default verbose = TRUE
.These are some screenshots of such a debugging session. Note the value of verbose
is FALSE
:
We run and this happens (note the verbose output)
Now, if we call bart_machine_cv
explicitly passing verbose = verbose
it honors the argument:
build_bart_machine_cv
has many cat
s that are not surrounded by the if (verbose)
construction that is present in, say, build_bart_machine
itself . (I actually wonder if using cat
, instead of message
is best practice; but this is a different issue).Is it possible to suppress output messages completely when running bartMachine? I tried verbose = F and even redirecting the output to a temporary file to no avail.
set.seed(11)
n = 200
p = 5
X = data.frame(matrix(runif(n * p), ncol = p))
y = 10 * sin(pi* X[ ,1] * X[,2]) +20 * (X[,3] -.5)^2 + 10 * X[ ,4] + 5 * X[,5] + rnorm(n)
sink(file = tempfile())
bartMachine(X, y, verbose = F)
sink()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.