Giter Club home page Giter Club logo

bas's Introduction

  • 👋 Hi, I’m @merliseclyde ! I am a professor at Duke University in the the Department of Statistical Science.
  • 👀 I’m interested in Bayesian Statistics, in particular, methods for addressing Bayesian model uncertainty and Bayesian non-parametric regression.
  • 💞️ I’m interested in collaborating on packages so let me know if you are interested in contributing to the R package BAS or bark
  • 📫 If you encounter an issue with any of my R packages please report an issue here on GitHub rather than emailing me!

bas's People

Contributors

akashrajkn avatar betsybersson avatar merliseclyde avatar olivroy avatar vandenman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bas's Issues

predict.basglm not compatible with MPM estimator

When estimator = 'MPM' predict.basglm produces an error regarding the prior values.

Reproducible example:

data(Pima.tr, package="MASS")
data(Pima.te, package="MASS")
Pima.bas = bas.glm(type ~ ., data=Pima.tr, n.models= 2^7, method="BAS",
                   betaprior=CCH(a=1, b=nrow(Pima.tr)/2, s=0), family=binomial(),
                   modelprior=uniform())
pred = predict(Pima.bas, newdata=Pima.te, estimator = 'MPM')

#Error in bas.lm(eval(object$call$formula), data = eval(object$call$data,  : 
#  prior  1 is not one of  g-prior, hyper-g, hyper-g-laplace, hyper-g-n, AIC, BIC, ZS-null, ZS-full, EB-local, EB-global, JZSprior

Desktop (please complete the following information):

  • OS: Windows x86_64-w64-mingw32/x64 (64-bit)
  • R Version 4.0.2

bas.glm() function

We are interested in implementing the bas.glm() function cannot be applied in a quasi-Poisson or negative binomial regression model framework. It would be very useful to deal with over-dispersed count data. We look forward to this extension in the near future.

BPM coefficients don't align with explanatory variables in BPM model - bas and pred.bas objects

Update:

I found a way to pull the coefficients for the model. I still can't pull the correct coefficients using the method I originally asked about.

Method that works, in case it helps someone else reading this:

  • Variables in the model: lmmovies$namesx[lmBPM$bestmodel + 1] (there are multiple ways of getting these)
  • Coefficients for the variables: lmmovies$mle[lmBPM$best]

To confirm I had pulled the correct information, I used this information to predict the first record, and it matched the fitted value from the model for that record.

  • Observations for each variable (factors converted to 1/0 by the model) for the first record: lmmovies$X[1, ]
  • Because the model is centered, the sample mean for each variable is needed: lmmovies$mean.x
  • Model shrinkage is also needed, if shrinkage isn't equal to 1: lmmovies$shrinkage[lmBPM$best]

There's some matching up/data prep to be done to get ready to do the math, which I'm not including here because I'm now working on a different model than the one shown below and the code wouldn't work.

For each variable, calculate: a = (coefficient * shrinkage) * (observation - sample mean)

Then, Y = Intercept + sum(a)

Original post:

I am using the BAS package to do Bayesian multiple regression modeling for the Coursera course from Duke University, "Bayesian Statistics."

I am experiencing what I think is a bug. When I pull the mean values for the coefficients for the BPM, I have been assuming that:

  • there will be non-zero values for the variables included in the BPM model's best.vars list.
  • there will be zero values for the variables not included in that list.

That isn't what I'm experiencing, however.

I'm using BAS version 1.5.5 and statsr version 0.2.1 in R version 4.0.2. My OS is Mac OSX Catalina 10.15.5.

Here's sample code that reproduces what I'm seeing.

library(dplyr)
library(statsr)
library(BAS)

load("movies.Rdata")

# data set for modeling
dsmodeling <- movies %>%
select(audience_score, genre, thtr_rel_year,
critics_score, best_pic_nom, best_pic_win, best_actor_win,
best_actress_win, best_dir_win)

# bas object
lmmovies <- bas.lm(
formula = audience_score ~ ., data = dsmodeling, prior = "BIC", modelprior = uniform())

# pred.bas object
lmBPM = predict(lmmovies, estimator = "BPM", se.fit = TRUE)

##### Comparing the bas and pred.bas objects

### pred.bas object
# BPM model: its model #
vmodel = lmBPM$best
vmodel
# explanatory variable names
lmBPM$best.vars
# intercept & explanatory variable #s
lmBPM$bestmodel

### bas object
# intercept & explanatory variable #s for [vmodel]th model
# these match what I get from lmBPM$bestmodel
lmmovies$which[vmodel]

### get mean values of the coefficients for the [vmodel]th model
coefBPM <- coef(lmmovies)$conditionalmeans[vmodel,]
coefBPM <- as.data.frame(coefBPM)
colnames(coefBPM) <- "Posterior Mean Coefficient"
coefBPM

I noticed the same behavior in the example of this procedure given in the textbook for this course, the version last built on 2020-06-01, on pp. 191-192. On p. 191, the list of variables for the BPM is pulled. On p. 192, the coefficients are pulled. As an example of what I mean, the variable Pop has a non-zero coefficient, however, it is not in the list of coefficients. The variable M, on the other hand, is on the list of coefficients but has a zero coefficient.

Weights not found when using formula object for bas.lm

Describe the bug
coef() is unable to find the weights when using estimator "MPM"

To Reproduce
Steps to reproduce the behavior:

library(MASS)
library(BAS)

data(UScrime, package = "MASS")
UScrime[, -2] <- log(UScrime[, -2])

Crime_aic =step(lm(y~., data = UScrime),k=2, trace = FALSE)

formula(Crime_aic)

Crime_ZS <- bas.lm(formula(Crime_aic),
data = UScrime,
prior = "ZS-null",
modelprior = uniform())

coef(Crime_ZS, estimator = "MPM")

Desktop (please complete the following information):

  • OS: macOS Sierra

"Error in contrasts" when using bas.predict

Myself and some other students in the Coursera course on Bayesian Statistics that you helped teach are running into problems using bas.predict. When we try to use that function with the result of bas.lm along with new data, we are getting the following error:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

We've made sure that the new data is a dataframe with all of the required columns, including making sure that the factor levels in the new dataframe match the factor levels found in the dataframe fed to bas.lm. As such, every factor should have at least 2 levels.

We're all pretty much stuck on this, so if you could respond here I will relay the message to the rest of the class. Thank you!

non-conformable arrays error when confint is used on a BMA output of predict.bas

library(BAS)
set.seed(42)

d <- data.frame(
  likes.cats = c(rep("yes", 60), rep("no", 40)),
  likes.dogs = c(rep("yes", 30), rep("no", 70)),
  fish.eaten.per.year = c(rnorm(100, mean=20, sd=5)),
  donated.to.cats = c(rnorm(50, mean=50, sd=10), rep(0, 50))
)

m <- bas.lm(donated.to.cats ~ likes.cats + likes.dogs + fish.eaten.per.year,
            data=d, prior="BIC", modelprior=uniform())

nd <- data.frame("yes", "no", 100, 50)
colnames(nd) <- colnames(d)

dnd <- rbind(d, nd)

fit <- predict(m, dnd[101,], se.fit=TRUE)
confint(fit)

Output is:

Error in pred * sd : non-conformable arrays

This only happens when predict is used with estimator = "BMA", which is the default.

Warning messages in bas.lm

Noted the following warning message while using bas.lm. As discussed earlier today, you mentioned that potentially can ignore it.

Warning message:In model.matrix.default(mt, mf, contrasts) : non-list contrasts argument ignored

[Bug] NaN posterior inclusion probability

Hi Merlise,

Here's the link to the R file that makes BAS produce NaNs.

https://www.dropbox.com/s/girhufa20d3vsi1/basPosteriorInclusionProbZero.R?dl=0

Note that this analysis uses the JASP debug data set that was developed to make things crash:

https://www.dropbox.com/s/6cv1mi9zi3xylgi/debug.csv?dl=0

Upon further inspection I think that the NaNs are caused by (at least) one log marginal being undefined, which is due to the MLE not being well defined (because of an overflow problem??). I've also included the analysis using the BayesFactor package. Please let me know if I need to clarify something. Thanks for looking into this!

Cheers,
Alexander

always.include works differently in 1.5.2 (github version)

Hi Merlise,

I’ve been playing with the GitHub version of BAS (1.5.2) and compared this to the CRAN version (1.5.1). In both versions I added contcor1 to the null model:

“include.always=as.formula("contNormal ~ contcor1”)”

The CRAN version (1.5.1) gives me the following:

bas1 5 1

which I think is correct. When I apply the new image function (1.5.2) with drop.always.included=TRUE to the cranBasObject I get the following:

bas1 5 1drop

Only the intercept is removed, but not contcor1.

————————

On the other hand, the GitHub version (1.5.2) gives me:

bas1 5 2

which seems to be quite different, and note that contcor1 is not included in some of the models. The new feature drop.always.included=TRUE does remove contcor1, but not the intercept.

bas1 5 2drop

The estimates also differ. Do you know whether there’s something wrong with my installation? Or did I specify the include.always variables incorrectly? Here’s the R script:

https://www.dropbox.com/s/jjqlsm7u9qtyp94/basIncludeAlwaysGithubVersion.R?dl=0

https://www.dropbox.com/s/varuoy0pf2ud8a6/cranBasObj.RData?dl=0

Cheers,
Alexander

MCMC+BAS method errors

in bas.glm, method="MCMC+BAS" leads to an error in the C code when MCMC.iterations not provided

bestmodel documentation and use

A clear and concise description of what you would like to see added to BAS.
I think the bestmodel argument is not clearly explained if one wants to modify it, would it be possible to include an example of using it with something other than the default?

I understand it as being able to specify which model should come first but it's not clear how and I haven't been able to determine that.

eplogprob returns NaN

Describe the bug
When coefficients are highly significant and p-values equal 0.0, the function eplogprob returns NaN. This in turn causes bas.lm or bas.glm to fail if init.probs = "eplogp" is used.

To Reproduce
Steps to reproduce the behavior:

See unit test in test-eplogprob.R

loc <- system.file("testdata", package = "BAS")
load(paste0(loc, "/eplogp-testdata.Rdata"))
prob = eplogprob(lm(Y ~ ., data = df))
expect_true(sum(is.na(prob)) == 0)

Expected behavior
If p-values are less than Machine precision, return max probability as this is just used to sort variables internally and initialize sampling probabilities. Do not set init probs to 1 in this case.

predict.bas ignores type = link

in predict.basglm, type="response" is the default in the code, although documentation suggests that the default is to predict on the linear predictor scale.

documentation should reflect that predictions on the response scale are the default.

errors with intrinsic prior

REPREX

test_that("intrinsic prior for GLM", {
  data(Pima.tr, package = "MASS")
  expect_error( pima.BAS <- bas.glm(type ~ .,
                      data = Pima.tr, n.models = 2^7, method = "BAS",
                      betaprior = intrinsic(n = nrow(Pima.tr)), family = binomial(),
                      modelprior = uniform()
  ))
#  expect_equal(0, sum(pima.BAS$shrinkage > 1))
})

Error in bas.glm(type ~ ., data = Pima.tr, n.models = 2^7, method = "BAS", :
REAL() can only be applied to a 'numeric', not a 'integer'

phi1 function in R/cch.R

function should be vectorized, but currently wrong length passed in so only valid for scalar case.

Save messages sent by bas.lm

Hi Merlise,

Sometimes Bas.lm prints messages such as

"There are 3 degenerate sampling probabilities (0 or 1); decreasing the number of models to 16"

or returns a warning

“Warning message: In BAS::bas.lm(simpleFormula, data = d, alpha = 0.125316, prior = "JZS",  : log marginals and posterior probabilities contain NA's.  Consider using `pivot=TRUE` if there are models that are not full rank”

Would it be possible to save these messages in the object? We can then print these messages as a footnote in JASP.

Cheers,
Alexander

include.always errors with bas.glm

from test-bas-glm.R

test_that("include always", {
  data("Pima.tr", package="MASS")
  expect_error(bas.glm(type ~ .,
                        data = Pima.tr, method = "MCMC",
                        include.always = ~ bp,
                        betaprior = g.prior(g=100), family = binomial(),
                        modelprior = beta.binomial(1, 1))
  )
})

error:  object 'prob' not found 

Jeffreys prior with MCMC returns marginal prob > 1

rexpr: in test-bas-glm.R

test_that("Jeffreys & MCMC", {
 data(Pima.tr, package="MASS")
 pima_BAS <-  bas.glm(type ~ .,
                      data = Pima.tr, method = "MCMC",
                      betaprior = Jeffreys(),
                      family = binomial(),
                      modelprior = tr.beta.binomial(1, 1, 4))
 expect_equal(0, sum(pima_BAS$probne0 > 1))

Using JZS gives error

Error in BAS::bas.lm(kid_score ~ . , data = cognitive, prior = "JZS", modelprior = uniform()) :
INTEGER() can only be applied to a 'integer', not a 'NULL'

This doesn't seem to happen before. I tried adding argument "a=1" and "alpha=1" but the error still showed.

Create option that searches models only to reduce memory

Add option that does not save all coefficients and standard errors (other?) to reduce memory usage when enumerating with option BAS. i.e enumeration with p=26 require 50+ GigaBytes ram; enumeration with p=21 requires about 2 GB.

diagnostic plot error for bas.glm objects

When using MCMC, the length of the MCMC estimates of the posterior inclusion probabilities is the number of models. This leads to an error in the diagnostic plot function.

REPREX:

data(Pima.tr, package = "MASS")
pima.MCMC <- bas.glm(type ~ .,
    data = Pima.tr, MCMC.iterations = 2^10,
    method = "MCMC", betaprior = aic.prior(),
    family = binomial(),
    modelprior = tr.poisson(ncol(Pima.tr) - 1, 7)
  )
  expect_null(diagnostics(pima.MCMC, type = "pip"))

In tests/testthat/test-bas_glm.R

beta.prime prior not updating value of n in bas.glm

REPREX:

 data(Pima.tr, package = "MASS")
  pima.BAS <- bas.glm(type ~ .,
                      data = Pima.tr, n.models = 2^7, method = "BAS",
                      betaprior = beta.prime(n=200), family = binomial(),
                      modelprior = uniform()
  )   #OK

pima.BAS <- bas.glm(type ~ .,
                      data = Pima.tr, n.models = 2^7, method = "BAS",
                      betaprior = beta.prime(), family = binomial(),
                      modelprior = uniform()
# default n=NULL leads to error in C code 

Identified in tests/testthat/test-bayes_glm.R "beta.prime prior for GLM"

image function on the bas object.

Is it possible to raise a warning or a clearer error message to user when using the image function on a bas object which has only one model? Right now the error message is a pure error message from image saying dimensions of z are not length(x)(-1) times length(y)(-1)

BLAS/LAPACK routine `DSYRK` gave error code -7

Describe the bug
When knitting the attache Rmd Document I get this error.

To Reproduce
Steps to reproduce the behavior:

See attached file (remove .txt extension and run on RStudio)
test_BAS.Rmd.txt

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • MacOS Mojave

Additional context
test in versions from both CRAN and github repo, locally installed

shrinkage greater than 1 with 'JZS' prior

shrinkage values returned using the Jeffreys-Zellner-Siow prior are greater than one in the following reproducible example:

test_that("JZS: shrinkage is less than or equal to 1", {
  data(Hald)
  hald.gprior = bas.lm(Y ~ ., prior="JZS", modelprior=uniform(), data=Hald)
  expect_equal(0, sum(hald.gprior$shrinkage > 1))
})

This breaks predicted/fitted values

predict.basglm error with newdata when se.fit = TRUE

When trying to generate standard errors about fitted values using predict.basglm (i.e., se.fit = TRUE), the function issues an error when data are supplied to newdata:

Error in array(STATS, dims[perm]) :
'data' must be of a vector type, was 'NULL'
In addition: Warning message:
In max(cumDim[cumDim <= lstats]) :
no non-missing arguments to max; returning -Inf

Example which reproduces the behavior:

data(Pima.tr, package="MASS")
data(Pima.te, package="MASS")
Pima.bas = bas.glm(type ~ ., data=Pima.tr, n.models= 2^7, method="BAS",
           betaprior=CCH(a=1, b=nrow(Pima.tr)/2, s=0), family=binomial(),
           modelprior=uniform())
pred = predict(Pima.bas, newdata=Pima.te, se.fit = TRUE, top=1)  # Highest Probability model

Desktop (please complete the following information):

  • OS: Windows x86_64-w64-mingw32/x64 (64-bit)
  • R Version 4.0.2
  • RStudio 1.1.463

[Enhancement] Remove the include.always variable from the image(obj) function

Hi Merlise,

Would it be possible to remove the variables that are "include.always" from the plotting output:

  • plot(obj) [in particular, the Inclusion Probabilities plot]
  • image(obj)
    I think that this might be sensible. By removing the include.always variables, there's more space to display the effects of the variables that are of interest, and it might save some space for memory as well.

Cheers,
Alexander

CRAN check fails on solaris for version 1.5.2

Describe the bug
Test

To Reproduce
Steps to reproduce the behavior:

test_that("herdity and BAS", { set.seed(2) data(Pima.tr, package="MASS") pima_BAS <- bas.glm(type ~ (bp + glu + npreg)^2, data = Pima.tr, method = "BAS", betaprior = bic.prior(), family = binomial(), update=NULL, modelprior =uniform(), force.heredity=TRUE) pima_BAS_no <- bas.glm(type ~ (bp + glu + npreg)^2, data = Pima.tr, method = "BAS", betaprior = bic.prior(), family = binomial(), update=NULL, modelprior =uniform(), force.heredity=FALSE) pima_BAS_no <- force.heredity.bas(pima_BAS_no) expect_equal(0L, sum(pima_BAS$probne0 > 1.0)) expect_equal(0L, sum(pima_BAS_no$probne0[-1] > 1.0)) expect_equal(pima_BAS$probne0, pima_BAS_no$probne0) expect_equal(pima_BAS$n.models, pima_BAS_no$n.models) expect_equal(0L, sum(duplicated(pima_BAS$which))) })

Number of models returned by bas.lm with force.heredity = TRUE is greater than the number of models under post processing.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • Solaris on hub
  • R Version R-devel and R 3.5.2

Additional context
Add any other context about the problem here.

Depends on random seed; also impacts bas.lm

Unable to calculate Laplace Approximation with hypergeometric2F1()

In general, I can use the hypergeometric function as expected. However, when I set method = "Laplace" R always returns NaN. Was this ever an issue in development? Do you have any suggestions on correcting this?

> hypergeometric2F1(12, 1, 2, 0.65)
[1] 9.580921
> hypergeometric2F1(12, 1, 2, 0.65, method = "Laplace")
[1] NaN

Also, when installing from source, I also get the following warning, and I'm not sure if this has any effect on my issue.

bayesreg.c:520:9: warning: variable 'ghat' is uninitialized when used here [-Wuninitialized]
   if ( ghat < 0) Rprintf("ERROR: In Laplace approximation to 2F1");
        ^~~~
bayesreg.c:477:15: note: initialize the variable 'ghat' to silence this warning
   double ghat,logint,sigmahat, D, A, B, C;
              ^
               = 0.0

To include graph parameters for image.bas function

Noted that the graph parameters (e.g., for inner margin etc.) are hardcoded in image.bas function. Think would be great if can include them in the function so that user inputs can overwrite function default parameters. This would be useful for cases where the parameter names are very long to be fully shown under the current set up.

Details here:

BAS:::image.bas

if (rotate) {
        par(mar = c(6, 6, 3, 5) + 0.1)

...

else {
        par(mar = c(6, 8, 6, 2) + 0.1)
....

Thanks!!

Errors in prior = "ZS-null" when R2 is not finite

When a model is not full rank, and hence R2 is not defined using model fitting without pivoting, calculation of the log marginals under the ZS prior lead to errors, rather than returning NA's and a warning.

May be a problem with other priors as well, such as the hyper-g family

phi1 function returns NA or Inf for large values of x

Describe the bug
For large values of the Wald statistic (typical with large n) the current implementation of the Phi1 function in BAS returns NA or Inf
due to overflow in the exponential function. This impacts Bayes factor calculations with several prior distributions
in bas.glm, such as the hyper-g/n

To Reproduce
Steps to reproduce the behavior:

BAS::phi1(1, 2, 1.5, 1000, 1/100)

Reported by Daniel Heeman

coerce argument of g.prior to be real

rexpr:

test_that("g prior for GLM", {
  data(Pima.tr, package = "MASS")
  pima.BAS <- bas.glm(type ~ .,
                      data = Pima.tr, n.models = 2^7, method = "BAS",
                      betaprior = g.prior(g = as.numeric(nrow(Pima.tr))), family = binomial(),
                      modelprior = uniform()
  )
  expect_equal(0, sum(pima.BAS$shrinkage > 1))
})

Error in BAS:::coef.bas(..., estimator = "MPM"): Error in eval(object$call$weights): object 'object' not found

Describe the bug
Calling BAS:::coef.bas(basobj, estimator = "MPM") may error depending on how basobj was created.

To Reproduce
The first example works as intended. The second example changes the seemingly harmless formula = M ~ So + Ed + Po1 + Po2 into formula = form where form is defined earlier form = M ~ So + Ed + Po1 + Po2.

data(UScrime, package = "MASS")
UScrime <- UScrime[, 1:5]

crime.bic <- BAS::bas.lm(
  formula = M ~ So + Ed + Po1 + Po2,
  data = UScrime,
  prior = "JZS",
  initprobs = c(1, 0.5, 0.5, 0.5, 0.5),
  renormalize = TRUE
)
BAS:::coef.bas(crime.bic, estimator = "MPM")
#> 
#>  Marginal Posterior Summaries of Coefficients: 
#> 
#>  Using  MPM 
#> 
#>  Based on the top  1 models 
#>            post mean  post SD    post p(B != 0)
#> Intercept  138.57447    1.40051    1.00000     
#> So          11.53684    3.13294    1.00000     
#> Ed           0.00000    0.00000    0.00000     
#> Po1          0.00000    0.00000    0.00000     
#> Po2         -0.14815    0.05367    1.00000

form <- M ~ So + Ed + Po1 + Po2
crime.bic2 <- BAS::bas.lm(
  formula = form,
  data = UScrime,
  prior = "JZS",
  initprobs = c(1, 0.5, 0.5, 0.5, 0.5),
  renormalize = TRUE
)
BAS:::coef.bas(crime.bic2, estimator = "MPM")
#> Error in eval(object$call$weights): object 'object' not found

Created on 2022-02-02 by the reprex package (v2.0.1)

What goes wrong, is that this code

BAS/R/coefficients.R

Lines 87 to 99 in f92fc3e

object <- bas.lm(
eval(object$call$formula),
data = eval(object$call$data),
weights = eval(object$call$weights),
n.models = 1,
alpha = object$g,
initprobs = object$probne0,
prior = object$prior,
modelprior = object$modelprior,
update = NULL,
bestmodel = models,
prob.local = 0.0
)

is doing things with eval. However, since R is lazy, the evals are not evaluated until we get to

BAS/R/bas_lm.R

Line 514 in f92fc3e

mf <- eval(mf, parent.frame())

where R tries to do eval(expression(stats::model.frame(formula = eval(object$call$formula), data = eval(object$call$data), weights = eval(object$call$weights), drop.unused.levels = TRUE)), parent.frame()) but this fails. I don't really understand why this fails though, as env <- parent.frame(); env$object during debugging does return the objects that R cannot find.

Expected behavior
The same output as in the first version.

Desktop:

  • OS: Ubuntu 20.04
  • R Version 4.1.2

[Enhancement] Specific flat priors for the always include variables

Hi Merlise,

Do you think it's sensible to have a flat priors on the always include variables as we discussed in Edinburgh? I think that this makes sense and I believe that this boils down to a simple numerical integral with a change of degrees of freedom. I'll look into the formulas a bit more, once I'm done with some other work.

Cheers,
Alexander

Error in bas.lm: $ operator is invalid for atomic vectors

Describe the bug
function "bas.lm" can't work with error message "Error: $ operator is invalid for atomic vectors".

To Reproduce

`x<- mvtnorm::rmvnorm(100, mean=rep(0,9), sigma=diag(9))
b <- c(2,2,2,0.2,0.2,0.2,0,0,0)
y <- x%*%b + rnorm(100, mean=0, sd=1)
test <- cbind(x,y)
test <- as.data.frame(test)

bas.out<-bas.lm(V10~V1+V2+V3+V4+V5+V6+V7+V8+V9, data=test, method="MCMC", MCMC.iterations=20000, prior="hyper-g", modelprior = "uniform", pivot = FALSE)`

Expected behavior
Have the function "bas.lm" run with clear instruction on the input data format.

Screenshots
Capture

Desktop:

  • OS: Windows 10 Pro
  • R Version 4.0.4

Add function to extract bas objects for single models

** feature request**
Add a function to return a BAS object for a single model (HPM, MPM, BPM or others) given a formula or a vector for the model that can be passed to other summaries.

Is your feature request related to a problem? Please describe.

related to #49 where coef.bas() does not yet return coefficients for the best predictive model.

See also solution in StatsWithR/book#80

include hereditary constraints

Add option for factors and interactions so that

  1. all levels of a factor enter or leave the model together
  2. all lower order terms are included whenever a higher order term is added to the model for i.e. for a model formula with factor A having J levels and continuous variable B: Y ~ A*B
    there are 4 models Int, Int + A, Int + B, Int + A + B + A:B
    Should work with higher order interactions in factorial models.

unable to install on mac for R version 3.1.

When I try to install it from github using the command,

devtools::install_github('merliseclyde/BAS')

I get the follwing error

make: gfortran-4.2: No such file or directory

what do I do for this? (I have gfortran 4.8 installed on my system)

NAs must be omitted for predict function to work

When building a BMA regression model and using the predict function (i.e. predict.bas) there can be no NAs or NULLs in new data set for prediction. If there are NAs, prediction returns
numeric(0)
attr(,"best")
integer(0)

Currently, recommend using complete.cases() to remove any NAs before prediction.

Hopefully, function can be made more robust by either calculating values for all partially NA rows or excluding NAs entirely with a warning issued.

What is the default number of iterations for 'MCMC' method

@merliseclyde, thank you for this package.

I have a question. In the docs, for the function bas.lm, it says, the default value for

`MCMC.iterations = n.models * 10` 

if not set by user. I was looking at the source for this function, at this line, it assigns the value of n.models*2 if MCMC.iterations is not specified by the user. Can you please clarify this?

Odd behavior Beta-binomial model prior with include.always

Describe the bug

This is more a theoretical issue rather than a programming bug. Prior model probabilities of a beta-binomial model prior are surprising and perhaps incorrect when always including some predictors.

To Reproduce

A small example below:

rm(list = ls())
library(BAS)

getModelName <- function(x, params) {
  x <- x[-1L]
  if (length(x))
    return(paste(params[x], collapse = " + "))
  else return("Null model")
}

getPriorProbsPerModel <- function(basObj) {
  params <- basObj$namesx[-1L]
  paramsInModels <- sapply(basObj$which, getModelName, params = params)
  out <- data.frame(model = paramsInModels, priorProb = basObj$priorprobs)
  return(out[order(lengths(basObj$which)), ])
}

dat <- read.table("https://raw.githubusercontent.com/jasp-stats/jasp-desktop/development/Resources/Data%20Sets/Big%205%20(Dolan%2C%20Oort%2C%20Stoel%20%26%20Wicherts%2C%202009).csv", header = TRUE)
head(dat)

modelprior <- beta.binomial(1.0, 1.0)

# constraint via include.always
fit1 <- bas.lm(Neuroticism ~ Extraversion + Openness + Agreeableness, data = dat, modelprior = modelprior, 
               include.always = ~ 1 + Agreeableness)

# no constraint
fit2 <- bas.lm(Neuroticism ~ Extraversion + Openness + Agreeableness, data = dat, modelprior = modelprior)

which gives the following prior model probabilities:

getPriorProbsPerModel(fit1) # constrained model, always includes Agreeableness
                                    model  priorProb
1                           Agreeableness 0.0833 # <- Least complex model, low prob
3                Openness + Agreeableness 0.0833
4            Extraversion + Agreeableness 0.0833
2 Extraversion + Openness + Agreeableness 0.25   # <- most complex model, highest prob
getPriorProbsPerModel(fit2) # unconstrained model, as expected
                                    model  priorProb
1                              Null model 0.25
3                                Openness 0.0833
5                            Extraversion 0.0833
7                           Agreeableness 0.0833
2                Openness + Agreeableness 0.0833
4                 Extraversion + Openness 0.0833
6            Extraversion + Agreeableness 0.0833
8 Extraversion + Openness + Agreeableness 0.25

So it seems that BAS computes the prior probabilities as if there is no constraint and then assigns 0 prior probability to models that do not include Agreeableness. If the prior model probabilities of the constrained model are normalized we get:

tb <- getPriorProbsPerModel(fit1)
tb[, 2] <- tb[, 2] / sum(tb[, 2])
tb
                                    model priorProb
1                           Agreeableness 0.1666667
3                Openness + Agreeableness 0.1666667
4            Extraversion + Agreeableness 0.1666667
2 Extraversion + Openness + Agreeableness 0.5000000 # <- favors the most complex model

However, this may lead to biased inference as there is a strong prior preference for the most complex model!

Expected behavior

I'd expect these prior model probabilties:

tb2 <- getPriorProbsPerModel(fit1)
tb2[, 2] <- (extraDistr::dbbinom(0:2, 2, 1, 1) / c(choose(2, 0:2)))[c(1, 2, 2, 1)]
tb2
                                    model priorProb
1                           Agreeableness 0.333
3                Openness + Agreeableness 0.167
4            Extraversion + Agreeableness 0.167
2 Extraversion + Openness + Agreeableness 0.333

Effectively, because one predictor included in all models, I would expect that the prior model probabilities are computed as if the model space contained one predictor less.

This example shows what I would expect when always including one predictor but naturally this generalizes to always including l out of k predictors.

Desktop:

  • OS: OSX, Windows, Ubuntu 18.04
  • R Versions 3.5.2 and 3.5.3

If anything is unclear, please let me know!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.