For details, I refer to <a href="https://discourse.mc-stan.org/t/auc

I'll check the difference between varsel and <code cl

Point 0 is not wrong, is just has a different interpretation when you do LOO-CV.

Incorrectly computed AUC for cv_varsel,about stan-dev/projpred

Comments (25)

dmenne commented on June 4, 2024 1

Ok, that's the example you promised above, it should do.

from projpred.

AlejandroCatalina commented on June 4, 2024

I'll check the difference between varsel and cv_varsel. Thanks for the feedback.

from projpred.

dmenne commented on June 4, 2024

Any progress here? A re-submitted paper is on hold.

from projpred.

AlejandroCatalina commented on June 4, 2024

Could you share your fit object so I play with this? I checked the code and on paper the same code runs for both objects. There is one difference regarding the baseline used for the plot that I believe it's causing this difference but I would like to check with an actual object. I tested a toy classification problem locally and I got the same plot for both cv_varsel and varsel so I could not reproduce this bug.

from projpred.

dmenne commented on June 4, 2024

The data are proprietary, so I had to do some obfuscation and will send these to you via email. I have been able to reproduce the plot problem with a simpler example using stan_glm which anyway is the right model to use

library(bayesplot)
library(projpred)
library(rstan)
options(mc.cores = 2)

dtx = readRDS("dtx.rds") # These data will be sent by private mail

# This is the model I should have used, because there are not repeats per subject
# However, it fails to honour the structure, see
# https://discourse.mc-stan.org/t/projpred-interactions-without-marginal-terms/14893/11
fit_glm = stan_glm(r ~ (pre + bmi + age)*group,
                        family = binomial('logit'),
                        data = dtx, chains = 2,
                        iter = 10000, seed = 1123)
vs = varsel(fit_glm, verbose = FALSE)
vs_cv = cv_varsel(fit_glm, verbose = FALSE)

# First assumption is that
plot(vs, stats=c("auc", "acc"))
plot(vs_cv, stats=c("auc", "acc"))
# The acc plots look ok. One could argue that only the point at 0 and should
# be 0.5, but to me it looks like the other point are also incorrectly scaled.

####### I would prefer when the above would work correctly by honouring
####### the formula structure, and the auc would be correct.

# The following are just trial-and-error attemtps with many
# singular cases.

# Because model structure is not taken into account with stan_glm,
# I used a dummy (1|id) and stan_glmer.
# In glmer/lme4, this would fail with good reasons.

if (FALSE){
  fit_r = stan_glmer(r ~ (pre + bmi +age)*group + (1|id),
                          family = binomial('logit'),
                          data = dtx, chains = 2,
                          iter = 10000, seed = 1123)
  vs = varsel(fit_r, verbose = FALSE)
  # This gives warnings, but the result looks right and takes
  # into
  summary(vs)
  plot(vs, stats="auc")

  vs_cv = cv_varsel(fit_r, verbose = FALSE)
  plot(vs_cv, stats="auc")
}

Correct for `varsel`

Incorrect scaling of auc for `cv_varsel` (probably ok for acc)

from projpred.

AlejandroCatalina commented on June 4, 2024

Thanks, I’ll look into this to see what happens. Best, Alejandro

…

On 15 Mar 2021, 12:23 PM +0200, Dieter Menne ***@***.***>, wrote: The data are proprietary, so I had to do some obfuscation and will send these to you via email. I have been able to reproduce the plot problem with a simpler example using stan_glm which anyway is the right model to use library(bayesplot) library(projpred) library(rstan) options(mc.cores = 2) dtx = readRDS("dtx.rds") # These data will be sent by private mail # This is the model I should have used, because there are not repeats per subject # However, it fails to honour the structure, see # https://discourse.mc-stan.org/t/projpred-interactions-without-marginal-terms/14893/11 fit_glm = stan_glm(r ~ (pre + bmi + age)*group, family = binomial('logit'), data = dtx, chains = 2, iter = 10000, seed = 1123) vs = varsel(fit_glm, verbose = FALSE) vs_cv = cv_varsel(fit_glm, verbose = FALSE) # First assumption is that plot(vs, stats=c("auc", "acc")) plot(vs_cv, stats=c("auc", "acc")) # The acc plots look ok. One could argue that only the point at 0 and should # be 0.5, but to me it looks like the other point are also incorrectly scaled. ####### I would prefer when the above would work correctly by honouring ####### the formula structure, and the auc would be correct. # The following are just trial-and-error attemtps with many # singular cases. # Because model structure is not taken into account with stan_glm, # I used a dummy (1|id) and stan_glmer. # In glmer/lme4, this would fail with good reasons. if (FALSE){ fit_r = stan_glmer(r ~ (pre + bmi +age)*group + (1|id), family = binomial('logit'), data = dtx, chains = 2, iter = 10000, seed = 1123) vs = varsel(fit_r, verbose = FALSE) # This gives warnings, but the result looks right and takes # into summary(vs) plot(vs, stats="auc") vs_cv = cv_varsel(fit_r, verbose = FALSE) plot(vs_cv, stats="auc") } Correct for varsel Incorrect scaling of auc for cv_varsel (probably ok for acc) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

from projpred.

dmenne commented on June 4, 2024

I believe that the pure prior value a 0 should not be fixed, but also be simulated and have error bars.

from projpred.

AlejandroCatalina commented on June 4, 2024

I've had a bit of time to inspect this.

I checked that indeed the AUC plot is different in cv_varsel compared to varsel. Per my investigation, this is expected when you think about it because varsel predicts a constant probability for all data points, which results in the desired 0.5 AUC. However, when using LOOCV for projpred, the prediction of each data point i is made with a model where the ith observation is removed, and therefore the intercept is slightly different. This has a big impact in the AUC computation because we need to “plot” the ROC curve and compute the area under it by ordering the predictions according to decreasing probability. Since the intercept is slightly larger for models where we have removed a data point of class 1, this ordering results in all data points of class 0 appearing first in the ordered predictions vector. The ROC for this data is just 0 until the FPR gets to 1, and then it shoots up to 1. It just happens that the AUC for this ROC is 0. This makes sense to me, but it’s also true that you can simply reverse the sign of the predictions (or say the probability is equal to 1 - current prediction) to have the opposite behaviour, now all data points of class 1 appear first as true positives, and therefore the AUC under this model would be 1 (which does not mean it’s good as the accuracy is bad anyway because the prediction is almost constant).

This is a well known limitation of CV in extreme cases corresponding to non-smooth utility functions (AUC for instance). A reference on this issue is Celisse & Arlot (2010) [5.2.1].

This came up earlier during projpred development and the development team decided against doing anything in particular to address this, so I'm closing the issue and leaving this answer as explanation. We will probably document these cases better.

from projpred.

dmenne commented on June 4, 2024

This should be very carefully documented, because it will cause confusion with terms commonly used in binary decision, and a "reviewer hardened" reference without too much math must be available. AUC of 0 is considered perfect information in the community.

However, would you agree with the following:

Point at 0 is degenerate. So why not leave it out totally, and document it? Showing a degenerate case does not sound reasonable to me.
Are the other points comparable? To explain to end-users with clinical background only, is varsel in-sample and cv_varsel out-of-sample?
Is there any chance that https://discourse.mc-stan.org/t/projpred-interactions-without-marginal-terms/14893/11 could be resolved, i.e. varsel(stan_glm) follow the behavior of stan_glmer?

As an aside: it would be nice if plot.varsel were a real ggplot object. I expected that on could scale axis etc with ggplot tools, but have not been successful.

from projpred.

AlejandroCatalina commented on June 4, 2024

Point 0 is not wrong, is just has a different interpretation when you do LOO-CV. This behaviour is different if you use KFold-CV or stratified Leave-Two-Out. We have not implemente LTO-CV but you can play with KFold.
You will get AUC = 1 if you reverse the predictions, of course, but that does not indicate perfect predictions anyway, because the same LOO-CV effect applies to the reverse.
Every submodel is correctly computed, just be careful with the intercept-only model (point 0). As you correctly point out, varsel evaluates in-sample whereas cv_varsel shows out-of-sample performance.
You can pass a custom search_terms list to varsel to enforce that interactions are only added after evaluating the main effects.
As far as I know, plot returns a ggplot2 object?

from projpred.

dmenne commented on June 4, 2024

K-fold works - I know that the prediction is bad, but that's the important part I want to document. I am a bit surprised that 0 variables are almost as good as all variables (0.4 being equivalent to 0.6 in decision terms). Larger error bars would have made this palatable.

K-fold gives many warnings which look more like a programming problem than some convergence warnings:

1: In cbind(y, mu, weights) :
  number of rows of result is not a multiple of vector length (arg 1)
2: In cbind(y, mu.bs, weights) :
  number of rows of result is not a multiple of vector length (arg 1)
3: In cbind(y, mu, weights) :
  number of rows of result is not a multiple of vector length (arg 3)

An example using search_terms would be helpful, it is unclear how to use it. After reading @avehtari 's comment on https://discourse.mc-stan.org/t/projpred-interactions-without-marginal-terms/14893/11?u=denne, I thought that this was simply not implemented.

About ggplot2: I will check more carefully and make a separate post of it.

from projpred.

AlejandroCatalina commented on June 4, 2024

I would not say that 0 variables are almost equal to all variables. AUC is just one metric and it doesn’t completely specify a model. But I guess you are the one to measure that as the end user! Search_terms is not heavily used and therefore still quite experimental, which explains why Aki may not have suggested that. I will write an example tomorrow for your model. Regarding kfold warnings, you may want to switch to develop branch as I run your model for Kfold and it run fine with maybe unreliable results but no warnings. I’ll check if master is having issues with kfold or if you run a particularly troublesome kfold setting. Best, Alejandro

…

On 15 Mar 2021, 19:03 +0200, Dieter Menne ***@***.***>, wrote: • K-fold works - I know that the prediction is bad, but that's the important part I want to document. I am a bit surprised that 0 variables are almost as good as all variables (0.4 being equivalent to 0.6 in decision terms). Larger error bars would have made this palatable. • K-fold gives many warnings which look more like a programming problem than some convergence warnings: 1: In cbind(y, mu, weights) : number of rows of result is not a multiple of vector length (arg 1) 2: In cbind(y, mu.bs, weights) : number of rows of result is not a multiple of vector length (arg 1) 3: In cbind(y, mu, weights) : number of rows of result is not a multiple of vector length (arg 3) • An example using search_terms would be helpful, it is unclear how to use it. After reading @avehctari comment on https://discourse.mc-stan.org/t/projpred-interactions-without-marginal-terms/14893/11?u=denne, I thought that this was simply not implemented. About ggplot2: I will check more carefully and make a separate post of it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

from projpred.

dmenne commented on June 4, 2024

Develop branch give only one presumable uncritical warning:
Quick-TRANSfer stage steps exceeded maximum (= 500000). Some explanation what this means would be nice.

Stangely, the result looks much better than k-fold in the CRAN/master version:

from projpred.

dmenne commented on June 4, 2024

Sorry for the comment about ggplot2.
Typical error: I had used %>% instead of + to add ylim(0,1). I know that hadley has regretted that inconsistency many times, but such is history.

from projpred.

AlejandroCatalina commented on June 4, 2024

It happens that we have increased the number of posterior draws we fit in develop branch to 400, which is a lot more than in master and that gives much better stability. I have not faced that warning before though, it might be related to your fit object having many posterior draws and consuming a lot of memory in some operations, I can check. Best, Alejandro

…

On 15 Mar 2021, 19:25 +0200, Dieter Menne ***@***.***>, wrote: Develop branch give only one presumable uncritical warning: Quick-TRANSfer stage steps exceeded maximum (= 500000). Some explanation what this means would be nice. Stangely, the result looks much better than k-fold in the CRAN/master version: — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

from projpred.

dmenne commented on June 4, 2024

Looks like this is a warning in classical stats/kmeans
https://stackoverflow.com/questions/21382681/kmeans-quick-transfer-stage-steps-exceeded-maximum

from projpred.

AlejandroCatalina commented on June 4, 2024

Makes sense, when trying to cluster a lot of posterior draws, should not be concerning I think. Best, Alejandro

…

On 15 Mar 2021, 19:35 +0200, Dieter Menne ***@***.***>, wrote: Looks like this is a warning in classical stats/kmeans https://stackoverflow.com/questions/21382681/kmeans-quick-transfer-stage-steps-exceeded-maximum — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

from projpred.

dmenne commented on June 4, 2024

Maybe add this to the docs (please check if it works for you, it could also be some problem with different random generatords):

vs_cv = cv_varsel(fit_glm, verbose = FALSE, cv_method="kfold",   algorithm="Lloyd")

from projpred.

dmenne commented on June 4, 2024

The default setting produce quite variable results on multiple runs. I will have to check how to make results more stable.

Looking ahead to searchterms`.

Or maybe make stan_glm behave like stan_glmer? That would be far more consistent.

from projpred.

AlejandroCatalina commented on June 4, 2024

We can make projpred internally adjust the search space for specific terms like interactions in this case, but glms are very fundamentally different to glmms so I'm not exactly sure what you mean by make glm behave like glmer?

from projpred.

avehtari commented on June 4, 2024

Stangely, the result looks much better than k-fold in the CRAN/master version:

Thanks for reporting this. It's good to know that the new defaults have visible effect also for others than just for me :)

This should be very carefully documented, because it will cause confusion with terms commonly used in binary decision

As Alejandro mentioned we had seen this case, but as it was artificially created case and no-one had reported seeing this, and there was no easy fix, we did nothing. In all these years, you are the first one to see this outside of simulated examples. Now re-thinking this, we could add warnings when ACC or AUC are clearly below random chance and mention the inherent unstability of those utilities.

from projpred.

dmenne commented on June 4, 2024

you mean by make glm behave like glmer

What Bill Venables hammered into our brains when we all were younger: using interactions before main effects is justified only in few cases, and doing this in a some forward/backward way jumps between incompatible models.
stepAIC at that time did and still does it right, whatever you think about AIC as a criterion.

That's not different between glm and glmer (or their stan-versions), even if the algorithms are different. My surrogate (1|id) when there is not real id-repeat is painful to me.

from projpred.

AlejandroCatalina commented on June 4, 2024

That for sure, I can think of something to automate that in glms. Best, Alejandro

…

On 15 Mar 2021, 20:16 +0200, Dieter Menne ***@***.***>, wrote: > > you mean by make glm behave like glmer What Bill Venables hammered into our brains when we all were younger: using interactions before main effects is justified only in few cases, and doing this in a some forward/backward way jumps between incompatible models. stepAIC at that time did and does it right, whatever you think about AIC as a criterion. That's not different between glm and glmer (or their stan-versions), even if the algorithms are different. My surrogate (1|id) when there is not real id-repeat is painful to me. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

from projpred.

dmenne commented on June 4, 2024

Any progress in glm honoring model structure as glmer does? Reviewers are waiting for a re-submission.

from projpred.

AlejandroCatalina commented on June 4, 2024

Not on our side. You can achieve this behaviour by manually setting the search_terms argument as follows.

For every interaction x:y you include in the model, you add the following terms to search_terms <- c(..., "x", "y", "x + y + x:y", ...), so that x:y is only considered after including both x and y. Note that search_tems has to explicitly include the intercept 1, so a full example would be search_terms <- c("1", "x", "y", "x + y + x:y").

We are working on several projects at the moment improving projpred in several directions and I'm afraid we won't have time to implement this as a whole feature in the short term.

from projpred.

Incorrectly computed AUC for cv_varsel about projpred HOT 25 CLOSED

Comments (25)

Correct for `varsel`

Incorrect scaling of auc for `cv_varsel` (probably ok for acc)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent