Comments (8)
Thanks for your code, I have run the experiment with different ICs.
First of all, for this question:
"I thought I could perhaps simulate abess using MBIC by specifying "aic" as tune.type but using ic.scale = (1/2)log(n(p^2)/16) so that 2 (the penalty factor implied by AIC multiplied by ic.scale would return the penalty implied by MBIC, but that didn't seem to work. "
This idea is almost correct, but ic.scale
won't be used when tune.type
is "aic" because we think ic.scale
of "aic" must be a constant so that nobody needs to modify it. We can specify "bic" as tune.type
but using ic.scale
= log(n*(p^2)/16) / log(n) to implement MBIC.
The results are the num of true positive variables and the size of support-set considered by the algorithm under 5 ICs and subsetted/original dim settings.
TP/ best.size | p=76 | p=1000000 |
---|---|---|
aic | 41/76 | 41/76 |
bic | 41/76 | 41/76 |
gic | 41/76 | 34/34 |
ebic | 38/43 | 33/33 |
mbic | 41/76 | 33/33 |
from abess.
Let's take '41/76' as an example to explain the result. Its corresponding confusion matrix is
estimated/true | FALSE | TRUE |
---|---|---|
FALSE | 999915 | 9 |
TRUE | 35 | 41 |
'76' refers that the algorithm believes that the size of support-set is best at 76. The fact that best.size
equals the total num of variables implies that the penalty of IC is too little to select correct variables.
The results above implies that using true dim (p=1e6) can increase the penalty so that improve the precision except for AIC and BIC.
Maybe slightly counterintuitive that ic.scale would work on all IC except AIC - would it not be more logical to apply them to all? E.g. so that if ic.scale would be set at 1/2 and tune.type="aic" the penalisation would be half as strong as implied by AIC?
Thanks for your suggestion, we will align the behavior of AIC with other ICs soon.
from abess.
@tomwenseleers thanks for this question. We are working on it now. I not pretty sure what is MBIC you mentioned?Can you provide any reference?
from abess.
Many thanks for that! The specific version of mBIC I was using was cited in Frommlet & Nuel (2016) & I was using their choice of c=4, which is actually a hyperparameter:
https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0148620&type=printable
But it's a bit confusing since there is several versions of mBIC available:
https://onlinelibrary.wiley.com/doi/epdf/10.1002/qre.936
https://link.springer.com/chapter/10.1007/978-3-642-29210-1_39
And there is still a whole zoo of other IC that have been suggested, e.g. (hope I am getting these formulae right)
hq = min2LL + c*log(log(n))*edf : the Hannan and Quinnn information criterion
ric = min2LL + 2 * log(p) # risk inflation criterion
mric = min2LL + 2 * sum(log(p/(1:edf))) # modified risk inflation criterion
cic = min2LL + 4 * sum(log(p/(1:edf))) # covariance inflation criterion
bicg = min2LL + log(n)*edf + 2*g*lchoose(p,round(edf)) # g =1 suggested as default
# (https://www.proquest.com/openview/918b8b1efc7e0a0aa4d565ed54fa37dd/1?cbl=18750&diss=y&pq-origsite=gscholar)
bicq = min2LL + log(n)*edf - 2*edf*log(q/(1-q))
# (see Xu, C. and McLeod, A.I. (2009). Bayesian Information Criterion with Bernouilli Prior. and https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=cb9a4547704f6a401116e04263e0445feab10cba)
For aic, bic, gic, ebic & mbic I was using
aic = min2LL + 2 * edf # Akaike information criterion
# aicc = ifelse(n > edf, min2LL + 2*edf*n/(n-edf), NA) # small sample AIC
bic = min2LL + log(n) * edf # Bayesian information criterion
gic = min2LL + log(p) * log(log(n)) * edf # generalized information criterion GIC = SIC in https://www.pnas.org/doi/10.1073/pnas.2014241117
ebic = min2LL + (log(n) + 2 * (1 - log(n) / (2 * log(p))) * log(p)) * edf # extended BIC, Chen, J. and Chen, Z. (2008). Extended Bayesian information criterion for model selection with large model space. Biometrika, 94, 759-771., https://arxiv.org/abs/1107.2502 (note original still has an additional tuning parameter)
mbic = min2LL + log(n * (p ^ 2) / 16) * edf # see Frommlet & Nuel 2016
For gic several versions have also been suggested though, so gic as a name is in fact a little ambiguous.
No idea though which ones are now in general recommended to achieve either optimal predictive performance or optimal variable selection consistency for either the n>p or p>n setting... Myself I use for n > p AIC & BIC when I am interested in optimal predictive performance (as optimising AIC asymptotically is equivalent to minimising leave one out cross validation error, Stone M. (1977) An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. Journal of the Royal Statistical Society Series B. 39, 44–7) and optimal variable selection consistency (Shao J. (1997) An asymptotic theory for linear model selection. Statistica Sinica 7, 221-242.), respectively, and for p > n I was using mBIC for variable selection (with c=4, following Frommlet & Nuel) - but not sure what's best to get optimal predictive performance when p >> n. What IC do you find generally perform best for either purpose? Some of the IC also have a hyperparameter related to the actual nr of variables that you think have nonzero coefficients, which also makes sense.
Maybe an easy way to support any of these IC could be to allow for some argument ic.factor
, which the user could set in function of the true p & n of the original problem him/herself & any other hyperparrameters. So that if you would pass 2 or log(n) or log(n * (p ^ 2) / 16) that it would correspond to AIC, BIC or mBIC etc? Maybe that could be used instead of argument ic.scale
, which I find a little ambiguous in terms of what exactly it does. So that would address both the support of other alternative IC as well as allowing passing the correct penalization in case the variables have already been subsetting via another method.
Note that the leave one out cross validation error can also be calculated analytically from the residuals and the diagonal of the hat matrix, without having to actually carry out any cross validation (see also https://www.efavdb.com/leave-one-out-cross-validation). This would always be an alternative to the AIC, as that is only an asymptotic approximation of the LOOCV error. I suppose that would also work for generalized linear models if one works on the adjusted z scale of the GLM and if one uses the working observation weights.
from abess.
Ha many thanks - great! So what are the numbers after the slash? The nr of true positives is the nr before the slash, but what is the second? Is 76 not always the set considered by the algorithm, given that that's the nr of MCP preselected variables? Or are some variables kicked out from the very start by the algo, with that nr being dependent on the penalization?
So would you say GIC works best based on this? But what were the nr of false positives, as I imagine that would be far too high with AIC and maybe also with GIC?
Maybe slightly counterintuitive that ic.scale
would work on all IC except AIC - would it not be more logical to apply them to all? E.g. so that if ic.scale
would be set at 1/2 and tune.type="aic"
the penalisation would be half as strong as implied by AIC?
from abess.
Ha OK makes sense! So AIC & BIC always provide insufficient penalisation for the high dimensional case, even when working with the original problem size, while GIC here seems best - giving 34 true pos & 0 false pos when working with the original problem size! That's cool! Amazes me that you can pick out 34 true positives from a set of 1 million possible variables with zero false positives, even when the effect size of many of the selected variables is relatively modest in terms of Cohen's d. If you use "gic" on the full dataset without MCP preselection I noticed abess selects 33 true positives (0 false positives). So it seems both gic & ebic work here... That runs in 56s on my laptop but only if you specify support.size = c(1:76) - c(1:(n-1)) would be much much slower... And if you specify the max support size of the MCP fit one might of course as well use that as the initial active set or subset to those variables...
from abess.
I'm sorry that the experimental results were not clear enough and may cause a misunderstanding.
TP/ best.size | p=76 | p=1000000 |
---|---|---|
aic | 41/76 | 41/76 |
bic | 41/76 | 41/76 |
gic | 41/76 | 34/34 |
ebic | 38/43 | 33/33 |
mbic | 41/76 | 33/33 |
So, AIC and BIC aren't able to provide sufficient penalties in this case. EBIC is the best one without correcting the value of p. After correcting, EBIC, GIC, MBIC give better and similar results.
It implies IC needs to be corrected after pre-screening. May I ask if there is anything else need to discuss?
from abess.
No thanks that's clear! I'll close this then!
from abess.
Related Issues (20)
- Possible to include calcualtion of Bayes Factor according to Dunstan et al (2022)? HOT 3
- Perhaps a typo in the online tutorial for multi-response linear regression HOT 2
- Apparently Inconsistent Behaviour in LinearRegression - Expected Behaviour or Not? HOT 2
- Is it possible to combine `group` and `always_select` to always select a whole group? HOT 3
- [Bug] No termination within reasonable time for Poisson regression in a specific case HOT 6
- Incorporation of `fit_intercept` to `LinearRegression`? HOT 9
- Why the data generator funciton `make_glm_data` for gamma will define n shape parameters for a data set HOT 2
- Incorrect optimal support size in a multitask learning problem with nonnegativity constraints HOT 15
- How I can aviod intercept when I do abess in R HOT 1
- Provide any option for inference? HOT 12
- Risk Score Card Develope HOT 3
- How to set test data HOT 1
- [Feature] Release new version of `abess` on PyPI and Conda HOT 2
- Version 0.4.7 crashing when LinearRegression.fit() invoked HOT 3
- Colname errors HOT 1
- different results on Mac, Linux, Windows HOT 2
- [Feature] add a Wikipedia page about `abess` HOT 1
- Failed to build abess HOT 2
- What do multiple sets of coefficients mean? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from abess.