Comments (14)
Because the loss
(here, RSE) is already returned for the model, I think you can compute the AIC:
set.seed(1)
# Simulating some data
N <- 530
M <- 730
x <- matrix(rnorm(N * M, sd = 5), N)
y <- rowSums(x[, 1:5]) + 8 * rnorm(N)
library(biglasso)
X <- as.big.matrix(x, shared = FALSE)
# Training
ind.train <- sample(N, 400)
mod <- biglasso(X, y, row.idx = ind.train)
lambda <- mod$lambda
n <- length(ind.train)
loss.train <- mod$loss / n
# over-fitting on training, use AIC instead
plot(lambda, loss.train, pch = 20)
k <- ncol(X) - mod$rejections ## add 1 for the intercept?
## AIC not correct, but works well
aic <- 2 * (k + loss.train)
plot(lambda, aic, pch = 20)
# Check on test set
ind.test <- setdiff(seq_len(N), ind.train)
preds <- predict(mod, X, row.idx = ind.test)
y.test <- y[ind.test]
loss.test <- apply(preds, 2, function(pred) {
mean((pred - y.test)^2)
})
plot(lambda, loss.train, pch = 20, ylim = c(0, max(aic)))
points(lambda, aic, pch = 20, col = "blue")
points(lambda, loss.test, pch = 20, col = "red")
from biglasso.
Many thanks - yes that will do it, perfect! Could still be nice though to formally define AIC and BIC methods... cheers & thanks again! Ha and would you know if implementing positivity and/or box constraints on the coefficients would be hard to implement in biglasso (see the other issue I posted a while ago)?
from biglasso.
Note that I'm not sure at all of the AIC formula.
You can search for the "real" ones in many posts:
- https://stats.stackexchange.com/a/277205/135793
- https://stats.stackexchange.com/a/261422/135793
- https://stats.stackexchange.com/a/87357/135793
- https://www.researchgate.net/post/What_is_the_AIC_formula
None of those works well on my example.
It would be easy to make an AIC method for the class biglasso, but we need to agree on some formula (for the linear and logistic regressions).
Also note that I'm not sure AIC/BIC can be used for this kind of models: https://en.wikipedia.org/wiki/Bayesian_information_criterion#Limitations
from biglasso.
For the gaussian case (together with glmnet) I was using the formulae used here:
https://github.com/gabrielrvsc/HDeconometrics/blob/master/R/ic.glmnet.R
As degrees of freedom it is using the effective degrees of freedom of the LASSO, which is the nr of nonzero coefficients for a given lambda (which is what the slot in glmnet $df is returning), see
https://projecteuclid.org/euclid.aos/1194461726. This would be correct for the LASSO but maybe not for elastic net or ridge (see below).
For the binomial case the recipe would be similar but would then use the log likelihood directly.
The tricky thing I am not sure of myself is how to calculate the effective degree of freedom for the elastic net in general (the ic.glmnet uses effective degrees of the LASSO for everything) - I think the correct formula is given here
https://stats.stackexchange.com/questions/327088/how-can-i-calculate-the-number-of-degrees-of-freedom-in-the-elastic-net-regulari
but I am not sure if this might have been implemented already somewhere or not...
For ridge regression an effective AIC and effective degrees of freedom can be calculated using the rms package, see
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/FHHandouts/iscb98.pdf
If biglasso would return a slot $df with a correct estimate of the effective degrees of freedom for LASSO, ridge or elastic net, then the formulae for AIC or BIC will of course be easy enough... Main thing though would be to get the effective nr of degrees of freedom correct...
from biglasso.
The slot $df
in glmnet is returning the number of non-zero coefficients (you said non-negative).
And do you know the formula used by the ncvreg package?
If we try to use that, it doesn't work well:
fit2 <- ncvreg(x[ind.train, ], y[ind.train], family = "gaussian",
penalty = "lasso", alpha = 1)
plot(AIC(fit2))
from biglasso.
Ha yes sorry that's what I meant!
The calculation of the log likelihood in ncvreg
is given in function
logLik.ncvreg <- function(object, REML=FALSE, ...) {
n <- as.numeric(object$n)
df <- predict(object, type="nvars") + 1
if (object$family=="gaussian") {
if (REML) rdf <- n-df
else rdf <- n
RSS <- object$loss
l <- -n/2 * (log(2*pi) + log(RSS) - log(rdf)) - rdf/2
df <- df + 1
} else if (object$family=="binomial") {
l <- -1*object$loss
} else if (object$family=="poisson") {
y <- object$y
ind <- y != 0
l <- -object$loss + sum(y[ind]*log(y[ind])) - sum(y) - sum(lfactorial(y))
}
val <- l
attr(val,"df") <- df
attr(val,"nobs") <- n
class(val) <- "logLik"
val
}
and the degrees of freedom are calculated as in glmnet, ie as the total nr of nonzero coefficients, which in ncvreg
is in the predict.R
file, line
if (type=="nvars") return(apply(beta!=0,2,sum))
I think using these degrees of freedom in the AIC/BIC calculation would only be correct for the LASSO though, not for elastic net in general or ridge...
from biglasso.
For me BIC
in ncvreg
for variable selection works fine though:
fit2 <- ncvreg(x, y, family = "gaussian",
penalty = "lasso", alpha = 1, nlambda=500)
# AIC doesn't work so well for variable selection (though it's not terrible either, just overfits a bit):
plot(AIC(fit2))
optlambda=fit2$lambda[which.min(AIC(fit2))]
plot(coef(fit2, lambda=optlambda),type="h",ylab="Estimated coefficients")
which(coef(fit2, lambda=optlambda)!=0) # not so good
# but BIC does:
plot(BIC(fit2))
optlambda=fit2$lambda[which.min(BIC(fit2))]
plot(coef(fit2, lambda=optlambda),type="h",ylab="Estimated coefficients")
which(coef(fit2, lambda=optlambda)!=0)
# (Intercept) V1 V2 V3 V4 V5
# 1 2 3 4 5 6
To get rid of the downward bias you could probably do adaptive LASSO instead, or just re-estimate the selected covariates using regular least squares... And using BIC with penalty="SCAD"
or "MCP"
also seems to work and also results in less biased coefficients.
from biglasso.
Can you format the code? It is difficult to read your answers right now.
from biglasso.
Done!
from biglasso.
This is already supported by biglasso
:
data(colon)
X <- as.big.matrix(colon$X)
fit <- biglasso(X, colon$y)
head(AIC(fit))
# 0.3022 0.2932 0.2844 0.276 0.2677 0.2598
# 88.53887 89.06932 87.65349 86.29061 84.97980 83.72012
> head(BIC(fit))
# 0.3022 0.2932 0.2844 0.276 0.2677 0.2598
# 92.79314 95.45072 94.03489 92.67201 91.36120 90.10152
@YaohuiZeng, it seems to me you can close this issue.
from biglasso.
The problem is that the path of AIC
/BIC
- is not smooth at all
- doesn't always have a minimum
from biglasso.
Not sure what you mean by "doesn't always have a minimum" (some value is guaranteed to be the smallest, right?). As for it not being smooth, that is certainly true, but I would argue that this is more of a fundamental limitation of AIC/BIC than a problem with the biglasso
package.
In my experience, this requires looking at a plot of the AIC/BIC results and using your judgment (see below). I've tried to come up with automated ways of choosing the best AIC/BIC/OtherIC, but it's hard to come up with something foolproof.
data(colon)
X <- as.big.matrix(colon$X)
fit <- biglasso(X, colon$y)
ll <- log(fit$lambda)
plot(ll, BIC(fit), xlim=rev(range(ll)), pch=19, las=1)
Or for something smoother:
ss <- smooth.spline(ll, BIC(fit))
plot(ss, xlim=rev(range(ll)), pch=19, las=1)
from biglasso.
What I meant is that for the examples I tested, AIC kept decreasing (so the minimum was the last value).
I'll try to test it again.
from biglasso.
OK, that's kind of what I thought. But that's a fundamental flaw of AIC -- it completely breaks down in p > n situations, and its use as a model selection criteria for penalized regression is not justified.
from biglasso.
Related Issues (20)
- Automatically detect and cast variables to expected type(s) HOT 4
- standardize = FALSE ?
- predict.cv.biglasso - address 0x2b2da5f69ae0, cause 'memory not mapped' HOT 2
- Dealing with Extremely large datasets (750 GB) HOT 6
- Bug in cv.biglasso() triggered by edge case HOT 1
- Split available cores in cv.biglasso() HOT 1
- Rigorous LASSO and causal inference HOT 1
- Disabling intercept HOT 1
- Warning: stack imbalance
- Error R_Reprotect: only x protected items, can't reprotect index y HOT 3
- row.idx doesn't work ?
- Default CV error contradictory documentation HOT 1
- Installation error on CRAN HOT 3
- Cross validation tests failing HOT 3
- Removed from CRAN HOT 1
- Update benchmarks
- .
- data storage mode affects model fit -- why??
- are NCV penalties working correctly? HOT 2
- bug in setupX? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biglasso.