Comments (7)
@MattWenham @Weiniily hi, I have programmed a python version for this idea.
Many thanks, that's certainly code we can work with for our application 😊
from abess.
Thanks for your question! Do you mean that there is no sparsity across groups, only sparsity within each group?
As I understand your question, my initial question as posed assumes no sparsity across groups - i.e. all groups should be considered - but an extension could also be to allow sparsity across groups, e.g. choose 1 or N from each of M groups, perhaps.
Sorry for the delay. It takes sometimes for us to discussion a possible solution. At present, we cannot use one line command to handle this problem, but we give an alternative solution here.
Our idea aims to use a block-coordinate-wise manner to select s variables in each group (suppose there are J groups). Specifically, we fix the results of the J-1 groups and use the splicing algorithm to obtain the optimal s variables for the first group. This process is similarly repeated for the remaining J-1 groups. The iteration stops when the active set remains unchanged after updating all the J groups.
Here is the implementation for our idea:
library(abess)
library(Matrix)
abess.group <- function(x, y, group, s){
n <- nrow(x)
p <- ncol(x)
group.num <- length(table(group))
# Marginal screening for each group to find the top s variables in each group
x.cor <- apply(x, 2, function(x) abs(cor(x, y)))
gi <- unique(group)
index <- match(gi, group)
A1 <- tapply(x.cor, group, function(x) order(x, decreasing = TRUE)[1:s])
A1 <- unlist(mapply(function(x, y) x + y - 1, index, A1, SIMPLIFY = FALSE))
A1 <- sort(A1)
# The iteration for block-wise best-subset selection:
A <- numeric(length(A1))
while(any(A1 != A)) {
A = A1
for (i in 1:group.num) {
x_temp <- cbind(
x[, group == i], # the data of the i-th group
x[, A1[-(((i-1)*s+1):(i*s))]] # the data of the selected variables in remaining groups
)
res <- abess(
x_temp, y,
support.size = ((group.num-1)*s+1):(group.num*s), # to select s variables in the i-th group (leveraging warm-start)
always.include=tail(1:ncol(x_temp), (group.num-1)*s) # to include the selected variables in remaining groups
)
ind <- which(extract(res, support.size = group.num*s)$beta != 0)[1:s]
A1[(((i-1)*s+1):(i*s))] <- ind + index[i]-1
}
}
return(A1)
}
We examine this idea by code below, and find it works pretty well.
set.seed(1)
n <- 100
p <- 50
group <- rep(1:5, each = 10)
beta <- rep(c(1, -1, rep(0, 8)), times = 5)
x <- matrix(rnorm(n*p, 0, 1), n, p)
y <- x %*% beta + rnorm(n, 0, 1)
s <- 2 # Each group selects s(=2) variables
abess.group(x, y, group, s)
from abess.
@MattWenham , to ease understanding, I have just modified the code above and I hope that helps. We are also willing to implement it with python, but it will take a few days.
from abess.
@MattWenham @Weiniily hi, I have programmed a python version for this idea. Here is the code:
from abess.linear import LinearRegression
from abess.datasets import make_glm_data
from sklearn.feature_selection import r_regression
from sklearn.feature_selection import SelectFromModel
import numpy as np
def abess_group(x, y, group, s):
n, p = x.shape
group_label = np.unique(group)
group_num = len(group_label)
# Marginal screening for each group to find the top s variables in each group:
x_cor = np.abs(r_regression(x, y))
A = []
group_start_index = []
for i in group_label:
group_i_index = group == i
group_start_index.append(np.where(group_i_index)[0].min())
each_group_num = np.sum(group_i_index)
A.extend(np.where(
np.logical_and(group_i_index, x_cor >= np.sort(x_cor[group_i_index])[each_group_num - s])
)[0].tolist())
A = np.array(A)
group_start_index = np.array(group_start_index)
# The iteration for block-wise best-subset selection:
model = LinearRegression()
A1 = np.array([0] * s * group_num)
select_var_group = group_label.repeat(s)
while np.any(A != A1):
A1 = A.copy()
for i in group_label:
x_toselect = x[:, group == i] # the data of the i-th group
x_fixed = x[:, A[select_var_group != i]] # the data of the selected variables in remaining groups
candidate_num = x_toselect.shape[1]
x_fit = np.hstack([x_toselect, x_fixed])
model.set_params(
# to include the selected variables in remaining groups:
always_select=np.arange(candidate_num, x_fit.shape[1]),
# to select s variables in the i-th group (leveraging warm-start):
support_size=range(candidate_num, s+candidate_num),
)
model.fit(x_fit, y)
# update the selected s variables:
select_result_tmp = SelectFromModel(estimator=model, prefit=True).get_support()[range(candidate_num)]
A[select_var_group == i] = np.array(np.where(select_result_tmp)[0] + group_start_index[i])
pass
return A
I simply test this implementation via:
np.random.seed(1)
p = 50
J = 10
group = np.linspace(0, 4, 5).repeat(10).astype(np.int32)
coef_ = np.array([1.0 if i % 10 == 0 or i %10 == 1 else 0.0 for i in range(50)])
dataset = make_glm_data(n=100, p=50, coef_=coef_, family='gaussian', k = 10)
print('True:', np.where(coef_ != 0.0)[0])
est = abess_group(dataset.x, dataset.y, group, s=2)
print("Estimate:", est)
And the result is pretty well:
True: [ 0 1 10 11 20 21 30 31 40 41]
Estimate: [ 0 1 10 11 20 21 30 31 40 41]
Finally, I attach my python and OS info:
- Python 3.9.7 with
abess
(version: 0.4.6) - OS: Mac, Apple M1, 13.2.1 (22D68)
Wish that helps your research and data analysis!
from abess.
Thanks for your question! Do you mean that there is no sparsity across groups, only sparsity within each group?
from abess.
Thanks for your question! Do you mean that there is no sparsity across groups, only sparsity within each group?
As I understand your question, my initial question as posed assumes no sparsity across groups - i.e. all groups should be considered - but an extension could also be to allow sparsity across groups, e.g. choose 1 or N from each of M groups, perhaps.
from abess.
Many thanks, I will take a look at this in more detail soon. If you have a python version of this implementation, that would be very useful, otherwise I will work on a conversion if we feel it will be useful.
from abess.
Related Issues (20)
- Possible to include calcualtion of Bayes Factor according to Dunstan et al (2022)? HOT 3
- Perhaps a typo in the online tutorial for multi-response linear regression HOT 2
- Apparently Inconsistent Behaviour in LinearRegression - Expected Behaviour or Not? HOT 2
- Is it possible to combine `group` and `always_select` to always select a whole group? HOT 3
- [Bug] No termination within reasonable time for Poisson regression in a specific case HOT 6
- Incorporation of `fit_intercept` to `LinearRegression`? HOT 9
- Why the data generator funciton `make_glm_data` for gamma will define n shape parameters for a data set HOT 2
- Incorrect optimal support size in a multitask learning problem with nonnegativity constraints HOT 15
- How I can aviod intercept when I do abess in R HOT 1
- Provide any option for inference? HOT 12
- Risk Score Card Develope HOT 3
- How to set test data HOT 1
- [Feature] Release new version of `abess` on PyPI and Conda HOT 2
- Any way to work with preselected sets of variables in abess & with MBIC criterion? HOT 8
- Version 0.4.7 crashing when LinearRegression.fit() invoked HOT 3
- Colname errors HOT 1
- different results on Mac, Linux, Windows HOT 2
- [Feature] add a Wikipedia page about `abess` HOT 1
- Failed to build abess HOT 2
- What do multiple sets of coefficients mean? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from abess.