<a href="https://abess.readthedocs.io/en/latest/auto_gallery/3-advanced-features/plot_

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

One-from-group or N-from-group about abess HOT 7 CLOSED

abess-team commented on July 20, 2024

One-from-group or N-from-group

from abess.

Comments (7)

MattWenham commented on July 20, 2024 3

@MattWenham @Weiniily hi, I have programmed a python version for this idea.

Many thanks, that's certainly code we can work with for our application 😊

from abess.

Yanhang129 commented on July 20, 2024 2

Thanks for your question! Do you mean that there is no sparsity across groups, only sparsity within each group?

As I understand your question, my initial question as posed assumes no sparsity across groups - i.e. all groups should be considered - but an extension could also be to allow sparsity across groups, e.g. choose 1 or N from each of M groups, perhaps.

Sorry for the delay. It takes sometimes for us to discussion a possible solution. At present, we cannot use one line command to handle this problem, but we give an alternative solution here.

Our idea aims to use a block-coordinate-wise manner to select s variables in each group (suppose there are J groups). Specifically, we fix the results of the J-1 groups and use the splicing algorithm to obtain the optimal s variables for the first group. This process is similarly repeated for the remaining J-1 groups. The iteration stops when the active set remains unchanged after updating all the J groups.

Here is the implementation for our idea:

library(abess)
library(Matrix)
abess.group <- function(x, y, group, s){
  n <- nrow(x)
  p <- ncol(x)
  group.num <- length(table(group))
  # Marginal screening for each group to find the top s variables in each group
  x.cor <- apply(x, 2, function(x) abs(cor(x, y)))
  gi <- unique(group)
  index <- match(gi, group)
  A1 <- tapply(x.cor, group, function(x) order(x, decreasing = TRUE)[1:s])
  A1 <- unlist(mapply(function(x, y) x + y - 1, index, A1, SIMPLIFY = FALSE))
  A1 <- sort(A1)
  # The iteration for block-wise best-subset selection:
  A <- numeric(length(A1))
  while(any(A1 != A)) {
    A = A1
    for (i in 1:group.num) {
      x_temp <- cbind(
        x[, group == i],                 # the data of the i-th group
        x[, A1[-(((i-1)*s+1):(i*s))]]    # the data of the selected variables in remaining groups
      )
      res <- abess(
        x_temp, y, 
        support.size = ((group.num-1)*s+1):(group.num*s),     # to select s variables in the i-th group (leveraging warm-start)
        always.include=tail(1:ncol(x_temp), (group.num-1)*s)  # to include the selected variables in remaining groups
      )
      ind <- which(extract(res, support.size = group.num*s)$beta != 0)[1:s]
      A1[(((i-1)*s+1):(i*s))] <- ind + index[i]-1
    }
  }
  return(A1)
}

We examine this idea by code below, and find it works pretty well.

set.seed(1)
n <- 100
p <- 50
group <- rep(1:5, each = 10)
beta <- rep(c(1, -1, rep(0, 8)), times = 5)
x <- matrix(rnorm(n*p, 0, 1), n, p)
y <- x %*% beta + rnorm(n, 0, 1)
s <- 2 # Each group selects s(=2) variables
abess.group(x, y, group, s)

from abess.

Mamba413 commented on July 20, 2024 2

@MattWenham , to ease understanding, I have just modified the code above and I hope that helps. We are also willing to implement it with python, but it will take a few days.

from abess.

Mamba413 commented on July 20, 2024 1

@MattWenham @Weiniily hi, I have programmed a python version for this idea. Here is the code:

from abess.linear import LinearRegression
from abess.datasets import make_glm_data
from sklearn.feature_selection import r_regression
from sklearn.feature_selection import SelectFromModel
import numpy as np

def abess_group(x, y, group, s):
    n, p = x.shape
    group_label = np.unique(group)
    group_num = len(group_label)

    # Marginal screening for each group to find the top s variables in each group:
    x_cor = np.abs(r_regression(x, y))
    A = []
    group_start_index = []
    for i in group_label:
        group_i_index = group == i
        group_start_index.append(np.where(group_i_index)[0].min())
        each_group_num = np.sum(group_i_index)
        A.extend(np.where(
            np.logical_and(group_i_index, x_cor >= np.sort(x_cor[group_i_index])[each_group_num - s])
        )[0].tolist())
    A = np.array(A)
    group_start_index = np.array(group_start_index)
    
    # The iteration for block-wise best-subset selection:
    model = LinearRegression()
    A1 = np.array([0] * s * group_num)
    select_var_group = group_label.repeat(s)
    while np.any(A != A1):
        A1 = A.copy()
        for i in group_label:
            x_toselect = x[:, group == i]               # the data of the i-th group
            x_fixed = x[:, A[select_var_group != i]]    # the data of the selected variables in remaining groups
            candidate_num = x_toselect.shape[1]
            x_fit = np.hstack([x_toselect, x_fixed])
            model.set_params(
                # to include the selected variables in remaining groups:
                always_select=np.arange(candidate_num, x_fit.shape[1]), 
                # to select s variables in the i-th group (leveraging warm-start):
                support_size=range(candidate_num, s+candidate_num),    
            )
            model.fit(x_fit, y)
            # update the selected s variables:
            select_result_tmp = SelectFromModel(estimator=model, prefit=True).get_support()[range(candidate_num)]
            A[select_var_group == i] = np.array(np.where(select_result_tmp)[0] + group_start_index[i])
        pass
    return A

I simply test this implementation via:

np.random.seed(1)
p = 50
J = 10
group = np.linspace(0, 4, 5).repeat(10).astype(np.int32)
coef_ = np.array([1.0 if i % 10 == 0 or i %10 == 1 else 0.0 for i in range(50)])
dataset = make_glm_data(n=100, p=50, coef_=coef_, family='gaussian', k = 10)
print('True:', np.where(coef_ != 0.0)[0])
est = abess_group(dataset.x, dataset.y, group, s=2)
print("Estimate:", est)

And the result is pretty well:

True: [ 0  1 10 11 20 21 30 31 40 41]
Estimate: [ 0  1 10 11 20 21 30 31 40 41]

Finally, I attach my python and OS info:

Python 3.9.7 with abess (version: 0.4.6)
OS: Mac, Apple M1, 13.2.1 (22D68)

Wish that helps your research and data analysis!

from abess.

Yanhang129 commented on July 20, 2024

Thanks for your question! Do you mean that there is no sparsity across groups, only sparsity within each group?

from abess.

MattWenham commented on July 20, 2024

Thanks for your question! Do you mean that there is no sparsity across groups, only sparsity within each group?

As I understand your question, my initial question as posed assumes no sparsity across groups - i.e. all groups should be considered - but an extension could also be to allow sparsity across groups, e.g. choose 1 or N from each of M groups, perhaps.

from abess.

MattWenham commented on July 20, 2024

Many thanks, I will take a look at this in more detail soon. If you have a python version of this implementation, that would be very useful, otherwise I will work on a conversion if we feel it will be useful.

from abess.

One-from-group or N-from-group about abess HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent