Giter Club home page Giter Club logo

Comments (7)

MattWenham avatar MattWenham commented on July 20, 2024 3

@MattWenham @Weiniily hi, I have programmed a python version for this idea.

Many thanks, that's certainly code we can work with for our application 😊

from abess.

Yanhang129 avatar Yanhang129 commented on July 20, 2024 2

Thanks for your question! Do you mean that there is no sparsity across groups, only sparsity within each group?

As I understand your question, my initial question as posed assumes no sparsity across groups - i.e. all groups should be considered - but an extension could also be to allow sparsity across groups, e.g. choose 1 or N from each of M groups, perhaps.

Sorry for the delay. It takes sometimes for us to discussion a possible solution. At present, we cannot use one line command to handle this problem, but we give an alternative solution here.

Our idea aims to use a block-coordinate-wise manner to select s variables in each group (suppose there are J groups). Specifically, we fix the results of the J-1 groups and use the splicing algorithm to obtain the optimal s variables for the first group. This process is similarly repeated for the remaining J-1 groups. The iteration stops when the active set remains unchanged after updating all the J groups.

Here is the implementation for our idea:

library(abess)
library(Matrix)
abess.group <- function(x, y, group, s){
  n <- nrow(x)
  p <- ncol(x)
  group.num <- length(table(group))
  # Marginal screening for each group to find the top s variables in each group
  x.cor <- apply(x, 2, function(x) abs(cor(x, y)))
  gi <- unique(group)
  index <- match(gi, group)
  A1 <- tapply(x.cor, group, function(x) order(x, decreasing = TRUE)[1:s])
  A1 <- unlist(mapply(function(x, y) x + y - 1, index, A1, SIMPLIFY = FALSE))
  A1 <- sort(A1)
  # The iteration for block-wise best-subset selection:
  A <- numeric(length(A1))
  while(any(A1 != A)) {
    A = A1
    for (i in 1:group.num) {
      x_temp <- cbind(
        x[, group == i],                 # the data of the i-th group
        x[, A1[-(((i-1)*s+1):(i*s))]]    # the data of the selected variables in remaining groups
      )
      res <- abess(
        x_temp, y, 
        support.size = ((group.num-1)*s+1):(group.num*s),     # to select s variables in the i-th group (leveraging warm-start)
        always.include=tail(1:ncol(x_temp), (group.num-1)*s)  # to include the selected variables in remaining groups
      )
      ind <- which(extract(res, support.size = group.num*s)$beta != 0)[1:s]
      A1[(((i-1)*s+1):(i*s))] <- ind + index[i]-1
    }
  }
  return(A1)
}

We examine this idea by code below, and find it works pretty well.

set.seed(1)
n <- 100
p <- 50
group <- rep(1:5, each = 10)
beta <- rep(c(1, -1, rep(0, 8)), times = 5)
x <- matrix(rnorm(n*p, 0, 1), n, p)
y <- x %*% beta + rnorm(n, 0, 1)
s <- 2 # Each group selects s(=2) variables
abess.group(x, y, group, s)

from abess.

Mamba413 avatar Mamba413 commented on July 20, 2024 2

@MattWenham , to ease understanding, I have just modified the code above and I hope that helps. We are also willing to implement it with python, but it will take a few days.

from abess.

Mamba413 avatar Mamba413 commented on July 20, 2024 1

@MattWenham @Weiniily hi, I have programmed a python version for this idea. Here is the code:

from abess.linear import LinearRegression
from abess.datasets import make_glm_data
from sklearn.feature_selection import r_regression
from sklearn.feature_selection import SelectFromModel
import numpy as np

def abess_group(x, y, group, s):
    n, p = x.shape
    group_label = np.unique(group)
    group_num = len(group_label)

    # Marginal screening for each group to find the top s variables in each group:
    x_cor = np.abs(r_regression(x, y))
    A = []
    group_start_index = []
    for i in group_label:
        group_i_index = group == i
        group_start_index.append(np.where(group_i_index)[0].min())
        each_group_num = np.sum(group_i_index)
        A.extend(np.where(
            np.logical_and(group_i_index, x_cor >= np.sort(x_cor[group_i_index])[each_group_num - s])
        )[0].tolist())
    A = np.array(A)
    group_start_index = np.array(group_start_index)
    
    # The iteration for block-wise best-subset selection:
    model = LinearRegression()
    A1 = np.array([0] * s * group_num)
    select_var_group = group_label.repeat(s)
    while np.any(A != A1):
        A1 = A.copy()
        for i in group_label:
            x_toselect = x[:, group == i]               # the data of the i-th group
            x_fixed = x[:, A[select_var_group != i]]    # the data of the selected variables in remaining groups
            candidate_num = x_toselect.shape[1]
            x_fit = np.hstack([x_toselect, x_fixed])
            model.set_params(
                # to include the selected variables in remaining groups:
                always_select=np.arange(candidate_num, x_fit.shape[1]), 
                # to select s variables in the i-th group (leveraging warm-start):
                support_size=range(candidate_num, s+candidate_num),    
            )
            model.fit(x_fit, y)
            # update the selected s variables:
            select_result_tmp = SelectFromModel(estimator=model, prefit=True).get_support()[range(candidate_num)]
            A[select_var_group == i] = np.array(np.where(select_result_tmp)[0] + group_start_index[i])
        pass
    return A

I simply test this implementation via:

np.random.seed(1)
p = 50
J = 10
group = np.linspace(0, 4, 5).repeat(10).astype(np.int32)
coef_ = np.array([1.0 if i % 10 == 0 or i %10 == 1 else 0.0 for i in range(50)])
dataset = make_glm_data(n=100, p=50, coef_=coef_, family='gaussian', k = 10)
print('True:', np.where(coef_ != 0.0)[0])
est = abess_group(dataset.x, dataset.y, group, s=2)
print("Estimate:", est)

And the result is pretty well:

True: [ 0  1 10 11 20 21 30 31 40 41]
Estimate: [ 0  1 10 11 20 21 30 31 40 41]

Finally, I attach my python and OS info:

  • Python 3.9.7 with abess (version: 0.4.6)
  • OS: Mac, Apple M1, 13.2.1 (22D68)

Wish that helps your research and data analysis!

from abess.

Yanhang129 avatar Yanhang129 commented on July 20, 2024

Thanks for your question! Do you mean that there is no sparsity across groups, only sparsity within each group?

from abess.

MattWenham avatar MattWenham commented on July 20, 2024

Thanks for your question! Do you mean that there is no sparsity across groups, only sparsity within each group?

As I understand your question, my initial question as posed assumes no sparsity across groups - i.e. all groups should be considered - but an extension could also be to allow sparsity across groups, e.g. choose 1 or N from each of M groups, perhaps.

from abess.

MattWenham avatar MattWenham commented on July 20, 2024

Many thanks, I will take a look at this in more detail soon. If you have a python version of this implementation, that would be very useful, otherwise I will work on a conversion if we feel it will be useful.

from abess.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.