abess-team / abess Goto Github PK
View Code? Open in Web Editor NEWFast Best-Subset Selection Library
Home Page: https://abess.readthedocs.io/
License: Other
Fast Best-Subset Selection Library
Home Page: https://abess.readthedocs.io/
License: Other
I run the following code in the terminal, I get the error "zsh: illegal hardware instruction python"
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from abess.linear import abessLogistic
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.metrics import make_scorer, roc_auc_score
>>> from sklearn.preprocessing import PolynomialFeatures
>>> from sklearn.model_selection import GridSearchCV
>>> pipe = Pipeline([('poly', PolynomialFeatures(include_bias=False)), ('alogistic', abessLogistic())])
>>> param_grid = {'poly__interaction_only': [True, False],'poly__degree': [1, 2, 3]}
>>> scorer = make_scorer(roc_auc_score, greater_is_better=True)
>>> grid_search = GridSearchCV(pipe, param_grid, scoring=scorer, cv=5)
>>> X, y = load_breast_cancer(return_X_y=True)
>>> grid_search.fit(X, y)
Dunstan et al in Easy computation of the Bayes factor to fully quantify Occam’s razor in least-squares fitting and to guide actions present "an easy way of calculating [the Bayes Factor] so that it can be routinely used with all least-squares fitting to complement and augment other figures of merit".
Would it be possible to include these calculations as an alternative ic_type
?
Describe the bug
When installing v0.4.6
any calls to the abess
library causes a segmentation fault.
The same simple script with v0.4.5
results in a normal execution.
Code for Reproduction
from abess.linear import LinearRegression
from abess.datasets import make_glm_data
import numpy as np
np.random.seed(12345)
data = make_glm_data(n = 100, p = 50, k = 10, family = 'gaussian')
model = LinearRegression(support_size = 10)
model.fit(data.x, data.y)
print(model.predict(data.x)[:4])
Using v0.4.6
results in the following error:
Expected behavior
I expected the predictions to be printed without segmentation fault.
Example run with v0.4.5
:
Desktop (please complete the following information):
abess
would be even more helpful if make_glm_data
and make_multivariate_glm_data
can support simulating datasets associated with an exponential correlation structure. This structure is widely considered in recent literature (e.g. https://arxiv.org/pdf/2104.12576.pdf).
Describe the bug
Why the data generator funciton make_glm_data
for gamma will define n shape parameters for a data set
elif family == "gamma":
x = x / 16
m = 5 * np.sqrt(2 * np.log(p) / n)
if coef_ is None:
Tbeta[nonzero] = np.random.uniform(m, 100 * m, k) * sign
else:
Tbeta = coef_
# add noise
eta = x @ Tbeta + np.random.normal(0, sigma, n)
# set coef_0 to make eta<0
eta = eta - np.abs(np.max(eta)) - 10
eta = -1 / eta
# set the shape para of gamma uniformly in [0.1,100.1]
shape_para = 100 * np.random.uniform(0, 1, n) + 0.1
y = np.random.gamma(
shape=shape_para,
scale=eta / shape_para,
size=n)
Additional context
Would it be more sensible that a data set share the same shape parameter?
When I import abess in my python file, I see the following error:
Traceback (most recent call last):
File "c:\Users\igork\code\best_subset.py", line 1, in
from abess.linear import abessLm
File "C:\Users\igork\AppData\Roaming\Python\Python39\site-packages\abess_init_.py", line 9, in
from abess.linear import abessLogistic, abessLm, abessCox, abessPoisson, abessMultigaussian, abessMultinomial, abessGamma
File "C:\Users\igork\AppData\Roaming\Python\Python39\site-packages\abess\linear.py", line 3, in
from .bess_base import bess_base
File "C:\Users\igork\AppData\Roaming\Python\Python39\site-packages\abess\bess_base.py", line 6, in
from .cabess import *
File "C:\Users\igork\AppData\Roaming\Python\Python39\site-packages\abess\cabess.py", line 13, in
from . import _cabess
ImportError: DLL load failed while importing _cabess: Не найден указанный модуль.
I tried installing MinGW-w64 and adding it to PATH as written here #259 : but the problem persists.
abess was installed via pip install. python version 3.9.6
Can you help me?
Describe the bug
In my experiments, after updating abess from 0.4.0 to 0.4.5, I found the cv procedure get slower in some cases. The following code provides an example.
Code for Reproduction
library(microbenchmark)
library(abess)
n <- 3000
p <- 500
support.size <- 10
sim_once <- function(seed) {
dataset <- generate.data(n, p, support.size, family = "binomial", seed = seed)
time_cv <- microbenchmark(
abess_fit <- abess(dataset[["x"]], dataset[["y"]], family = "binomial", tune.type = "cv", nfolds = 10),
times = 1
) [["time"]] / 10^9
time_cv
}
# average time
time <- sapply(1:5, sim_once)
mean(time)
Is your feature request related to a problem? Please describe.
I find that the data generators will always give positive coef_
, described on this page. Could you support random negative coefficients too?
Describe the solution you'd like
coef_
contains both positive and negative values.
If we wish to use always_select
to select an entire group
of targets, how should this be done? I can imagine two possibilities:
always_select
to all members of the group, oralways_select
to any member of the group.Which of these should work correctly?
I've encountered a strange issue: abess()
does not terminate in a specific situation. The following code produces a reproducible example. It runs for at least 10 mins without termination. However, by simply setting support.size = 0:13
or support.size = 14
, it terminates immediately (perhaps within 1 second). Moreover, when tune.type = "gic"
, this issue also didn't happen, which makes me really confused.
The version of abess
is 0.4.7
(installed from CRAN). I've tested the code on two different Linux systems. The same issue is encountered.
library(abess)
seed <- 1
n <- 100
p <- 1000
family <- "poisson"
snr <- Inf
beta <- rep(0, p)
nonzero <- sample(1:p, 10)
beta[nonzero] <- c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5)
k <- 10
data <- generate.data(n, p, beta = beta, snr = snr, family = family, support.size = k, seed = seed)
x <- data$x
y <- data$y
abess(x, y, tune.type = "cv", family = "poisson", support.size = 0:14)
In the HPC environment (Intel, linux) even very simple script give Illegal Instruction
errors. This suffices to generate the error
model = abess.LinearRegression()
model.fit(np.random.rand(2,2), np.random.rand(2))
However, this occurs only with the latest abess binary from conda-forge. If I use pip to install the package, everything runs smoothly.
Describe the bug
In my experiment, when using abess.linear.LogisticRegression in cross_validate, it returns negative test_score.The following code provides an example.
Code for Reproduction
LogisticRegression the samples on Hypersphere(dim=9) in 10D Euclidean Space (without do the logarithm map).
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_validate
from geomstats.geometry.hypersphere import Hypersphere
from abess import LogisticRegression
sphere = Hypersphere(dim=9)
labels = np.concatenate((np.zeros(1000),np.ones(1000)))
data0 = sphere.random_riemannian_normal(mean=np.array([1/3,0,2/3,0,2/3,0,0 ,0,0 ,0]), n_samples=1000, precision=5)
data1 = sphere.random_riemannian_normal(mean=np.array([0 ,0,0 ,0,2/3,0,2/3,0,1/3,0]), n_samples=1000, precision=5)
data = np.concatenate((data0,data1))
train_data, test_data, train_labels, test_labels = train_test_split(data, labels, test_size=0.33, random_state=0)
result = cross_validate(LogisticRegression(support_size=range(0, 11)), train_data, train_labels)
print(result)
return:
{'fit_time': array([0.01018214, 0.0107348 , 0.00791812, 0.00899959, 0.00998664]), 'score_time': array([0.0010004, 0. , 0.0010004, 0. , 0. ]), 'test_score': array([-96.61424923, -94.28383166, -91.52310614, -95.26857002,
-87.02766473])}
Describe the bug
I was testing abess on the multitask learning problem below (deconvolution of a multichannel signal using a known point spread function / blur kernel by regressing the multichannel signal on shifted copies of the point spread function) but I am experiencing poor performance, where the best subset is inferred to be much larger than it actually is (support size of best subset = 157, while true support size of simulated dataset is 50). Is there a way to resolve this? Does this have to do with the fact that abess currently does not support nonnegativity constraints (all my simulated model coefficients are positive)? Is there any way to allow for nonnegativity or box constraints? Even if not taken into account during fitting it could be handy if the coefficients could be clipped within the allowed range, so that the information criteria would at least indicate the correct support size of the best allowed model. Taking into account the constraints during fitting would of course be even better. I was also interested in fitting a multitask identity link Poisson model - as currently only Poisson log link is incorporated I could potentially still get something close to that by using an observation weights matrix 1/(Y+0.1), which would be approx. 1/Poisson variance weights and then using family="mgaussian". Is that allowed by any chance? Can observation weights be a matrix with family="mgaussian", to allow for different observation weights per outcome channel/task? If not, could this be allowed for?
Code for Reproduction
Paste your code for reproducing the bug:
library(remotes)
remotes::install_github("tomwenseleers/L0glm/L0glm")
library(L0glm)
# simulate blurred multichannel spike train
set.seed(1)
s <- 0.1 # sparsity (% of timepoints where there is a peak)
p <- 500 # 500 variables
# simulate multichannel blurred spike train with Gaussian noise
sd_noise <- 1
sim <- simulate_spike_train(n=p,
p=p,
k=round(s*p), # true support size = 0.1*500 = 50
mean_beta = 10000,
sd_logbeta = 1,
family="gaussian",
sd_noise = sd_noise,
multichannel=TRUE, sparse=TRUE)
X <- sim$X # covariate matrix with shifted copies of point spread function, n x p matrix
Y <- sim$y # multichannel signal (blurred spike train), n x m matrix
colnames(X) = paste0("x", 1:ncol(X)) # NOTE: if colnames of X and Y are not set abess gives an error message, maybe fix this?
colnames(Y) = paste0("y", 1:ncol(Y))
true_coefs <- sim$beta_true # true coefficients
m <- ncol(Y) # nr of tasks
n <- nrow(X) # nr of observations
p <- ncol(X) # nr of independent variables (shifted copies of point spread functions)
W <- 1/(Y+0.1) # approx 1/variance Poisson observation weights with family="poisson", n x m matrix
# best subset multitask learning using family="mgaussian" using abess
abess_fit <- abess(x = X,
y = Y,
# weights = 1/(Y+0.1), # QUESTION: if I use family="poisson" above I would like to be able to use this matrix as approx. 1/Poisson variance observation weights to be able to fix identity link poisson using family="mgaussian" in abess - is this possible/allowed ?
family = "mgaussian",
tune.path = "sequence",
support.size = c(1:200),
lambda=0,
warm.start = TRUE,
tune.type = "gic") # or cv or ebic or bic or aic
plot(abess_fit, type="tune") # optimal support size would come out at 150 when in fact it is 50 - I presume because my coefficients are constrainted to be all positive, and abess currently does not allow for nonnegativity or box constraints?
beta_abess = as.matrix(extract(abess_fit)$beta) # coefficient matrix for best subset
library(qlcMatrix)
image(x=1:nrow(sim$y), y=1:ncol(sim$y), z=beta_abess^0.1, col = topo.colors(255),
useRaster=TRUE,
xlab="Time", ylab="Channel", main="abess multitask mgaussian (red=true peaks)")
abline(v=(1:nrow(sim$X))[as.vector(rowMax(sim$beta_true)!=0)], col="red")
abline(v=(1:nrow(sim$X))[as.vector(rowMin(beta_abess)!=0)], col="cyan")
sum(rowMax(sim$beta_true)!=0) # 50 true peaks
sum(rowMin(beta_abess)!=0) # 157 peaks detected
Expected behavior
Couple of questions here:
Desktop (please complete the following information):
The loss function seems incorrect in the first section of the online tutorial. Should the 2-norm of matrices be changed to the Frobenius norm instead? Both for R & Python tutorials.
When the input matrix X contains a constant column, the LinearRegression()
class in abess package makes prediction with nan instead of estimated values, which is the case of scikit-learn class LassoCV()
. One way to avoid this is that we set the parameter is_normal=False
, however, this is not the way user likes and scikit-learn works. Since I have encountered this kind of thing many times,I wonder if there is any possible that you can optimize this API. The following codes describe the case concisely:
I want to compute GIC to select the true model. But I gain different results from the abess packages and manual calculation.
set.seed(2)
p = 250
N = 2500
X = matrix(rnorm(N * p), ncol = p)
A = sort(sample(p, 10))
beta = rep(0, p)
beta = replace(beta, A, rnorm(10, mean = 6))
xbeta <- X %*% beta
Y <- xbeta + rnorm(N)
Compute the estimator by abess packages.
C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
k = C$best.size
mid=coef(abess(X, Y, family = "gaussian",support.size =k))
Central =mid[2:(p+1)]
intercept=mid[1]
#compute GIC[10]=131.3686
GIC= N*log(1/(2*N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
#GIC=-1601.499
Describe the bug
Unexpected NaN appears in regression coefficients when fitting logistic regression on a real data.
Code for Reproduction
library(abess)
x_train <- as.matrix(read.table("arcene_train.txt"))
y_train <- (read.table("arcene_train_labels.txt")$V1 + 1) / 2
fit_abess <- abess(x_train, y_train, family = "binomial")
beta <- coef(fit_abess, fit_abess$best.size)
sum(is.nan(as.vector(beta)))
Issue
The coefficient contains NaNs, which is not expected, since there is no NA or NaN in the data, which can be easily checked via
sum(is.na(x_train))
sum(is.na(y_train))
sum(is.nan(x_train))
sum(is.nan(y_train))
Desktop
Dataset
When I import abess in each python version I try,such as ( 3.5 3.6 3.7 ) all report the error as follow'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\__init__.py", line 9, in <module>
from abess.linear import abessLogistic, abessLm, abessCox, abessPoisson, abessMultigaussian, abessMultinomial
File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\linear.py", line 3, in <module>
from .bess_base import bess_base
File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\bess_base.py", line 7, in <module>
from abess.cabess import pywrap_abess
File "C:\ProgramData\Anaconda3\envs\cp3710\lib\site-packages\abess\cabess.py", line 13, in <module>
from . import _cabess
ImportError: DLL load failed: 找不到指定的模块。
I try all I can find in the web,but not works, cloud anyone can help me .
I tried to install the latest version of the abess
Python package, and I followed the guideline in https://abess.readthedocs.io/en/latest/Installation.html.
But the error Error: invalid register for .seh_savexmm
occurs.
The info of my computer are:
Platform Version: Windows-10-10.0.22000-SP0, 64bit
Python Version: 3.7.4
'CPU info: Intel64 Family 6 Model 85 Stepping 4, GenuineIntel'
Package Version: 0.4.0
Hello, I am doing some real data analysis about high-dimensional cox model. My real dataset's shape is like 240*7000, however, I try to use the abess.CoxPHSurvivalAnalysis()
with cv and it can not choose any feature out. So, I must use screening before abess for Cox model. I also did simulation test for only screening method in abess
package and found that the screening method can not contain all the real features spawn by make_glm_data
. So, I doubt the algorithm of screening in this package, I hope you guys may adapt it, thank u!!!
Describe the bug
The result of Gamma model is wrong. When use approx newton, the estimator is always all-zero vector; a total wrong result is got when use exact newton.
Code for Reproduction
Here are R code:
n <- 10000
p <- 5
support.size <- 3
dataset <- generate.data(n, p, support.size, family = "gamma", seed = 1)
approx_fit <- abess(
dataset[["x"]],
dataset[["y"]],
family = "gamma",
newton = "approx",
)
exact_fit <- abess(
dataset[["x"]],
dataset[["y"]],
family = "gamma",
newton = "exact",
)
print("true_coef: ")
print(dataset$beta)
print("approx newton est_coef: ")
print(approx_fit$beta[,support.size])
print("exact newton est_coef: ")
print(exact_fit$beta[,support.size])
Result:
[1] "true_coef: "
[1] 0.000000 0.000000 3.069073 6.725235 7.974553
[1] "approx newton est_coef: "
x1 x2 x3 x4 x5
0 0 0 0 0
[1] "exact newton est_coef: "
x1 x2 x3 x4 x5
2.159370e+01 0.000000e+00 5.188777e-13 0.000000e+00 0.000000e+00
Here are Python code:
import abess
import numpy as np
np.random.seed(1)
data = abess.make_glm_data(n=10000, p=5, k=3, family="gamma")
model1 = abess.GammaRegression(support_size = 3, approximate_Newton = False)
model1.fit(data.x, data.y)
model2 = abess.GammaRegression(support_size = 3, approximate_Newton = True)
model2.fit(data.x, data.y)
print("true_coef: ",data.coef_)
print("approx newton est_coef: ",model2.coef_)
print("exact newton est_coef: ",model1.coef_)
Results:
true_coef: [ 1.47594114 6.66687502 -2.85407881 0. 0. ]
approx newton est_coef: [0. 0. 0. 0. 0.]
exact newton est_coef: [ 0.00000000e+00 0.00000000e+00 -2.10497802e-34 -1.93607800e-35 1.82703568e-34]
Desktop (please complete the following information):
That is
BayesSUR KSgeneral L0Learn MrSGUIDE RPEGLMEN SCAT abess clustermq diseq
gaselect imager landsepi rayrender
which specify a C++ standard, usually C++11, but do not use it in their
configure script. See
https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Using-C_002b_002b-code
L0Learn already has a deadline for another issue: for the rest please
correct before 2023-02-18.
Note that in the cases we have looked at no C++ specification is
actually needed: the current default of C++14, soon to be C++17, works.
(And C++11 has been the default since R 3.6.2 in 2019.) So please try
removing it.
--
Brian D. Ripley, [email protected]
Emeritus Professor of Applied Statistics, University of Oxford
When I want to use abess in multi classfication,I use the abessMultinomial and run the example code
from abess.linear import abessMultinomial
from abess.datasets import make_multivariate_glm_data
import numpy as np
np.random.seed(12345)
data = make_multivariate_glm_data(n = 100, p = 50, k = 10, M = 3, family = 'multinomial')
model = abessMultinomial(support_size = [10])
model.fit(data.x, data.y)
model.predict(data.x)
And I get an int 47,but I think it should get a array of the class such as .
array([[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.],
[0., 1., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
I'm so sorry to bother you again.And thanks a lot for yours help.
Describe the bug
All types of tune go wrong in PCA model, include "gic", "aic", "bic", "ebic" and "cv". Specifically, all information metric methods return 0; the result of "cv" method monotonically decreases as support_size
increases so it's useless for selecting support_size
.
Code for Reproduction
n <- 10000
p <- 5
support_size <- 3
dataset <- generate.spc.matrix(n, p, support_size, snr = 100, seed = 1)
for(ic_type in c("gic", "aic", "bic", "ebic")){
spca_fit <- abesspca(dataset[["x"]], tune.type = ic_type)
if(all(spca_fit[["tune.value"]] == 0)){
print(sprintf("tune.value of %s is all zero!", ic_type))
}
}
spca_fit2 <- abesspca(dataset[["x"]], tune.type = "cv")
if(!is.unsorted(-spca_fit2[["tune.value"]])){
print("tune.value of cv is sorted!")
}
Results:
[1] "tune.value of gic is all zero!"
[1] "tune.value of aic is all zero!"
[1] "tune.value of bic is all zero!"
[1] "tune.value of ebic is all zero!"
[1] "tune.value of cv is sorted!"
Desktop (please complete the following information):
Describe the bug
abess::abess()
from R package fails when there is only one feature.
Code for Reproduction
library(abess)
library(data.table)
x = data.table(x = sample(100, size = 100))
y = factor(sample(c("a", "b", "c"), size = 100, replace = TRUE))
abess(x = x, y = y, family = "multinomial")
#> Error in Matrix::Matrix(x[, -y_dim], sparse = TRUE, dimnames = list(vn, : length of 'dimnames' [1] not equal to array extent
Created on 2023-03-01 with reprex v2.0.2
Version: ‘0.4.7’
This did not happen in previous releases (you broke the CI of mlr3extralearners)
I use an abess learner in DoubleMLPLR, but there are some warnings. The following is my code:
from doubleml import DoubleMLData
from sklearn.base import clone
from abess.linear import LinearRegression
n_obs = 500
n_vars = 100
theta = 3
X = np.random.normal(size=(n_obs, n_vars))
d = np.dot(X[:, :3], np.array([5, 5, 5])) + np.random.standard_normal(size=(n_obs,))
y = theta * d + np.dot(X[:, :3], np.array([5, 5, 5])) + np.random.standard_normal(size=(n_obs,))
dml_data_sim = DoubleMLData.from_arrays(X, y, d)
abess = LinearRegression(cv = 5)
ml_g_abess = clone(abess)
ml_m_abess = clone(abess)
ml_plr_abess = DoubleMLPLR(dml_data_sim, ml_g_abess, ml_m_abess)
dml_plr_abess.fit();
After running the code, there will be warnings like "Learner provided for ml_g is probably invalid".
In my fork repo, the developing part in "bbayukari/abess/develop-ordinal" worked well before, but the new version which I want to PR to upstream in "bbayukari/abess/ordinal" didn't work(not only my part didn't work, but all models can't find the api). In the new version, I locate the cpp code at abess/src where isn't in R-package, that's the only difference between the two versions.
After clone the code in "bbayukari/abess/ordinal", then "install and restart", and test simply like:
dataset <- generate.data(150,100,3)
abess(dataset[["x"]],dataset[["y"]])
then,
Error in abessGLM_API(x = x, y = y, n = nobs, p = nvars, normalize_type = normalize, :
could not find function "abessGLM_API"
Called from: abess.default(dataset[["x"]], dataset[["y"]])
Mention new reference https://arxiv.org/abs/2110.09697 in readme.
First of all: Thank you for the great package, it has been very helpful. Now to my suggestion:
Abess uses weight
https://abess.readthedocs.io/en/latest/Python-package/linear/Logistic.html?highlight=score#abess.linear.LogisticRegression.fit
Sklearn uses sample_weight
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit
I'm using both in a project and it would be helpful if abess followed the sklearn convention.
Hello, I 'm using the package to calculate some real survival data, I find the cross-validation can only choose deviance on the test cohort to determine the support size. could you guys add the c_index principle for cox model's cross validation?
Describe the bug
I want to check that whether setting sample weight to 0 is equivalent to removing corresponding samples. However, in the following example, coef1
and coef2
are different. Does it means that samples with zero weight can still affect the result?
Code for Reproduction
import numpy as np
from abess.linear import MultinomialRegression
from abess.datasets import make_multivariate_glm_data
n, p = 100, 50
np.random.seed(12345)
data = make_multivariate_glm_data(n=n, p=p, k=10, M=3, family='multinomial')
# construct dataset1 with 100 samples
X1, y1 = data.x, data.y
w1 = np.ones(n)
model1 = MultinomialRegression(support_size=10)
model1.fit(X1, y1, weight=w1)
coef1 = model1.coef_
# construct dataset2 by adding 100 different samples to dataset1
# simultaneously set weight to 0 for these additional 100 samples
X2 = np.vstack([data.x, -data.x])
y2 = np.vstack([data.y, data.y])
w2 = np.ones(n * 2)
w2[n:] = 0
model2 = MultinomialRegression(support_size=10)
model2.fit(X2, y2, weight=w2)
coef2 = model2.coef_
Thanks!
Best Subset of Group Selection allows for all members of non-overlapping groups to be selected, but would the converse be possible, i.e. choosing only one variable from each group? This scheme could be extended to N-from-group, or allow the user to set a maximum proportion of each group (e.g. 50%) which should be selected.
If any of this is possible in with current version of the software, I would be grateful to know how it can be implemented.
I meet trouble when I install R package abess from github directly.
I have delete original abess, and then just use the code "devtools::install_github("abess-team/abess/R-package")" in RStudio, and this is the error information:
Downloading GitHub repo abess-team/abess@HEAD
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?
1: All
2: CRAN packages only
3: None
4: Rcpp (1.0.7 -> 1.0.8) [CRAN]
Enter one or more numbers, or an empty line to skip updates:
√ checking for file 'C:\Users\ustc\AppData\Local\Temp\RtmpKkyovB\remotes32f06dd94478\abess-team-abess-4d97f97\R-package/DESCRIPTION' ...
Describe the bug
I'm doing some experiments about combining abess with auto-sklearn, when using MultinomialRegression
for classification, the memory tends to increase very quickly and so much that it cannot be displayed on a web page, but for LinearRegression
, there is no similar out-of-memory problem.
Code for Reproduction
My code is given as follows:
from pprint import pprint
from ConfigSpace.configuration_space import ConfigurationSpace
from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
UniformIntegerHyperparameter, UniformFloatHyperparameter
import sklearn.metrics
import autosklearn.classification
import autosklearn.pipeline.components.classification
from autosklearn.pipeline.components.base \
import AutoSklearnClassificationAlgorithm
from autosklearn.pipeline.constants import DENSE, SIGNED_DATA, UNSIGNED_DATA, \
PREDICTIONS
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import openml
from abess import MultinomialRegression
from sklearn.ensemble import RandomForestClassifier
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn import preprocessing
from sklearn.tree import DecisionTreeClassifier
import time
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
class AbessClassifier(AutoSklearnClassificationAlgorithm):
def __init__(self, exchange_num, random_state=None):
self.exchange_num = exchange_num
self.random_state = random_state
self.estimator = None
def fit(self, X, y):
from abess import MultinomialRegression
self.estimator = MultinomialRegression()
self.estimator.fit(X, y)
return self
def predict(self, X):
if self.estimator is None:
raise NotImplementedError
return self.estimator.predict(X)
def predict_proba(self, X):
if self.estimator is None:
raise NotImplementedError()
return self.estimator.predict_proba(X)
@staticmethod
def get_properties(dataset_properties=None):
return {
'shortname': 'abess Classifier',
'name': 'abess logistic Classifier',
'handles_regression': False,
'handles_classification': True,
'handles_multiclass': True,
'handles_multilabel': False,
'handles_multioutput': False,
'is_deterministic': False,
# Both input and output must be tuple(iterable)
'input': [DENSE, SIGNED_DATA, UNSIGNED_DATA],
'output': [PREDICTIONS]
}
@staticmethod
def get_hyperparameter_search_space(dataset_properties=None):
cs = ConfigurationSpace()
exchange_num=UniformIntegerHyperparameter(
name='exchange_num', lower=4, upper=6, default_value=5
)
cs.add_hyperparameters([exchange_num])
return cs
# Add abess logistic classifier component to auto-sklearn.
autosklearn.pipeline.components.classification.add_classifier(AbessClassifier)
cs = AbessClassifier.get_hyperparameter_search_space()
print(cs)
dataset = fetch_openml(data_id = int(29),as_frame=True)#507,183,44136
X=dataset.data
y=dataset.target
X.replace([np.inf,-np.inf],np.NaN,inplace=True)
## Remove rows with NaN or Inf values
inx=X[X.isna().values==True].index.unique()
X.drop(inx,inplace=True)
y.drop(inx,inplace=True)
##use dummy variables to replace classification variables:
X = pd.get_dummies(X)
## Keep only numeric columns
X = X.select_dtypes(np.number)
## Remove columns with NaN or Inf values
nan = np.isnan(X).any()[np.isnan(X).any() == True]
inf = np.isinf(X).any()[np.isinf(X).any() == True]
X = X.drop(columns = list(nan.index))
X = X.drop(columns = list(inf.index))
##Encode target labels with value between 0 and 1
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape) #number of initial features
print(X_test.shape) #number of initial features
cls = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=60,
per_run_time_limit=10,
include={
'classifier': ['AbessClassifier'],
'feature_preprocessor': ['polynomial']
},
memory_limit=6144,
ensemble_size=1,
)
cls.fit(X_train, y_train, X_test, y_test)
predictions = cls.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
After running this code , the memory gets to about 159MB, which is not friendly for users to open an .ipynb
. Again, regression does not encounter the memory-out problem.
Hello! I want to get the result of each step of splicing iteration, but I don't know how to get it. The c++ code is difficult for me, I wonder if there is a simple way to get the result?
Your method performs quite well on high-dimensional linear models. I am curious about whether it will work on ordinal logistics regression, and this also seems to be a meaningful generalization. Will you implement this new feature in the near future? Thanks!
I am getting apparently inconsistent results from LinearRegression
depending on how I specify the support_size
.
Essentially, when using np.nonzero(model.coef_)
to obtain the support set, I get inconsistent results between the following:
support_size = 1
: 'A' is chosen.
support_size = 2
: 'A' and 'B' are chosen.
support_size = [1,2]
: 'A' and 'C' are chosen.
'A', 'B', and 'C' are all 'correct' to some extent, but one issue I am facing is that I cannot get 'C' to appear in a support set using a single value for support_size
unless that value is much larger than it needs to be, in this case support_size = 18
. All other arguments are their default values.
Before we delve into what might be happening, I guess I need to ask if this is expected behaviour or not?
Hello, I'm trying to running make html
in command line to convert comments to .html
files, but there are soming warnings and errors in compiling the basic examples like cox regression. The warnings show that the class OrdinalRegression
can’t be imported from the module abess.linear
. Then I try to delete my comments, but there are still warnings and errors.
Here are the warnings:
WARNING: autodoc: failed to import class OrdinalRegression' from module' abess. linear
AttributeError: module’ abess. linear' has no attribute’ OrdinalRegression'
checking consistency... D: \My app \Gi thubDesktop \abess \docs \Tutorial\1-g 1m \README. rst: WARNING: document isn' t included in any toctree
D: \My app \Gi thubDesktop \abess \docs \Tutorial \2- pca \README. rst: WARNING: document isn' t included in any toctree
app\Gi thubDesktop \abess \docs \Tutorial \3 -advanced features \README.rst WARNING: document isn't included in any toctree
D: \My app \Gi thubDesktop \abess \docs \Tutorial \4-computation-tips \README. rst: WARNING: document isn' t included in any toctree
D: \My app \Gi thubDesktop \abess \docs \Tutorial \5-scikit-learn-connection \README. rst: WARNING: document isn' t included in any toctree
D: \My app \Gi thubDesktop \abess \docs \Tutorial \README. rst: WARNING: document isn' t inc luded in any toctree
No warning is suggested when rank of the design matrix <= support size. When the support size gets larger, results seem to be meaningless. What about throwing a warning? The following code provides a demo.
library(abess)
data <- generate.data(n = 30, p = 100, support.size = 10)
x0 <- data$x
y0 <- data$y
idx <- c(1:10, 1:10)
x <- x0[idx, ]
y <- y0[idx]
abess(x, y, support.size = 0:15)
Call:
abess.default(x = x, y = y, support.size = 0:15)
support.size dev GIC
1 0 9.934785e+05 276.17935
2 1 2.310636e+05 252.06170
3 2 8.155513e+04 236.28617
4 3 9.333066e+03 197.98460
5 4 2.467864e+03 176.43313
6 5 3.883858e+02 144.50369
7 6 5.223017e+02 155.48135
8 7 1.689733e+00 45.86059
9 8 2.741854e+01 106.64631
10 9 3.802369e-24 -1033.05370
11 10 1.188349e-25 -1097.31384
12 11 6.137819e-25 -1059.42301
13 12 1.259870e-25 -1086.03949
14 13 4.764910e-24 -1008.32964
15 14 1.402658e-25 -1073.78679
16 15 3.573510e-25 -1050.03047
Describe the bug
Hello, I try to use OrdinalRegression() in cross_validate, but it returns a TypeError said no scoring is specified. Here is a demo to recurrent it.
Code for Reproduction
# %%
from sklearn.model_selection import cross_validate
from abess import OrdinalRegression, make_glm_data
import numpy as np
np.random.seed(0)
data = make_glm_data(n=1000, p=200, k=30, family='ordinal')
model = OrdinalRegression()
result = cross_validate(model, data.x ,data.y)
Expected behavior
return the cv score of the ‘OrdinalRegression’ model.
Desktop (please complete the following information):
Screenshots
More of a feature request - do you think it might be possible to implement any form of statistical inference in abess
? E.g. provide confidence intervals on coefficients via bootstrapping? Or repeatedly bootstrap the dataset & refit an abess
model on each of those bootstrapped datasets, calculate the union of selected variables across all fits on bootstrapped datasets, repeat this until the size of the union of selected variables no longer grows & then refit a single regular GLM using base R's GLM function on this union of selected variables (I don't know if something like this has ever been suggested in the literature - I thought I had seen some suggestion along these lines in one of the Tibshirani articles on selective inference, but I can't seem to find it now).
Describe the bug
The cross validation result is not the same as the result written in R.
The code to reproduce
library(abess)
n <- 100
p <- 200
support.size <- 3
dataset <- generate.data(n, p, support.size, seed = 1)
ss <- 0:10
nfolds <- 5
foldid <- rep(1:nfolds, ceiling(n / nfolds))[1:n]
abess_fit <- abess(dataset[["x"]], dataset[["y"]],
tune.type = "cv", nfolds = nfolds,
foldid = foldid, support.size = ss, num.threads = 1)
cv <- rep(0, length(ss))
for (k in 1:nfolds) {
abess_fit_k <- abess(dataset[["x"]][foldid != k, ],
dataset[["y"]][foldid != k], support.size = ss)
y_hat_k <- predict(abess_fit_k, dataset[["x"]][foldid == k, ],
support.size = ss)
fold_cv <- apply(y_hat_k, 2, function(yh) {
mean((dataset[["y"]][foldid == k] - yh)^2)
})
fold_cv <- round(fold_cv, digits = 2)
print(fold_cv)
cv <- cv + fold_cv
}
cv <- cv / nfolds
names(cv) <- NULL
all.equal(cv, abess_fit$tune.value, digits = 2)
Expected behavior
The output of all.equal(cv, abess_fit$tune.value, digits = 2)
is TRUE. However, the output is "Mean relative difference: 0.0008444762"
.
System info
platform x86_64-apple-darwin17.0
arch x86_64
os darwin17.0
system x86_64, darwin17.0
status
major 4
minor 1.0
year 2021
month 05
day 18
svn rev 80317
language R
version.string R version 4.1.0 (2021-05-18)
nickname Camp Pontanezen
Instead of writing
"The abess software both Python and R's interfaces."
I suggest
"The abess software has both Python and R interfaces."
Thanks for your project.
I run pip install abess in the terminal
Last login: Tue Jan 18 20:32:17 on ttys000
(base) jiaqihu@Mac-M1 ~ % pip install abess
Collecting abess
Using cached abess-0.3.6.tar.gz (1.5 MB)
Requirement already satisfied: numpy in ./opt/anaconda3/lib/python3.9/site-packages (from abess) (1.20.3)
Requirement already satisfied: scipy in ./opt/anaconda3/lib/python3.9/site-packages (from abess) (1.7.1)
Requirement already satisfied: scikit-learn>=0.24 in ./opt/anaconda3/lib/python3.9/site-packages (from abess) (0.24.2)
Requirement already satisfied: joblib>=0.11 in ./opt/anaconda3/lib/python3.9/site-packages (from scikit-learn>=0.24->abess) (1.1.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./opt/anaconda3/lib/python3.9/site-packages (from scikit-learn>=0.24->abess) (2.2.0)
Building wheels for collected packages: abess
Building wheel for abess (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /Users/jiaqihu/opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"'; __file__='"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-wheel-j5ai4o5i
cwd: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/
Complete output (19 lines):
bash: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/copy_src.sh: No such file or directory
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.9
creating build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/metrics.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/linear.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/cabess.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/datasets.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/__init__.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/bess_base.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/pca.py -> build/lib.macosx-10.9-x86_64-3.9/abess
running build_ext
building 'abess._cabess' extension
swigging /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i to /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp
swig -python -c++ -o /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i
error: command 'swig' failed: No such file or directory
----------------------------------------
ERROR: Failed building wheel for abess
Running setup.py clean for abess
Failed to build abess
Installing collected packages: abess
Running setup.py install for abess ... error
ERROR: Command errored out with exit status 1:
command: /Users/jiaqihu/opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"'; __file__='"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-record-6eee5eyp/install-record.txt --single-version-externally-managed --compile --install-headers /Users/jiaqihu/opt/anaconda3/include/python3.9/abess
cwd: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/
Complete output (19 lines):
bash: /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/copy_src.sh: No such file or directory
running install
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.9
creating build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/metrics.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/linear.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/cabess.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/datasets.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/__init__.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/bess_base.py -> build/lib.macosx-10.9-x86_64-3.9/abess
copying /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/abess/pca.py -> build/lib.macosx-10.9-x86_64-3.9/abess
running build_ext
building 'abess._cabess' extension
swigging /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i to /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp
swig -python -c++ -o /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap_wrap.cpp /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/src/pywrap.i
error: command 'swig' failed: No such file or directory
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/jiaqihu/opt/anaconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"'; __file__='"'"'/private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-install-ofm5rwzp/abess_e1c5333de72248a2bdb93137c36fb890/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/pip-record-6eee5eyp/install-record.txt --single-version-externally-managed --compile --install-headers /Users/jiaqihu/opt/anaconda3/include/python3.9/abess Check the logs for full command output.
Matrix 1.4-2 will formally deprecate 187 coercion methods. More precisely, coercions of the form
as(object, Class)
where
'object' inherits from the virtual class Matrix, is a traditional matrix, or is a logical or numeric vector
'Class' specifies a non-virtual subclass of Matrix, such as dgCMatrix, but really any subclass matching the pattern
^[dln]([gts][CRT]|di|ge|tr|sy|tp|sp)Matrix$
will continue to work as before but signal a deprecation message or warning (message in the widely used dg.Matrix and d.CMatrix cases).
To simplify the revision process, the development version of Matrix provides Matrix:::.as.via.virtual(), taking a pair of class names and returning as a call the correct nesting of coercion:
Matrix:::.as.via.virtual("matrix", "dgCMatrix")
Would it be possible to introduce a fit_intercept
to LinearRegression
, similar to the sklearn linear regression implementation?
Make the estimators in abess
(like LinearRegression
and LogisticRegression
) can be used via SelectFromModel
. See details about sklearn.feature_selection.SelectFromModel
in https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFromModel.html#sklearn.feature_selection.SelectFromModel.
I'm trying to use Abess in R, I didn't find any option in argument for avioding intercept. I also try to use fomular: y ~ 0 + x to aviod intercept but fail.
Is there a way to do that?
I tried to run the usage example provided in the article, but it failed,
from abess.linear import abessLogistic
from sklearn.datasets import load_breast_cancer
from sklearn.pipeline import Pipeline
from sklearn.metrics import make_scorer, roc_auc_score
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import GridSearchCV
# combine feature transform and model:
pipe = Pipeline([('poly', PolynomialFeatures(include_bias=False)), ('alogistic', abessLogistic())])
param_grid = {'poly_interaction_only': [True, False],'poly_degree': [1, 2, 3]}
# Use cross validation to tune parameters:
scorer = make_scorer(roc_auc_score, greater_is_better=True)
grid_search = GridSearchCV(pipe, param_grid, scoring=scorer, cv=5)
# load and fitting example dataset:
X, y = load_breast_cancer(return_X_y=True)
grid_search.fit(X, y)
# print the best tuning parameter and associated AUC score:
print([grid_search.best_params_, grid_search.best_score_])
It gives the following errors:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/ipykernel_28894/2314297278.py in <module>
13 # load and fitting example dataset:
14 X, y = load_breast_cancer(return_X_y=True)
---> 15 grid_search.fit(X, y)
16 # print the best tuning parameter and associated AUC score:
17 print([grid_search.best_params_, grid_search.best_score_])
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
1294 def _run_search(self, evaluate_candidates):
1295 """Search all candidates in param_grid"""
-> 1296 evaluate_candidates(ParameterGrid(self.param_grid))
1297
1298
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params, cv, more_results)
793 n_splits, n_candidates, n_candidates * n_splits))
794
--> 795 out = parallel(delayed(_fit_and_score)(clone(base_estimator),
796 X, y,
797 train=train, test=test,
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in __call__(self, iterable)
1041 # remaining jobs.
1042 self._iterating = False
-> 1043 if self.dispatch_one_batch(iterator):
1044 self._iterating = self._original_iterator is not None
1045
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
859 return False
860 else:
--> 861 self._dispatch(tasks)
862 return True
863
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in _dispatch(self, batch)
777 with self._lock:
778 job_idx = len(self._jobs)
--> 779 job = self._backend.apply_async(batch, callback=cb)
780 # A job can complete so quickly than its callback is
781 # called before we get here, causing self._jobs to
~/opt/anaconda3/lib/python3.9/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
~/opt/anaconda3/lib/python3.9/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in __call__(self)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in <listcomp>(.0)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/utils/fixes.py in __call__(self, *args, **kwargs)
220 def __call__(self, *args, **kwargs):
221 with config_context(**self.config):
--> 222 return self.function(*args, **kwargs)
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, split_progress, candidate_progress, error_score)
584 cloned_parameters[k] = clone(v, safe=False)
585
--> 586 estimator = estimator.set_params(**cloned_parameters)
587
588 start_time = time.time()
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/pipeline.py in set_params(self, **kwargs)
148 self
149 """
--> 150 self._set_params('steps', **kwargs)
151 return self
152
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/utils/metaestimators.py in _set_params(self, attr, **params)
52 self._replace_estimator(attr, name, params.pop(name))
53 # 3. Step parameters and other initialisation arguments
---> 54 super().set_params(**params)
55 return self
56
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/base.py in set_params(self, **params)
228 key, delim, sub_key = key.partition('__')
229 if key not in valid_params:
--> 230 raise ValueError('Invalid parameter %s for estimator %s. '
231 'Check the list of available parameters '
232 'with `estimator.get_params().keys()`.' %
ValueError: Invalid parameter poly_degree for estimator Pipeline(steps=[('poly', PolynomialFeatures(include_bias=False)),
('alogistic', abessLogistic())]). Check the list of available parameters with `estimator.get_params().keys()`.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/pj/c4q3qfkx2119wyj2vbfr_yhr0000gn/T/ipykernel_28894/2314297278.py in <module>
13 # load and fitting example dataset:
14 X, y = load_breast_cancer(return_X_y=True)
---> 15 grid_search.fit(X, y)
16 # print the best tuning parameter and associated AUC score:
17 print([grid_search.best_params_, grid_search.best_score_])
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
1294 def _run_search(self, evaluate_candidates):
1295 """Search all candidates in param_grid"""
-> 1296 evaluate_candidates(ParameterGrid(self.param_grid))
1297
1298
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params, cv, more_results)
793 n_splits, n_candidates, n_candidates * n_splits))
794
--> 795 out = parallel(delayed(_fit_and_score)(clone(base_estimator),
796 X, y,
797 train=train, test=test,
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in __call__(self, iterable)
1041 # remaining jobs.
1042 self._iterating = False
-> 1043 if self.dispatch_one_batch(iterator):
1044 self._iterating = self._original_iterator is not None
1045
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
859 return False
860 else:
--> 861 self._dispatch(tasks)
862 return True
863
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in _dispatch(self, batch)
777 with self._lock:
778 job_idx = len(self._jobs)
--> 779 job = self._backend.apply_async(batch, callback=cb)
780 # A job can complete so quickly than its callback is
781 # called before we get here, causing self._jobs to
~/opt/anaconda3/lib/python3.9/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
~/opt/anaconda3/lib/python3.9/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in __call__(self)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py in <listcomp>(.0)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/utils/fixes.py in __call__(self, *args, **kwargs)
220 def __call__(self, *args, **kwargs):
221 with config_context(**self.config):
--> 222 return self.function(*args, **kwargs)
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, split_progress, candidate_progress, error_score)
584 cloned_parameters[k] = clone(v, safe=False)
585
--> 586 estimator = estimator.set_params(**cloned_parameters)
587
588 start_time = time.time()
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/pipeline.py in set_params(self, **kwargs)
148 self
149 """
--> 150 self._set_params('steps', **kwargs)
151 return self
152
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/utils/metaestimators.py in _set_params(self, attr, **params)
52 self._replace_estimator(attr, name, params.pop(name))
53 # 3. Step parameters and other initialisation arguments
---> 54 super().set_params(**params)
55 return self
56
~/opt/anaconda3/lib/python3.9/site-packages/sklearn/base.py in set_params(self, **params)
228 key, delim, sub_key = key.partition('__')
229 if key not in valid_params:
--> 230 raise ValueError('Invalid parameter %s for estimator %s. '
231 'Check the list of available parameters '
232 'with `estimator.get_params().keys()`.' %
ValueError: Invalid parameter poly_degree for estimator Pipeline(steps=[('poly', PolynomialFeatures(include_bias=False)),
('alogistic', abessLogistic())]). Check the list of available parameters with `estimator.get_params().keys()`.
Thanks to the author for helping to respond to my question
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.