lasso-net / lassonet Goto Github PK
View Code? Open in Web Editor NEWFeature selection in neural networks
License: MIT License
Feature selection in neural networks
License: MIT License
I'm training a custom model with the CoxPHLoss
and have noticed that when using the Efron tie method the training will fail when a batch only contains censored events. The code giving the error is in lassonet/utils.py:
if hasattr(torch.Tensor, "scatter_reduce_"):
# version >= 1.12
def scatter_reduce(input, dim, index, reduce, *, output_size=None):
src = input
if output_size is None:
output_size = index.max() + 1
return torch.empty(output_size, device=input.device).scatter_reduce(
dim=dim, index=index, src=src, reduce=reduce, include_self=False
)
else:
scatter_reduce = torch.scatter_reduce
When all samples are censored index
will be an empty tensor and index.max()
fails.
Also, if I understand correctly, the Cox likelihood would be zero in that case so that the log likelihood is not defined.
For now I have resorted to skipping these problematic batches, but I was thinking that it might be helpful to handle this edge case directly in CoxPHLoss
. Not sure what's the best way of doing it though.
Hi, thank you so much for the wonderful project and for providing the basic codes. I am currently testing the LassoNet Classifier algorithm on a dataset that I have. However, I need the best lambda value (best model) and also observe the features that are getting selected for this lambda value. Until now, I tried using the function model.best_lambda_ but that has been unsuccessful. Some help and direction would be appreciated.
Second question, in the Diabetes.py file, I see that the importance of each feature is calculated using model.feature_importances_.numpy(). I am a bit confused by this approach as shouldn't we be using the features from the best model. It might be a misunderstanding on my part but a clarification would be very good.
Looking forward to your help.
Hi, thank you so much for your wonderful code.
But I have some problems in my practice, in debug mode of Pycharm, it indicated "reg: Unable to get repr for <'class 'lassonet.interfaces.LassoNetRegressor'>" as in the below picture
Is my python package version not right?
Thank you very much!
Look forward to your reply.
Great work,
I found your approach very interesting and I was trying to generalize it to different pytorch architectures
I wanted to test your approach with custom models and other pytorch model. The idea is to basically take a pytorch model (arbitrary architecture) and test the ability to predict survival.
for example, I wanted to test with a simple pytorch model.
let' s say:
now, to better explain there is below:
What I am trying to understand is, considering this case:
taking this dataset and starting from your example:
from pathlib import Path
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from lassonet import LassoNetCoxRegressor
from lassonet import plot_path
res_dir = './survival/'
X = np.genfromtxt(res_dir + "hnscc_x.csv", delimiter=",", skip_header=1)
y = np.genfromtxt(res_dir + "hnscc_y.csv", delimiter=",", skip_header=1)
this is a simple version of the approach modelling the survival as a simple binary classification approach:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, random_split, SubsetRandomSampler, ConcatDataset, Dataset
import pandas as pd
import seaborn as sns
# creating a simple MLP
class FCNNC(nn.Module):
def __init__(self, input_size, constraint_size, hidden_size, num_classes):
super(FCNNC, self).__init__()
self.fc1 = nn.Linear(input_size, constraint_size)
self.fc2 = nn.Linear(constraint_size, hidden_size)
self.fc3 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
x = torch.tanh(self.fc1(x))
x = torch.sigmoid(self.fc2(x))
x = self.fc3(x)
return x
# simple class for the dataset
class DataClassifier(Dataset):
def __init__(self, X_train, y_train):
self.X = torch.from_numpy(X_train.astype(np.float32))
self.y = torch.from_numpy(y_train).type(torch.LongTensor)
self.len = self.X.shape[0]
def __getitem__(self, index):
return self.X[index], self.y[index]
def __len__(self):
return self.len
# binary accuracy
def multi_acc(y_pred, y_test):
_, y_pred = torch.max(y_pred, dim = 1)
correct_pred = (y_pred == y_test).float()
acc = correct_pred.sum() / len(correct_pred)
acc = torch.round(acc * 100)
return acc
# transforming in binary classification
batch_size = 2048
X_train, X_test, Y_train, Y_test = train_test_split(X, y[:,1], random_state=0)
traindata = DataClassifier(X_train, Y_train)
trainloader = torch.utils.data.DataLoader(traindata, batch_size=batch_size, shuffle=True)
valdata = DataClassifier(X_test,Y_test)
valloader = torch.utils.data.DataLoader(valdata, batch_size=X_test.shape[0], shuffle=False)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
criterion = nn.CrossEntropyLoss()
model = FCNNC(X.shape[1],20,20,2)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
n_epochs =1000
%matplotlib inline
# simple training loop to store results and plotting
accuracy_stats = {
'train': [],
"val": []
}
loss_stats = {
'train': [],
"val": []
}
for epoch in range(n_epochs):
running_loss = 0.0
train_epoch_acc = 0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs = inputs.to(device)
labels = labels.to(device)
model.to(device)
# set optimizer to zero grad to remove previous epoch gradients
optimizer.zero_grad()
# forward propagation
outputs = model(inputs)
loss = criterion(outputs, labels)
acc = multi_acc(outputs, labels)
# backward propagation
loss.backward()
# optimize
optimizer.step()
running_loss += loss.item()
train_epoch_acc += acc.item()
with torch.no_grad():
val_epoch_loss = 0
val_epoch_acc = 0
model.eval()
for X_val_batch, y_val_batch in valloader:
X_val_batch = X_val_batch.to(device)
y_val_batch = y_val_batch.to(device)
y_val_pred = model(X_val_batch)
val_loss = criterion(y_val_pred, y_val_batch)
val_acc = multi_acc(y_val_pred, y_val_batch)
val_epoch_loss += val_loss.item()
val_epoch_acc += val_acc.item()
loss_stats['train'].append(running_loss/len(trainloader))
loss_stats['val'].append(val_epoch_loss/len(valloader))
accuracy_stats['train'].append(train_epoch_acc/len(trainloader))
accuracy_stats['val'].append(val_epoch_acc/len(valloader))
if epoch % 50 == True:
print(f'Epoch {epoch+0:03}: | Train Loss: {running_loss/len(trainloader):.5f} | Val Loss: {val_epoch_loss/len(valloader):.5f} | Train Acc: {train_epoch_acc/len(trainloader):.3f}| Val Acc: {val_epoch_acc/len(valloader):.3f}')
train_val_acc_df = pd.DataFrame.from_dict(accuracy_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})
train_val_loss_df = pd.DataFrame.from_dict(loss_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})
# Plot the dataframes
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20,7))
sns.lineplot(data=train_val_acc_df, x = "epochs", y="value", hue="variable", ax=axes[0]).set_title('Train-Val Accuracy/Epoch')
sns.lineplot(data=train_val_loss_df, x = "epochs", y="value", hue="variable", ax=axes[1]).set_title('Train-Val Loss/Epoch')
The idea starting from very simple example to transform a model in able to handle censored data
I was highlighting this code from your repository:
import torch
from sortedcontainers import SortedList
def log_substract(x, y):
"""log(exp(x) - exp(y))"""
return x + torch.log1p(-(y - x).exp())
def scatter_logsumexp(input, index, *, dim=-1, output_size=None):
"""Inspired by torch_scatter.logsumexp
Uses torch.scatter_reduce for performance
"""
max_value_per_index = scatter_reduce(
input, dim=dim, index=index, output_size=output_size, reduce="amax"
)
max_per_src_element = max_value_per_index.gather(dim, index)
recentered_scores = input - max_per_src_element
sum_per_index = scatter_reduce(
recentered_scores.exp(),
dim=dim,
index=index,
output_size=output_size,
reduce="sum",
)
return max_value_per_index + sum_per_index.log()
class CoxPHLoss(torch.nn.Module):
"""Loss for CoxPH model. """
allowed = ("breslow", "efron")
def __init__(self, method):
super().__init__()
assert method in self.allowed, f"Method must be one of {self.allowed}"
self.method = method
def forward(self, log_h, y):
log_h = log_h.flatten()
durations, events = y.T
# sort input
durations, idx = durations.sort(descending=True)
log_h = log_h[idx]
events = events[idx]
event_ind = events.nonzero().flatten()
# numerator
log_num = log_h[event_ind].mean()
# logcumsumexp of events
event_lcse = torch.logcumsumexp(log_h, dim=0)[event_ind]
# number of events for each unique risk set
_, tie_inverses, tie_count = torch.unique_consecutive(
durations[event_ind], return_counts=True, return_inverse=True
)
# position of last event (lowest duration) of each unique risk set
tie_pos = tie_count.cumsum(axis=0) - 1
# logcumsumexp by tie for each event
event_tie_lcse = event_lcse[tie_pos][tie_inverses]
if self.method == "breslow":
log_den = event_tie_lcse.mean()
elif self.method == "efron":
# based on https://bydmitry.github.io/efron-tensorflow.html
# logsumexp of ties, duplicated within tie set
tie_lse = scatter_logsumexp(log_h[event_ind], tie_inverses, dim=0)[
tie_inverses
]
# multiply (add in log space) with corrective factor
aux = torch.ones_like(tie_inverses)
aux[tie_pos[:-1] + 1] -= tie_count[:-1]
event_id_in_tie = torch.cumsum(aux, dim=0) - 1
discounted_tie_lse = (
tie_lse
+ torch.log(event_id_in_tie)
- torch.log(tie_count[tie_inverses])
)
# denominator
log_den = log_substract(event_tie_lcse, discounted_tie_lse).mean()
# loss is negative log likelihood
return log_den - log_num
def concordance_index(risk, time, event):
"""
O(n log n) implementation of https://square.github.io/pysurvival/metrics/c_index.html
"""
assert len(risk) == len(time) == len(event)
n = len(risk)
order = sorted(range(n), key=time.__getitem__)
past = SortedList()
num = 0
den = 0
for i in order:
num += len(past) - past.bisect_right(risk[i])
den += len(past)
if event[i]:
past.add(risk[i])
return num / den
Thank you very much
Salvatore
Hi,
I would like to evaluate Cox LassoNet on my data for predicting end-point survival.
Is there a way to compute event probability for a given time (or for a set of given times) from a fitted LassoNetCoxRegressorCV?
It seems that model.predict(X_test) returns predictors in CoxPH assumption, so c-index can be computed, but I could not find in examples how to compute survival/event probability
Thank you!
I am looking into the mean cross-validation error of the best model selected in LassoNet with (M set to 0.0) and hidden_dims=(1,) and cv.glmnet() [documentation: https://www.rdocumentation.org/packages/glmnet/versions/1.6/topics/cv.glmnet ] with the same set of lambda values for a classification problem. However, they do not yield similar results. The parameters used are exactly as follows:
lambdas = (0,0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000)
LassoNet=
LassoNetClassifierCV(hidden_dims=(1,),M=0.0,random_state=0,lambda_seq=lambdas,torch_seed=0,cv=LeaveOneOut())
Glmnet=
cv.glmnet(X, Y, family = "binomial", alpha = 1, lambda =lambdas, type.measure = "class", nfolds = 34) [there are 34 instances in the dataset, so nfolds=34 is same as LeaveOneOut()]
If you could explain why these two are behaving differently, it would be really helpful. Also, do you consider the matrix multiplication of skip.weight and layer.weight of the output layer equivalent to feature coefficients in the logistic regression with lasso penalty?
my feature number is 30000, it get an error :
Loss is 511581280.0
Did you normalize input?
Choosing lambda with cross-validation: 0%| | 0/5 [01:12<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 3, in
path = model.fit( x, y)
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 744, in fit
self.path(X, y, return_state_dicts=False)
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 679, in path
path = super().path(
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 472, in path
last = self._train(
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 331, in _train
optimizer.step(closure)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/sgd.py", line 66, in step
loss = closure()
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 326, in closure
assert False
AssertionError
however,when the feature number is 1000, it would not get this error
I think there is a bug in the following lines:
lassonet/lassonet/interfaces.py
Lines 132 to 134 in 3b3b529
device is set to CUDA is torch reports that cuda is available, but its done after self.device is already set.
Hi,
This package is super helpful :-)!
When applying LassonetRegressor
to my data I get a constant model with model.predict(X)
(for test or validation set), i.e. a vector of predictions where all entries are equal. But the feature importances still make sense. The same observation I made in the diabetes.py
example.
Do you have any idea what this is?
Thanks a lot,
Mara
Line 28 in d33bab1
This line here:
lassonet/lassonet/interfaces.py
Line 450 in adf0aaf
True
if lambda_seq
is passed in without setting lambda_start
to None. It expects self.lambda_start_
to exist, but that only gets set if lambda_seq is None
.
Note setting lambda_start
to None in init
wouldn't work since lambda_seq
can also be passed to path.
commit 37d682e removed the lassonet_trainer.py
file which still is used in experiments/evaluate.py
. On the first glance it looks like this function is no replaced by lassonet_utils.lassonet
, so it would make sense to change that.
I tried to check the page: https://lassonet.ml/ but failed.
Could you check and update the link? Thanks!
Hi,
I would like to use lassonet as an unsupervised feature selection algorithm, but I can't find an example that shows how to do this in a simple way.
The only script that shows an example rebuild is the minst_ae.py
but it doesn't work ( I have an error : LassoNetAutoEncoder
does not exist ! .
My use case:
I have an input matrix without labels, and I want to have a new reduced matrix with only 30% of the important features.
I noticed that the lassonet will use half of threads to run. I wonder to know how to use more threads when running lassonet?
I’m currently testing LassoNet as a potential model in my research and would like to use a custom model. Is there a recommended way to do so? I was scanning the examples and didn’t quite see one.
I am performing a [0,1] classification task on high dimensional data, and would like to use a custom data generator / specific activation functions / add layers.
Hi,
I would like to use lassonet as an unsupervised feature selection algorithm, and I have made some attempts. However, it seems that I didn't get the correct outcome.
My use case:
I have an input matrix without labels, and I want to have a new reduced matrix with only several important features. I also want to know which features have been selected.
Dear Devs,
My name is Dufot Nicolas, working on picture classification using neural networks (with PyTorch).
I found your publication "LassoNet: A Neural Network with Feature Sparsity" very interesting and your function LassoNetClassifier is very useful to prioritize pixels and identify informative sub parts of the image for complex pictures classification.
I took mnist_classif.py script in the example folder in the aim to adapt the LassoNetClassifier to my pictures datas.
I have understand the numpy array X_train, X_test, y_train, y_test input with X for the pixels datas and y for the classification labels.
In the mnist_classif.py, datas in X_train and X_test are mono channel pixels (black and white MNIST dataset).
The data looks like this: [ [pixels datas for picture 1] [pixels datas for picture 2] ... [pixel datas for picture n] ]
I.E: a list of pictures presented as a list of pixels values.
This is working for mono channel pixels, the question is: how to insert into this my 3 channels colored pictures ?
My datas look like: [ [[ pixels datas for channel 1 picture 1 ] [ pixels datas for channel 2 picture 1 ] [ pixels datas for channel 3 picture 1 ]] ... [[ pixels datas for channel 1 picture n ] [ pixels datas for channel 2 picture n ] [ pixels datas for channel 3 picture n ]]]
IE: A list of pictures presented as 3 sub lists detailing 3 values per pixels, one value per channels. This is the standard datas presentation using PyTorch
How to deal with it ?
Second question, linked to the first: how to specify network parameters using LassoNetClassifier ? how to make it work with my dataset ?
Actually, when I try to use LassoNetClassifier with my arrays, the error "RuntimeError: mat1 and mat2 shapes cannot be multiplied (2436x28 and 3x2)" occurs. I have seen this error many times with PyTorch, this is due to bad shape parameters for the different neuronal layers, which also depend on the image input size and channels numbers (you see the link with first question)
So, configuration of the neuronal network needs to be adapted to my dataset (higher image size than in MNIST dataset).
Have a nice day,
Cordially.
I just pulled the latest version, and am trying out training with backtrack on. I am getting an error:
Initialized dense model in 28 epochs, val loss 9.84e-02, regularization 1.39e+01
Traceback (most recent call last):
File "/home/psmirnov/Code/Github/lassonet_exp/lassonet/examples/ctrpv2_lassonet_path.py", line 103, in <module>
path = model.path(inner_train_X, inner_train_y.reshape(-1), X_val=valid_X, y_val=valid_y.reshape(-1))
File "/home/psmirnov/Code/Github/lassonet_exp/lassonet/lassonet/interfaces.py", line 369, in path
self._train(
File "/home/psmirnov/Code/Github/lassonet_exp/lassonet/lassonet/interfaces.py", line 270, in _train
loss = real_loss
UnboundLocalError: local variable 'real_loss' referenced before assignment
I think in your code, it corresponds to line 260 (I added tracking of some metrics other than loss on the validation set).
I suspect this is happening when val_obj < real_best_val_obj
condition on line 249 is not met prior to early stopping breaking out of the loop. I think real_loss would be unassigned then.
The miceprotein.py example runs, printing progress on lamda and feature selection, but then gives the following error. Do you have a sense for what the problem might be? Thanks.
AttributeError: 'numpy.ndarray' object has no attribute 'log_softmax'
Detailed traceback:
File "<string>", line 1, in <module>
File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/lassonet/utils.py", line 26, in plot_path
score.append(model.criterion(model.predict(X_test), y_test))
File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/loss.py", line 1047, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/torch/nn/functional.py", line 2693, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/torch/nn/functional.py", line 1672, in log_softmax
ret = input.log_softmax(dim)
Hello! I wanna to ask whether "LASSONET" have a "EarlyStopped" function, when the loss function have not been decreased?
When I run the code, some errors occur.
boston_housing.py Line 39 It should be n_selected.append(save.selected.sum().cpu().numpy())
utils.py Line 38 n_selected.append(save.selected.sum().cpu().numpy())
The implementation was started on https://github.com/lasso-net/lassonet/tree/online and https://github.com/lasso-net/lassonline
Next step would be to provide plotting with mpld3
Hi,
I tried to run the code in 'Usage'. However, I encounter this error:
from lassonet import LassoNetClassifierCV
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.7/dist-packages/lassonet/init.py", line 4, in
from .interfaces import (
File "", line 1
(current_lambda=)
^
SyntaxError: invalid syntax
Could you please tell me why it's like this?
Hi,
thank you for this model and especially for the extension to the Cox/survival outcomes.
I am trying to test LassoNetCoxRegressor() for the example with the Hnscc data in python 3.9 (and for my own datasets),
I appreciate if you could help with the questions/issues:
X = pd.read_csv("x.csv")
y = pd.read_csv("y.csv")
model = LassoNetCoxRegressor(
hidden_dims=(32,), lambda_start=1e-2, path_multiplier=1.02,
gamma=1, verbose=True, tie_approximation="breslow")
X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = np.array(y_train)
y_test = np.array(y_test)
path = model.path(X_train, y_train)
plot_path(model, path, np.array(X_test), np.array(y_test))
model.score(X_test,y_test) #0.0
I have problem when i import lassonet in python3.7
File "", line 1
(current lambda=)
^ SyntaxError: invalid syntax
When I delete the “=” in
lassonet/lassonet/interfaces.py
Line 491 in d740ff3
Hey, I was wondering how did you report the scores in table 1 of the paper, was it the accuracy over the test set for one trial or an average over several trials? Thanks in advance.
Hi, is it possible to run GPU lassonet with CUDA 11.2? This is not clear in the documentation. In installation I can see that the packages are trying to install CUDA 12.3, but in my environment I have previously installed CUDA 11.2 via Conda. Does this mean that installing lassonet overwrites system’s CUDA version to 12.3? My driver cannot support 12.3 yet…
Dear all, we are working on physics guided AI where we wish to use several sensors to decode a full aerodynamics field of interest (https://royalsocietypublishing.org/doi/10.1098/rspa.2020.0097). Our decoder works well. However, we wish to optimise the sensor placement that will have the best results for decoding the aerodynamic field.
This leaves us with let's say 5- 30 or 100 input to the decoder and several thousand or hundreds of thousands of outputs. How can we apply LassoNet to our problem when we want to optimise the overall (several thousand) output decoded field quality?
i.e. study which input sensor features are most important for the 'genera' field reconstruction (minimum error at all outputs together). Multiple inputs - multiple output SHAP.
If you have any ideas or have heard of such an application of Lassonet please can you let us know!?
image
Iordan Doytchinov, Ph.D.
Postdoctoral researcher and scientific collaborator
Ecole Polytechnique Fédérale de Lausanne (EPFL)
EPFL – TOPO
Station 18 – Bâtiment GC C2 398
CH–1015 Lausanne
Office telephone: +41216939832
Personal mobile: +33699850592
Hello! I have a question. I want to know whether "LassoNet" have function which can get something like importance values or SHAP values?
Can lassonet be extended to multitask learning neural networks? If so, how do I go about implementing it? TIA
Hello, thank you for the work you have done. In my attempt to replicate the experiments reported in the LassoNet paper, I found that the results I am getting are totally different. The performance at 50 selected features is significantly lower than the one reported in the paper (<60% vs 88% for the ISOLET dataset). I have repeated the experiment on 20 or 30 runs and tried all the possible hidden_dim. I was wondering if there I am missing something or if there is a default parameter which significantly affects the performance and needs to be changed.
I will detail the steps I have done:
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from lassonet import LassoNetClassifier
from lassonet.interfaces import LassoNetClassifierCV
from lassonet.plot import plot_path
from lassonet.utils import eval_on_path
from data_utils import load_mice, load_coil, load_activity, load_isolet
import torch
import pickle
(X_train, y_train), (X_test, y_test) = load_isolet()
X_train_valid_fixed = X_train
y_train_valid_fixed = y_train
seed = None
device = 'cuda'
data_dim = X_train.shape[1]
hidden_dim = (data_dim//3,)
score_list_of_lists = []
n_selected_list_of_lists = []
lambda_list_of_lists = []
for i in range(30):
X_train, X_val, y_train, y_val = train_test_split(X_train_valid_fixed, y_train_valid_fixed, test_size=0.125, random_state=seed)
model = LassoNetClassifier(M=10, hidden_dims=hidden_dim, verbose=1, torch_seed=seed, random_state=seed, device=device)
path = model.path(X_train, y_train, X_val=X_val, y_val=y_val)
score = eval_on_path(model, path, X_test, y_test, score_function=None)
n_selected = [save.selected.sum().item() for save in path]
lambda_ = [save.lambda_ for save in path]
score_list_of_lists.append(score)
n_selected_list_of_lists.append(n_selected)
lambda_list_of_lists.append(lambda_)
And the following to plot
plt.figure(figsize=(30, 10))
for sublist_A, sublist_B in zip(score_list_of_lists, n_selected_list_of_lists):
plt.plot(sublist_B, sublist_A)
plt.xlabel('Features Selected')
plt.ylabel('Accuracy')
plt.title('Accuracy vs Features Selected for hidden_dim=(data_dim//3,) on ISOLET dataset --- GPU version for 30 runs')
plt.savefig('isolet_1.png')
Surprisingly, I got the following plots:
To rule out possible GPU issues, I ran the first experiment on the CPU for fewer runs (as it was taking longer)
Similarly, I repeated the first experiment on COIL dataset
I was surprised of the plots given the steps I have followed. However, I realized that similar plots were reported by a paper that studies LassoNet (especially for the case of 50 features).
I suspect the behavior should be consistent, and I'm still wondering what I might have missed. Could you kindly provide insight or assistance to help resolve these discrepancies? Thank you in advance!
The LassoNet paper mentiones that LassoNet generalises a method, though it's unclear how/when this is the case.
In Section 1.2 Related work, the paper says "Recently, Feng and Simon (2017) proposed an input-sparse neural network, where the input weights are penalized using the group Lasso penalty. As will become evident in Section 3, our proposed method extends and generalizes this approach in a natural way."
The Feng and Simon (2017) add a sparse group Lasso on the first layer (see figure below), which is a convex combination of a Lasso and a group Lasso.
How/When does LassoNet generalize the method of Feng and Simon (2017)? Looking in Section 3, I see that LassoNet is equivalent to a standard Lasso (when M=0) and an unregularized feed-forward neural network (when M → +∞); though the connection to the method of Feng and Simon (2017) isn't mentioned.
Hello,
I noticed that you provide code to load the datasets used in table 1 but do not provide the code to replicate the experiments, mentioning that it'll be possible if there is user demand. So I was wondering if you are planning on releasing the relevant code anytime soon, as I am working on a relevant project and it'll be much easier to have (at least) a clear pipeline to follow in order to replicate the experiments.
Your assistance is highly appreciated!
Firstly, I would like to congratulate for the amazing solution you have developed. I am using it in a classification problem (ANN). I just can't figure out what are the features in the number of selected features (x-axis). Please, can you help me? Thank you very much!
As per the documentation, LassoNet is supposed to behave as a Linear Regressor when the hyperparameter M is set to 0. I'm comparing this configuration with that of another model which can act as a Linear Regressor, i.e. Glmnet (https://www.rdocumentation.org/packages/glmnet/versions/1.6/topics/cv.glmnet), along with a Lasso penalty. This is to check if they yield the same/ similar optimal lambda value, cross-validation error and feature coefficients.
As per my understanding, the two only differ in the objective function. Glmnet uses the Gaussian equation as the objective function while operating as a Linear Regressor and it differs from that of LassoNet by a constant multiple of 0.5. Hence, the optimal lambda value in Glmnet should be half of that in LassoNet. However, after repeated attempts I've found that to not be the case. The minimum cross-validation error and coefficients also differ between the two models.
To keep the comparison fair, I used the same standardized dataset, the same list of lambda values (Lambda_.txt) that the LassoNet model takes up automatically, along with the same 5 fold cross validation. I've given a code snippet below for better understanding:
lambdas = [Lambda_.txt]
LassoNetRegressorCV(hidden_dims=(2,), M=0.0, random_state=42, torch_seed=0, cv=5)
cv.glmnet(X, y, nfolds=5, alpha = 1, lambda= lambdas, intercept=False)
It'd be really helpful for me if you could help explain this difference as I intend to use LassoNet in further research endeavors. Please let me know if you need any further clarification.
Thanks.
I'm trying to pull the model weights selected in the final optimized regularization path for LassoNet. Is there anyway to show the weights for each of the features used in that path? I know you can print out the most complex path using path[-1] but how do I access the weights there? I've tried using path[-1].state_dict() but that didn't work. Is there a special way to call the weights in the path? I know you can use model.feature_importance_ to show the feature importance but that doesn't serve as the weight in the model. Correct? The model.coef_ attribute doesn't work just to print out the weight. Could you please provide some guidance on how to obtain the weights or coefficients of the LassoNet model? Thanks for your help.
In the example mnist_ae.py the module LassoNetAutoEncoder is imported from lassonet, but this is not in the documentation?
I am using Lassonet for my thesis, but I want to do so with a quantile loss function instead of Mean Squared Error. After going through the code and found a variable self.criterion set to MSE loss by default (interfaces.py, class LassoNetRegressor, line 566). After instantiating the class, I manually changed it self.criterion = quantile loss and trained the lassonet.
However, the loss didn't converge and remains high even after several epochs and the train assertion becomes false and it exits with error line 316, interfaces.py.
Can someone suggest a solution?
Problem: Calling model.score()
function repeatedly gives difference results. I suspect that's because LassoNet doesn't call model.eval()
to stop the stochastic components (e.g., dropout).
Solution: Call model.eval()
inside the .score() function See: https://stackoverflow.com/questions/60018578/what-does-model-eval-do-in-pytorch
Hi,
I wonder why does the l1_regularization_skip() in model.py use L2-norm, then isn't it the same as l2_regularization_skip()? Thanks.
Here is the link to the function:
Lines 62 to 63 in e3a3754
Hi, we are very thankful for sharing this code, and we would like to apply it to a group lasso problem. We have a classification problem where the inputs are grouped in blocks all of the same sizes.
We want to adapt your code.
In our case, each input node j has k features that we would like to have the same theta_j.
One form of writing our problem is modifying the constraints In equation (2) of the Lassonet paper, just adding another index for W and leaving the rest as it is.
Do you think it's possible to make minor changes to your code to do that? Could you help us with that?
@xyang23 Could you add some README in HNSCC_data to explain how that data is made?
In the classifier, is it possible for y to be a vector instead of a single number?
i hope to use lassonet in S4, is it okay to do that?
Hi, thanks for sharing the code. I have some issues with the implementation of group lasso, here are my problems.
About Algorithm 4 of the paper: In line 14, the notation indicates the vector theta_j \in R^k, having the same dimension with W_j^{(1)}. That makes me a little confused. K is the size of the first hidden layer, and if it is a multi-class classification problem, theta should be \in R^{d*c}, where c is the number of classes (since theta is a linear classifier), and d is the input feature dimension. So theta_j should be in R^{c} in my opinion. Do I understand it correctly?
About Section 6 of the paper: In the group lasso problem, how does the group L1 norm regularizer construct? Assuming it is a multi-class classification problem, and theta is in R^{d*c}, where d is the number of features, and c is the number of classes. So if we want to choose a sparse subset of features for the linear classifier, the regularization term should be |theta|_ {1} = \ sum_{i=1}^d |\sum_{j=1}^c theta_{i,j}^2|, am I right?
How should I use the API provided in this repo? For example, in function prox
(in lassonet/prox.py), I found the theta
(variable v
in the code) is calculated by:
norm_v = torch.norm(v, p=2, dim=0)
It seems that this has the same formulation as the pseudocode in Line6, Algorithm 4 of the paper. Does this mean that the function prox
can solve the feature subset selection problem I described above?
What does the function inplace_group_prox
(in lassonet/prox.py) used for? I notice it passes each group of parameters to the prox
function. That makes me confused because I think prox
is used to give the features sparse weights. And we hope the group lasso can make a group of features share similar weights (for example, features in group 1 all have large weights, and features in group2 all have small weights, etc.). However, if we pass a group of features into the prox
, I would expect this function to return sparse weights, which means the features in this single group have sparse weights (some weights are big and some weights are small), and not the features in this group share similar weights (all large or all small). Do I understand this function correctly?
There is no LassoNetAutoEncoder
in lassonet
folder so [examples](https://github.com/lasso-net/lassonet/tree/master/examples)/mnist_ae.py
cannot run correctly.
Thanks for your help in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.