lasso-net / lassonet Goto Github PK

View Code? Open in Web Editor NEW

199.0 199.0 51.0 1.92 MB

Feature selection in neural networks

License: MIT License

Python 97.84% Makefile 1.52% TeX 0.64%

lassonet's People

Contributors

Stargazers

Watchers

lassonet's Issues

CoxPHLoss does not handle batches where all samples are censored

I'm training a custom model with the CoxPHLoss and have noticed that when using the Efron tie method the training will fail when a batch only contains censored events. The code giving the error is in lassonet/utils.py:

if hasattr(torch.Tensor, "scatter_reduce_"):
    # version >= 1.12
    def scatter_reduce(input, dim, index, reduce, *, output_size=None):
        src = input
        if output_size is None:
            output_size = index.max() + 1
        return torch.empty(output_size, device=input.device).scatter_reduce(
            dim=dim, index=index, src=src, reduce=reduce, include_self=False
        )

else:
    scatter_reduce = torch.scatter_reduce

When all samples are censored index will be an empty tensor and index.max() fails.
Also, if I understand correctly, the Cox likelihood would be zero in that case so that the log likelihood is not defined.
For now I have resorted to skipping these problematic batches, but I was thinking that it might be helpful to handle this edge case directly in CoxPHLoss. Not sure what's the best way of doing it though.

Need help regarding extracting lambda values and feature_importances_

Hi, thank you so much for the wonderful project and for providing the basic codes. I am currently testing the LassoNet Classifier algorithm on a dataset that I have. However, I need the best lambda value (best model) and also observe the features that are getting selected for this lambda value. Until now, I tried using the function model.best_lambda_ but that has been unsuccessful. Some help and direction would be appreciated.
Second question, in the Diabetes.py file, I see that the importance of each feature is calculated using model.feature_importances_.numpy(). I am a bit confused by this approach as shouldn't we be using the features from the best model. It might be a misunderstanding on my part but a clarification would be very good.
Looking forward to your help.

Some errors

Hi! There are some errors with 0.0.14 and Python 3.11. Could you tell me how to resolve it

unable to get repr for lassonet class

Hi, thank you so much for your wonderful code.
But I have some problems in my practice, in debug mode of Pycharm, it indicated "reg: Unable to get repr for <'class 'lassonet.interfaces.LassoNetRegressor'>" as in the below picture
Is my python package version not right?
Thank you very much!
Look forward to your reply.

Using the cox loss and methods with cutom model

Great work,

I found your approach very interesting and I was trying to generalize it to different pytorch architectures

I wanted to test your approach with custom models and other pytorch model. The idea is to basically take a pytorch model (arbitrary architecture) and test the ability to predict survival.

for example, I wanted to test with a simple pytorch model.

let' s say:

considering a simple pytorch loop with a generic pytorch model
the idea is transforming in a model predicting survival

now, to better explain there is below:

a simple code used for transforming the task in binary classification (I used the dataset you provided, just to create a code that works)
the function from repository that I may think can be considered useful for the task

What I am trying to understand is, considering this case:

how to modify a simple architecture for the survival prediction (I am not exactly sure how the last layer should be)
how to incorporate in the training loop the loss. More broadly, how to modify the loop to use a pytorch model (which can different layers, different architectures and so on) for the survival task, using the loss you provide

taking this dataset and starting from your example:

from pathlib import Path
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from lassonet import LassoNetCoxRegressor
from lassonet import  plot_path
res_dir = './survival/'
X = np.genfromtxt(res_dir + "hnscc_x.csv", delimiter=",", skip_header=1)
y = np.genfromtxt(res_dir +  "hnscc_y.csv", delimiter=",", skip_header=1)

this is a simple version of the approach modelling the survival as a simple binary classification approach:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, random_split, SubsetRandomSampler, ConcatDataset, Dataset
import pandas as pd
import seaborn as sns
# creating a simple MLP
class FCNNC(nn.Module):
    def __init__(self, input_size, constraint_size, hidden_size, num_classes):
        super(FCNNC, self).__init__()
        self.fc1 = nn.Linear(input_size, constraint_size) 
        self.fc2 = nn.Linear(constraint_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = torch.tanh(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        x = self.fc3(x)
        return x

# simple class for the dataset
class DataClassifier(Dataset):  
    def __init__(self, X_train, y_train):
        self.X = torch.from_numpy(X_train.astype(np.float32))
        self.y = torch.from_numpy(y_train).type(torch.LongTensor)
        self.len = self.X.shape[0]

    def __getitem__(self, index):
        return self.X[index], self.y[index]  
    
    def __len__(self):
        return self.len  
# binary accuracy
def multi_acc(y_pred, y_test):
    _, y_pred = torch.max(y_pred, dim = 1)    
    
    correct_pred = (y_pred == y_test).float()
    acc = correct_pred.sum() / len(correct_pred)
    
    acc = torch.round(acc * 100)
    
    return acc


# transforming in binary classification
batch_size = 2048
X_train, X_test, Y_train, Y_test = train_test_split(X, y[:,1], random_state=0)
traindata = DataClassifier(X_train, Y_train)
trainloader = torch.utils.data.DataLoader(traindata, batch_size=batch_size, shuffle=True)

valdata = DataClassifier(X_test,Y_test)
valloader = torch.utils.data.DataLoader(valdata, batch_size=X_test.shape[0], shuffle=False)


device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
criterion = nn.CrossEntropyLoss()
model = FCNNC(X.shape[1],20,20,2)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)


n_epochs =1000
%matplotlib inline

# simple training loop to store results and plotting
accuracy_stats = {
        'train': [],
        "val": []
    }
loss_stats = {
        'train': [],
        "val": []
    }
for epoch in range(n_epochs):
    running_loss = 0.0
    train_epoch_acc = 0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs = inputs.to(device)
        labels = labels.to(device)
        model.to(device)
            # set optimizer to zero grad to remove previous epoch gradients
        optimizer.zero_grad()
            
            # forward propagation
        outputs = model(inputs)
            
        loss = criterion(outputs, labels)
        acc = multi_acc(outputs, labels)
            
            # backward propagation
        loss.backward()
            # optimize
        optimizer.step()
            
        running_loss += loss.item()
        train_epoch_acc += acc.item()
    
    with torch.no_grad():
        val_epoch_loss = 0
        val_epoch_acc = 0
        model.eval()
        for X_val_batch, y_val_batch in valloader:
            X_val_batch = X_val_batch.to(device)
            y_val_batch = y_val_batch.to(device)

            y_val_pred = model(X_val_batch)

            val_loss = criterion(y_val_pred, y_val_batch)
            val_acc = multi_acc(y_val_pred, y_val_batch)

            val_epoch_loss += val_loss.item()
            val_epoch_acc += val_acc.item()
    
        loss_stats['train'].append(running_loss/len(trainloader))
        loss_stats['val'].append(val_epoch_loss/len(valloader))
        accuracy_stats['train'].append(train_epoch_acc/len(trainloader))
        accuracy_stats['val'].append(val_epoch_acc/len(valloader))
                              
    if epoch % 50 == True:
        print(f'Epoch {epoch+0:03}: | Train Loss: {running_loss/len(trainloader):.5f} | Val Loss: {val_epoch_loss/len(valloader):.5f} | Train Acc: {train_epoch_acc/len(trainloader):.3f}| Val Acc: {val_epoch_acc/len(valloader):.3f}')
    
train_val_acc_df = pd.DataFrame.from_dict(accuracy_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})
train_val_loss_df = pd.DataFrame.from_dict(loss_stats).reset_index().melt(id_vars=['index']).rename(columns={"index":"epochs"})
    # Plot the dataframes
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20,7))
sns.lineplot(data=train_val_acc_df, x = "epochs", y="value", hue="variable",  ax=axes[0]).set_title('Train-Val Accuracy/Epoch')
sns.lineplot(data=train_val_loss_df, x = "epochs", y="value", hue="variable", ax=axes[1]).set_title('Train-Val Loss/Epoch')

The idea starting from very simple example to transform a model in able to handle censored data

I was highlighting this code from your repository:

import torch
from sortedcontainers import SortedList


def log_substract(x, y):
    """log(exp(x) - exp(y))"""
    return x + torch.log1p(-(y - x).exp())


def scatter_logsumexp(input, index, *, dim=-1, output_size=None):
    """Inspired by torch_scatter.logsumexp
    Uses torch.scatter_reduce for performance
    """
    max_value_per_index = scatter_reduce(
        input, dim=dim, index=index, output_size=output_size, reduce="amax"
    )
    max_per_src_element = max_value_per_index.gather(dim, index)
    recentered_scores = input - max_per_src_element
    sum_per_index = scatter_reduce(
        recentered_scores.exp(),
        dim=dim,
        index=index,
        output_size=output_size,
        reduce="sum",
    )
    return max_value_per_index + sum_per_index.log()

class CoxPHLoss(torch.nn.Module):
    """Loss for CoxPH model. """

    allowed = ("breslow", "efron")

    def __init__(self, method):
        super().__init__()
        assert method in self.allowed, f"Method must be one of {self.allowed}"
        self.method = method

    def forward(self, log_h, y):
        log_h = log_h.flatten()

        durations, events = y.T

        # sort input
        durations, idx = durations.sort(descending=True)
        log_h = log_h[idx]
        events = events[idx]

        event_ind = events.nonzero().flatten()

        # numerator
        log_num = log_h[event_ind].mean()

        # logcumsumexp of events
        event_lcse = torch.logcumsumexp(log_h, dim=0)[event_ind]

        # number of events for each unique risk set
        _, tie_inverses, tie_count = torch.unique_consecutive(
            durations[event_ind], return_counts=True, return_inverse=True
        )

        # position of last event (lowest duration) of each unique risk set
        tie_pos = tie_count.cumsum(axis=0) - 1

        # logcumsumexp by tie for each event
        event_tie_lcse = event_lcse[tie_pos][tie_inverses]

        if self.method == "breslow":
            log_den = event_tie_lcse.mean()

        elif self.method == "efron":
            # based on https://bydmitry.github.io/efron-tensorflow.html

            # logsumexp of ties, duplicated within tie set
            tie_lse = scatter_logsumexp(log_h[event_ind], tie_inverses, dim=0)[
                tie_inverses
            ]
            # multiply (add in log space) with corrective factor
            aux = torch.ones_like(tie_inverses)
            aux[tie_pos[:-1] + 1] -= tie_count[:-1]
            event_id_in_tie = torch.cumsum(aux, dim=0) - 1
            discounted_tie_lse = (
                tie_lse
                + torch.log(event_id_in_tie)
                - torch.log(tie_count[tie_inverses])
            )

            # denominator
            log_den = log_substract(event_tie_lcse, discounted_tie_lse).mean()

        # loss is negative log likelihood
        return log_den - log_num


def concordance_index(risk, time, event):
    """
    O(n log n) implementation of https://square.github.io/pysurvival/metrics/c_index.html
    """
    assert len(risk) == len(time) == len(event)
    n = len(risk)
    order = sorted(range(n), key=time.__getitem__)
    past = SortedList()
    num = 0
    den = 0
    for i in order:
        num += len(past) - past.bisect_right(risk[i])
        den += len(past)
        if event[i]:
            past.add(risk[i])
    return num / den

Thank you very much

Salvatore

Prediction of survival probability at a specific time with LassoNet

Hi,
I would like to evaluate Cox LassoNet on my data for predicting end-point survival.

Is there a way to compute event probability for a given time (or for a set of given times) from a fitted LassoNetCoxRegressorCV?

It seems that model.predict(X_test) returns predictors in CoxPH assumption, so c-index can be computed, but I could not find in examples how to compute survival/event probability
Thank you!

Comparison with cv.glmnet(alpha=1)

I am looking into the mean cross-validation error of the best model selected in LassoNet with (M set to 0.0) and hidden_dims=(1,) and cv.glmnet() [documentation: https://www.rdocumentation.org/packages/glmnet/versions/1.6/topics/cv.glmnet ] with the same set of lambda values for a classification problem. However, they do not yield similar results. The parameters used are exactly as follows:

lambdas = (0,0.00001,0.0001,0.001,0.01,0.1,1,10,100,1000,10000)

LassoNet=

LassoNetClassifierCV(hidden_dims=(1,),M=0.0,random_state=0,lambda_seq=lambdas,torch_seed=0,cv=LeaveOneOut())

Glmnet=

cv.glmnet(X, Y, family = "binomial", alpha = 1, lambda =lambdas, type.measure = "class", nfolds = 34) [there are 34 instances in the dataset, so nfolds=34 is same as LeaveOneOut()]

If you could explain why these two are behaving differently, it would be really helpful. Also, do you consider the matrix multiplication of skip.weight and layer.weight of the output layer equivalent to feature coefficients in the logistic regression with lasso penalty?

Large dataset error

my feature number is 30000, it get an error :
Loss is 511581280.0
Did you normalize input?
Choosing lambda with cross-validation: 0%| | 0/5 [01:12<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 3, in
path = model.fit( x, y)
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 744, in fit
self.path(X, y, return_state_dicts=False)
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 679, in path
path = super().path(
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 472, in path
last = self._train(
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 331, in _train
optimizer.step(closure)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/sgd.py", line 66, in step
loss = closure()
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 326, in closure
assert False
AssertionError

however,when the feature number is 1000, it would not get this error

Check for presence of cuda is not being saved to self

I think there is a bug in the following lines:

lassonet/lassonet/interfaces.py

Lines 132 to 134 in 3b3b529

 self.device = device 

 if device is None: 

 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

device is set to CUDA is torch reports that cuda is available, but its done after self.device is already set.

model.predict() gives constant values

Hi,

This package is super helpful :-)!

When applying LassonetRegressor to my data I get a constant model with model.predict(X) (for test or validation set), i.e. a vector of predictions where all entries are equal. But the feature importances still make sense. The same observation I made in the diabetes.py example.

Do you have any idea what this is?

Thanks a lot,
Mara

need to add device from input

lassonet/lassonet/utils.py

Line 28 in d33bab1

return torch.empty(output_size).scatter_reduce(

suggested fix:
return torch.empty(output_size, device=input.device).scatter_reduce(
dim=dim, index=index, src=src, reduce=reduce, include_self=False
)

Passing in lambda_seq does not change default value of lambda_start, causing bug in path()

This line here:

lassonet/lassonet/interfaces.py

Line 450 in adf0aaf

if self.lambda_start == "auto":

will evaluate to True if lambda_seq is passed in without setting lambda_start to None. It expects self.lambda_start_ to exist, but that only gets set if lambda_seq is None.

Note setting lambda_start to None in init wouldn't work since lambda_seq can also be passed to path.

lassonet_trainer no longer exists.

commit 37d682e removed the lassonet_trainer.py file which still is used in experiments/evaluate.py. On the first glance it looks like this function is no replaced by lassonet_utils.lassonet, so it would make sense to change that.

site cannot be reached

I tried to check the page: https://lassonet.ml/ but failed.

Could you check and update the link? Thanks!

lassonet as unsupervised feature selection algorithm (LassoNetAutoEncoder does not exist !)

Hi,
I would like to use lassonet as an unsupervised feature selection algorithm, but I can't find an example that shows how to do this in a simple way.
The only script that shows an example rebuild is the minst_ae.py but it doesn't work ( I have an error : LassoNetAutoEncoder does not exist ! .

My use case:
I have an input matrix without labels, and I want to have a new reduced matrix with only 30% of the important features.

how to set threads when use laaonet?

I noticed that the lassonet will use half of threads to run. I wonder to know how to use more threads when running lassonet?

Add instructions to use a custom model

I’m currently testing LassoNet as a potential model in my research and would like to use a custom model. Is there a recommended way to do so? I was scanning the examples and didn’t quite see one.

I am performing a [0,1] classification task on high dimensional data, and would like to use a custom data generator / specific activation functions / add layers.

LassoNet-Unsupervised Feature Selection

Hi,

I would like to use lassonet as an unsupervised feature selection algorithm, and I have made some attempts. However, it seems that I didn't get the correct outcome.

My use case:

I have an input matrix without labels, and I want to have a new reduced matrix with only several important features. I also want to know which features have been selected.

Lassonet Python library data input

Dear Devs,

My name is Dufot Nicolas, working on picture classification using neural networks (with PyTorch).
I found your publication "LassoNet: A Neural Network with Feature Sparsity" very interesting and your function LassoNetClassifier is very useful to prioritize pixels and identify informative sub parts of the image for complex pictures classification.

I took mnist_classif.py script in the example folder in the aim to adapt the LassoNetClassifier to my pictures datas.
I have understand the numpy array X_train, X_test, y_train, y_test input with X for the pixels datas and y for the classification labels.

In the mnist_classif.py, datas in X_train and X_test are mono channel pixels (black and white MNIST dataset).
The data looks like this: [ [pixels datas for picture 1] [pixels datas for picture 2] ... [pixel datas for picture n] ]
I.E: a list of pictures presented as a list of pixels values.

This is working for mono channel pixels, the question is: how to insert into this my 3 channels colored pictures ?

My datas look like: [ [[ pixels datas for channel 1 picture 1 ] [ pixels datas for channel 2 picture 1 ] [ pixels datas for channel 3 picture 1 ]] ... [[ pixels datas for channel 1 picture n ] [ pixels datas for channel 2 picture n ] [ pixels datas for channel 3 picture n ]]]
IE: A list of pictures presented as 3 sub lists detailing 3 values per pixels, one value per channels. This is the standard datas presentation using PyTorch

How to deal with it ?

Second question, linked to the first: how to specify network parameters using LassoNetClassifier ? how to make it work with my dataset ?

Actually, when I try to use LassoNetClassifier with my arrays, the error "RuntimeError: mat1 and mat2 shapes cannot be multiplied (2436x28 and 3x2)" occurs. I have seen this error many times with PyTorch, this is due to bad shape parameters for the different neuronal layers, which also depend on the image input size and channels numbers (you see the link with first question)
So, configuration of the neuronal network needs to be adapted to my dataset (higher image size than in MNIST dataset).

Have a nice day,
Cordially.

Having trouble with backtrack option

I just pulled the latest version, and am trying out training with backtrack on. I am getting an error:

Initialized dense model in 28 epochs, val loss 9.84e-02, regularization 1.39e+01
Traceback (most recent call last):
  File "/home/psmirnov/Code/Github/lassonet_exp/lassonet/examples/ctrpv2_lassonet_path.py", line 103, in <module>
    path = model.path(inner_train_X, inner_train_y.reshape(-1), X_val=valid_X, y_val=valid_y.reshape(-1))
  File "/home/psmirnov/Code/Github/lassonet_exp/lassonet/lassonet/interfaces.py", line 369, in path
    self._train(
  File "/home/psmirnov/Code/Github/lassonet_exp/lassonet/lassonet/interfaces.py", line 270, in _train
    loss = real_loss
UnboundLocalError: local variable 'real_loss' referenced before assignment

I think in your code, it corresponds to line 260 (I added tracking of some metrics other than loss on the validation set).

I suspect this is happening when val_obj < real_best_val_obj condition on line 249 is not met prior to early stopping breaking out of the loop. I think real_loss would be unassigned then.

miceprotein.py

The miceprotein.py example runs, printing progress on lamda and feature selection, but then gives the following error. Do you have a sense for what the problem might be? Thanks.

AttributeError: 'numpy.ndarray' object has no attribute 'log_softmax'

Detailed traceback:
  File "<string>", line 1, in <module>
  File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/lassonet/utils.py", line 26, in plot_path
    score.append(model.criterion(model.predict(X_test), y_test))
  File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/loss.py", line 1047, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/torch/nn/functional.py", line 2693, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/Users/cbuerkle/Library/Python/3.8/lib/python/site-packages/torch/nn/functional.py", line 1672, in log_softmax
    ret = input.log_softmax(dim)

EarlyStopped

Hello! I wanna to ask whether "LASSONET" have a "EarlyStopped" function, when the loss function have not been decreased?

Some errors

When I run the code, some errors occur.
boston_housing.py Line 39 It should be n_selected.append(save.selected.sum().cpu().numpy())
utils.py Line 38 n_selected.append(save.selected.sum().cpu().numpy())

Online logging feature

The implementation was started on https://github.com/lasso-net/lassonet/tree/online and https://github.com/lasso-net/lassonline

Next step would be to provide plotting with mpld3

An error when trying to run 'Usage' code in Readme

Hi,

I tried to run the code in 'Usage'. However, I encounter this error:

from lassonet import LassoNetClassifierCV
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.7/dist-packages/lassonet/init.py", line 4, in
from .interfaces import (
File "", line 1
(current_lambda=)
^
SyntaxError: invalid syntax

Could you please tell me why it's like this?

LassoNet for Cox question

Hi,
thank you for this model and especially for the extension to the Cox/survival outcomes.

I am trying to test LassoNetCoxRegressor() for the example with the Hnscc data in python 3.9 (and for my own datasets),

I appreciate if you could help with the questions/issues:

It runs model.path() and produces reasonable plots (attached), but the returned model.score(X_test,y_test) is zero (?), which is confusing, also 0 for model.fit(train), model.score(test). Is model.score only works with CV?..
Also, model.path() or model.fit() do not accept data frames, only work with np.array(X_test), np.array(y_test) - but as I write this is probably expected.
A bit more general question - the LassoNet would zero out any input from a predictor, which results with 0 Lasso-optimized weight in the outer loop (i.e. with no linear contribution to the outcome)?

X = pd.read_csv("x.csv")
y = pd.read_csv("y.csv")
model = LassoNetCoxRegressor(
    hidden_dims=(32,),     lambda_start=1e-2,     path_multiplier=1.02,    
     gamma=1,    verbose=True, tie_approximation="breslow")
X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = np.array(y_train)
y_test = np.array(y_test)
path = model.path(X_train, y_train)
plot_path(model, path, np.array(X_test), np.array(y_test))
model.score(X_test,y_test) #0.0

Cannot import lassonet

I have problem when i import lassonet in python3.7

File "", line 1
(current lambda=)
^ SyntaxError: invalid syntax

When I delete the “=” in

lassonet/lassonet/interfaces.py

Line 491 in d740ff3

f"Features start to disappear at {current_lambda=:.3f}."

It works.

Significance of the reported results

Hey, I was wondering how did you report the scores in table 1 of the paper, was it the accuracy over the test set for one trial or an average over several trials? Thanks in advance.

Running lassonet with CUDA 11.2

Hi, is it possible to run GPU lassonet with CUDA 11.2? This is not clear in the documentation. In installation I can see that the packages are trying to install CUDA 12.3, but in my environment I have previously installed CUDA 11.2 via Conda. Does this mean that installing lassonet overwrites system’s CUDA version to 12.3? My driver cannot support 12.3 yet…

LassoNet for multiple input multiple output problems (sensors selection for physics field decoder challenge)

Dear all, we are working on physics guided AI where we wish to use several sensors to decode a full aerodynamics field of interest (https://royalsocietypublishing.org/doi/10.1098/rspa.2020.0097). Our decoder works well. However, we wish to optimise the sensor placement that will have the best results for decoding the aerodynamic field.

This leaves us with let's say 5- 30 or 100 input to the decoder and several thousand or hundreds of thousands of outputs. How can we apply LassoNet to our problem when we want to optimise the overall (several thousand) output decoded field quality?

i.e. study which input sensor features are most important for the 'genera' field reconstruction (minimum error at all outputs together). Multiple inputs - multiple output SHAP.

If you have any ideas or have heard of such an application of Lassonet please can you let us know!?

image

Iordan Doytchinov, Ph.D.

Postdoctoral researcher and scientific collaborator

Ecole Polytechnique Fédérale de Lausanne (EPFL)
EPFL – TOPO
Station 18 – Bâtiment GC C2 398

CH–1015 Lausanne

Office telephone: +41216939832
Personal mobile: +33699850592

[email protected]
http://topo.epfl.ch/

importance values

Hello! I have a question. I want to know whether "LassoNet" have function which can get something like importance values or SHAP values?

LassoNet for Multitask Learning

Can lassonet be extended to multitask learning neural networks? If so, how do I go about implementing it? TIA

Lower accuracy when reproducing the experiments

Hello, thank you for the work you have done. In my attempt to replicate the experiments reported in the LassoNet paper, I found that the results I am getting are totally different. The performance at 50 selected features is significantly lower than the one reported in the paper (<60% vs 88% for the ISOLET dataset). I have repeated the experiment on 20 or 30 runs and tried all the possible hidden_dim. I was wondering if there I am missing something or if there is a default parameter which significantly affects the performance and needs to be changed.

I will detail the steps I have done:

I downloaded the datasets from Google drive provided on the github repo.
I created a fresh virtual environment using venv with python 3.9.2 and only installed lassonet.
For loading the data, I also used the loaders in data_utils.py in the experiments folder of the repo.
Instead of tuning the hidden_dim (as the paper indicates), I experimented with all the possible options [d//3, 2d//3, d, 4d//3]
I used the following script for running the experiments (for one hidden_dim at a time):

import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from lassonet import LassoNetClassifier
from lassonet.interfaces import LassoNetClassifierCV
from lassonet.plot import plot_path
from lassonet.utils import eval_on_path
from data_utils import load_mice, load_coil, load_activity, load_isolet
import torch
import pickle


(X_train, y_train), (X_test, y_test) = load_isolet()
X_train_valid_fixed = X_train
y_train_valid_fixed = y_train

seed = None
device = 'cuda'

data_dim = X_train.shape[1]
hidden_dim = (data_dim//3,)

score_list_of_lists = []
n_selected_list_of_lists = []
lambda_list_of_lists = []

for i in range(30):
    X_train, X_val, y_train, y_val = train_test_split(X_train_valid_fixed, y_train_valid_fixed, test_size=0.125, random_state=seed)
    model = LassoNetClassifier(M=10, hidden_dims=hidden_dim, verbose=1, torch_seed=seed, random_state=seed, device=device)
    path = model.path(X_train, y_train, X_val=X_val, y_val=y_val)

    score = eval_on_path(model, path, X_test, y_test, score_function=None)
    n_selected = [save.selected.sum().item() for save in path]
    lambda_ = [save.lambda_ for save in path]

    score_list_of_lists.append(score)
    n_selected_list_of_lists.append(n_selected)
    lambda_list_of_lists.append(lambda_)

And the following to plot

plt.figure(figsize=(30, 10))
for sublist_A, sublist_B in zip(score_list_of_lists, n_selected_list_of_lists):
    plt.plot(sublist_B, sublist_A)   
    plt.xlabel('Features Selected')
    plt.ylabel('Accuracy')
    plt.title('Accuracy vs Features Selected for hidden_dim=(data_dim//3,) on ISOLET dataset --- GPU version for 30 runs')

plt.savefig('isolet_1.png')

Surprisingly, I got the following plots:

To rule out possible GPU issues, I ran the first experiment on the CPU for fewer runs (as it was taking longer)

Similarly, I repeated the first experiment on COIL dataset

I was surprised of the plots given the steps I have followed. However, I realized that similar plots were reported by a paper that studies LassoNet (especially for the case of 50 features).

I suspect the behavior should be consistent, and I'm still wondering what I might have missed. Could you kindly provide insight or assistance to help resolve these discrepancies? Thank you in advance!

Unclear connection of LassoNet and SPINN

The LassoNet paper mentiones that LassoNet generalises a method, though it's unclear how/when this is the case.

In Section 1.2 Related work, the paper says "Recently, Feng and Simon (2017) proposed an input-sparse neural network, where the input weights are penalized using the group Lasso penalty. As will become evident in Section 3, our proposed method extends and generalizes this approach in a natural way."

The Feng and Simon (2017) add a sparse group Lasso on the first layer (see figure below), which is a convex combination of a Lasso and a group Lasso.

How/When does LassoNet generalize the method of Feng and Simon (2017)? Looking in Section 3, I see that LassoNet is equivalent to a standard Lasso (when M=0) and an unregularized feed-forward neural network (when M → +∞); though the connection to the method of Feng and Simon (2017) isn't mentioned.

Pipeline for reproducing the experiments

Hello,
I noticed that you provide code to load the datasets used in table 1 but do not provide the code to replicate the experiments, mentioning that it'll be possible if there is user demand. So I was wondering if you are planning on releasing the relevant code anytime soon, as I am working on a relevant project and it'll be much easier to have (at least) a clear pipeline to follow in order to replicate the experiments.
Your assistance is highly appreciated!

plot_path(model, path, X_test, y_test)

Firstly, I would like to congratulate for the amazing solution you have developed. I am using it in a classification problem (ANN). I just can't figure out what are the features in the number of selected features (x-axis). Please, can you help me? Thank you very much!

Comparison of LassoNet with Glmnet in terms of Linear Regression

As per the documentation, LassoNet is supposed to behave as a Linear Regressor when the hyperparameter M is set to 0. I'm comparing this configuration with that of another model which can act as a Linear Regressor, i.e. Glmnet (https://www.rdocumentation.org/packages/glmnet/versions/1.6/topics/cv.glmnet), along with a Lasso penalty. This is to check if they yield the same/ similar optimal lambda value, cross-validation error and feature coefficients.

As per my understanding, the two only differ in the objective function. Glmnet uses the Gaussian equation as the objective function while operating as a Linear Regressor and it differs from that of LassoNet by a constant multiple of 0.5. Hence, the optimal lambda value in Glmnet should be half of that in LassoNet. However, after repeated attempts I've found that to not be the case. The minimum cross-validation error and coefficients also differ between the two models.

To keep the comparison fair, I used the same standardized dataset, the same list of lambda values (Lambda_.txt) that the LassoNet model takes up automatically, along with the same 5 fold cross validation. I've given a code snippet below for better understanding:

lambdas = [Lambda_.txt]

LassoNetRegressorCV(hidden_dims=(2,), M=0.0, random_state=42, torch_seed=0, cv=5)

cv.glmnet(X, y, nfolds=5, alpha = 1, lambda= lambdas, intercept=False)

It'd be really helpful for me if you could help explain this difference as I intend to use LassoNet in further research endeavors. Please let me know if you need any further clarification.

Thanks.

Add support for data loader

As per https://pytorch.org/docs/stable/data.html

How to Access LassoNet Model Wieghts

I'm trying to pull the model weights selected in the final optimized regularization path for LassoNet. Is there anyway to show the weights for each of the features used in that path? I know you can print out the most complex path using path[-1] but how do I access the weights there? I've tried using path[-1].state_dict() but that didn't work. Is there a special way to call the weights in the path? I know you can use model.feature_importance_ to show the feature importance but that doesn't serve as the weight in the model. Correct? The model.coef_ attribute doesn't work just to print out the weight. Could you please provide some guidance on how to obtain the weights or coefficients of the LassoNet model? Thanks for your help.

implement l1_regularization_skip for groups

Add documentation for LassoNetAutoEncoder

In the example mnist_ae.py the module LassoNetAutoEncoder is imported from lassonet, but this is not in the documentation?

Using a different loss function than MSE

I am using Lassonet for my thesis, but I want to do so with a quantile loss function instead of Mean Squared Error. After going through the code and found a variable self.criterion set to MSE loss by default (interfaces.py, class LassoNetRegressor, line 566). After instantiating the class, I manually changed it self.criterion = quantile loss and trained the lassonet.

However, the loss didn't converge and remains high even after several epochs and the train assertion becomes false and it exits with error line 316, interfaces.py.
Can someone suggest a solution?

Bug: dropout is not deactivated during evaluation

Problem: Calling model.score() function repeatedly gives difference results. I suspect that's because LassoNet doesn't call model.eval() to stop the stochastic components (e.g., dropout).

Solution: Call model.eval() inside the .score() function See: https://stackoverflow.com/questions/60018578/what-does-model-eval-do-in-pytorch

l1_regularization_skip()

Hi,
I wonder why does the l1_regularization_skip() in model.py use L2-norm, then isn't it the same as l2_regularization_skip()? Thanks.
Here is the link to the function:

lassonet/lassonet/model.py

Lines 62 to 63 in e3a3754

 def l1_regularization_skip(self): 

 return torch.norm(self.skip.weight.data, p=2, dim=0).sum()

The link does not work

Hi,

I tried https://lassonet.ml, but it does not work :(

Would you be able to fix it?

Group Lasso

Hi, we are very thankful for sharing this code, and we would like to apply it to a group lasso problem. We have a classification problem where the inputs are grouped in blocks all of the same sizes.
We want to adapt your code.
In our case, each input node j has k features that we would like to have the same theta_j.
One form of writing our problem is modifying the constraints In equation (2) of the Lassonet paper, just adding another index for W and leaving the rest as it is.
Do you think it's possible to make minor changes to your code to do that? Could you help us with that?

About Algorithm 4 of the paper: In line 14, the notation indicates the vector theta_j \in R^k, having the same dimension with W_j^{(1)}. That makes me a little confused. K is the size of the first hidden layer, and if it is a multi-class classification problem, theta should be \in R^{d*c}, where c is the number of classes (since theta is a linear classifier), and d is the input feature dimension. So theta_j should be in R^{c} in my opinion. Do I understand it correctly?
About Section 6 of the paper: In the group lasso problem, how does the group L1 norm regularizer construct? Assuming it is a multi-class classification problem, and theta is in R^{d*c}, where d is the number of features, and c is the number of classes. So if we want to choose a sparse subset of features for the linear classifier, the regularization term should be |theta|_ {1} = \ sum_{i=1}^d |\sum_{j=1}^c theta_{i,j}^2|, am I right?
How should I use the API provided in this repo? For example, in function prox (in lassonet/prox.py), I found the theta (variable v in the code) is calculated by:
norm_v = torch.norm(v, p=2, dim=0)
It seems that this has the same formulation as the pseudocode in Line6, Algorithm 4 of the paper. Does this mean that the function prox can solve the feature subset selection problem I described above?
What does the function inplace_group_prox (in lassonet/prox.py) used for? I notice it passes each group of parameters to the prox function. That makes me confused because I think prox is used to give the features sparse weights. And we hope the group lasso can make a group of features share similar weights (for example, features in group 1 all have large weights, and features in group2 all have small weights, etc.). However, if we pass a group of features into the prox, I would expect this function to return sparse weights, which means the features in this single group have sparse weights (some weights are big and some weights are small), and not the features in this group share similar weights (all large or all small). Do I understand this function correctly?
There is no LassoNetAutoEncoder in lassonet folder so [examples](https://github.com/lasso-net/lassonet/tree/master/examples)/mnist_ae.py cannot run correctly.

Thanks for your help in advance!

	self.device = device
	if device is None:
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	def l1_regularization_skip(self):
	return torch.norm(self.skip.weight.data, p=2, dim=0).sum()

lasso-net / lassonet Goto Github PK

lassonet's People

Contributors

Stargazers

Watchers

Forkers

lassonet's Issues

Recommend Projects

Recommend Topics

Recommend Org