benchopt / benchmark_logreg_l2 Goto Github PK

Benchopt benchmark for L2-regularized Logistic Regression

Python 100.00%

benchmark_logreg_l2's Introduction

—Making your ML and optimization benchmarks simple and open—

Benchopt is a benchmarking suite tailored for machine learning workflows. It is built for simplicity, transparency, and reproducibility. It is implemented in Python but can run algorithms written in many programming languages.

So far, benchopt has been tested with Python, R, Julia and C/C++ (compiled binaries with a command line interface). Programs available via conda should be compatible as well. See for instance an example of usage with R.

Install

It is recommended to use benchopt within a conda environment to fully-benefit from benchopt Command Line Interface (CLI).

To install benchopt, start by creating a new conda environment and then activate it

conda create -n benchopt python
conda activate benchopt

Then run the following command to install the latest release of benchopt

pip install -U benchopt

It is also possible to use the latest development version. To do so, run instead

pip install git+https://github.com/benchopt/benchopt.git

Getting started

After installing benchopt, you can

replicate/modify an existing benchmark
create your own benchmark

Using an existing benchmark

Replicating an existing benchmark is simple. Here is how to do so for the L2-logistic Regression benchmark.

Clone the benchmark repository and cd to it

git clone https://github.com/benchopt/benchmark_logreg_l2
cd benchmark_logreg_l2

Install the desired solvers automatically with benchopt

benchopt install . -s lightning -s sklearn

Run the benchmark to get the figure below

benchopt run . --config ./example_config.yml

https://benchopt.github.io/_images/sphx_glr_plot_run_benchmark_001.png

These steps illustrate how to reproduce the L2-logistic Regression benchmark. Find the complete list of the Available benchmarks. Also, refer to the documentation to learn more about benchopt CLI and its features. You can also easily extend this benchmark by adding a dataset, solver or metric. Learn that and more in the Benchmark workflow.

Creating a benchmark

The section Write a benchmark of the documentation provides a tutorial for creating a benchmark. The benchopt community also maintains a template benchmark to quickly and easily start a new benchmark.

Finding help

Join benchopt discord server and get in touch with the community! Feel free to drop us a message to get help with running/constructing benchmarks or (why not) discuss new features to be added and future development directions that benchopt should take.

Citing Benchopt

Benchopt is a continuous effort to make reproducible and transparent ML and optimization benchmarks. Join us in this endeavor! If you use benchopt in a scientific publication, please cite

@inproceedings{benchopt,
   author    = {Moreau, Thomas and Massias, Mathurin and Gramfort, Alexandre
                and Ablin, Pierre and Bannier, Pierre-Antoine
                and Charlier, Benjamin and Dagréou, Mathieu and Dupré la Tour, Tom
                and Durif, Ghislain and F. Dantas, Cassio and Klopfenstein, Quentin
                and Larsson, Johan and Lai, En and Lefort, Tanguy
                and Malézieux, Benoit and Moufad, Badr and T. Nguyen, Binh and Rakotomamonjy,
                Alain and Ramzi, Zaccharie and Salmon, Joseph and Vaiter, Samuel},
   title     = {Benchopt: Reproducible, efficient and collaborative optimization benchmarks},
   year      = {2022},
   booktitle = {NeurIPS},
   url       = {https://arxiv.org/abs/2206.13424}
}

Available benchmarks

Problem	Results	Build Status
Ordinary Least Squares (OLS)	Results
Non-Negative Least Squares (NNLS)	Results
LASSO: L1-Regularized Least Squares	Results
LASSO Path	Results
Elastic Net
MCP	Results
L2-Regularized Logistic Regression	Results
L1-Regularized Logistic Regression	Results
L2-regularized Huber regression
L1-Regularized Quantile Regression	Results
Linear SVM for Binary Classification
Linear ICA
Approximate Joint Diagonalization (AJD)
1D Total Variation Denoising
2D Total Variation Denoising
ResNet Classification	Results
Bilevel Optimization	Results

benchmark_logreg_l2's People

Contributors

Stargazers

Watchers

Forkers

ngazagna mathurinm josephsalmon agramfort tommoral tanglef tomdlt tbng thomasryck amelievernay badr-moufad matdag ogrisel mblondel ceelestin

benchmark_logreg_l2's Issues

Handle issues from other packages side?

The CI is currently red and one of the reasons is a problem from a package that is used (see ).
Technically, this is not a BenchOpt error, and it can impact multiple repositories if the same library is used for different problems to solve.

So where is the limit (if there is one) between a warning in the CI that there is a problem on some library side and a real error from BenchOpt that we can deal with?

(poke @josephsalmon)

Make stochastic solvers fit in this benchmark

Now that we use SufficientProcessCriterion to stop the benchmark and that we can have multiple loss at once, there is no real gain to have separate benchmarks for logreg_l2 with batch and stochastic solvers. We should thus merge benchopt/benchmark_stochastic_logreg_l2 and this benchmark. To do this, we should:

port SGD solver in this benchmark
add a notion of train/test losses in objective.py
improve the plotting utils for multi value loss functions to get train/test graphs.
close the benchopt/benchmark_stochastic_logreg_l2 repo

Moreover, to make this benchmark more inline with literature practices, we should try and reproduce the following figures:

From Schmidtt et al. 2014
From Leblond et al. 2017

`benchopt install --env .` failure on this benchmark

(base) ➜  benchmark_logreg_l2 git:(main) benchopt install --env .
Installing 'benchmark_logreg_l2' requirements
Creating conda env 'benchopt_benchmark_logreg_l2':...



 done
# Install
Collecting packages:
- 'Python-GD' already available in 'benchopt_benchmark_logreg_l2'
- 'Simulated' already available in 'benchopt_benchmark_logreg_l2'
... done
Installing required packages for:
- cd
- chop
- copt
- Lightning
- sklearn
- covtype_binary
- madelon
- news20
- rcv1
...Traceback (most recent call last):
  File "/home/mathurin/anaconda3/bin/benchopt", line 33, in <module>
    sys.exit(load_entry_point('benchopt', 'console_scripts', 'benchopt')())
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/mathurin/workspace/benchopt/benchopt/cli/main.py", line 274, in install
    benchmark.install_all_requirements(
  File "/home/mathurin/workspace/benchopt/benchopt/benchmark.py", line 233, in install_all_requirements
    install_in_conda_env(
  File "/home/mathurin/workspace/benchopt/benchopt/utils/conda_env_cmd.py", line 181, in install_in_conda_env
    _run_shell_in_conda_env(
  File "/home/mathurin/workspace/benchopt/benchopt/utils/shell_cmd.py", line 130, in _run_shell_in_conda_env
    return _run_shell(
  File "/home/mathurin/workspace/benchopt/benchopt/utils/shell_cmd.py", line 68, in _run_shell
    raise RuntimeError(raise_on_error.format(output=output))
RuntimeError: Failed to conda install packages ('pip:https://github.com/openopt/copt/archive/master.zip', 'scikit-learn', 'pip:git+https://github.com/scikit-learn-contrib/lightning.git', 'pip:https://github.com/openopt/chop/archive/master.zip', 'numba', 'pip:scikit-learn', 'pip:libsvmdata')
Error:Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed
Solving environment: ...working... 
Building graph of deps:   0%|          | 0/11 [00:00<?, ?it/s]
Examining cython:   0%|          | 0/11 [00:00<?, ?it/s]      
Examining @/linux-64::__archspec==1=x86_64:   9%|▉         | 1/11 [00:01<00:10,  1.02s/it]
Examining @/linux-64::__archspec==1=x86_64:  18%|█▊        | 2/11 [00:01<00:04,  1.96it/s]
Examining compilers:  18%|█▊        | 2/11 [00:01<00:04,  1.96it/s]                       
Examining @/linux-64::__unix==0=0:  27%|██▋       | 3/11 [00:01<00:04,  1.96it/s]
Examining @/linux-64::__linux==5.11.0=0:  36%|███▋      | 4/11 [00:01<00:03,  1.96it/s]
Examining pip:  45%|████▌     | 5/11 [00:01<00:03,  1.96it/s]                          
Examining numpy:  55%|█████▍    | 6/11 [00:02<00:02,  1.96it/s]
Examining numpy:  64%|██████▎   | 7/11 [00:02<00:01,  2.58it/s]
Examining scikit-learn:  64%|██████▎   | 7/11 [00:07<00:01,  2.58it/s]
Examining scikit-learn:  73%|███████▎  | 8/11 [00:07<00:03,  1.11s/it]
Examining numba:  73%|███████▎  | 8/11 [00:16<00:03,  1.11s/it]       
Examining numba:  82%|████████▏ | 9/11 [00:16<00:05,  2.72s/it]
Examining @/linux-64::__glibc==2.31=0:  82%|████████▏ | 9/11 [00:20<00:05,  2.72s/it]
Examining @/linux-64::__glibc==2.31=0:  91%|█████████ | 10/11 [00:20<00:03,  3.04s/it]
Examining python=3.8:  91%|█████████ | 10/11 [00:20<00:03,  3.04s/it]                 
                                                                     

Determining conflicts:   0%|          | 0/11 [00:00<?, ?it/s]
Examining conflict for cython pip numpy numba scikit-learn python:   0%|          | 0/11 [00:00<?, ?it/s]
Examining conflict for cython numba pip:   9%|▉         | 1/11 [00:03<00:34,  3.42s/it]                  
Examining conflict for cython numba pip:  18%|█▊        | 2/11 [00:03<00:15,  1.71s/it]
Examining conflict for cython numpy numba scikit-learn python:  18%|█▊        | 2/11 [00:05<00:15,  1.71s/it]
Examining conflict for cython numpy numba scikit-learn python:  27%|██▋       | 3/11 [00:05<00:14,  1.81s/it]
Examining conflict for cython pip numpy numba scikit-learn:  27%|██▋       | 3/11 [00:09<00:14,  1.81s/it]   
Examining conflict for cython pip numpy numba scikit-learn:  36%|███▋      | 4/11 [00:09<00:18,  2.58s/it]
Examining conflict for cython pip numba scikit-learn python:  36%|███▋      | 4/11 [00:10<00:18,  2.58s/it]
Examining conflict for cython pip numba scikit-learn python:  45%|████▌     | 5/11 [00:10<00:13,  2.25s/it]
Examining conflict for cython numpy numba scikit-learn __glibc python:  45%|████▌     | 5/11 [00:11<00:13,  2.25s/it]
Examining conflict for cython numpy numba scikit-learn __glibc python:  55%|█████▍    | 6/11 [00:11<00:08,  1.70s/it]
Examining conflict for python pip:  55%|█████▍    | 6/11 [00:14<00:08,  1.70s/it]                                    
Examining conflict for python pip:  64%|██████▎   | 7/11 [00:14<00:08,  2.16s/it]
Examining conflict for numba scikit-learn numpy:  64%|██████▎   | 7/11 [00:14<00:08,  2.16s/it]
Examining conflict for scikit-learn numpy:  73%|███████▎  | 8/11 [00:18<00:06,  2.16s/it]      
Examining conflict for scikit-learn numpy:  82%|████████▏ | 9/11 [00:18<00:03,  1.99s/it]
Examining conflict for numba scikit-learn:  82%|████████▏ | 9/11 [00:22<00:03,  1.99s/it]
Examining conflict for numba scikit-learn:  91%|█████████ | 10/11 [00:22<00:02,  2.49s/it]
                                                                                          
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (numpy):

  - numba -> numpy[version='1.13.*|>=1.10,<1.11.0a0|>=1.11.3,<2.0a0|>=1.16.6,<2.0a0|>=1.14.6,<2.0a0|>=1.9.3,<2.0a0|>=1.12,<1.13.0a0|>=1.14,<1.15.0a0|>=1.13,<1.14.0a0|>=1.11,<1.12.0a0']
  - numba -> python[version='>=3.8,<3.9.0a0'] -> pip
  - scikit-learn -> numpy[version='>=1.11.3,<2.0a0|>=1.14.6,<2.0a0|>=1.16.6,<2.0a0|>=1.9.3,<2.0a0']
  - scikit-learn -> python[version='>=3.7,<3.8.0a0'] -> pip
  - scikit-learn -> scipy[version='>=1.1.0'] -> numpy[version='>=1.15.1,<2.0a0']

The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package python_abi conflicts for:
pip -> setuptools -> python_abi=3.8[build=*_cp38]
cython -> python_abi=3.8[build=*_cp38]
numba -> numpy[version='>=1.16.6,<2.0a0'] -> python_abi=3.8[build=*_cp38]
scikit-learn -> numpy[version='>=1.16.6,<2.0a0'] -> python_abi=3.8[build=*_cp38]
numpy -> python_abi=3.8[build=*_cp38]

Package setuptools conflicts for:
numba -> setuptools
scikit-learn -> joblib[version='>=0.11'] -> setuptools
pip -> setuptools
cython -> setuptools
python=3.8 -> pip -> setuptools

Package _libgcc_mutex conflicts for:
numpy -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1|0.1',build='main|conda_forge']
cython -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1|0.1',build='main|conda_forge']
python=3.8 -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1|0.1',build='main|conda_forge']
numba -> _openmp_mutex[version='>=4.5'] -> _libgcc_mutex[version='*|0.1',build='main|conda_forge|main']
scikit-learn -> _openmp_mutex -> _libgcc_mutex[version='*|0.1',build='main|conda_forge|main']

Package libgfortran4 conflicts for:
scikit-learn -> scipy[version='>=1.1.0'] -> libgfortran4[version='>=7.5.0']
numpy -> libgfortran-ng[version='>=7,<8.0a0'] -> libgfortran4=7.5.0

Package pip conflicts for:
python=3.8 -> pip
cython -> python[version='>=3.8,<3.9.0a0'] -> pip
numpy -> python[version='>=3.8,<3.9.0a0'] -> pip

Package certifi conflicts for:
cython -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
pip -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
numba -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']

Package libgomp conflicts for:
numba -> _openmp_mutex[version='>=4.5'] -> libgomp[version='>=7.5.0']
scikit-learn -> _openmp_mutex -> libgomp[version='>=7.5.0']

Package wheel conflicts for:
python=3.8 -> pip -> wheel
pip -> wheelThe following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.31=0
  - feature:|@/linux-64::__glibc==2.31=0
  - cython -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - numba -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - numpy -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - python=3.8 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - scikit-learn -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.31

MTN snapml does not support numpy 2.0

Some solvers do not support numpy==2.0 API.
I will skip them for now.

copt might be fixed with openopt/copt#111
snapml wheel are not build with numpy2.0

DATA add ill-conditionned simulated data

As discussed in this comment from sklearn, when the features of the dataset are not scaled, there can be slow convergence of optimization methods.

Adding an example which such ill-conditioned matrix would be very interesting.
The data generation mechanism is (quick extract, check this before coding :) ):

from sklearn.datasets import make_low_rank_matrix

n_samples, n_features = 1000, 10000

w_true = rng.randn(n_features)

X = make_low_rank_matrix(n_samples, n_features, random_state=rng)
X[:, 0] *= 1e3
X[:, -1] *= 1e3

z = X @ w_true + 1
z += 1e-1 * rng.randn(n_samples)

# Balanced binary classification problem
y = (z > np.median(z)).astype(np.int32)

benchopt run ./benchmark_logreg_l2 -s sklearn[lbfgs]

and returns

Usage: benchopt run [OPTIONS] BENCHMARK

Error: Invalid value: Patterns ['sklearn[lbfgs]'] did not matched any solver.
Available solvers are:
- Lightning
- sklearn[liblinear]
- sklearn[newton-cg]
- sklearn[lbfgs]

which looks contradictory.

add copt solver

see https://github.com/openopt/copt/blob/master/examples/plot_saga_vs_svrg.py

add cholesky solver from sklearn

See

scikit-learn/scikit-learn#24637

maybe we need to wait for a new release of sklearn

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.