Giter Club home page Giter Club logo

benchmark_logreg_l2's Introduction

https://raw.githubusercontent.com/benchopt/communication_materials/main/posters/images/logo_benchopt.png

—Making your ML and optimization benchmarks simple and open—


Test Status codecov Documentation Python 3.6+ install-per-months discord SWH

Benchopt is a benchmarking suite tailored for machine learning workflows. It is built for simplicity, transparency, and reproducibility. It is implemented in Python but can run algorithms written in many programming languages.

So far, benchopt has been tested with Python, R, Julia and C/C++ (compiled binaries with a command line interface). Programs available via conda should be compatible as well. See for instance an example of usage with R.

Install

It is recommended to use benchopt within a conda environment to fully-benefit from benchopt Command Line Interface (CLI).

To install benchopt, start by creating a new conda environment and then activate it

conda create -n benchopt python
conda activate benchopt

Then run the following command to install the latest release of benchopt

pip install -U benchopt

It is also possible to use the latest development version. To do so, run instead

pip install git+https://github.com/benchopt/benchopt.git

Getting started

After installing benchopt, you can

  • replicate/modify an existing benchmark
  • create your own benchmark

Using an existing benchmark

Replicating an existing benchmark is simple. Here is how to do so for the L2-logistic Regression benchmark.

  1. Clone the benchmark repository and cd to it
git clone https://github.com/benchopt/benchmark_logreg_l2
cd benchmark_logreg_l2
  1. Install the desired solvers automatically with benchopt
benchopt install . -s lightning -s sklearn
  1. Run the benchmark to get the figure below
benchopt run . --config ./example_config.yml
https://benchopt.github.io/_images/sphx_glr_plot_run_benchmark_001.png

These steps illustrate how to reproduce the L2-logistic Regression benchmark. Find the complete list of the Available benchmarks. Also, refer to the documentation to learn more about benchopt CLI and its features. You can also easily extend this benchmark by adding a dataset, solver or metric. Learn that and more in the Benchmark workflow.

Creating a benchmark

The section Write a benchmark of the documentation provides a tutorial for creating a benchmark. The benchopt community also maintains a template benchmark to quickly and easily start a new benchmark.

Finding help

Join benchopt discord server and get in touch with the community! Feel free to drop us a message to get help with running/constructing benchmarks or (why not) discuss new features to be added and future development directions that benchopt should take.

Citing Benchopt

Benchopt is a continuous effort to make reproducible and transparent ML and optimization benchmarks. Join us in this endeavor! If you use benchopt in a scientific publication, please cite

@inproceedings{benchopt,
   author    = {Moreau, Thomas and Massias, Mathurin and Gramfort, Alexandre
                and Ablin, Pierre and Bannier, Pierre-Antoine
                and Charlier, Benjamin and Dagréou, Mathieu and Dupré la Tour, Tom
                and Durif, Ghislain and F. Dantas, Cassio and Klopfenstein, Quentin
                and Larsson, Johan and Lai, En and Lefort, Tanguy
                and Malézieux, Benoit and Moufad, Badr and T. Nguyen, Binh and Rakotomamonjy,
                Alain and Ramzi, Zaccharie and Salmon, Joseph and Vaiter, Samuel},
   title     = {Benchopt: Reproducible, efficient and collaborative optimization benchmarks},
   year      = {2022},
   booktitle = {NeurIPS},
   url       = {https://arxiv.org/abs/2206.13424}
}

Available benchmarks

Problem Results Build Status
Ordinary Least Squares (OLS) Results Build Status OLS
Non-Negative Least Squares (NNLS) Results Build Status NNLS
LASSO: L1-Regularized Least Squares Results Build Status Lasso
LASSO Path Results Build Status Lasso Path
Elastic Net   Build Status ElasticNet
MCP Results Build Status MCP
L2-Regularized Logistic Regression Results Build Status LogRegL2
L1-Regularized Logistic Regression Results Build Status LogRegL1
L2-regularized Huber regression   Build Status HuberL2
L1-Regularized Quantile Regression Results Build Status QuantileRegL1
Linear SVM for Binary Classification   Build Status LinearSVM
Linear ICA   Build Status LinearICA
Approximate Joint Diagonalization (AJD)   Build Status JointDiag
1D Total Variation Denoising   Build Status TV1D
2D Total Variation Denoising   Build Status TV2D
ResNet Classification Results Build Status ResNetClassif
Bilevel Optimization Results Build Status Bilevel

benchmark_logreg_l2's People

Contributors

agramfort avatar badr-moufad avatar ceelestin avatar geoffnn avatar mathurinm avatar ogrisel avatar tanglef avatar tomdlt avatar tommoral avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

benchmark_logreg_l2's Issues

Handle issues from other packages side?

The CI is currently red and one of the reasons is a problem from a package that is used (see L.186 of the action test log).
Technically, this is not a BenchOpt error, and it can impact multiple repositories if the same library is used for different problems to solve.

So where is the limit (if there is one) between a warning in the CI that there is a problem on some library side and a real error from BenchOpt that we can deal with?

(poke @josephsalmon)

Make stochastic solvers fit in this benchmark

Now that we use SufficientProcessCriterion to stop the benchmark and that we can have multiple loss at once, there is no real gain to have separate benchmarks for logreg_l2 with batch and stochastic solvers. We should thus merge benchopt/benchmark_stochastic_logreg_l2 and this benchmark. To do this, we should:

  • port SGD solver in this benchmark
  • add a notion of train/test losses in objective.py
  • improve the plotting utils for multi value loss functions to get train/test graphs.
  • close the benchopt/benchmark_stochastic_logreg_l2 repo

Moreover, to make this benchmark more inline with literature practices, we should try and reproduce the following figures:

`benchopt install --env .` failure on this benchmark

(base) ➜  benchmark_logreg_l2 git:(main) benchopt install --env .
Installing 'benchmark_logreg_l2' requirements
Creating conda env 'benchopt_benchmark_logreg_l2':...



 done
# Install
Collecting packages:
- 'Python-GD' already available in 'benchopt_benchmark_logreg_l2'
- 'Simulated' already available in 'benchopt_benchmark_logreg_l2'
... done
Installing required packages for:
- cd
- chop
- copt
- Lightning
- sklearn
- covtype_binary
- madelon
- news20
- rcv1
...Traceback (most recent call last):
  File "/home/mathurin/anaconda3/bin/benchopt", line 33, in <module>
    sys.exit(load_entry_point('benchopt', 'console_scripts', 'benchopt')())
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mathurin/anaconda3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/mathurin/workspace/benchopt/benchopt/cli/main.py", line 274, in install
    benchmark.install_all_requirements(
  File "/home/mathurin/workspace/benchopt/benchopt/benchmark.py", line 233, in install_all_requirements
    install_in_conda_env(
  File "/home/mathurin/workspace/benchopt/benchopt/utils/conda_env_cmd.py", line 181, in install_in_conda_env
    _run_shell_in_conda_env(
  File "/home/mathurin/workspace/benchopt/benchopt/utils/shell_cmd.py", line 130, in _run_shell_in_conda_env
    return _run_shell(
  File "/home/mathurin/workspace/benchopt/benchopt/utils/shell_cmd.py", line 68, in _run_shell
    raise RuntimeError(raise_on_error.format(output=output))
RuntimeError: Failed to conda install packages ('pip:https://github.com/openopt/copt/archive/master.zip', 'scikit-learn', 'pip:git+https://github.com/scikit-learn-contrib/lightning.git', 'pip:https://github.com/openopt/chop/archive/master.zip', 'numba', 'pip:scikit-learn', 'pip:libsvmdata')
Error:Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed
Solving environment: ...working... 
Building graph of deps:   0%|          | 0/11 [00:00<?, ?it/s]
Examining cython:   0%|          | 0/11 [00:00<?, ?it/s]      
Examining @/linux-64::__archspec==1=x86_64:   9%|| 1/11 [00:01<00:10,  1.02s/it]
Examining @/linux-64::__archspec==1=x86_64:  18%|█▊        | 2/11 [00:01<00:04,  1.96it/s]
Examining compilers:  18%|█▊        | 2/11 [00:01<00:04,  1.96it/s]                       
Examining @/linux-64::__unix==0=0:  27%|██▋       | 3/11 [00:01<00:04,  1.96it/s]
Examining @/linux-64::__linux==5.11.0=0:  36%|███▋      | 4/11 [00:01<00:03,  1.96it/s]
Examining pip:  45%|████▌     | 5/11 [00:01<00:03,  1.96it/s]                          
Examining numpy:  55%|█████▍    | 6/11 [00:02<00:02,  1.96it/s]
Examining numpy:  64%|██████▎   | 7/11 [00:02<00:01,  2.58it/s]
Examining scikit-learn:  64%|██████▎   | 7/11 [00:07<00:01,  2.58it/s]
Examining scikit-learn:  73%|███████▎  | 8/11 [00:07<00:03,  1.11s/it]
Examining numba:  73%|███████▎  | 8/11 [00:16<00:03,  1.11s/it]       
Examining numba:  82%|████████▏ | 9/11 [00:16<00:05,  2.72s/it]
Examining @/linux-64::__glibc==2.31=0:  82%|████████▏ | 9/11 [00:20<00:05,  2.72s/it]
Examining @/linux-64::__glibc==2.31=0:  91%|█████████ | 10/11 [00:20<00:03,  3.04s/it]
Examining python=3.8:  91%|█████████ | 10/11 [00:20<00:03,  3.04s/it]                 
                                                                     

Determining conflicts:   0%|          | 0/11 [00:00<?, ?it/s]
Examining conflict for cython pip numpy numba scikit-learn python:   0%|          | 0/11 [00:00<?, ?it/s]
Examining conflict for cython numba pip:   9%|| 1/11 [00:03<00:34,  3.42s/it]                  
Examining conflict for cython numba pip:  18%|█▊        | 2/11 [00:03<00:15,  1.71s/it]
Examining conflict for cython numpy numba scikit-learn python:  18%|█▊        | 2/11 [00:05<00:15,  1.71s/it]
Examining conflict for cython numpy numba scikit-learn python:  27%|██▋       | 3/11 [00:05<00:14,  1.81s/it]
Examining conflict for cython pip numpy numba scikit-learn:  27%|██▋       | 3/11 [00:09<00:14,  1.81s/it]   
Examining conflict for cython pip numpy numba scikit-learn:  36%|███▋      | 4/11 [00:09<00:18,  2.58s/it]
Examining conflict for cython pip numba scikit-learn python:  36%|███▋      | 4/11 [00:10<00:18,  2.58s/it]
Examining conflict for cython pip numba scikit-learn python:  45%|████▌     | 5/11 [00:10<00:13,  2.25s/it]
Examining conflict for cython numpy numba scikit-learn __glibc python:  45%|████▌     | 5/11 [00:11<00:13,  2.25s/it]
Examining conflict for cython numpy numba scikit-learn __glibc python:  55%|█████▍    | 6/11 [00:11<00:08,  1.70s/it]
Examining conflict for python pip:  55%|█████▍    | 6/11 [00:14<00:08,  1.70s/it]                                    
Examining conflict for python pip:  64%|██████▎   | 7/11 [00:14<00:08,  2.16s/it]
Examining conflict for numba scikit-learn numpy:  64%|██████▎   | 7/11 [00:14<00:08,  2.16s/it]
Examining conflict for scikit-learn numpy:  73%|███████▎  | 8/11 [00:18<00:06,  2.16s/it]      
Examining conflict for scikit-learn numpy:  82%|████████▏ | 9/11 [00:18<00:03,  1.99s/it]
Examining conflict for numba scikit-learn:  82%|████████▏ | 9/11 [00:22<00:03,  1.99s/it]
Examining conflict for numba scikit-learn:  91%|█████████ | 10/11 [00:22<00:02,  2.49s/it]
                                                                                          
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (numpy):

  - numba -> numpy[version='1.13.*|>=1.10,<1.11.0a0|>=1.11.3,<2.0a0|>=1.16.6,<2.0a0|>=1.14.6,<2.0a0|>=1.9.3,<2.0a0|>=1.12,<1.13.0a0|>=1.14,<1.15.0a0|>=1.13,<1.14.0a0|>=1.11,<1.12.0a0']
  - numba -> python[version='>=3.8,<3.9.0a0'] -> pip
  - scikit-learn -> numpy[version='>=1.11.3,<2.0a0|>=1.14.6,<2.0a0|>=1.16.6,<2.0a0|>=1.9.3,<2.0a0']
  - scikit-learn -> python[version='>=3.7,<3.8.0a0'] -> pip
  - scikit-learn -> scipy[version='>=1.1.0'] -> numpy[version='>=1.15.1,<2.0a0']

The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package python_abi conflicts for:
pip -> setuptools -> python_abi=3.8[build=*_cp38]
cython -> python_abi=3.8[build=*_cp38]
numba -> numpy[version='>=1.16.6,<2.0a0'] -> python_abi=3.8[build=*_cp38]
scikit-learn -> numpy[version='>=1.16.6,<2.0a0'] -> python_abi=3.8[build=*_cp38]
numpy -> python_abi=3.8[build=*_cp38]

Package setuptools conflicts for:
numba -> setuptools
scikit-learn -> joblib[version='>=0.11'] -> setuptools
pip -> setuptools
cython -> setuptools
python=3.8 -> pip -> setuptools

Package _libgcc_mutex conflicts for:
numpy -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1|0.1',build='main|conda_forge']
cython -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1|0.1',build='main|conda_forge']
python=3.8 -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1|0.1',build='main|conda_forge']
numba -> _openmp_mutex[version='>=4.5'] -> _libgcc_mutex[version='*|0.1',build='main|conda_forge|main']
scikit-learn -> _openmp_mutex -> _libgcc_mutex[version='*|0.1',build='main|conda_forge|main']

Package libgfortran4 conflicts for:
scikit-learn -> scipy[version='>=1.1.0'] -> libgfortran4[version='>=7.5.0']
numpy -> libgfortran-ng[version='>=7,<8.0a0'] -> libgfortran4=7.5.0

Package pip conflicts for:
python=3.8 -> pip
cython -> python[version='>=3.8,<3.9.0a0'] -> pip
numpy -> python[version='>=3.8,<3.9.0a0'] -> pip

Package certifi conflicts for:
cython -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
pip -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
numba -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']

Package libgomp conflicts for:
numba -> _openmp_mutex[version='>=4.5'] -> libgomp[version='>=7.5.0']
scikit-learn -> _openmp_mutex -> libgomp[version='>=7.5.0']

Package wheel conflicts for:
python=3.8 -> pip -> wheel
pip -> wheelThe following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.31=0
  - feature:|@/linux-64::__glibc==2.31=0
  - cython -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - numba -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - numpy -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - python=3.8 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
  - scikit-learn -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.31

DATA add ill-conditionned simulated data

As discussed in this comment from sklearn, when the features of the dataset are not scaled, there can be slow convergence of optimization methods.

Adding an example which such ill-conditioned matrix would be very interesting.
The data generation mechanism is (quick extract, check this before coding :) ):

from sklearn.datasets import make_low_rank_matrix

n_samples, n_features = 1000, 10000

w_true = rng.randn(n_features)

X = make_low_rank_matrix(n_samples, n_features, random_state=rng)
X[:, 0] *= 1e3
X[:, -1] *= 1e3

z = X @ w_true + 1
z += 1e-1 * rng.randn(n_samples)

# Balanced binary classification problem
y = (z > np.median(z)).astype(np.int32)

Add unpenalized logistic regression

The problem sets of logistic regression all have a penalty. It would be very interesting, at least to me, to add the zero penalty case.

Note: For n_features > n_samples, like the 20 news dataset, this is real fun (from an optimization point of view).

[ENH] Cannot run a single solver from sklearn

We cannot run a single solver of sklearn at a time. The following command does not work

benchopt run ./benchmark_logreg_l2 -s sklearn[lbfgs]

and returns

Usage: benchopt run [OPTIONS] BENCHMARK

Error: Invalid value: Patterns ['sklearn[lbfgs]'] did not matched any solver.
Available solvers are:
- Lightning
- sklearn[liblinear]
- sklearn[newton-cg]
- sklearn[lbfgs]

which looks contradictory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.