Giter Club home page Giter Club logo

Comments (12)

jbrea avatar jbrea commented on June 2, 2024 1

I think users (like me) will expect that the default behavior of the algorithm, especially as simple as a logistic regression, is to work out of the box and provide a reasonnable fit with the default hyperparameters.

I don't think you can expect a default that is not scaled to work well across the board for users. More generally I don't think you can expect a good default for this full stop. These parameters must be tuned and the tuning should not be affected by sample size.

I agree with both. I think scale_penalty_with_samples = true is the better default. However, currently the default lambda = 1 means rather strong regularisation when the input is (close to) standardised, which may be the most common case (see below). In this case, users usually have to change lambda to avoid underfitting. We could also set the default to lambda = 1e-8 (or some other small value), with the argument that it basically doesn't affect the unregularised solution in the non-separable case while still avoiding runaway solutions in the separable case. Users would usually have to deal with overfitting.

So, if we have good evidence that 1) (close to) standardised input is the most common case and 2) the majority of users perceives (potentially) overfitting as a more reasonable fit than (potentially) underfitting, I would argue for lowering the default lambda.


If we write the solution of logistic regression as $\theta = \beta \tilde\theta$, where $|\tilde\theta|_2 = 1$, assume perfect separability and uncorrelated predictors with mean 0 and variance 1, therefore roughly $y_i\tilde\theta'x_i \approx 1$, we can find $\beta$ by minimising $\log(1 + \exp(-\beta)) + \frac{\lambda}2\beta^2$. For $\lambda = 1$ the solution is approximately $\beta \approx 0.8$ and therefore the prediction for the correct class approximately $1/(1 + \exp(-\beta)) \approx 0.7$. This looks heavily regularised for a separable problem. For lambda = 1e-8 the prediction for the correct class would be basically 1.

from mljlinearmodels.jl.

jbrea avatar jbrea commented on June 2, 2024 1

why not completely lambda = 0

We could do this. I just don't like too much the fact that, in the separable case, the solution would have infinite norm, $|\theta|_2 = \infty$, which is never reached by any optimiser, obviously. Therefore I would prefer the default to be at least lambda = eps().

from mljlinearmodels.jl.

ablaom avatar ablaom commented on June 2, 2024 1

@tlienart This is breaking, no? I think we need a breaking (minor) release not a patch. Or am I missing something?

from mljlinearmodels.jl.

ablaom avatar ablaom commented on June 2, 2024 1

Thanks @tlienart. I'm making a PR to General to yank 0.6.5 from the registry.

from mljlinearmodels.jl.

tlienart avatar tlienart commented on June 2, 2024

It's a convention on the objective function; the reason is to have the scale of the loss and the penalty be on the same grounds (so that if you have twice as much data, you don't have to change the regularisation) In the case of ridge for instance:

1/n ||y - Xb||^2 + lambda * ||b||^2

then this is equivalent to multiplying by n.

from mljlinearmodels.jl.

tlienart avatar tlienart commented on June 2, 2024

PS: wait, I'm confused, you made those changes didn't you? I don't think anyone touched that logic since you did.

ah no it's not you it's @jbrea maybe he can chip in if you have further questions.

Note: in any case I think that parameter is best obtained via hyperparameter optimisation.

from mljlinearmodels.jl.

olivierlabayle avatar olivierlabayle commented on June 2, 2024

Thanks for the explanation @tlienart, I think users (like me) will expect that the default behavior of the algorithm, especially as simple as a logistic regression, is to work out of the box and provide a reasonnable fit with the default hyperparameters. This new hyperparameter does seem to mess things up as far as I can see, the output is almost like a random biased coin toss. It would probably make more sense to default as false doesn't it? Moreover this would have been a non breaking change from 0.5.7 if I followed correctly the history.

from mljlinearmodels.jl.

tlienart avatar tlienart commented on June 2, 2024

Please have a look at #108 for the reasoning behind it, specifically the tuning.

I don't think you can expect a default that is not scaled to work well across the board for users. More generally I don't think you can expect a good default for this full stop. These parameters must be tuned and the tuning should not be affected by sample size.

from mljlinearmodels.jl.

tlienart avatar tlienart commented on June 2, 2024

Thanks, I like this suggestion

from mljlinearmodels.jl.

olivierlabayle avatar olivierlabayle commented on June 2, 2024

Also agree, why not completely lambda=0 which is vanilla logistic regression?

from mljlinearmodels.jl.

tlienart avatar tlienart commented on June 2, 2024

Thanks both for the discussion, default set to eps(), patch release under way.

from mljlinearmodels.jl.

tlienart avatar tlienart commented on June 2, 2024

Ok, would you mind doing it? Thanks!

Done! 5bb7c6d#commitcomment-80390857

from mljlinearmodels.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.