juliaai / mljlinearmodels.jl Goto Github PK

Generalized Linear Regressions Models (penalized regressions, robust regressions, ...)

License: MIT License

Julia 100.00%

regressions julia-language mlj sparse-regression lasso-regression ridge-regression robust-regression elastic-net logistic-regression multinomial-regression

mljlinearmodels.jl's Introduction

MLJLinearModels.jl

[Linux]	Coverage	Documentation

This is a package gathering functionalities to solve a number of generalised linear regression/classification problems which, inherently, correspond to an optimisation problem of the form

$$ L(y, X\theta) + P(\theta) $$

where:

$L$ is a loss function
$X$ is the $n \times p$ matrix of training observations, where $n$ is the number of observations (sample size) and $p$ is the number of features (dimension)
$\theta$ the length $p$ vector of weights to be optimized
$P$ is a penalty function

Additional regression/classification methods which do not directly correspond to this formulation may be added in the future.

The core aims of this package are:

make these regressions models "easy to call" and callable in a unified way,
interface with MLJ.jl,
focus on performance including in "big data" settings exploiting packages such as Optim.jl, IterativeSolvers.jl,
use a "machine learning" perspective, i.e.: focus essentially on prediction, hyper-parameters should be obtained via a data-driven procedure such as cross-validation.

Head to the quickstart section of the docs to see how to use this package.

NOTES

This section is only useful if you're interested in implementation details or would like to help extend the library. For usage instruction please head to the docs.

Implemented

Regressors	Formulation¹	Available solvers	Comments
OLS & Ridge	L2Loss + 0/L2	Analytical² or CG³
Lasso & Elastic-Net	L2Loss + 0/L2 + L1	(F)ISTA⁴
Robust 0/L2	RobustLoss⁵ + 0/L2	Newton, NewtonCG, LBFGS, IWLS-CG⁶	no scale⁷
Robust L1/EN	RobustLoss + 0/L2 + L1	(F)ISTA
Quantile⁸ + 0/L2	RobustLoss + 0/L2	LBFGS, IWLS-CG
Quantile L1/EN	RobustLoss + 0/L2 + L1	(F)ISTA

"0" stands for no penalty
Analytical means the solution is computed in "one shot" using the \ solver,
CG = conjugate gradient
(Accelerated) Proximal Gradient Descent
Huber, Andrews, Bisquare, Logistic, Fair and Talwar weighing functions available.
Iteratively re-Weighted Least Squares where each system is solved iteratively via CG
In other packages such as Scikit-Learn, a scale factor is estimated along with the parameters, this is a bit ad-hoc and corresponds more to a statistical perspective, further it does not work well with penalties; we recommend using cross-validation to set the parameter of the Huber Loss.
Includes as special case the least absolute deviation (LAD) regression when δ=0.5.

Classifiers	Formulation	Available solvers	Comments
Logistic 0/L2	LogisticLoss + 0/L2	Newton, Newton-CG, LBFGS	`yᵢ∈{±1}`
Logistic L1/EN	LogisticLoss + 0/L2 + L1	(F)ISTA	`yᵢ∈{±1}`
Multinomial 0/L2	MultinomialLoss + 0/L2	Newton-CG, LBFGS	`yᵢ∈{1,...,c}`
Multinomial L1/EN	MultinomialLoss + 0/L2 + L1	ISTA, FISTA	`yᵢ∈{1,...,c}`

Unless otherwise specified:

Newton-like solvers use Hager-Zhang line search (default in Optim.jl)
ISTA, FISTA solvers use backtracking line search and a shrinkage factor of β=0.8

Note: these models were all tested for correctness whenever a direct comparison with another package was possible, usually by comparing the objective function at the coefficients returned (cf. the tests):

(against scikit-learn): Lasso, Elastic-Net, Logistic (L1/L2/EN), Multinomial (L1/L2/EN)
(against quantreg): Quantile (0/L1)

Systematic timing benchmarks have not been run yet but it's planned (see this issue).

Current limitations

The models are built and tested assuming n > p; if this doesn't hold, tricks should be employed to speed up computations; these have not been implemented yet.
CV-aware code not implemented yet (code that re-uses computations when fitting over a number of hyper-parameters); "Meta" functionalities such as One-vs-All or Cross-Validation are left to other packages such as MLJ.
No support yet for sparse matrices.
Stochastic solvers have not yet been implemented.
All computations are assumed to be done in Float64.

Possible future models

Future

Model	Formulation	Comments
Group Lasso	L2Loss + ∑L1 over groups	⭒
Adaptive Lasso	L2Loss + weighted L1	⭒ A
SCAD	L2Loss + SCAD	A, B, C
MCP	L2Loss + MCP	A
OMP	L2Loss + L0Loss	D
SGD Classifiers	*Loss + No/L2/L1 and OVA	SkL

(⭒) should be added soon

Other regression models

There are a number of other regression models that may be included in this package in the longer term but may not directly correspond to the paradigm Loss+Penalty introduced earlier.

In some cases it will make more sense to just use GLM.jl.

Sklearn's list: https://scikit-learn.org/stable/supervised_learning.html#supervised-learning

Model	Note	Link(s)
LARS	--
Quantile Regression	--	Yang et al, 2013, QuantileRegression.jl
L∞ approx (Logsumexp)	--	slides
Passive Agressive	--	Crammer et al, 2006 SkL
Orthogonal Matching Pursuit	--	SkL
Least Median of Squares	--	Rousseeuw, 1984
RANSAC, Theil-Sen	Robust reg	Overview RANSAC, SkL, SkL, More Ransac
Ordinal regression	need to figure out how they work	E
Count regression	need to figure out how they work	R
Robust M estimators	--	F
Perceptron, MIRA classifier	Sklearn just does OVA with binary in SGDClassif	H
Robust PTS and LTS	--	PTS LTS

What about other packages

While the functionalities in this package overlap with a number of existing packages, the hope is that this package will offer a general entry point for all of them in a way that won't require too much thinking from an end user (similar to how someone would use the tools from sklearn.linear_model). If you're looking for specific functionalities/algorithms, it's probably a good idea to look at one of the packages below:

SparseRegression.jl
Lasso.jl
QuantileRegression.jl
(unmaintained) Regression.jl
(unmaintained) LARS.jl
(unmaintained) FISTA.jl
(unmaintained) RobustLeastSquares.jl

There's also GLM.jl which is more geared towards statistical analysis for reasonably-sized datasets and does (as far as I'm aware) lack a few key functionalities for ML such as penalised regressions or multinomial regression.

References

Minka, Algorithms for Maximum Likelihood Regression, 2003. For a review of numerical methods for the binary Logistic Regression.
Beck and Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, 2009. For the ISTA and FISTA algorithms.
Raman et al, DS-MLR: Exploiting Double Separability for Scaling up DistributedMultinomial Logistic Regression, 2018. For a discussion of multinomial regression.
Robust regression
- Mastronardi, Fast Robust Regression Algorithms for Problems with Toeplitz Structure, 2007. For a discussion on algorithms for robust regression.
- Fox and Weisberg, Robust Regression, 2013. For a discussion on robust regression and the IWLS algorithm.
- Statsmodels, M Estimators for Robust Linear Modeling. For a list of weight functions beyond Huber's.
- O'Leary, Robust Regression Computation using Iteratively Reweighted Least Squares, 1990. Discussion of a few common robust regressions and implementation with IWLS.

Dev notes

Probit Loss --> via StatsFuns // Φ(x) (normcdf); ϕ(x) (normpdf); -xϕ(x)
Newton, LBFGS take linesearches, seems NewtonCG doesn't
several ways of doing backtracking (e.g. https://archive.siam.org/books/mo25/mo25_ch10.pdf); for FISTA many though see http://www.seas.ucla.edu/~vandenbe/236C/lectures/fista.pdf; probably best to have "decent safe defaults"; also this for FISTA http://150.162.46.34:8080/icassp2017/pdfs/0004521.pdf ; https://github.com/tiepvupsu/FISTA#in-case-lf-is-hard-to-find ; https://hal.archives-ouvertes.fr/hal-01596103/document; not so great https://github.com/klkeys/FISTA.jl/blob/master/src/lasso.jl ;
https://www.ljll.math.upmc.fr/~plc/prox.pdf
proximal QN http://www.stat.cmu.edu/~ryantibs/convexopt-S15/lectures/24-prox-newton.pdf; https://www.cs.utexas.edu/~inderjit/public_papers/Prox-QN_nips2014.pdf; https://github.com/yuekai/PNOPT; https://arxiv.org/pdf/1206.1623.pdf
group lasso http://myweb.uiowa.edu/pbreheny/7600/s16/notes/4-27.pdf

mljlinearmodels.jl's People

Contributors

Stargazers

Watchers

Forkers

erensu okonsamuel lodewijkbrand aviatesk stephanegaiffas adityasaini70 jbrea rikhuijzer adienes takashi-takahashi rohitpandey13 martinuzzifrancesco

mljlinearmodels.jl's Issues

refs on accelerated ADMM

(2014?) ftp://ftp.math.ucla.edu/pub/camreport/cam12-35.pdf -- heavily cited
(2015?) seems more general https://www-users.cs.umn.edu/~baner029/papers/15/A2DM2_kdd15.pdf
http://proceedings.mlr.press/v37/nishihara15.pdf

throw error much ealier for LogisticClassifier

@tlienart . Sorry to disturb you again. I think a more informative error message should be thrown.
I ran LogisticClassifier using the default values on a dataset having nclasses > 2. it thows the following Error

ERROR: DimensionMismatch("new dimensions (4, 4) must be consistent with array size 4")

Although setting multi_class=true works. i think there should be a more informative error message.

Bump [compat] MLJModelInterface="^0.1,^0.2"

Here the changes since 0.1.8, which I doubt break anything in MLJLinearModels:

Bring in is_same_except and similar gizmos from MLJBase; see JuliaAI/MLJBase.jl#178
Let all models share same traits (#19, JuliaAI/MLJBase.jl#163)
Add save and restore stubs for serialisation

Offer use of lsqr for Ridge (?)

The only issue is that would need to form X_ but can offer the option.

Other models / solvers

(see readme)

Ridge classification
LARS see LARS.jl
Quantile regression see [QuantileRegression.jl] (https://github.com/pkofod/QuantileRegression.jl) see also #4 and #5
glmnet see GLMNet wrapper
prox Newton or L-BFGS (see also https://github.com/yuekai/PNOPT; and paper https://arxiv.org/pdf/1206.1623.pdf as well as https://papers.nips.cc/paper/5384-proximal-quasi-newton-for-computationally-intensive-l1-regularized-m-estimators.pdf)
count -- http://hua-zhou.github.io/media/pdf/ZhangZhouZhouSun17mglm.pdf
Ridge classifier -- https://stackoverflow.com/questions/53911663/what-does-sklearn-ridgeclassifier-do/53912015
TheilSen -- https://en.wikipedia.org/wiki/Theil–Sen_estimator

Other stuff

add option for size of tape in LBFGS solver(s)
add option(s) for linesearches
add and test options generally for solvers

NewtonCG is too slow

It shouldn't be; this may be due to issues with the Krylov Solver; consider rewriting your own NewtonCG using the cg from IterativeSolvers & starting with a simple backtracking; maybe later with Hager-Zhang.

schema(X) fails

at https://github.com/alan-turing-institute/MLJLinearModels.jl/blob/master/src/mlj/interface.jl#L22 the function schema(X) fails. @tlienart

add possibility to specify qr/chol in analytical

also use Symmetric for good measure

See this discussion for some insight on things https://discourse.julialang.org/t/efficient-way-of-doing-linear-regression/31232/26

Move away from global pointers in order to be threadsafe

Replace scratch space by allocations in the factory functions
pass whatever is needed to apply_Xt etc

this will likely make the code a bit less readable in some places but is probably better practice and shouldn't affect performance, additionally it should help avoid non-thread safe stuff

Bump [compat] MLJBase="^0.8.1"

To bring MLJLinearModels into synch with the other repos.

Documentation

(see branch docs)

the fact that, when we can, we avoid doing hcat(X, 1)
use IterativeSolvers.cg and not IterativeSolvers.lsqr as it allows specifying the operator as a linear map which is efficient and avoids copying when having to add a column for X; anyway it should be identical apart from pathological cases
robust regression with concomitant scale estimation, not done yet
LAD/quantile explain weight clipping for pathological zero residual
generally detail how algos do IWLS etc so that it's 100% clear what is being done
- Newton
- NewtonCG (explain briefly then link to Optim)
- LBFGS (explain briefly then link to Optim)
- IWLSCG

Types workflow

What happens if input data is entered as integer? as float32, how about the target? verify that it's consistent (at least with float64)

Option to scale loss by `n`

In case users prefer this;

that means though that things should be ScaledLosss{...} in fit methods etc, it shouldn't change much otherwise.

RidgeCV, can also use Hessenberg factorisation

see https://github.com/JuliaLang/julia/pull/31853/files#diff-3a896b2398ad83022a6e197921c125c6R274-R277

Though it may be overkill to compute all of the theta for all lambdas and better to get a proxy for the GCV error and then compute just for that

Stochastic solvers

Related code

(depr) SGDOptim.jl discussed in here
(tutorial) sgd.jl

Quantile loss

One day there may be a refactoring to use LossFunctions, in the mean time it's irrelevant but anyway:

-( 1 - tau) * r if r < 0
tau * r otherwise

https://www.stat.berkeley.edu/~mmahoney/pubs/quantile-icml13.pdf

For 1/2 it's basically l1

Multinomial regressor requires integer categories, and greater than zero

Hi,
I faced this issue when using the multinomial regression (aka multiclass Logistic Regression).
In the following, I show an example with MNIST classification.

I have to shift all categories of 1 integer (0->1, 1->2 .... 9->10) otherwise it gave the error I report in the end notes

# Classify MNIST digits with a simple multi-layer-perceptron
using Flux.Data.MNIST
using MLJLinearModels
# Get MNIST dataset and transpose for (records, features)
imgs = MNIST.images()
X = Array(transpose(hcat(float.(reshape.(imgs, :))...) )
# MNIST labels: Categorical labels must be 1...c, hence add .+1 to each label
labels = MNIST.labels() .+1
# and the number of classes
n_classes = length(Set(labels))
n_features = size(X,2)
# The MNIST database does not need the intercept
intercept = false
# deploy MultinomialRegression from MLJLinearModels, λ being the strenght of the reguliser
mnr = MultinomialRegression(λ; fit_intercept=intercept)
# Fit the model
θ  = fit(mnr, X, labels)
# The model parameters are organized such we can apply X⋅θ, the following is only to clarify
params = reshape(θ, n_features +Int(intercept), n_classes)
# Get the predictions X⋅θ 
preds = MLJLinearModels.softmax(MLJLinearModels.apply_X(X,θ,n_classes))
# map each vector to its maximal element 
targets = map(x->argmax(x),eachrow(preds))
#and evaluate the model over the labels
scores = 1- sum(targets-labels)/length(preds)

Error with 0 among the categories:


DimensionMismatch("new dimensions (785, 10) must be consistent with array size 7056")
(::Base.var"#throw_dmrsa#197")(::Tuple{Int64,Int64}, ::Int64) at reshapedarray.jl:41
reshape at reshapedarray.jl:45 [inlined]
reshape at reshapedarray.jl:116 [inlined]
apply_X!(::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,1}, ::Int64) at utils.jl:66
(::MLJLinearModels.var"#102#103"{GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}},Array{Float64,2},Array{Int64,1},Int64,Int64,Int64,Float64})(::Float64, ::Array{Float64,1}, ::Array{Float64,1}) at d_logistic.jl:149
(::NLSolversBase.var"#61#62"{NLSolversBase.InplaceObjective{Nothing,MLJLinearModels.var"#102#103"{GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}},Array{Float64,2},Array{Int64,1},Int64,Int64,Int64,Float64},Nothing,Nothing,Nothing},Float64})(::Array{Float64,1}, ::Array{Float64,1}) at incomplete.jl:45
value_gradient!!(::NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at interface.jl:82
initial_state(::Optim.LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.Options{Float64,Nothing}, ::NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at l_bfgs.jl:158
optimize(::NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Optim.LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.Options{Float64,Nothing}) at optimize.jl:33
#optimize#93 at interface.jl:116 [inlined]
optimize(::NLSolversBase.InplaceObjective{Nothing,MLJLinearModels.var"#102#103"{GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}},Array{Float64,2},Array{Int64,1},Int64,Int64,Int64,Float64},Nothing,Nothing,Nothing}, ::Array{Float64,1}, ::Optim.LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.Options{Float64,Nothing}) at interface.jl:115
optimize(::NLSolversBase.InplaceObjective{Nothing,MLJLinearModels.var"#102#103"{GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}},Array{Float64,2},Array{Int64,1},Int64,Int64,Int64,Float64},Nothing,Nothing,Nothing}, ::Array{Float64,1}, ::Optim.LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}) at interface.jl:115
_fit(::GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}}, ::LBFGS, ::Array{Float64,2}, ::Array{Int64,1}) at newton.jl:114
#fit#144(::LBFGS, ::typeof(fit), ::GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}}, ::Array{Float64,2}, ::Array{Int64,1}) at default.jl:48
fit(::GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}}, ::Array{Float64,2}, ::Array{Int64,1}) at default.jl:38
top-level scope at test_LR.jl:159

Fwd / Backward selection + regression metrics

All regressions have residuals, there could be metrics on the residuals. In most cases it would make more sense to just implement them in MLJBase and use them here
FWD/BWD selection procedure is a bit dumb and expensive but people use it I guess.

Expose more of iterativesolvers' options (eg LMSR, LSQR, etc)

Try adding on the fly scale estimation

basically just re-scale the calls by median(r). This may be what matlab does though a bit unclear.

Add tests vs rlm

https://stat.ethz.ch/R-manual/R-patched/library/MASS/html/rlm.html

Expose example creation

Expose the functions allowing to to create examples (continuous, binary etc)

Tests LAD/Quantile against R package

Investigate old "robustleastsquares" package

take a look at https://github.com/FugroRoames/RobustLeastSquares.jl

Cyanure (Python)

http://julien.mairal.org/cyanure/welcome.html

Add option to toggle penalty of intercept term

Update to MLJBase 0.9

No other action required. (The MLJBase API has not changed since 0.8 as far as MLJLinearModels is concerned.)

At present, if the update is not executed, MLJLinearModels will, it seems, disappear in the next @update to the MLJ registry.

Improved solvers

Quantile, LAD regression

ADMM should work but if requires adapting rho then needs refactoring of H often which is wasteful, could imagine doing CG for that bit but that would also end up being expensive see also #8
MM and other algorithms see issue #3 and #4
IP Method with or without pre-proc ox-code
Frisch-Newton (for L1 reg may be good) ref

NewtonCG, IWLSCG

should add a field where the user can specify :cg or :minres or something else, unlikely it would make a big difference in perf though.

Hoist all allocations out of factory

it should be possible to just pass the scratch spaces to the factory.

Add precise tests for all d* functions

At the moment indirectly tested but they may actually be incorrect but look correct.

Performance benchmarking (fit)

Before starting this, need a way to systematically:

trace number of function calls, number of gradient calls, number of hessian calls
have a way to stop with universal criterion OR show a plot where the objective function decreases and eventually hits the same value as that from ref package.

Against scikitlearn

expect on par or better

Against quantreg

expect a bit worse (quantreg is effectively in cpp)

quantile regression

Allow `rng` keyword as alias to `seed`

For consistency with MLJ

Use fghv! in Hv! for MultinomialLoss

use JuliaNLSolvers/Optim.jl#742 in d_logistic.jl Hv!

ADMM/FADMM

It still lives on the branch admm-bk, should probably refactor thinking:

if adapting rho need to refactor H
probably should check on a simple problem first to see if works
simple matlab code directly translated does not work

SGJ comment

https://discourse.julialang.org/t/efficient-way-of-doing-linear-regression/31232

LOOCV / GCV for ridge / logistic

Ridge

refs

https://arxiv.org/pdf/1509.09169.pdf
GCV and GCVPack (probably a good place to start): http://pages.stat.wisc.edu/~wahba/stat860public/pdf1/775r.pdf
page 22 https://elib.uni-stuttgart.de/bitstream/11682/9236/1/BA_fertig_QianKun_25072017.pdf
slides https://www.wias-berlin.de/workshops/validation2010/workshop_slides/talk_goeman.pdf

Basically just have to minimise a criterion that can be evaluated in constant time after the first fit.

Similar stuff can be obtained for LOOCV it seems

proper refs

http://w3.atomki.hu/~efo/hornyak/Tikhonov_references/Technometrics_Golub_Heath_Wahba

Logistic: https://arxiv.org/pdf/1711.05420.pdf

Add possibility to take symbols to match sklearn

Context: alan-turing-institute/MLJ.jl#387

Fix: allow symbols to be passed around and just re-interpret them as string.

Feature Requests

@tlienart

All Subset (Best K Subset):
I've got parsimonious code for All Subset Regression which might have a nice home in this package that I'd like to share.
While it's much faster than R's Leaps pkg, I would include a strong warning to users:
This optimization problem is sooo not convex. Avoid for p>20.
Approximate All Subset (MIO):
As you probably know,
Bertsimas etal (2016): show how to use MIO to approximately train all subset regression for p =1000 in minutes.
Bertsimas et al (2019): do this for p=100k in minutes.
The corresponding Julia package is SubsetSelection.jl.
Also see forthcoming Julia package MIP-Boost.
It would be really awesome if these linear models were featured in MLJ.
Relaxed Lasso:
Hastie et al 2017: responded that relaxed Lasso matches Lasso in low-SNR scenarios & beats All Subset in high-SNR scenarios
glmnet: now features "relaxed Lasso"
Would this be difficult to add to MLJLinearModels? (maybe we can work on it together?)
Linear Interactions:
There are some cool new packages for interactions in linear models.
I find these compare in out-of-sample fit w/ non-linear models (xgboost etc, no joke)
R: sprintr & HierNet
Python: HierScale
Nonconvex penalties (MCP SCAD):
https://github.com/joshday/SparseRegression.jl/blob/master/README.md
Forward selection/stepwise regression:
(stackoverflow, has Julia code)
Textbook code
Ensembles of Penalized Linear Models:
Christidis et al 2019: Paper, old code EnsembleEN, new code SplitReg

Feature names for LinearRegressor fitted_params

Currently, the fitted parameters (coefficient and intercept) for LinearRegressor does not include feature names, which makes mapping coefficients back to their input columns complicated when combined with transformations such as OneHotEncoder. E.g.

julia> fps = fitted_params(selected_model)

julia> fps.machines
3-element Array{Any,1}:
 NodalMachine{UnivariateStandardizer} @ 5…02
 NodalMachine{LinearRegressor} @ 6…58
 NodalMachine{OneHotEncoder} @ 8…55

julia> fps.fitted_params_given_machine[fps.machines[2]]
(coefs = [XXX],
 intercept = XXX,)

I think this would be a useful diagnostic feature to have something like

julia> fps.fitted_params_given_machine[fps.machines[2]]
(coefs = [XXX],
 intercept = XXX,
 names = [XXX])

Example of logistic regression

Hey looks like a nice library. I would appreciate an example of a logistic regression. Also i was wondering what 0/L2 and L1/EN mean, i don't know which one to choose. Thank you

Explain that all models are deterministic and can wrap if required

Reduce allocations more by keeping a cache

This is micro-optimization for later on but the functions that are called often like fgh! and the functions therein, could probably have a cache, would have to see if that's digestible to Optim but for instance it seems wasteful to allocate X*theta or the associated residual every time it's called.

Update [compat] MLJBase="^0.10" (or higher if now appropriate)

alan-turing-institute/MLJ.jl#430

Quantile reg

algos

Maj Min algorithm for non-reg case (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.206.1351&rep=rep1&type=pdf, also https://arxiv.org/pdf/1709.04126.pdf)
ADMM for gen reg case (https://arxiv.org/pdf/1709.04126.pdf)
L1 penalised: Frisch Newton http://www.econ.uiuc.edu/~roger/research/sparse/fn3.pdf
non-reg also IRWLS http://users.stat.umn.edu/~sandy/courses/8053/handouts/robust.pdf

ADMM, MM, and CD approaches, while MM and CD are faster and ADMM slower than the IP algorithm available in quantreg. The results so far suggest that the MM algorithm is the best-suited for non-regularized (composite) quantile regression among the four methods tested, especially for data sets with large n and relatively small p. In regularized quantile regression, all methods perform similarly in terms of variable selection, but CD and ADMM show clear superiority in run time, particularly relative to the IP and MM methods when p is large. In the case of regularized composite quantile regression, CD and ADMM dis

(...)

Applying existing optimization algorithms to (composite) quantile regression requires a non-trivial reformulation of the problem due to the non-linearity and non-differentiability of the loss and regularization terms of the objective. The well-known quantreg package for R (Koenker, 2017) uses an interior point (IP) approach for quantile and composite quantile regression with the option of l1 (lasso) regularization for the former and no regulariza- tion options for the latter. Although advanced IP algorithms in quantreg, such as the one using prediction-correction (Mehrotra, 1992) for non-regularized quantile regression, have greatly improved upon earlier attempts using simplex methods, the time spent on matrix inversion in IP approaches (Chen and Wei, 2005) motivates us to seek faster algorithms for quantile and composite quantile regression, particularly for high-dimensional data where regularization is required. In addition, following the conjectures of Fan and Li (2001), Zou (2006) showed lasso variable selection—currently the most commonly-implemented penalty for quantile regression—to be inconsistent in certain situations and presented adaptive lasso regularization as a solution. Our work in the present paper is thus motivated by both a search for faster quantile regression algorithms as well as the lack of publicly-available meth- ods for adaptive-lasso regularized quantile and composite quantile regression, particularly for high-dimensional data.

in https://arxiv.org/pdf/1709.04126.pdf (cqreg package in R)

refs

Hunter & Lange: MM algo for QR http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.206.1351&rep=rep1&type=pdf

Other penalties

References

scad, mcp, adaptive lasso ref
trimmed l1 ref

A PkgEval run for a PR which changes the generated numbers for randn! indicates that the tests of this package might fail in Julia 1.5 (and on Julia current master). Apologies if this is a false positive.
cf. https://github.com/JuliaCI/NanosoldierReports/blob/7de24e455342298cbef56826b5827f0d7640d2c1/pkgeval/by_hash/b89e35c_vs_098ef24/logs/MLJLinearModels/1.5.0-DEV-71a4a114c2.log

MAD/LAD regression

https://statweb.stanford.edu/~jtaylo/courses/stats203/notes/robust.pdf note discussion of scale

admm

could probably also benchmark against sequence of Huber estimators with delta going to zero

Performance benchmarking (base)

check allocations and speed of elementary step such as

apply_X!
apply_Xt!
get_residuals!
...

juliaai / mljlinearmodels.jl Goto Github PK

mljlinearmodels.jl's Introduction

MLJLinearModels.jl

NOTES

Implemented

Current limitations

Possible future models

Future

Other regression models

What about other packages

References

Dev notes

mljlinearmodels.jl's People

Contributors

Stargazers

Watchers

Forkers

mljlinearmodels.jl's Issues

Other stuff

Related code

Quantile, LAD regression

NewtonCG, IWLSCG

Against scikitlearn

Against quantreg

Ridge

proper refs

algos

refs

References

Recommend Projects

Recommend Topics

Recommend Org