Giter Club home page Giter Club logo

bolasso's Introduction

bolasso

R-CMD-check pkgdown Codecov test coverage CRAN status

The goal of bolasso is to implement model-consistent Lasso estimation via the bootstrap [1].

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("dmolitor/bolasso")

Usage

To illustrate the usage of bolasso, we’ll use the Pima Indians Diabetes dataset to determine which factors are important predictors of testing positive for diabetes. For a full description of the input variables, see the link above.

Load requisite packages and data

library(bolasso)

data(PimaIndiansDiabetes, package = "mlbench")

# Quick overview of the dataset
str(PimaIndiansDiabetes)
#> 'data.frame':    768 obs. of  9 variables:
#>  $ pregnant: num  6 1 8 1 0 5 3 10 2 8 ...
#>  $ glucose : num  148 85 183 89 137 116 78 115 197 125 ...
#>  $ pressure: num  72 66 64 66 40 74 50 0 70 96 ...
#>  $ triceps : num  35 29 0 23 35 0 32 0 45 0 ...
#>  $ insulin : num  0 0 0 94 168 0 88 0 543 0 ...
#>  $ mass    : num  33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 0 ...
#>  $ pedigree: num  0.627 0.351 0.672 0.167 2.288 ...
#>  $ age     : num  50 31 32 21 33 30 26 29 53 54 ...
#>  $ diabetes: Factor w/ 2 levels "neg","pos": 2 1 2 1 2 1 2 1 2 2 ...

First, we run 100-fold bootstrapped Lasso with the glmnet implementation. We can get a rough estimate of the elapsed time using system.time().

system.time({
  model <- bolasso(
    diabetes ~ .,
    data = PimaIndiansDiabetes,
    n.boot = 100, 
    implement = "glmnet",
    family = "binomial"
  )
})
#> Loaded glmnet 4.1-3
#>    user  system elapsed 
#>   19.32    0.14   19.58

We can get a quick overview of the model by printing the bolasso object.

model
#> ------------- 100-fold bootstrapped Lasso -------------
#> 
#> Model matrix dimensions:
#>    - 8 Predictors
#>    - 768 Observations
#> 
#> Selected variables:
#>    - 6/8 predictors selected with 90% threshold
#>    - 4/8 predictors selected with 100% threshold

Extracting selected variables

Next, we can extract all variables that were selected in 90% and 100% of the bootstrapped Lasso models. We can also pass any relevant arguments to predict on the cv.glmnet or cv.gamlr model objects. In this case we will use the lambda value that minimizes OOS error.

selected_vars(model, threshold = 0.9, select = "lambda.min")
#> # A tibble: 7 x 2
#>   variable  mean_coef
#>   <chr>         <dbl>
#> 1 Intercept   -8.15  
#> 2 pregnant     0.119 
#> 3 glucose      0.0348
#> 4 pressure    -0.0113
#> 5 mass         0.0821
#> 6 pedigree     0.849 
#> 7 age          0.0138

selected_vars(model, threshold = 1, select = "lambda.min")
#> # A tibble: 5 x 2
#>   variable  mean_coef
#>   <chr>         <dbl>
#> 1 Intercept   -8.15  
#> 2 pregnant     0.119 
#> 3 glucose      0.0348
#> 4 mass         0.0821
#> 5 pedigree     0.849

Plotting selected variables

We can also quickly plot the selected variables at the 90% and 100% threshold values.

plot(model, threshold = 0.9)

plot(model, threshold = 1)

Parallelizing bolasso

We can execute bolasso in parallel via the future package. To do so we can copy the code from above with only one minor tweak shown below.

future::plan("multisession")

We can now run the code from above, unaltered, and it will execute in parallel.

system.time({
  model <- bolasso(
    diabetes ~ .,
    data = PimaIndiansDiabetes,
    n.boot = 100, 
    implement = "glmnet",
    family = "binomial"
  )
})
#>    user  system elapsed 
#>    0.17    0.03    5.58

References

[1] Bach, Francis. “Bolasso: Model Consistent Lasso Estimation through the Bootstrap.” ArXiv:0804.1302 [Cs, Math, Stat], April 8, 2008. https://arxiv.org/abs/0804.1302.

bolasso's People

Contributors

dmolitor avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.