Giter Club home page Giter Club logo

parsnip's People

Contributors

bcjaeger avatar davisvaughan avatar emilhvitfeldt avatar grayskripko avatar hfrick avatar jtlandis avatar juliasilge avatar kiendang avatar klahrich avatar kscott-1 avatar ledell avatar malcolmbarrett avatar mdancho84 avatar mine-cetinkaya-rundel avatar oj713 avatar patr1ckm avatar pursuitofdatascience avatar qiushiyan avatar rorynolan avatar salim-b avatar schoonees avatar sharleenw avatar shosaco avatar simonpcouch avatar stefanbras avatar stevenpawley avatar t-kalinowski avatar tanho63 avatar tiagomaie avatar topepo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsnip's Issues

predict(..., type = "oob")

I couldn't find out-of-bag predictions. In case it will be added keep this issue.
A couple of examples

rand_forest(mode = "classification") %>% 
    fit(hp ~ ., data = mtcars, engine = "ranger") %>% 
    {.$fit$predictions[1:5]}
# [1] 127.44091 128.56689  94.72514 121.45245 165.98219


rand_forest(mode = "classification", others = list(probability = T)) %>% 
    fit(hp ~ ., data = mtcars, engine = "ranger") %>% 
    {.$fit$predictions[1:5, 1:5]}
#            110         93        175        105         245
# [1,] 0.24082457 0.024893872 0.19493817 0.01449105 0.040247785
# [2,] 0.25530948 0.038556354 0.16503503 0.01379800 0.039072892
# [3,] 0.14358238 0.000000000 0.02238186 0.02633872 0.002298851
# [4,] 0.05724574 0.053273285 0.10391786 0.08190140 0.009538507
# [5,] 0.09842348 0.009909852 0.16075958 0.04509576 0.079740338


rand_forest(mode = "classification")) %>% 
    fit(hp ~ ., data = mtcars, engine = "randomForest") %>% 
    {.$fit$predicted[1:5]}
#         Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout 
#         123.09100         123.47858          96.17599         125.62471         164.67905

Remove S3 from model functions

Previously, the idea was to figure out the mode from data, so there were different methods for formulas, recipes etc.

However, the mode is manually specified now so we don't need a class for the models (and different methods).

If we kept them, we would need to always specify the mode even when there is on'y one choice (e.g. logistic regression). For example:

> library(parsnip)
> logistic_reg()
Logistic Regression Model Specification (classification)


> logistic_reg(mixture = varying())
Error in varying() : This is a placeholder and should not be evaluated
>
> #we would need to do:
> logistic_reg(mode = "classification", mixture = varying())
Logistic Regression Model Specification (classification)

Main Arguments:
mixture:   varying()

Should "main" model spec args be protected from use in `others=`?

Meaning, should this be allowed?

> translate(linear_reg(penalty = 1, others = list(lambda = 1.5)), "glmnet")
Linear Regression Model Specification (regression)

Main Arguments:
  penalty = 1

Engine-Specific Arguments:
  lambda = 1.5

Computational engine: glmnet 

Model fit template:
glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
    lambda = 1, lambda = 1.5, family = "gaussian")

Should be able to prevent this with the use of protect = c("lambda") in linear_reg_glmnet_data

Regression Example: Error in current_env() : could not find function "current_env"

I was going through the Regression Example. Below is my code.

#devtools::install_github("topepo/modelgenerics")
#devtools::install_github("topepo/parsnip", dependencies=TRUE)
#install.packages("AmesHousing")
#install.packages("rsample")
#devtools::install_github("imbs-hl/ranger")
#install.packages("rlang")

library(parsnip)
library(AmesHousing)
library(tidyverse)
library(rsample)
library(ranger)

ames <- make_ames()
set.seed(4595)
data_split <- initial_split(ames, strata = "Sale_Price", p = 0.75)

ames_train <- training(data_split)
ames_test  <- testing(data_split)

rf_defaults <- rand_forest(mode = "regression")
rf_defaults

preds <- c("Longitude", "Latitude", "Lot_Area", "Neighborhood", "Year_Sold")

rf_xy_fit <- rf_defaults %>%
  fit(
    x = ames_train[, preds],
    y = log10(ames_train$Sale_Price),
    engine = "ranger"
  )

I get the following error

Error in current_env() : could not find function "current_env"

I thought this was an rlang issue to I removed and reinstalled rlang. Then I thought is was a session issue so I restarted R, still no luck. I was wondering if you had any insight into this error.

protect against common interface mistakes

  • Didn't use named arguments and wrong arg is picked up:
# `ovarian` gets picked up as a `recipe` argument
fit(tt, Surv(futime, fustat) ~ ecog.ps + rx, ovarian)
  • Directly try to fit without fit:
surv_reg(Surv(futime, fustat) ~ ecog.ps + rx, data = ovarian)

more options for argument control

Currently, there is a "protect" field that is a list of argument names that the uer should not be able to mess with. For example, with stats::glm, "data" should not be modified until fit is run and so on.

There should be at least one other option though:

  • some parameters we might want to always be included regardless of whether they have been modified from their original. Examples include "family" for logistic regression and "seed" for ranger.

intermediary constructors for different engines

This would help further modify default arguments as well as protect against common issues.

For example, randomForest has an option called importance that is used a lot and it takes a logical value. ranger has an argument of the same name but it takes character strings. A ranger-specific constructor (or function) can be used to protect against this problem.

Also, since some primary arguments are available for each engine (e.g. regularization in glm), we also need to intercept and/or modify these arguments when used inappropriately (instead of just ignoring them).

What is a model?

Coffee-addled rant here, but bear with me. I think it'll be really valuable as the tidymodels universe takes off to have a clear and well documented definition of what a model is.

In classic statistics land, if you have some data x that live in a space X, a model is a distribution P(X) indexed by parameters theta. In linear regression with three features, theta lives in R^3. Then a fit often refers P(X) where we've picked a particular theta to work with, and there's an isomorphism between R^3 and all possible fits.

(Aside: calling a particular theta a fit isn't great language because fit should be a verb referring to model fitting, not a noun referring to the object returned by the fitting process).

To me, a key question is how do we express this idea in code. For example, if we write out a linear model:

y = theta_0 + theta_1 * x_1 + ... + theta_p * x_p + epsilon

where epsilon are IID Gaussian, then the following are all the same model (in the sense that they all have the same parameter space)

  • OLS
  • LASSO
  • RIDGE
  • Any other penalized regression technique
  • OLS estimated with Horseshoe priors
  • Etc

Sure, for the penalized regression methods you have to estimate the penalization parameter, but this is a hyperparameter, which I think we can broadly think about as a parameter that we have to estimate but that we don't really care what value it takes on. So these all have the same parameter space, but different hyperparameter spaces. Another way to express this same idea is that what differentiates MCP from LASSO from OLS, etc, etc is not that they are different models but rather that they are different techniques for estimating the same model.

(Aside: one interesting question is whether or not hierarchical models belong on the list above. I think it depends on whether or not you care about the group level parameters, in which case you are now in a new parameter space. OLS with HC errors is another interesting case to think about. In this case the model is still the linear model, but now we're more explicitly declaring that we want to estimate the covariance matrix, and also that we are going to use, say, HC1 to do so. I'd still call this a linear model, but only if the original definition of the linear model specified covariance as an estimand).

If I'm going to actually implement things in code, I want to work with an object that specifies the estimation method, which likely is closely tied to a hyperparameter space.

I think that a parsnip model specification shouldn't work with the classical stats sense of a model like we're defined above, but rather should encapsulate all the things you need to do to get parameters back. Parsnip is already doing a lot of this, but I think there's a lot of value in being very clear about what a parsnip object should specify. In my mind this includes, at the minimum:

  • The estimand, or parameters you want to estimate, mostly implicit in the model you select
  • The estimating procedure (i.e. LARS, or the analytic solution to OLS). Often implicit in the package you call.
  • Any hyperparameter spaces (ex: lambda in R+ for LASSO)
  • Procedures for picking hyperparameters (ex: random search over bootstrap samples picking the smallest lambda within 1 SE of the minimum RMSE)

For now I think it makes sense to call this a model specification, but I think it's critically important to distinguish between the model and the model plus all this other stuff. Similarly, after the model fitting process, when you have many different fits (one for each hyperparameter combination, say), there are tasks that involve working with all the fits together (you might be curious which LASSO variable entered the model first), and tasks that involving working with just one fit (i.e. looking at the LASSO coefficients themselves).

I strongly believe that a good interface very clearly differentiates between a group of fits together, and single fit, and provides type-safe methods for working with each of these.

Related issue: canonical modelling examples

A related issue is to find canonical modelling examples that are sufficient to develop our intuition about what the code objects should look like. OLS is too simple because it doesn't need a lot of the machinery that other models need. I think that a good starting place is to have one canonical example where we can employ the submodel trick (penalized regression seems like a good place to start), and one where we can't (maybe SVMs here?). Another way to think about this: we should have one canonical example where there is exploitable structure in the hyperparameter space, and one canonical example where there isn't.

Differentiating between models, estimators and engines

I'm think I can finally translate the thoughts from the modeling abstraction essay (a separate doc that grew out of #19) into parsnip terms. Some concepts to start:

  • A model is a family of probability distributions or functions. That is, a model is set.
  • An estimator is a way to calculate the parameters of a model from a dataset. Note that hyperparameters are most often properties of estimators.
  • The resulting estimates are a fit (I think @topepo often refers to this a sub-model). This is an element of the model.
  • There are often multiple algorithms and implementations of the same estimator. In this case, using parsnip terminology, each implementation is a different engine.

Estimators are typically implicit

  • lm specifies the OLS estimator for the linear model
  • glmnet specifies the elastic net estimator for the linear model

Estimator selection should be explicit

Something along the lines of

ols_hc1_fit <- linear_reg() %>% 
  linear_estimator(coefs = "ols", coef_covariance = "HC1") %>% 
  fit_xy(
    x = ...,
    y = ...,
    engine = "lm_robust"
  )

Perhaps the linear_reg() isn't necessary here, but it does feel the most explicit / low-level to me. In particular, I think it's important to explicitly select an estimator, rather than letting it be implicit in engine. All estimators are not created equal.

Different estimators should have informative subclasses

Currently the parsnip behavior is to always produce a model_fit object:

ols <- linear_reg() %>% 
  fit(hp ~ ., data = mtcars, engine = "lm")

class(ols)
># [1] "model_fit"

I'm strongly of the opinion that ols should have subclasses that indicate:

  • the model_fit was estimated using ordinary least squares
  • the model_fit object contains a single fit/submodel, as opposed to a set of fits/submodels

Without this differentiation I don't think it's possible to meaningfully define methods on ols for inference. Consider the following methods, all for the linear model:

  • plot_lasso_path() only makes sense for a set of fits from the LASSO estimator
  • coef_standard_errors() makes sense for a fit from the OLS estimator but not the LASSO estimator
  • interpret_coefficients() should have different behavior for an OLS fit and a GEE fit

Vignette update request

Just took a look over the making a parsnip model object from scratch vignette. There was a lot going on and I had a bit of difficulty putting the pieces together from my short glance through. I think the vignette might benefit from being broken into two separate pieces:

  • Creating new model specifications
  • Adding a new estimator for an existing model specification
    • Aside: it would be easy to confuse an engine with an estimator, so we should probably document the difference somewhere.

I finally have parsnip running on my laptop and I'm going to try to use it exclusively for a regression course this semester and see where I run into problems. Some things I imagine I'll be building fairly early on:

  • Mixed models. I should just be able to pass formulas to lme4 with the usual random effects syntax, right?
  • Robust covariance estimators. The easy way to do this is probably via a new lm_robust engine.

Parsnip name

Hi Max,

Great meeting you at rstudio::conf. So you did decide to ultimately call it parsnip? :-)

Cheers,
Bohdan

detect when to calculate descriptors

Right now, the fit function makes available certain variables that characterize the training data at the time of model fitting. The two underlying functions that do this are get_descr_form and get_descr_xy.

These may be costly, so we should have some code that determines if any data descriptors are used in the argument values. If at least one is found, we can execute the code to make them available,

passing contrasts and other model.matrix options

In a lot of cases, there will be some data conversion from a data frame to a model matrix. There needs to be a clean interface so that the usual options can be passed along.

Also, we might need a flag for when to stop at a model frame and when to go all the way with calling model.matrix.

troubleshoot testing issues

There are a number of test cases that pass in a fresh interactive session but fail when running devtools::check().

Reporting uncertainty

I'm reworking lots of broom::augment() methods at the moment and am discovering that packages do some crazy stuff to report uncertainty. Defining some standards for reporting uncertainty early on seems like a good idea.

For classification problems, reporting the class probabilities makes sense, but can become problematic for outcomes with high cardinalities. Nobody wants 1000 columns of class probabilities. One option is to just report the most likely class along with it's probability, or the top k = 5 or so classes by default.

For regression problems I think there's more nuance. Open questions:

  • Is the best way to report uncertainty in a regression outcome to add a column of standard errors se_fit or similar?
  • How should users specify that they want confidence intervals vs prediction intervals?
  • Should confidence intervals or prediction intervals be the default reporting option?

tasks for user-defined models

  • export and document check_empty_ellipse, make_classes, model_printer, and show_call
  • document with model_fit and model_spec object structures
  • complete vignette on creating models from scratch.

next set of models

  • knn via kknn package
  • decision_tree via rpart, C5.0, spark (others?)
  • SVM models: linear, RBF, polynomial as separate functions (kernlab)
  • multinomial regression via glment and spark
  • mars via earth package
  • null model wrapper as well as fit and pred functions
  • naive Bayes (klaR, spark ?)
  • cubist
  • discriminant analysis (of various types)
  • PLS (sparse, DA)
  • FDA models with different basis functions (MARS, poly)
  • bagged trees (helped by potential new rpart version and side package)
  • bagged MARS (based on side package)
  • Poisson regression (perhaps including ZIP models; otherwise a clone of linear_reg)
  • ARIMA and other time series models
  • generalized additive models
  • multilevel model extension engines for linear, logistic, multinomial, and Poisson regression (in the multilevelmod package)
  • more models for censored data (in the censored package)

👆Already in parsnip or adjacent package
👇working on or thinking about

  • ordinal regression
  • rotation forests

Is `fit` actually checking for installed libs?

If I remove randomForest and then try and run:

fit(rand_forest(), formula = Species ~ ., data = iris, engine = "randomForest")

I get the following traceback

 Error in loadNamespace(name) : there is no package calledrandomForest15.
stop(e) 
14.
value[[3L]](cond) 
13.
tryCatchOne(expr, names, parentenv, handlers[[1L]]) 
12.
tryCatchList(expr, classes, parentenv, handlers) 
11.
tryCatch(loadNamespace(name), error = function(e) stop(e)) 
10.
getNamespace(ns) 
9.
asNamespace(ns) 
8.
getExportedValue(pkg, name) 
7.
randomForest::randomForest 
6.
eval_tidy(e, ...) at fit.R#272
5.
eval_mod(fit_call, capture = control$verbosity == 0, catch = control$catch, 
    env = env, ...) at fit_helpers.R#107
4.
xy_xy(object = object, env = env, control = control, target = target) at fit_helpers.R#138
3.
form_xy(object = object, control = control, env = eval_env, target = object$method$fit$interface, 
    ...) at fit.R#135
2.
fit.model_spec(rand_forest(), formula = Species ~ ., data = iris, 
    engine = "randomForest") at models.R#116
1.
fit(rand_forest(), formula = Species ~ ., data = iris, engine = "randomForest") 

Pretty sure that this line in fit() is supposed to be right after the check_engine() line:

# populate `method` with the details for this model type
    object <- get_method(object, engine = object$engine)

because check_installs() and load_libs() both use things from x$method$library (which I think is now x$method$libs) but x$method isn't populated until get_method() is run

non-right censoring for survival models with recipes

Surv objects of type= "counting", "interval1" or "interval2" are not currently supported.

This would mostly be for defining roles but in the case of interval censoring is is difficult. If we let two time variables be used with the outcome role, we don't know what their order should be (and the recipe might change the order). If we used roles like tmin and tmax, then juice and helpers won't recognize them as outcomes.

fully evaluate primary arguments

such as mtry for random forests. Currently:

args <- list(
  mtry = rlang::enquo(mtry),
  trees = rlang::enquo(trees),
  min_n = rlang::enquo(min_n)
)

how to include data checks and/or manipulations

For example, some functions have specific data requirements:

  • some must have a matrix as input (instead of a data frame or matrix) (e.g. glmnet),
  • want factors to be encoded as integers :-O (tensorflow),
  • a sparse matrix is required (xgboost), or
  • some require specific data types such as all factor predictors and so on.

Some of this could occur be modifying the default argument to the function (e.g. x = as.matrix(x)) but it would probably be better to include some code or module that checks or modifies the data.

The problem is the different interfaces. We should need one for formula, recipes, and x/y interfaces.

rename some objects

alternates is poorly named. defaults? The same is slightly true for the prediction function's args.

The constructors aren't really constructors (I think that they originally were constructor functions). Change them to "modules".

The {model name}_{engine}_fit objects contain everything so maybe call them {model name}_{engine}_data?

predict method

Change the predict function to predict_num and write a general wrapper predict method that switches between these.

parameterize prediction functions

Figure out how to pass arguments down to the prediction code.

For example, with glmnet and other sub-model enabled prediction methods, figure out how to pass args to get other parameter estimates back.

A more pipeable fit() interface

I gave a little thought to the fit() interface problem and this is what I came up with. I don't really like the interface arg name but thats just a naming thing.

# helper if required
xy <- function(x, y) {
  list(x = x, y = y)
}

# notice how engine would come before the _optional_ data param for pipeability
# all required params are now moved to the front
# engine could come before interface if you want to keep interface+data together
fit <- function(model_spec, interface, engine, data, control, ...) {
  #switch based on interface being a formula VS list
}

linear_reg() %>%
  fit(y ~ x1 + x2, "lm", fit_data)

linear_reg() %>%
  fit(xy(fit_data[,c("x1", "x2")], fit_data[,c("y")]), "lm")

# slightly simpler
xy_defn <- xy(
  x = fit_data[,c("x1", "x2")], 
  y = fit_data[,c("y")]
)

linear_reg() %>%
  fit(xy_defn, "lm")

formula:formula calls are borked

This works:

rand_forest(mode = "regression") %>%
  fit(mpg ~ ., data = mtcars, engine = "ranger")

but not this:

foo <- function(dat, ...) {
  rand_forest(mode = "regression") %>%
    fit(..., data = dat, engine = "ranger")
}
foo(mtcars, mpg ~ .)
#  Error in eval_tidy(data) : object 'dat' not found 

We try to avoid getting fancy by avoiding execution of the data object until it is needed here but we pass in the call$data expression. eval_tidy is digging in the wrong place! This should probably be a quosure to capture the environment (or try not to be fancy).

We do the same thing with the formula too.

determine convention for available data characteristics

When fit is called, there will be a need to pass unevaluated arguments to the fit function. For example, someone might want to use:

rand_forest(mtry = expr(floor(sqrt(ncol(x)))))

but x might not exist when this expression is evaluated.

There should be a standard set of objects that are guaranteed to be available at the time of fit, such as:

  • number of columns
  • number of row
  • minimum and maximum class sizes

This will need to be well documented; the number of columns depends on whether dummy variables have been created or not. As such, this might vary depending on how the data are exposed via fit.

Perhaps variables such as .n, .p and others could be used so that

rand_forest(mtry = expr(floor(sqrt(.p))))

would work.

switching between prediction arguments

For "multi-use" models, there might be prediction arguments that are variable depending on the type of prediction being requested.

For example, with randomForest , predict should use the argument type = "response" when predicting the classes or numbers but should use type = "prob" for predict_probs.

We could parameterize translate to work with specific classes with set this as appropriate. Right now, the prediction call object is created in predict.model_fit and similars. translate could do this too based on the embedded model specification sub-object.

glmnet predictions and lambda

We want to have the number of rows in the prediction results the same as the number of rows in new_data.

Right now, the code will make predictions at all lambda values contained in the model fit. Here is an example:

> all_lambda <- 
+     linear_reg() %>%
+     fit(mpg ~ ., data = mtcars, engine = "glmnet")
> 
> predict(all_lambda, new_data = mtcars[1:3, -1])
# A tibble: 237 x 2
   .pred_values .pred_lambda
          <dbl>        <dbl>
 1         20.1         5.15
 2         20.1         5.15
 3         20.1         5.15
 4         20.4         4.69
 5         20.2         4.69
 6         20.5         4.69
 7         20.5         4.27
 8         20.4         4.27
 9         21.0         4.27
10         20.7         3.89
# ... with 227 more rows
> # yuk ".pred_lambda" needs to go regardless
> 
> length(unique(.Last.value$.pred_lambda))
[1] 79

The new multi_predict will generate predictions at multiple lambda values and should be preferred in this case. I suggest that

  • predict only produces predictions at a single lambda (and otherwise throws an error that directs people to multi_predict for this instance).

  • We could write some specialized predict methods for the glmnet subclasses (e.g. multnet, lognet, etc) that have a penalty argument that accepts a single value. This would appear seamless to to the user since

predict(all_lambda, new_data = mtcars[1:3, -1])               # errors but
predict(all_lambda, new_data = mtcars[1:3, -1], penalty = .1) # would work

since

> class(all_lambda)
[1] "model_fit" "_elnet"  

(and predict._elnet would just call predict.model_fit with a single parameter value)

crappy straw-man alternative: The issue is that a lot of people are going to leave penalty unspecified, expect to get all possible predictions back, and be frustrated that they have to use a different predict function to get them. Making predict.model_fit make predictions at all lambdas is possible but then it behaves differently for this model (which is how we got into this mess).

Any other suggestions?

fixed method dispatch

Separate backend for tidy prediction

I've been reworking the augment() methods and it's rapidly becoming clear that dealing with idiosyncratic predict() methods is going to slow down progress immensely.

In the end broom and parsnip are both going to want to wrap a bajillion predict methods, and we should report predictions in the same way for consistencies sake. I think we should move this functionality to a separate package. Potentially we could use the prediction package by Thomas Leeper, but we should decide on the behavior we want first.

If we define a new generic / series of generics, we can then test these behaviors in modeltests and allow other modelling package developers to guarantee that their predict() methods are sane and consistent.

What I want from a predict method:

  • Returns a tidy tibble
  • Never drops missing data (i.e. matches the behavior of predict.lm(..., na.action = na.pass))
  • Consistent naming of fitted values
  • Uncertainty in predictions

I want all of these to be guaranteed, and for methods that cannot meet these guarantees, I want an informative error rather than a partially correct output.

Installation problems

Hello! I'm having troubles installing parsnip here from github.

devtools::install_github("topepo/parsnip")
#> Using GitHub PAT from envvar GITHUB_PAT
#> Downloading GitHub repo topepo/parsnip@master
#> from URL https://api.github.com/repos/topepo/parsnip/zipball/master
#> Installing parsnip
#> Using GitHub PAT from envvar GITHUB_PAT
#> Using GitHub PAT from envvar GITHUB_PAT
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
#>   --no-environ --no-save --no-restore --quiet CMD INSTALL  \
#>   '/private/var/folders/nj/s2k7d2_93t9_87brhynnfwvc0000gn/T/Rtmpo0ApMz/devtools147a71ced9c8/topepo-parsnip-de55683'  \
#>   --library='/Library/Frameworks/R.framework/Versions/3.5/Resources/library'  \
#>   --install-tests
#>
#> ERROR: dependency ‘modelgenerics’ is not available for package ‘parsnip’
#> * removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/parsnip’
#> Installation failed: Command failed (1)

Created on 2018-08-26 by the reprex
package
(v0.2.0).

It appears that the problems comes modelgenerics which is being downloaded remotely as tidymodels/modelgenerics. however that links directly to r-lib/generics.

ranger with probabilities = TRUE

When getting class predictions, the post-processor has a bug:

obj <- 
  rand_forest(mode = "classification", others = list(probability = TRUE)) %>% 
  fit(Species ~ ., data = iris, engine = "ranger") 

predict(obj, newdata = iris[1:4, -5])

easily populate parameters from dials

If you use

param_grid <- random_grid(mtry, min_n, size = 5)

and want to populate a model specification, it gets kludgy.

If param is a row of param_grid:

   rand_forest(mtry = param[["mtry"]], min_n = param[["min_n"]])
# or
update(object, mtry = param[["mtry"]], min_n = param[["min_n"]])

there should be some version of update that automatically populates the parameters (which is why the names are standardized between dials and parsnip). Maybe coopt merge or some other relevant verb?

[Docs] Parameter Tuning

I am trying to do some parameter tuning with the ranger engine in parsnip. I see that there is a varying() parameter, but I am confused on how to implement tuning with it.

I really liked your tuning example here from the rsample package. I would benefit from documentation on how to implement this nested resampling strategy in parsnip.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.