Giter Club home page Giter Club logo

Comments (2)

DavisVaughan avatar DavisVaughan commented on June 26, 2024 1

This sounds useful! model.matrix() actually checks for this automatically and throws a warning and drops the duplicated predictor (if it is exactly the same as the outcome. Meaning log(Sepal.Width) would not count as a duplicate). I think this is rather aggressive.

I don't think I would put this in mold(), as I wouldn't call this a "required" check, but I want hardhat to have a number of extra optional validate_***() functions that developers can use, and this seems like one of them.

Below is one version of a validate function for this. This uses the original column names and checks for duplicates. So Sepal.Width ~ Sepal.Width and Sepal.Width ~ log(Sepal.Width) will both be flagged as having duplicates. There could also be a version that works more like model.matrix() and checks that the processed training data does not have duplicates (so log(Sepal.Width) would look different than Sepal.Width).

library(hardhat)

.formula <- Sepal.Width ~ Sepal.Width

# //////////////////////////////////////////////////////////////////////////////

# mold() lets you use them
x <- mold(.formula, iris)

x$predictors
#> # A tibble: 150 x 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.5
#>  2         3  
#>  3         3.2
#>  4         3.1
#>  5         3.6
#>  6         3.9
#>  7         3.4
#>  8         3.4
#>  9         2.9
#> 10         3.1
#> # … with 140 more rows

x$outcomes
#> # A tibble: 150 x 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.5
#>  2         3  
#>  3         3.2
#>  4         3.1
#>  5         3.6
#>  6         3.9
#>  7         3.4
#>  8         3.4
#>  9         2.9
#> 10         3.1
#> # … with 140 more rows

# //////////////////////////////////////////////////////////////////////////////

# a warning is thrown here
mf <- model.frame(.formula, iris)
head(model.matrix(terms(mf), mf))
#> Warning in model.matrix.default(terms(mf), mf): the response appeared on
#> the right-hand side and was dropped
#> Warning in model.matrix.default(terms(mf), mf): problem with term 1 in
#> model.matrix: no columns are assigned
#>   (Intercept)
#> 1           1
#> 2           1
#> 3           1
#> 4           1
#> 5           1
#> 6           1

# //////////////////////////////////////////////////////////////////////////////

# the info is here
x$preprocessor$predictors$names
#> [1] "Sepal.Width"
x$preprocessor$outcomes$names
#> [1] "Sepal.Width"

# //////////////////////////////////////////////////////////////////////////////

validate_lhs_rhs_duplication <- function(preprocessor) {
  
  if (!inherits(preprocessor, "terms_preprocessor")) {
    return(preprocessor)
  }
  
  original_predictor_names <- preprocessor$predictors$names
  original_outcome_names <- preprocessor$outcomes$names
  
  dups <- intersect(original_predictor_names, original_outcome_names)
  
  if (length(dups) > 0) {
    
    dups <- glue::glue_collapse(glue::single_quote(dups), ", ")
    
    rlang::abort(glue::glue(
      "The supplied `formula` cannot have the same term ",
      "as both an outcome and a predictor. The following terms ",
      "appear on both sides of the formula: {dups}."
    ))
  }
  
  invisible(preprocessor)
}

validate_lhs_rhs_duplication(x$preprocessor)
#> Error: The supplied `formula` cannot have the same term as both an outcome and a predictor. The following terms appear on both sides of the formula: 'Sepal.Width'.
#> Backtrace:
#>     █
#>  1. └─global::validate_lhs_rhs_duplication(x$preprocessor)

Created on 2019-02-16 by the reprex package (v0.2.1.9000)

from hardhat.

github-actions avatar github-actions commented on June 26, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from hardhat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.