Giter Club home page Giter Club logo

tidy.outliers's People

Contributors

brunocarlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

opus1993

tidy.outliers's Issues

Minor documentation changes

The package looks good. Although I'm not a fan of automated outlier removals, I like what you've done.

In the docs, could you:

  • use bake(rec, new_data = NULL) instead of juice()
  • have steps use selectors for type and role? So all_numeric_predictors() instead of all_numeric(), -all_outcomes()
  • have steps do some sort of normalization procedures before outlier detection (e.g. maybe Yeo-Johnson)? I feel that most identified outliers are just the edge of healthy skewed distributions.

Output probelm of step_outliers_outForest

Hello, it seems that there's a problem with the output of step_outliers_outForest.

When the code is run by reprex(), everything is fine:

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(tidy.outliers)

rec <-
  recipe(mpg ~ ., data = mtcars) %>%
  step_outliers_outForest(all_numeric_predictors()) %>%
  prep(mtcars)
#> Due to small sample size, reduced 'min.node.size' to 11

bake(rec, new_data = NULL) %>% 
  select(.outliers_outForest)
#> # A tibble: 32 × 1
#>    .outliers_outForest$score
#>                        <dbl>
#>  1                         0
#>  2                         0
#>  3                         0
#>  4                         0
#>  5                         0
#>  6                         0
#>  7                         0
#>  8                         0
#>  9                         1
#> 10                         0
#> # … with 22 more rows

However, when the same code is run in .rmd file, the type of .outliers_outForest column is tibble:

image

Function `step_outliers_lookout` doesn't process new data

It seems that the step_outliers_lookout doesn't work on the testing set:

library(tidymodels)
library(tidy.outliers)

# split data into the training and testing sets
set.seed(123)
split <- mtcars %>% 
  initial_split(prop = 0.8)
df_train = training(split)
df_test = testing(split)

# preprocessing steps
rec <-
  recipe(mpg ~ ., data = mtcars) %>%
  step_outliers_lookout(all_numeric_predictors(), skip = FALSE) %>%
  step_outliers_remove(contains(".outliers"), skip = FALSE) %>% 
  prep(training = df_train, retain = TRUE)

# processing the training data
df_train_preprocessed <- bake(rec, new_data = NULL)

# processing the testing data
df_test_preprocessed <- bake(rec, new_data = df_test)
#> Error:
#> ! Assigned data `object$outlier_score` must be compatible with existing data.
#> ✖ Existing data has 7 rows.
#> ✖ Assigned data has 25 rows.
#> ℹ Only vectors of size 1 are recycled.

#> Backtrace:
#>      ▆
#>   1. ├─recipes::bake(rec, new_data = df_test)
#>   2. ├─recipes:::bake.recipe(rec, new_data = df_test)
#>   3. │ ├─recipes::bake(step, new_data = new_data)
#>   4. │ └─tidy.outliers:::bake.step_outliers_lookout(step, new_data = new_data)
#>   5. │   ├─base::`[[<-`(`*tmp*`, object$name_mutate, value = `<dbl>`)
#>   6. │   └─tibble:::`[[<-.tbl_df`(`*tmp*`, object$name_mutate, value = `<dbl>`)
#>   7. │     └─tibble:::tbl_subassign(...)
#>   8. │       └─tibble:::vectbl_recycle_rhs_rows(...)
#>   9. │         ├─base::withCallingHandlers(...)
#>  10. │         └─vctrs::vec_recycle(value[[j]], nrow)
#>  11. ├─vctrs:::stop_recycle_incompatible_size(...)
#>  12. │ └─vctrs:::stop_vctrs(...)
#>  13. │   └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))
#>  14. │     └─rlang:::signal_abort(cnd, .file)
#>  15. │       └─base::signalCondition(cnd)
#>  16. └─tibble (local) `<fn>`(`<vctrs___>`)
#>  17.   └─rlang::cnd_signal(...)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.