Comments (4)
I would be happy to look into it but it would require you to create a reproducible example. The code you provided is fine, but I would need to have access to the dataset or a similar one giving similar behaviour in order to be able to debug it...
from lime.
Here it is, is this reprex enough? thanks for looking into it
library(tidyverse)
#> -- Attaching packages ------------------------------------ tidyverse 1.2.0 --
#> v ggplot2 2.2.1 v purrr 0.2.4
#> v tibble 1.3.4 v dplyr 0.7.4
#> v tidyr 0.7.2 v stringr 1.2.0
#> v readr 1.1.1 v forcats 0.2.0
#> -- Conflicts --------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
library(h2o)
#>
#> ----------------------------------------------------------------------
#>
#> Your next step is to start H2O:
#> > h2o.init()
#>
#> For H2O package documentation, ask for help:
#> > ??h2o
#>
#> After starting H2O, you can use the Web UI at http://localhost:54321
#> For more information visit http://docs.h2o.ai
#>
#> ----------------------------------------------------------------------
#>
#> Attaching package: 'h2o'
#> The following objects are masked from 'package:stats':
#>
#> cor, sd, var
#> The following objects are masked from 'package:base':
#>
#> %*%, %in%, &&, ||, apply, as.factor, as.numeric, colnames,
#> colnames<-, ifelse, is.character, is.factor, is.numeric, log,
#> log10, log1p, log2, round, signif, trunc
library(lime)
#>
#> Attaching package: 'lime'
#> The following object is masked from 'package:dplyr':
#>
#> explain
dataset_url <- "https://www.dropbox.com/s/t3o1zvzq0t7emz4/sales.RDS?raw=1"
sales_aug <- readRDS(gzcon(url(dataset_url)))
train <- sales_aug %>% filter(month <= 8)
valid <- sales_aug %>% filter(month == 9)
test <- sales_aug %>% filter(month >= 10)
h2o.init()
#> Connection successful!
#>
#> R is connected to the H2O cluster:
#> H2O cluster uptime: 8 minutes 21 seconds
#> H2O cluster version: 3.14.0.7
#> H2O cluster version age: 19 days
#> H2O cluster name: H2O_started_from_R_andre_crw711
#> H2O cluster total nodes: 1
#> H2O cluster total memory: 1.71 GB
#> H2O cluster total cores: 4
#> H2O cluster allowed cores: 4
#> H2O cluster healthy: TRUE
#> H2O Connection ip: localhost
#> H2O Connection port: 54321
#> H2O Connection proxy: NA
#> H2O Internal Security: FALSE
#> H2O API Extensions: Algos, AutoML, Core V3, Core V4
#> R Version: R version 3.4.2 (2017-09-28)
h2o.no_progress()
train <- as.h2o(train)
valid <- as.h2o(valid)
test <- as.h2o(test)
y <- "amount"
x <- setdiff(names(train), y)
leaderboard <- h2o.automl(x, y, training_frame = train, validation_frame = valid, leaderboard_frame = test, max_runtime_secs = 30, stopping_metric = "MSE", seed = 12345)
gbm_model <- leaderboard@leader
explainer <- lime(as.data.frame(train), gbm_model, bin_continuous = FALSE)
explanation <- explain(as.data.frame(test[1:5,]), explainer, n_features = 5)
#> Warning in `[<-.factor`(`*tmp*`, iseq, value = structure(c(1L, 1L, 1L,
#> 1L, : invalid factor level, NA generated
#> Error in if (r2 > max) {: valor ausente donde TRUE/FALSE es necesario
from lime.
The problem is that your test data includes factor levels that is not present in your training data. More specifically the month.lbl
column. This means that the input cannot get properly permuted and will result in NA's which trips up the model. Either make sure that your training data covers the full feature space (this is good practice anyway) or don't use factors but regular strings instead to indicate that they might take any value...
from lime.
Thank you very much, I suppose I'll have to settle for the h2o.varimp()
function until I have enough data for covering the feature space.
from lime.
Related Issues (20)
- permute_cases: Error arguments imply differing number of rows: 30000, 0
- Dealing with multiple output regression keras model
- Shiny plotOutput with plot_features from the lime package produces nothing
- Use of lime to be used in conjunction with keras model (regression)
- Error when using MLR3 for LIME
- lime/keras image classification: Input must be a vector, not a `superpixel_list` object. HOT 4
- [!] explain() does not work with ordered factors
- Flow ... through to the interactive_text_explanations
- Question about LIME results HOT 1
- Incorrect diagram in "Understanding lime"? HOT 1
- lime predicts other label than CNN
- Error in feature_distribution[[i]] : subscript out of bounds
- Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, : invalid number of intervals
- Error in Image Explanation
- Documentation gap concerning usage with additional libraries HOT 1
- Compatibility with tidymodels HOT 2
- Family in glmnet is always gaussian
- Release lime 0.5.3
- Error in combine_vars(data, params$plot_env, vars, drop = params$drop) :
- Allow `plot_features(cases = )` to accept integer indices even when `x` has rownames
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lime.