Comments (10)
Hi
Thanks for posting the issue. I will need a reproducible example in order to help you further, so please provide a code block that I can copy into a clean R session that will produce the error
from lime.
I'm trying to follow some code on a blog from Matt Dancho and the folks at Business Science. Here's there blog if you need it: http://www.business-science.io/business/2017/09/18/hr_employee_attrition.html
Here's the r code I'm working from in an RMD file. I set it up as two blocks for this post. The first block runs fully. The second block gives me the error when I try to set the explainer in Lime. When I run the first set of code to set the explainer, I get the following error:
Error: Unknown feature type
7. stop("Unknown feature type", call. = FALSE)
6. FUN(X[[i]], ...)
5. lapply(X = X, FUN = FUN, ...)
4. sapply(x, function(f) { if (is.integer(f)) { "integer" } ...
3. setNames(sapply(x, function(f) { if (is.integer(f)) { "integer" } ...
2. lime.data.frame(as.data.frame(train_h2o[, -1]), model = automl_leader, bin_continuous = FALSE)
- lime::lime(as.data.frame(train_h2o[, -1]), model = automl_leader, bin_continuous = FALSE)
When I try the second line of code which I tried to adapt from the description file for the Lime package, I get this error:
Error in UseMethod("lime") :
no applicable method for 'lime' applied to an object of class "H2OFrame"
I've also attached the data file I'm using if you want to access them.
# Load the following packages
library(tidyquant) # Loads tidyverse and several other pkgs
library(readxl) # Super simple excel reader
library(h2o) # Professional grade ML pkg
library(lime) # Explain complex black-box ML models
# Read excel data
hr_data_raw <- read_excel(path = "/Users/crawfordw/Downloads/WA_Fn-UseC_-HR-Employee-Attrition.xlsx")
# View first 10 rows
hr_data_raw[1:10,] %>%
knitr::kable(caption = "First 10 rows")
hr_data <- hr_data_raw %>%
mutate_if(is.character, as.factor) %>%
select(Attrition, everything())
glimpse(hr_data)
# Initialize H2O JVM
h2o.init()
h2o.no_progress() # Turn off output of progress bars
# Split data into Train/Validation/Test Sets
hr_data_h2o <- as.h2o(hr_data)
split_h2o <- h2o.splitFrame(hr_data_h2o, c(0.7, 0.15), seed = 1234 )
train_h2o <- h2o.assign(split_h2o[[1]], "train" ) # 70%
valid_h2o <- h2o.assign(split_h2o[[2]], "valid" ) # 15%
test_h2o <- h2o.assign(split_h2o[[3]], "test" ) # 15%
# Set names for h2o
y <- "Attrition"
x <- setdiff(names(train_h2o), y)
# Run the automated machine learning
automl_models_h2o <- h2o.automl(
x = x,
y = y,
training_frame = train_h2o,
leaderboard_frame = valid_h2o,
max_runtime_secs = 30
)
# Extract leader model
automl_leader <- automl_models_h2o@leader
# Predict on hold-out set, test_h2o
pred_h2o <- h2o.predict(object = automl_leader, newdata = test_h2o)
# Prep for performance assessment
test_performance <- test_h2o %>%
tibble::as_tibble() %>%
select(Attrition) %>%
add_column(pred = as.vector(pred_h2o$predict)) %>%
mutate_if(is.character, as.factor)
test_performance
# Confusion table counts
confusion_matrix <- test_performance %>%
table()
confusion_matrix
# Performance analysis
tn <- confusion_matrix[1]
tp <- confusion_matrix[4]
fp <- confusion_matrix[3]
fn <- confusion_matrix[2]
accuracy <- (tp + tn) / (tp + tn + fp + fn)
misclassification_rate <- 1 - accuracy
recall <- tp / (tp + fn)
precision <- tp / (tp + fp)
null_error_rate <- tn / (tp + tn + fp + fn)
tibble(
accuracy,
misclassification_rate,
recall,
precision,
null_error_rate
) %>%
transpose()
class(automl_leader)
# Setup lime::model_type() function for h2o
model_type.H2OBinomialModel <- function(x, ...) {
# Function tells lime() what model type we are dealing with
# 'classification', 'regression', 'survival', 'clustering', 'multilabel', etc
#
# x is our h2o model
return("classification")
}
# Setup lime::predict_model() function for h2o
predict_model.H2OBinomialModel <- function(x, newdata, type, ...) {
# Function performs prediction and returns dataframe with Response
#
# x is h2o model
# newdata is data frame
# type is only setup for data frame
pred <- h2o.predict(x, as.h2o(newdata))
# return probs
return(as.data.frame(pred[,-1]))
}
# Test our predict_model() function
predict_model(x = automl_leader, newdata = as.data.frame(test_h2o[,-1]), type = 'raw') %>%
tibble::as_tibble()
# Run lime() on training set
explainer <- lime::lime(
as.data.frame(train_h2o[,-1]),
model = automl_leader,
bin_continuous = FALSE)
explainer <- lime(train_h2o, automl_leader)
WA_Fn-UseC_-HR-Employee-Attrition.xlsx
from lime.
I think you can close this issue out. I uninstalled lime, and then reinstalled it, and also updated all packages and rebooted R. Now the first set of code works for me.
Now I'm having trouble with plot_features(explanation).
# Run lime() on training set
explainer <- lime::lime(
as.data.frame(train_h2o[,-1]),
model = automl_leader,
bin_continuous = FALSE)
# Run explain() on explainer
explanation <- lime::explain(
as.data.frame(test_h2o[1:10,-1]),
explainer = explainer,
n_labels = 1,
n_features = 4,
kernel_width = 0.5)
plot_features(explanation)
When I run that, I get something like this:
from lime.
May be worth a sepparete issue (or feature request?) but it would be ideal for h2o models to avoid converting H2OFrames back to R data.frames (needed atm to to call lime or explain) and then back to H2OFrames (needed to call h2o's predict method).
With large distributed data this process is very slow.
from lime.
I'll try to look into that
from lime.
I tried to implement it but ran into a wall because currently there's no way in h2o to sample from values o an H2Oframe but there may be in a future release.
from lime.
@crawfordwsc For your last issue with the incomprehensible plot it is simply a matter of you trying to plot too many explanations on too little space - either subset your explanation data.frame or use plot_explanations()
which is much more compact
from lime.
I was having the exact same issue with this code and this data. The solution is to avoid H2O for now.
from lime.
You can use H2O models just fine - just pass in regular data.frames when creating the explainer and predictions...
from lime.
I am having the same issue with the garbled plots. I am new to both Lime and h2o, so I was wondering if there is any further information I could get on how to resolve the problem? Thanks.
from lime.
Related Issues (20)
- permute_cases: Error arguments imply differing number of rows: 30000, 0
- Dealing with multiple output regression keras model
- Shiny plotOutput with plot_features from the lime package produces nothing
- Use of lime to be used in conjunction with keras model (regression)
- Error when using MLR3 for LIME
- lime/keras image classification: Input must be a vector, not a `superpixel_list` object. HOT 4
- [!] explain() does not work with ordered factors
- Flow ... through to the interactive_text_explanations
- Question about LIME results HOT 1
- Incorrect diagram in "Understanding lime"? HOT 1
- lime predicts other label than CNN
- Error in feature_distribution[[i]] : subscript out of bounds
- Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, : invalid number of intervals
- Error in Image Explanation
- Documentation gap concerning usage with additional libraries HOT 1
- Compatibility with tidymodels HOT 2
- Family in glmnet is always gaussian
- Release lime 0.5.3
- Error in combine_vars(data, params$plot_env, vars, drop = params$drop) :
- Allow `plot_features(cases = )` to accept integer indices even when `x` has rownames
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lime.