Giter Club home page Giter Club logo

auditor's Introduction

Model verification, validation, and error analysis

CRAN_Status_Badge R build status Coverage Status DrWhy-eXtrAI

Overview

Package auditor is a tool for model-agnostic validation. Implemented techniques facilitate assessing and comparing the goodness of fit and performance of models. In addition, they may be used for the analysis of the similarity of residuals and for the identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. Due to the flexible and consistent grammar, it is simple to validate models of any classes.

An up-to-date paper about auditor and a shorter version in The R-Journal.

auditor is a part of DrWhy collection of tools for Visual Exploration, Explanation and Debugging of Predictive Models.

auditor’s pipeline: model %>% DALEX::explain() %>% plot(type=…)

Installation

Stable version from CRAN:

install.packages("auditor")

Developer version from GitHub:

source("https://install-github.me/ModelOriented/auditor")

# or with the devtools package
devtools::install_github("ModelOriented/auditor")

Demo

Run the code below or try the auditor.

library(auditor)
library(randomForest)
data(mtcars)

# fitting models
model_lm <- lm(mpg ~ ., data = mtcars)
set.seed(123)
model_rf <- randomForest(mpg ~ ., data = mtcars)

# creating objects with 'explain' function from the package DALEX
# that contains all necessary components required for further processing
exp_lm <- DALEX::explain(model_lm, data = mtcars, y = mtcars$mpg,  verbose = FALSE)
exp_rf <- DALEX::explain(model_rf, data = mtcars, y = mtcars$mpg, label = "rf", verbose = FALSE)

# create explanation  objects
mr_lm <- model_residual(exp_lm)
mr_rf <- model_residual(exp_rf)

# generating plots
plot_residual(mr_lm, mr_rf, variable = "wt", smooth = TRUE)

More Resources

Short overview of plots

Column type contains character that should be passed to parameter type= when using plot() function. Regr and Class columns indicate whether plot can be used for regression and classification models.

Name of a plot Function Interactive version Type Regr Class
Autocorrelation Function plot_acf() plotD3_acf() “acf” yes yes
Autocorrelation plot_autocorrelation() plotD3_autocorrelation() “autocorrelation” yes yes
Influence of Observations plot_cooksdistance() plotD3_cooksdistance() “cooksdistance” yes yes
Half-Normal plot_halfnormal() plotD3_halfnormal() “halfnormal” yes yes
LIFT Chart plot_lift() plotD3_lift() “lift” no yes
Model Correlation plot_correlation() - “correlation” yes yes
Principal Component Analysis of Models plot_pca() - “pca” yes yes
Model Ranking Radar Plot plot_radar() - “radar” yes yes
Predicted Response vs Actual or Variable Values plot_prediction() plotD3_prediction() “prediction” yes yes
Regression Error Characteristic Curve (REC) plot_rec() plotD3_rec() “rec” yes yes
Plot Residuals vs Actual, Fitted or Variable Values plot_residual() plotD3_residual() “residual” yes yes
Residual Boxplot plot_residual_boxplot() - “residual_boxplot” yes yes
Residual Density plot_residual_density() - “residual_density” yes yes
Receiver Operating Characteristic (ROC) Curve plot_roc() plotD3_roc “roc” no yes
Regression Receiver Operating Characteristic (RROC) plot_rroc() plotD3_rroc “rroc” yes yes
Scale-Location Plot plot_scalelocation() plotD3_scalelocation() “scalelocation” yes yes
Two-sided Cumulative Distribution Function plot_tsecdf() - “tsecdf” yes yes

Acknowledgments

Work on this package was financially supported by the NCN Opus grant 2016/21/B/ST6/02176.

auditor's People

Contributors

agosiewska avatar byrolew avatar hbaniecki avatar maksymiuks avatar michbur avatar mstaniak avatar pbiecek avatar tmikolajczyk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

auditor's Issues

dragons data is outdated

If dragons are used in examples, then link to DALEX2 and use the newest version
if dragons are not used in examples, then do we really need them here?

A problem with `plotACF`

I am getting strange error


library("titanic")
titanic <- titanic_train[,c("Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked")]
titanic$Survived <- factor(titanic$Survived)
titanic$Sex <- factor(titanic$Sex)
titanic$Embarked <- factor(titanic$Embarked)
titanic <- na.omit(titanic)
titanic <- titanic[titanic$Embarked != "",]
titanic$Embarked <- factor(titanic$Embarked)
head(titanic)

library("randomForest")
rf_model <- randomForest(Survived ~ .,  data = titanic)
rf_model



library("DALEX2")
predict_fuction <- function(m,x) predict(m, x, type = "prob")[,2]
rf_explain <- explain(rf_model, data = titanic[,-1],
                      y = titanic$Survived == "1", label = "RF",
                      predict_function = predict_fuction)

library(auditor)
rf_audit <- audit(rf_explain)
auditor::plotACF(rf_audit)

Error in `$<-.data.frame`(`*tmp*`, "index", value = c("1", "2", "3", "4",  : 
  replacement has 712 rows, data has 1000

Enter a frame number, or 0 to exit   

1: auditor::plotACF(rf_audit)
2: modelResiduals(object, variable)
3: orderResidualsDF(object, variable, is.df = TRUE)
4: `$<-`(`*tmp*`, "index", value = c("1", "2", "3", "4", "5", "7", "8", "9", "10", "11", "12", "1
5: `$<-.data.frame`(`*tmp*`, "index", value = c("1", "2", "3", "4", "5", "7", "8", "9", "10", "11


New theme_drwhy for D3 plots

  • plotACF() 
  • plotAutocorrelation()
  • plotCooksDistance() 
  • plotHalfNormal() 
  • plotLIFT() 
  • plotModelCorrelation()
  • plotModelPCA() 
  • plotModelRanking() 
  • plotPrediction() 
  • plotREC() 
  • plotResidual() 
  • plotResidualBoxplot() 
  • plotResidualDensity() 
  • plotROC() 
  • plotRROC() 
  • plotScaleLocation() 
  • plotTwoSidedECDF()

While installing from CRAN version 2.1 is installed

Hi,
it might be not Auditor related error, but I'm not sure, so I will post it here.

I'm installing auditor in following way:
install.packages("auditor")

After installing and running:
library(auditor)
I have following environment:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /home/damian/miniconda3/envs/jakbadacdane.pl/lib/R/lib/libRblas.so
LAPACK: /home/damian/miniconda3/envs/jakbadacdane.pl/lib/R/lib/libRlapack.so

locale:
[1] en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RevoUtils_11.0.1     RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
 [1] gtools_3.8.1       zoo_1.8-3          tidyselect_0.2.4   purrr_0.2.5       
 [5] lattice_0.20-35    haven_1.1.2        carData_3.0-1      colorspace_1.3-2  
 [9] yaml_2.2.0         rlang_0.2.1        pillar_1.3.0       foreign_0.8-71    
[13] glue_1.3.0         RColorBrewer_1.1-2 TTR_0.23-3         readxl_1.1.0      
[17] bindrcpp_0.2.2     factoextra_1.0.5   bindr_0.1.1        plyr_1.8.4        
[21] quantmod_0.4-13    munsell_0.5.0      gtable_0.2.0       cellranger_1.1.0  
[25] zip_1.0.0          caTools_1.17.1.1   tseries_0.10-45    rio_0.5.10        
[29] GGally_1.4.0       forcats_0.3.0      curl_3.2           auditor_0.2.1     
[33] fdrtool_1.2.15     xts_0.11-0         Rcpp_0.12.18       KernSmooth_2.23-15
[37] ROCR_1.0-7         scales_0.5.0       gdata_2.18.0       abind_1.4-5       
[41] gplots_3.0.1       ggplot2_3.0.0      hms_0.4.2          openxlsx_4.1.0    
[45] dplyr_0.7.6        ggrepel_0.8.0      grid_3.5.1         quadprog_1.5-5    
[49] tools_3.5.1        bitops_1.0-6       magrittr_1.5       lazyeval_0.2.1    
[53] tibble_1.4.2       crayon_1.3.4       car_3.0-0          pkgconfig_2.0.1   
[57] MASS_7.3-50        data.table_1.11.4  hnp_1.2-6          assertthat_0.2.0  
[61] reshape_0.8.7      rstudioapi_0.7     plotROC_2.2.1      R6_2.2.2          
[65] rpart_4.1-13       compiler_3.5.1    

I believe I should have auditor_3.0.1 or something similar (based on CRAN). What I am doing wrong?

convert data to data.frame internally

I've been using auditor with data that were stored in a tibble and this resulted in weird error in plotResidualDensity function. Converting "data" argument to "audit" function solved the issue. Same problem occured in other packages like pdp i guess. I can give more details if you need them.

data consistency

i accidentally provided wrong y argument (from outside the data.frame specified in data argument), is there any way there could be a check if the data are consistent (y + data and maybe model)?

plotPrediction - some options to add

  1. To have the y axis show residuals instead of the actual values (y-y_predicted)
  2. Add some smoothed line over the scatter plot (geom_smooth with method = 'loess' would do the job)

New theme_drwhy for ggplot2 plots

  • plotACF() 
  • plotAutocorrelation()
  • plotCooksDistance() 
  • plotHalfNormal() 
  • plotLIFT() 
  • plotModelCorrelation()
  • plotModelPCA() 
  • plotModelRanking() 
  • plotPrediction() 
  • plotREC() 
  • plotResidual() 
  • plotResidualBoxplot() 
  • plotResidualDensity() 
  • plotROC() 
  • plotRROC() 
  • plotScaleLocation() 

Fix LIFT

function plotLIFT gives different values than plotLift from lift package

naming convention

Some function names end with s some do not
E.g. plotPrediction vs plotResiduals

Two models - one plot

Very nice cheatsheet.
What about the possibility of plotting two models in the same chart. Then it will be easier to compare models.

And/or what about lots inspired by ROC curves or lift curves. In most cases they are used for binary classification, but maybe they can be extended to other glm models as well.

List of diagnostic plots with `plot.modelAudit`

Wouldn't it be cool to handle a vector of diagnostics plots by the plot.modelAudit function.
I.e. if the type argument is longer than a single element then list of plots is returned (more or less like in the plot.lm function)

RandomForest

For random forest models auditor requires the particular version of the broom package (unreleased on Cran yet):

devtools::install_github("tidyverse/broom", force=TRUE, ref = "3df7a2d")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.