interpretml / interpret Goto Github PK

Fit interpretable models. Explain blackbox machine learning.

License: MIT License

Batchfile 0.20% Shell 2.16% C++ 60.11% Python 33.45% CSS 0.45% JavaScript 0.28% Jupyter Notebook 0.27% C 1.27% R 0.80% Makefile 0.13% SCSS 0.04% Cuda 0.52% Dockerfile 0.02% TypeScript 0.29%

machine-learning interpretability gradient-boosting blackbox scikit-learn xai interpretml interpretable-machine-learning interpretable-ai transparency

interpret's Introduction

InterpretML

In the beginning machines learned in darkness, and data scientists struggled in the void to explain them.

Let there be light.

InterpretML is an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof. With this package, you can train interpretable glassbox models and explain blackbox systems. InterpretML helps you understand your model's global behavior, or understand the reasons behind individual predictions.

Interpretability is essential for:

Model debugging - Why did my model make this mistake?
Feature Engineering - How can I improve my model?
Detecting fairness issues - Does my model discriminate?
Human-AI cooperation - How can I understand and trust the model's decisions?
Regulatory compliance - Does my model satisfy legal requirements?
High-risk applications - Healthcare, finance, judicial, ...

Installation

Python 3.7+ | Linux, Mac, Windows

pip install interpret
# OR
conda install -c conda-forge interpret

Introducing the Explainable Boosting Machine (EBM)

EBM is an interpretable model developed at Microsoft Research^*. It uses modern machine learning techniques like bagging, gradient boosting, and automatic interaction detection to breathe new life into traditional GAMs (Generalized Additive Models). This makes EBMs as accurate as state-of-the-art techniques like random forests and gradient boosted trees. However, unlike these blackbox models, EBMs produce exact explanations and are editable by domain experts.

Dataset/AUROC	Domain	Logistic Regression	Random Forest	XGBoost	Explainable Boosting Machine
Adult Income	Finance	.907±.003	.903±.002	.927±.001	*.928±.002*
Heart Disease	Medical	.895±.030	.890±.008	.851±.018	*.898±.013*
Breast Cancer	Medical	*.995±.005*	.992±.009	.992±.010	*.995±.006*
Telecom Churn	Business	.849±.005	.824±.004	.828±.010	*.852±.006*
Credit Fraud	Security	.979±.002	.950±.007	*.981±.003*	*.981±.003*

Notebook for reproducing table

Supported Techniques

Interpretability Technique	Type
Explainable Boosting	glassbox model
Decision Tree	glassbox model
Decision Rule List	glassbox model
Linear/Logistic Regression	glassbox model
SHAP Kernel Explainer	blackbox explainer
LIME	blackbox explainer
Morris Sensitivity Analysis	blackbox explainer
Partial Dependence	blackbox explainer

Train a glassbox model

Let's fit an Explainable Boosting Machine

from interpret.glassbox import ExplainableBoostingClassifier

ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)

# or substitute with LogisticRegression, DecisionTreeClassifier, RuleListClassifier, ...
# EBM supports pandas dataframes, numpy arrays, and handles "string" data natively.

Understand the model

from interpret import show

ebm_global = ebm.explain_global()
show(ebm_global)

Understand individual predictions

ebm_local = ebm.explain_local(X_test, y_test)
show(ebm_local)

And if you have multiple model explanations, compare them

show([logistic_regression_global, decision_tree_global])

If you need to keep your data private, use Differentially Private EBMs (see DP-EBMs)

from interpret.privacy import DPExplainableBoostingClassifier, DPExplainableBoostingRegressor

dp_ebm = DPExplainableBoostingClassifier(epsilon=1, delta=1e-5) # Specify privacy parameters
dp_ebm.fit(X_train, y_train)

show(dp_ebm.explain_global()) # Identical function calls to standard EBMs

For more information, see the documentation.

EBMs include pairwise interactions by default. For 3-way interactions and higher see this notebook: https://interpret.ml/docs/python/examples/custom-interactions.html

Interpret EBMs can be fit on datasets with 100 million samples in several hours. For larger workloads consider using distributed EBMs on Azure SynapseML: classification EBMs and regression EBMs

Acknowledgements

InterpretML was originally created by (equal contributions): Samuel Jenkins, Harsha Nori, Paul Koch, and Rich Caruana

EBMs are fast derivative of GA2M, invented by: Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker

Many people have supported us along the way. Check out ACKNOWLEDGEMENTS.md!

We also build on top of many great packages. Please check them out!

Citations

InterpretML

"InterpretML: A Unified Framework for Machine Learning Interpretability" (H. Nori, S. Jenkins, P. Koch, and R. Caruana 2019)

@article{nori2019interpretml,
  title={InterpretML: A Unified Framework for Machine Learning Interpretability},
  author={Nori, Harsha and Jenkins, Samuel and Koch, Paul and Caruana, Rich},
  journal={arXiv preprint arXiv:1909.09223},
  year={2019}
}

Paper link

Explainable Boosting

"Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission" (R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad 2015)

@inproceedings{caruana2015intelligible,
  title={Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission},
  author={Caruana, Rich and Lou, Yin and Gehrke, Johannes and Koch, Paul and Sturm, Marc and Elhadad, Noemie},
  booktitle={Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  pages={1721--1730},
  year={2015},
  organization={ACM}
}

Paper link

"Accurate intelligible models with pairwise interactions" (Y. Lou, R. Caruana, J. Gehrke, and G. Hooker 2013)

@inproceedings{lou2013accurate,
  title={Accurate intelligible models with pairwise interactions},
  author={Lou, Yin and Caruana, Rich and Gehrke, Johannes and Hooker, Giles},
  booktitle={Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining},
  pages={623--631},
  year={2013},
  organization={ACM}
}

Paper link

"Intelligible models for classification and regression" (Y. Lou, R. Caruana, and J. Gehrke 2012)

@inproceedings{lou2012intelligible,
  title={Intelligible models for classification and regression},
  author={Lou, Yin and Caruana, Rich and Gehrke, Johannes},
  booktitle={Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining},
  pages={150--158},
  year={2012},
  organization={ACM}
}

Paper link

"Interpretability, Then What? Editing Machine Learning Models to Reflect Human Knowledge and Values" (Zijie J. Wang, Alex Kale, Harsha Nori, Peter Stella, Mark E. Nunnally, Duen Horng Chau, Mihaela Vorvoreanu, Jennifer Wortman Vaughan, Rich Caruana 2022)

@article{wang2022interpretability,
  title={Interpretability, Then What? Editing Machine Learning Models to Reflect Human Knowledge and Values},
  author={Wang, Zijie J and Kale, Alex and Nori, Harsha and Stella, Peter and Nunnally, Mark E and Chau, Duen Horng and Vorvoreanu, Mihaela and Vaughan, Jennifer Wortman and Caruana, Rich},
  journal={arXiv preprint arXiv:2206.15465},
  year={2022}
}

Paper link

"Axiomatic Interpretability for Multiclass Additive Models" (X. Zhang, S. Tan, P. Koch, Y. Lou, U. Chajewska, and R. Caruana 2019)

@inproceedings{zhang2019axiomatic,
  title={Axiomatic Interpretability for Multiclass Additive Models},
  author={Zhang, Xuezhou and Tan, Sarah and Koch, Paul and Lou, Yin and Chajewska, Urszula and Caruana, Rich},
  booktitle={Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={226--234},
  year={2019},
  organization={ACM}
}

Paper link

"Distill-and-compare: auditing black-box models using transparent model distillation" (S. Tan, R. Caruana, G. Hooker, and Y. Lou 2018)

@inproceedings{tan2018distill,
  title={Distill-and-compare: auditing black-box models using transparent model distillation},
  author={Tan, Sarah and Caruana, Rich and Hooker, Giles and Lou, Yin},
  booktitle={Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society},
  pages={303--310},
  year={2018},
  organization={ACM}
}

Paper link

"Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models" (B. Lengerich, S. Tan, C. Chang, G. Hooker, R. Caruana 2019)

@article{lengerich2019purifying,
  title={Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models},
  author={Lengerich, Benjamin and Tan, Sarah and Chang, Chun-Hao and Hooker, Giles and Caruana, Rich},
  journal={arXiv preprint arXiv:1911.04974},
  year={2019}
}

Paper link

"Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning" (H. Kaur, H. Nori, S. Jenkins, R. Caruana, H. Wallach, J. Wortman Vaughan 2020)

@inproceedings{kaur2020interpreting,
  title={Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning},
  author={Kaur, Harmanpreet and Nori, Harsha and Jenkins, Samuel and Caruana, Rich and Wallach, Hanna and Wortman Vaughan, Jennifer},
  booktitle={Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems},
  pages={1--14},
  year={2020}
}

Paper link

"How Interpretable and Trustworthy are GAMs?" (C. Chang, S. Tan, B. Lengerich, A. Goldenberg, R. Caruana 2020)

@article{chang2020interpretable,
  title={How Interpretable and Trustworthy are GAMs?},
  author={Chang, Chun-Hao and Tan, Sarah and Lengerich, Ben and Goldenberg, Anna and Caruana, Rich},
  journal={arXiv preprint arXiv:2006.06466},
  year={2020}
}

Paper link

Differential Privacy

"Accuracy, Interpretability, and Differential Privacy via Explainable Boosting" (H. Nori, R. Caruana, Z. Bu, J. Shen, J. Kulkarni 2021)

@inproceedings{pmlr-v139-nori21a,
  title = 	 {Accuracy, Interpretability, and Differential Privacy via Explainable Boosting},
  author =       {Nori, Harsha and Caruana, Rich and Bu, Zhiqi and Shen, Judy Hanwen and Kulkarni, Janardhan},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8227--8237},
  year = 	 {2021},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  publisher =    {PMLR}
}

Paper link

LIME

"Why should i trust you?: Explaining the predictions of any classifier" (M. T. Ribeiro, S. Singh, and C. Guestrin 2016)

@inproceedings{ribeiro2016should,
  title={Why should i trust you?: Explaining the predictions of any classifier},
  author={Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos},
  booktitle={Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining},
  pages={1135--1144},
  year={2016},
  organization={ACM}
}

Paper link

SHAP

"A Unified Approach to Interpreting Model Predictions" (S. M. Lundberg and S.-I. Lee 2017)

@incollection{NIPS2017_7062,
 title = {A Unified Approach to Interpreting Model Predictions},
 author = {Lundberg, Scott M and Lee, Su-In},
 booktitle = {Advances in Neural Information Processing Systems 30},
 editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
 pages = {4765--4774},
 year = {2017},
 publisher = {Curran Associates, Inc.},
 url = {https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf}
}

Paper link

"Consistent individualized feature attribution for tree ensembles" (Lundberg, Scott M and Erion, Gabriel G and Lee, Su-In 2018)

@article{lundberg2018consistent,
  title={Consistent individualized feature attribution for tree ensembles},
  author={Lundberg, Scott M and Erion, Gabriel G and Lee, Su-In},
  journal={arXiv preprint arXiv:1802.03888},
  year={2018}
}

Paper link

"Explainable machine-learning predictions for the prevention of hypoxaemia during surgery" (S. M. Lundberg et al. 2018)

@article{lundberg2018explainable,
  title={Explainable machine-learning predictions for the prevention of hypoxaemia during surgery},
  author={Lundberg, Scott M and Nair, Bala and Vavilala, Monica S and Horibe, Mayumi and Eisses, Michael J and Adams, Trevor and Liston, David E and Low, Daniel King-Wai and Newman, Shu-Fang and Kim, Jerry and others},
  journal={Nature Biomedical Engineering},
  volume={2},
  number={10},
  pages={749},
  year={2018},
  publisher={Nature Publishing Group}
}

Paper link

Sensitivity Analysis

"SALib: An open-source Python library for Sensitivity Analysis" (J. D. Herman and W. Usher 2017)

@article{herman2017salib,
  title={SALib: An open-source Python library for Sensitivity Analysis.},
  author={Herman, Jonathan D and Usher, Will},
  journal={J. Open Source Software},
  volume={2},
  number={9},
  pages={97},
  year={2017}
}

Paper link

"Factorial sampling plans for preliminary computational experiments" (M. D. Morris 1991)

@article{morris1991factorial,
  title={},
  author={Morris, Max D},
  journal={Technometrics},
  volume={33},
  number={2},
  pages={161--174},
  year={1991},
  publisher={Taylor \& Francis Group}
}

Paper link

Partial Dependence

"Greedy function approximation: a gradient boosting machine" (J. H. Friedman 2001)

@article{friedman2001greedy,
  title={Greedy function approximation: a gradient boosting machine},
  author={Friedman, Jerome H},
  journal={Annals of statistics},
  pages={1189--1232},
  year={2001},
  publisher={JSTOR}
}

Paper link

Open Source Software

"Scikit-learn: Machine learning in Python" (F. Pedregosa et al. 2011)

@article{pedregosa2011scikit,
  title={Scikit-learn: Machine learning in Python},
  author={Pedregosa, Fabian and Varoquaux, Ga{\"e}l and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and others},
  journal={Journal of machine learning research},
  volume={12},
  number={Oct},
  pages={2825--2830},
  year={2011}
}

Paper link

"Collaborative data science" (Plotly Technologies Inc. 2015)

@online{plotly, 
  author = {Plotly Technologies Inc.}, 
  title = {Collaborative data science}, 
  publisher = {Plotly Technologies Inc.}, 
  address = {Montreal, QC}, 
  year = {2015}, 
  url = {https://plot.ly}
}

Link

"Joblib: running python function as pipeline jobs" (G. Varoquaux and O. Grisel 2009)

@article{varoquaux2009joblib,
  title={Joblib: running python function as pipeline jobs},
  author={Varoquaux, Ga{\"e}l and Grisel, O},
  journal={packages. python. org/joblib},
  year={2009}
}

Link

Videos

External links

Papers that use or compare EBMs

Books that cover EBMs

External tools

Contact us

There are multiple ways to get in touch:

Email us at [email protected]
Or, feel free to raise a GitHub issue

If a tree fell in your random forest, would anyone notice?

interpret's People

Contributors

Stargazers

Watchers

Forkers

snowdj easyfmxu ridwanalam spencerai ttnsdcn imatiach-msft lilsummer raunakdune bolaben satyakamacodes jingx8885 xieliaing benzei antoniosql helenligit d131412 gaimjkp shlpu xchuwenbo simonleegit allencod matthewconners rajchowdhury420 yilin0830 chaoyue729 coldkey2003 littlehann cxz rainscut victor8733 mazelinx notspicyzhan you-uoy-zhao ye-man dgq2011 mindis mayank-k-jha feiwofeifeixiaowo jendoubizaid cahuja1992 ace139 mldeveloper01 dfrankow ananthc lccc-scu y-bai sidharthiimc sumitsidana arturosorio tarrysingh ideaplexus marcelomata shubhampachori12110095 smsinks volkerh 100rabh1401 arunkumarramanan gdcollect abefukasawa minuteswithmetrics cbentes hannody manikant92 acedesci lhoang29 anshuman-unilu amit2014 matthiasnickles lplenka mojothejojo youjoinit vipuldcoder serviolimareina phdkiran roundegg benediktwagner sakampavankumar udrain antoinelypro axelderomblay ybenzaki wcreus nfsrules tgaston oldregan sprinterzzj regzhuce averroes zhengt4869 gavinljj wumalbert rajagurunath karankatiyar92 leo-xxx leaderyangzi literaryprogrammer anton4i jbdatascience yuan776 elberaguilar

interpret's Issues

Plotting Questions

All-
I fit a model : ebm.fit(X,y). Then I am able to see overall feature importance running:

import plotly 
plotly.tools.set_credentials_file(username='XXX', api_key='123')
import plotly.plotly as plotly_py
ebm_global = ebm.explain_global()
plotly_py.iplot(ebm_global.visualize())

Question 1: Is there a way to see more than the default top features?

Then if we pass an index into visualize, we get the shape of the effect of that feature:

plotly_py.iplot(ebm_global.visualize(0))

I can see local explanations:

ebm_local = ebm.explain_local(X,y)
plotly_py.iplot(ebm_local.visualize(20))

QUESTION 2: IS it possible to get this data back and not plot it?

Question 3:

How does the above differ from show()? I have been unable to get this to work but it looks like the only difference is a drop down selector?

Weights on data

Hi!

First thanks for your amazing work.
Would it be possible to take into account weights on the data samples, similar to what is done in a few models of sklearn?
And a last unrelated question: will you consider the option of adding alternatives for fitting GAMs (other than boosting, for instance using splines)?

Thanks,

How to interpret the result of `explain_perf` function?

Hi all

I am running the notebook:

https://github.com/microsoft/interpret/blob/master/examples/notebooks/Explaining%20Blackbox%20Regressors.ipynb

The result of explain_perf function:

How could I interpret the result, i.e. what are the x-axis and y-axis of the graph?

Many thanks

pip install Error (pyscaffold)

Python 3.6
hello when i run

pip install  -U interpret

i had the following error
Could not find suitable distribution for Requirement.parse('pyscaffold<3.1a0,>=3.0a0')

so i dowgraded the pyscaffold package version as required

pip install pyscaffold==3.0.0

then i tried to install again
i got

pyscaffold.exceptions.OldSetuptools: Your setuptools version is too old (<30.3.0). Use `pip install -U setuptools` to upgrade.

the probelme is that my setuptools version is 41.2.0

thanks for help

EBM inside LightGBM?

Hi!

Maybe this is a silly question (mostly because codebases are obviously different), but what do you think about integrating this library with LightGBM, since it is also a top-notch gradient boosting library by Microsoft?

Now that the project is small, this should be easier than trying to do the same later. In the long term, maintenance costs should go down IMHO...

BTW great job with this library. I absolutely love your approach.

How to interpret the score on Y-axis of the global interpretation of each feature plot in classification model?

Hi Interpret Team,
First of all congratulations for the fantastic job. This package is very useful to run and interpret models.

Could you please give an explanation of the score on the Y axis of the global interpretation of the each feature plot?

Thanks
Bharath

interpretation of local feature importances

Hi,
thank you for this nice package.

It would be helpful if the examples where extended with some documentation on how the visualized results are computed.

For instance, what puzzles me is the computation of the local feature importance values in relation to the predicted value considering the Explainable Boosting Classifier.

How do they relate to each other? Is there a relation between them (as the model is a GAM I would expect as a first guess some linear relation, but it doesn't add up)?

My apologies if the question reveals some ignorance on my side to standard literature.

Kind regards,

Tomas

##########
EDIT below:

by try and error, I believe I found the solution: the probability score given by the Explainable Boosting Classifier is computed as:

`
def compute_p(rs):
return 1./(1. + np.exp(-rs))

t = np.sum(ebm_local._internal_obj['specific'][0]['scores']) + ebm_local._internal_obj['specific'][0]['extra']['scores']
p = compute_p(t)
`

How to get categorical variables in the graph?

I am opening an issue because I am trying to reproduce the results obtained in this graph:

I would like to get the categorical features as text, just as it can be seen in the image. However, when doing it by myself, if I don't convert the categorical values to numeric I am getting an error, and therefore I am not able to reproduce the graph as it gets the number the variable was encoded with:

Do you know why is this happening and what should I do in order to get it with the text? Thanks

Bug in Notebook for reproducing table

In the UCI heart disease dataset used in the load_heart_dataset function in the notebook, the label is given the second last column instead of the last column. The last column in the dataset corresponds to a location variable (and takes values such as 'Hungary', 'VA' etc).

"Error loading dependecies" in show() method

Hi guys, thanks for this great contribution.

Each time I use the method 'show()' I got the following error:
"Error loading dependencies"

Examples:

ebm_global = ebm.explain_global(name='EBM')
show(ebm_global) # "error loading dependencies

I'm using Python 3.7

Thanks in advance,

Nelson

"127.0.0.1 refused to connect" error after re-opening Jupyter notebook

Thanks for this library!

I have been able to run everything as it is following the README.md, However, when I re-open the Jupyter notebook I have been using, instead of the images created with the show(hist) , show(ebm_perf) commands I am getting the following error: 127.0.0.1 refused to connect.

I have read the different issues and some similar problem happened on #14, but I couldn't figure out how to fix the issue myself. The same happens if I download the jupyter notebook in .html format and I open it after closing the Jupyter notebook connection.

Can anyone throw some light on it?

scoring on the continuous feature looks unreal

This graphic show the scoring on a continuous variable, contrary to the spline regression, the score has a lot of volatility, it might overfit the local observations. I tried to tweak the fitting parametters to have something smoother, but no success. What's the recommendation?

Dashboard API

Hi all,

Thanks so much for your work on this package!

Here I'm requesting a Dashboard API -- maybe just a more transparent interface with the Dash backend so I can customize how the predictions and explanations are served. For example, I want to serve/access individual local explanations as .png's remotely by using some row ID. If this can be done easily on my end, please excuse my bothering you all!

Feature importance using Permutation

Hi,
At work, we use the "Permutation Importance" method to inspect feature importance.
We use the awesome library eli5 for that.

Would it be possible to include a version of that in this library?

Install Fails - Greenlet

I am trying to pip install interpret and currently the following error appears to be stopping the install. Any ideas for getting around this and proceeding?

I am using Ubuntu 16.04
Cannot uninstall 'greenlet'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

support pairwise interactions

Does this library currently support finding pairwise interactions, or visualizing pairwise interactions?

Question on Calibration

I'm testing the classifier algorithm on a dataset (unfortunately confidential) with a binary target (70,000 rows and about 40 predictors) and seeing that while the rank ordering is competitive with other tree based methods, the predictions seem poorly calibrated - even on the training data itself. The prediction is always lower than the actual. I am wondering if there might be a cause based on the algorithm that could be tuned or if this has been seen in development?

The model is trained using the default settings (I have tweaked multiple parameters and not found any impact)

ebm = ExplainableBoostingClassifier(n_estimators=16,interactions=0,n_jobs=10)
ebm.fit(X,y)

The prediction is made on the training data.

p=ebm.predict_proba(X)[:,1]
print(np.mean(p))  # THIS IS 0.023

I rank the predictions into deciles (10% bins) and plot the actual target rate and the mean prediction probability for each decile. The rank order is good, AUC is high (this is the training data of course) but we underpredict systematically. The red horizontal line is the overall mean of the training data which is significantly higher than the mean prediction noted above (0.1 versus 0.02)

NaNs cause hanging issue while training

Hey there, thank you for this awesome project!

At first I thought this was an issue with joblib but found out it had to do with NaNs in my dataframe.

Problem: when I try and train using NaNs the job basically just hangs and is pegging only one core. It seems like a better case would be to return an error or something because I had no idea it was just hanging since #7 isn't implemented yet.

I am running Fedora 30 with python 3.7.3.

Perhaps a simple check for NaNs in the array would be useful as an error to prevent this from happening? Lemme know how I can help and thank you! Great speech at Strata btw

Thank you

-Matt

Use graphs in a Jupyter notebook?

Thanks for this library.

I'm following along with the README.md and got to:

from interpret import show

ebm_global = ebm.explain_global()
show(ebm_global)

When I run that in my Jupyter notebook I get: RuntimeError: Could not find open port.

Maybe it's trying to run a web server from a notebook?

Can I just make the individual graphs in the notebook? How?

I see functions in interpret.visual.plot, but I'm having a bit of trouble finding the right objects to pass to it.

TerminatedWorkerError

Got this erros on an machine with 72 cores and 137GB ram. The same data runs well with XGBoost.

I tryied to reduce the n_jobs and n_estimator, same error.

---------------------------------------------------------------------------
TerminatedWorkerError                     Traceback (most recent call last)
<ipython-input-46-777da5fd101f> in <module>
      1 import time
      2 before = time.time()
----> 3 ebm.fit(X_train.values, y_train)
      4 total = (time.time()-before)/60
      5 print(f"spent %.2f minutes" % total)

~/.local/lib/python3.6/site-packages/interpret/glassbox/ebm/ebm.py in fit(self, X, y)
    783         )
    784 
--> 785         estimators = provider.parallel(train_model, train_model_args_iter)
    786 
    787         if isinstance(self.interactions, int) and self.interactions > 0:

~/.local/lib/python3.6/site-packages/interpret/utils/distributed.py in parallel(self, compute_fn, compute_args_iter)
     16     def parallel(self, compute_fn, compute_args_iter):
     17         results = Parallel(n_jobs=self.n_jobs)(
---> 18             delayed(compute_fn)(*args) for args in compute_args_iter
     19         )
     20         # NOTE: Force gc, as Python does not free native memory easy.

~/.local/lib/python3.6/site-packages/joblib/parallel.py in __call__(self, iterable)
    932 
    933             with self._backend.retrieval_context():
--> 934                 self.retrieve()
    935             # Make sure that we get a last message telling us we are done
    936             elapsed_time = time.time() - self._start_time

~/.local/lib/python3.6/site-packages/joblib/parallel.py in retrieve(self)
    831             try:
    832                 if getattr(self._backend, 'supports_timeout', False):
--> 833                     self._output.extend(job.get(timeout=self.timeout))
    834                 else:
    835                     self._output.extend(job.get())

~/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    519         AsyncResults.get from multiprocessing."""
    520         try:
--> 521             return future.result(timeout=timeout)
    522         except LokyTimeoutError:
    523             raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGSEGV(-11)}

Some details:

~$ python3 --version
Python 3.6.7
~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.2 LTS
Release:	18.04
Codename:	bionic
~$ pip3 freeze
asn1crypto==0.24.0
atomicwrites==1.3.0
attrs==19.1.0
Automat==0.6.0
backcall==0.1.0
bleach==3.1.0
blinker==1.4
certifi==2019.3.9
chardet==3.0.4
Click==7.0
cloud-init==18.4
cloudpickle==0.8.1
colorama==0.3.7
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
cryptography==2.1.4
cycler==0.10.0
dash==0.39.0
dash-core-components==0.44.0
dash-cytoscape==0.1.1
dash-html-components==0.14.0
dash-renderer==0.20.0
dash-table==3.6.0
dash-table-experiments==0.6.0
dask==1.2.0
decorator==4.4.0
defusedxml==0.6.0
distro-info==0.18
entrypoints==0.3
Flask==1.0.2
Flask-Compress==1.4.0
gevent==1.4.0
greenlet==0.4.15
hibagent==1.0.1
httplib2==0.9.2
hyperlink==17.3.1
hypothesis==4.23.5
idna==2.8
imageio==2.5.0
incremental==16.10.1
interpret==0.1.1
ipykernel==5.1.1
ipython==7.5.0
ipython-genutils==0.2.0
ipywidgets==7.4.2
itsdangerous==1.1.0
jedi==0.13.3
Jinja2==2.10.1
joblib==0.13.2
jsonpatch==1.16
jsonpointer==1.10
jsonschema==3.0.1
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.4.0
keyring==10.6.0
keyrings.alt==3.0
kiwisolver==1.1.0
language-selector==0.1
lime==0.1.1.34
locket==0.2.0
MarkupSafe==1.1.1
matplotlib==3.0.3
mistune==0.8.4
more-itertools==7.0.0
nbconvert==5.5.0
nbformat==4.4.0
netifaces==0.10.4
networkx==2.3
notebook==5.7.8
numpy==1.16.3
oauthlib==2.0.6
PAM==0.4.2
pandas==0.24.2
pandocfilters==1.4.2
parso==0.4.0
partd==0.3.10
pexpect==4.7.0
pickleshare==0.7.5
Pillow==6.0.0
plotly==3.9.0
pluggy==0.11.0
prometheus-client==0.6.0
prompt-toolkit==2.0.9
ptyprocess==0.6.0
py==1.8.0
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycrypto==2.6.1
Pygments==2.4.0
pygobject==3.26.1
PyJWT==1.5.3
pyOpenSSL==17.5.0
pyparsing==2.4.0
pyrsistent==0.15.2
pyserial==3.4
pytest==4.5.0
pytest-runner==4.4
python-apt==1.6.3+ubuntu1
python-dateutil==2.8.0
python-debian==0.1.32
pytz==2019.1
PyWavelets==1.0.3
pyxdg==0.25
PyYAML==3.12
pyzmq==18.0.1
qtconsole==4.4.3
requests==2.22.0
requests-unixsocket==0.1.5
retrying==1.3.3
SALib==1.3.4
scikit-image==0.15.0
scikit-learn==0.21.1
scipy==1.2.1
SecretStorage==2.3.1
Send2Trash==1.5.0
service-identity==16.0.0
shap==0.28.5
six==1.12.0
sklearn==0.0
skope-rules==1.0.0
ssh-import-id==5.7
systemd-python==234
terminado==0.8.2
testpath==0.4.2
toolz==0.9.0
tornado==6.0.2
tqdm==4.32.1
traitlets==4.3.2
Twisted==17.9.0
ufw==0.35
unattended-upgrades==0.1
urllib3==1.25.2
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.15.4
widgetsnbextension==3.4.2
xgboost==0.82
zope.interface==4.3.2

Could not find open port

I am getting an error "RuntimeError: Could not find open port" when I use show comment. Is there a way to set up the ip address and port number to run the dashboard. Thanks

import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
header=None)
df.columns = [
"Age", "WorkClass", "fnlwgt", "Education", "EducationNum",
"MaritalStatus", "Occupation", "Relationship", "Race", "Gender",
"CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", "Income"
]
train_cols = df.columns[0:-1]
label = df.columns[-1]
X = df[train_cols]
y = df[label].apply(lambda x: 0 if x == " <=50K" else 1) #Turning response into 0 and 1

seed = 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

from interpret import show
from interpret.data import ClassHistogram

hist = ClassHistogram().explain_data(X_train, y_train, name = 'Train Data')
show(hist)

RuntimeError Traceback (most recent call last)
in
3
4 hist = ClassHistogram().explain_data(X_train, y_train, name = 'Train Data')
----> 5 show(hist)

~/anaconda/envs/fastai/lib/python3.7/site-packages/interpret/visual/interactive.py in show(explanation, share_tables)
120 except Exception as e: # pragma: no cover
121 log.error(e, exc_info=True)
--> 122 raise e
123
124 return None

~/anaconda/envs/fastai/lib/python3.7/site-packages/interpret/visual/interactive.py in show(explanation, share_tables)
110 # Initialize server if needed
111 if this.app_runner is None: # pragma: no cover
--> 112 init_show_server(this.app_addr)
113
114 # Register

~/anaconda/envs/fastai/lib/python3.7/site-packages/interpret/visual/interactive.py in init_show_server(addr, base_url, use_relative_links)
83 log.debug("Create app runner at {0}".format(addr))
84 this.app_runner = AppRunner(
---> 85 addr, base_url=base_url, use_relative_links=use_relative_links
86 )
87 this.app_runner.start()

~/anaconda/envs/fastai/lib/python3.7/site-packages/interpret/visual/dashboard.py in init(self, addr, base_url, use_relative_links)
58 msg = "Could not find open port"
59 log.error(msg)
---> 60 raise RuntimeError(msg)
61 else:
62 self.ip = addr[0]

RuntimeError: Could not find open port

how to graph without spinning a local web-server

Is this API available yet? We cant seem to plot it locally with plotly offline.

Hi @dfrankow, thanks for the issue! We're just about to introduce a few new API changes that should make this easier in our next release. One, we'll let you specify a port in the show method, so that you can pick your own port that you know is open. Second, we'll introduce a new function that doesn't spin up the local web-server, and directly uses plotly to visualize it. For now, here are a few notes:

visualize() does return a plotly object, and you can use plotly.offline so that you don't need an api key. And yes, if you pass in a key to visualize() , you can get a specific graph back out!

If you run this code at the top of your notebook:

from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

you can then use "iplot(plotly_figure)" in your notebook to get a direct plotly graph. We'll have a nicer API around this soon!

Originally posted by @interpret-ml in #1 (comment)

increase cost for predicting minority class?

Does the ExplainableBoostingClassifier have a mechanism for dealing with imbalanced data?

SHAP doesn't show feature names

The SHAP local explanation chart doesn't display the names of the features.

EBM parameter search with Ray Tune

Hi!

I'm using Tune to search for optimal set of hyperparameters for different models. It works without issues for CatBoost and Keras models, however, I haven't been able to successfully run it with EBM on more than one CPU, let alone on GPU (it seems that EBM can't be run on GPU at all). When I haven't explicitly set n_jobs=1, I get this warning:

UserWarning: Loky-backed parallel loops cannot be nested below threads, setting n_jobs=1

Do you know is there something I'm missing or is it not possible to run EBM with Tune on multiple CPUs?

Code to reproduce

import ray
from ray import tune

ray.init(ignore_reinit_error=True)

import numpy as np
from interpret.glassbox import ExplainableBoostingClassifier

def train_ebm(config, reporter):
    n_samples = 100000
    x_train = np.random.rand(n_samples, 3)  # Random training data with 3 features
    y_train = np.random.randint(0, 2, n_samples)  # Random binary labels
    
    model = ExplainableBoostingClassifier(
        learning_rate=config['lr'], 
        n_jobs=12
    )
    
    model.fit(x_train, y_train)
    
    reporter()

ebm_experiment = tune.Experiment(
    name='ebm_test', 
    run=train_ebm, 
    num_samples=1,
    resources_per_trial={
        'gpu': 0,
        'cpu': 12
    },
    config={
        'lr': 0.01,
    }
)

trials = tune.run_experiments(ebm_experiment)

Versions
ray: 0.7.3
interpret: 0.1.15

Custom validation set

Hi,
Would it be possible to add an option to specify the validation data in EBM fit() function instead of letting sklearn calculate it?

Bump dash version

dash 1.0 was released on June 20, 2019 and 1.1 on August 5, 2019.

The currently supported dash==0.39 is from March 5, 2019.

Range-based instead of exact dependencies would help when building environments where other packages than interpret are installed.

Installation failed on Windows 10

Installation failed on Windows 10
it is shame Microsoft can not develop package to be in installed on Windows

error message is

  Found existing installation: greenlet 0.4.12
Cannot uninstall 'greenlet'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

all installation history

Microsoft Windows [Version 10.0.17763.557]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\Users\sndr\Downloads>pip install numpy scipy pyscaffold
Requirement already satisfied: numpy in c:\users\sndr\anaconda3\lib\site-packages (1.15.4)
Requirement already satisfied: scipy in c:\users\sndr\anaconda3\lib\site-packages (1.0.0)
Collecting pyscaffold
  Downloading https://files.pythonhosted.org/packages/d3/3f/0ce77998683cb7967ba7d98b114b8a6a954a731b812f455dee57f1636853/PyScaffold-3.1-py3-none-any.whl (163kB)
    100% |████████████████████████████████| 174kB 75kB/s
Requirement already satisfied: setuptools>=38.3 in c:\users\sndr\anaconda3\lib\site-packages (from pyscaffold) (40.8.0)
Installing collected packages: pyscaffold
Successfully installed pyscaffold-3.1

C:\Users\sndr\Downloads>pip install -U interpret
Collecting interpret
  Downloading https://files.pythonhosted.org/packages/d8/0c/3b4b55e69dad95131126ffb3eaa7a8b2f43e7796775aa5dd8123531fab8a/interpret-0.1.9-py3-none-any.whl (4.1MB)
    100% |████████████████████████████████| 4.1MB 3.3MB/s
Collecting pytest>=4.3.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/b3/eb/df264c0b1ff4aaf263375dc09aabd9093364f66060be9b26f3a2c166d558/pytest-4.6.3-py2.py3-none-any.whl (229kB)
    100% |████████████████████████████████| 235kB 6.0MB/s
Collecting gevent>=1.4.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/51/97/2e1e8aa7ea27171c3e249480d382e78b49ab4cead5dafb2124d2a1b58a83/gevent-1.4.0-cp36-cp36m-win_amd64.whl (3.0MB)
    100% |████████████████████████████████| 3.0MB 3.8MB/s
Collecting skope-rules>=1.0.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/56/b0/b56fb8d186f35089a469dc788c32ac99cf0276eae567736325b179b71db0/skope-rules-1.0.0.tar.gz (2.0MB)
    100% |████████████████████████████████| 2.0MB 3.4MB/s
Collecting dash==0.39.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/38/c0/353ba9f56f171389f0b4985f0481805219fc1921d651586c51345b89c1ea/dash-0.39.0.tar.gz (40kB)
    100% |████████████████████████████████| 40kB 2.6MB/s
Requirement already satisfied, skipping upgrade: nbconvert>=5.4.1 in c:\users\sndr\anaconda3\lib\site-packages (from interpret) (5.4.1)
Collecting dash-table-experiments==0.6.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/6f/4a/e201fe7419a250c35635fb0b81f3cba8cf19ed4e3663fda6cd08e7bd0655/dash_table_experiments-0.6.0.tar.gz (738kB)
    100% |████████████████████████████████| 747kB 4.5MB/s
Collecting dash-renderer==0.20.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/c4/dd/f686321d054bb1e145d3a7d1f6600516de535b0d597bcf7701dbb96b1262/dash_renderer-0.20.0.tar.gz (920kB)
    100% |████████████████████████████████| 921kB 5.4MB/s
Collecting SALib>=1.3.3 (from interpret)
  Downloading https://files.pythonhosted.org/packages/12/8b/14f6c0f0a12b29d5e1766e7a585269cd6ec9728a63886c161a6eddb4e7fa/SALib-1.3.7.tar.gz (854kB)
    100% |████████████████████████████████| 860kB 3.8MB/s
Collecting lime>=0.1.1.33 (from interpret)
  Downloading https://files.pythonhosted.org/packages/07/20/a4a59ed562610e19fea333da48bb5fab978a72acbe8e831930f444cd69c9/lime-0.1.1.34.tar.gz (272kB)
    100% |████████████████████████████████| 276kB 3.8MB/s
Collecting shap>=0.28.5 (from interpret)
  Downloading https://files.pythonhosted.org/packages/5d/34/4a3e429f969cc69ab4e910154360adab3f56cdde02a42f12e170625e71e1/shap-0.29.1-cp36-cp36m-win_amd64.whl (258kB)
    100% |████████████████████████████████| 266kB 2.8MB/s
Collecting ipython>=7.4.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/a9/2e/41dce4ed129057e05a555a7f9629aa2d5f81fdcd4d16568bc24b75a1d2c9/ipython-7.5.0-py3-none-any.whl (770kB)
    100% |████████████████████████████████| 778kB 6.5MB/s
Collecting dash-core-components==0.44.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/07/8b/e7193b60288f62c6c40da7d3fdbd01ccdc6752dbf25e9ef60912a5948938/dash_core_components-0.44.0.tar.gz (4.2MB)
    100% |████████████████████████████████| 4.2MB 3.8MB/s
Collecting scipy>=1.2.1 (from interpret)
  Downloading https://files.pythonhosted.org/packages/9e/fd/9a995b7fc18c6c17ce570b3cfdabffbd2718e4f1830e94777c4fd66e1179/scipy-1.3.0-cp36-cp36m-win_amd64.whl (30.5MB)
    100% |████████████████████████████████| 30.5MB 1.1MB/s
Collecting psutil>=5.6.2 (from interpret)
  Downloading https://files.pythonhosted.org/packages/86/91/f15a3aae2af13f008ed95e02292d1a2e84615ff42b7203357c1c0bbe0651/psutil-5.6.3-cp36-cp36m-win_amd64.whl (234kB)
    100% |████████████████████████████████| 235kB 3.5MB/s
Requirement already satisfied, skipping upgrade: pandas>=0.24.0 in c:\users\sndr\anaconda3\lib\site-packages (from interpret) (0.24.2)
Requirement already satisfied, skipping upgrade: ipykernel>=5.1.0 in c:\users\sndr\anaconda3\lib\site-packages (from interpret) (5.1.0)
Collecting dash-html-components==0.14.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/08/1f/943c0f90d957fdff6c5968ea80694b2959d0b0ec959be17a1478e3c97e5a/dash_html_components-0.14.0.tar.gz (537kB)
    100% |████████████████████████████████| 542kB 5.9MB/s
Collecting scikit-learn>=0.20.0 (from interpret)
  Downloading https://files.pythonhosted.org/packages/a9/bc/18663f6d75838b73353ba49fabd631347e68470ec9e623d7b3f3ccd4f426/scikit_learn-0.21.2-cp36-cp36m-win_amd64.whl (5.9MB)
    100% |████████████████████████████████| 5.9MB 3.4MB/s
Collecting plotly>=3.8.1 (from interpret)
  Downloading https://files.pythonhosted.org/packages/ff/75/3982bac5076d0ce6d23103c03840fcaec90c533409f9d82c19f54512a38a/plotly-3.10.0-py2.py3-none-any.whl (41.5MB)
    100% |████████████████████████████████| 41.5MB 390kB/s
Requirement already satisfied, skipping upgrade: joblib>=0.12.5 in c:\users\sndr\anaconda3\lib\site-packages (from interpret) (0.13.1)
Collecting dash-cytoscape==0.1.1 (from interpret)
  Downloading https://files.pythonhosted.org/packages/aa/93/d9db22331dcad4a055631372816bf4544a1a1a852fb2fa3a2905c6682198/dash_cytoscape-0.1.1.tar.gz (3.4MB)
    100% |████████████████████████████████| 3.4MB 3.2MB/s
Collecting pytest-runner>=4.4 (from interpret)
  Downloading https://files.pythonhosted.org/packages/f8/31/f291d04843523406f242e63b5b90f7b204a756169b4250ff213e10326deb/pytest_runner-5.1-py2.py3-none-any.whl
Requirement already satisfied, skipping upgrade: numpy>=1.15.1 in c:\users\sndr\anaconda3\lib\site-packages (from interpret) (1.15.4)
Requirement already satisfied, skipping upgrade: six>=1.10.0 in c:\users\sndr\anaconda3\lib\site-packages (from pytest>=4.3.0->interpret) (1.11.0)
Requirement already satisfied, skipping upgrade: py>=1.5.0 in c:\users\sndr\anaconda3\lib\site-packages (from pytest>=4.3.0->interpret) (1.5.2)
Requirement already satisfied, skipping upgrade: colorama; sys_platform == "win32" in c:\users\sndr\anaconda3\lib\site-packages (from pytest>=4.3.0->interpret) (0.3.9)
Collecting pluggy<1.0,>=0.12 (from pytest>=4.3.0->interpret)
  Downloading https://files.pythonhosted.org/packages/06/ee/de89e0582276e3551df3110088bf20844de2b0e7df2748406876cc78e021/pluggy-0.12.0-py2.py3-none-any.whl
Collecting importlib-metadata>=0.12 (from pytest>=4.3.0->interpret)
  Downloading https://files.pythonhosted.org/packages/bd/23/dce4879ec58acf3959580bfe769926ed8198727250c5e395e6785c764a02/importlib_metadata-0.18-py2.py3-none-any.whl
Requirement already satisfied, skipping upgrade: attrs>=17.4.0 in c:\users\sndr\anaconda3\lib\site-packages (from pytest>=4.3.0->interpret) (17.4.0)
Requirement already satisfied, skipping upgrade: packaging in c:\users\sndr\anaconda3\lib\site-packages (from pytest>=4.3.0->interpret) (16.8)
Requirement already satisfied, skipping upgrade: wcwidth in c:\users\sndr\anaconda3\lib\site-packages (from pytest>=4.3.0->interpret) (0.1.7)
Collecting atomicwrites>=1.0 (from pytest>=4.3.0->interpret)
  Downloading https://files.pythonhosted.org/packages/52/90/6155aa926f43f2b2a22b01be7241be3bfd1ceaf7d0b3267213e8127d41f4/atomicwrites-1.3.0-py2.py3-none-any.whl
Requirement already satisfied, skipping upgrade: more-itertools>=4.0.0; python_version > "2.7" in c:\users\sndr\anaconda3\lib\site-packages (from pytest>=4.3.0->interpret) (6.0.0)
Collecting greenlet>=0.4.14; platform_python_implementation == "CPython" (from gevent>=1.4.0->interpret)
  Downloading https://files.pythonhosted.org/packages/a9/a3/2a7a15c2dc23f764eaed46d41e081659aadf45570b4170156dde1c76d4f7/greenlet-0.4.15-cp36-cp36m-win_amd64.whl
Collecting cffi>=1.11.5; sys_platform == "win32" and platform_python_implementation == "CPython" (from gevent>=1.4.0->interpret)
  Downloading https://files.pythonhosted.org/packages/f1/b5/ca3583cbf7975f53b030be773caeabd4e19bac467714e525eaff447a8ac8/cffi-1.12.3-cp36-cp36m-win_amd64.whl (171kB)
    100% |████████████████████████████████| 174kB 2.1MB/s
Requirement already satisfied, skipping upgrade: Flask>=0.12 in c:\users\sndr\anaconda3\lib\site-packages (from dash==0.39.0->interpret) (0.12.2)
Collecting flask-compress (from dash==0.39.0->interpret)
  Downloading https://files.pythonhosted.org/packages/0e/2a/378bd072928f6d92fd8c417d66b00c757dc361c0405a46a0134de6fd323d/Flask-Compress-1.4.0.tar.gz
Collecting dash-table==3.6.0 (from dash==0.39.0->interpret)
  Downloading https://files.pythonhosted.org/packages/a3/3a/eae584bb7eccdf93d2931c4ebf43e55937cf22d51ad63551241fc83d68fc/dash_table-3.6.0.tar.gz (468kB)
    100% |████████████████████████████████| 471kB 5.1MB/s
Requirement already satisfied, skipping upgrade: mistune>=0.8.1 in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (0.8.3)
Requirement already satisfied, skipping upgrade: jinja2 in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (2.10)
Requirement already satisfied, skipping upgrade: pygments in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (2.2.0)
Requirement already satisfied, skipping upgrade: traitlets>=4.2 in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (4.3.2)
Requirement already satisfied, skipping upgrade: jupyter_core in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (4.4.0)
Requirement already satisfied, skipping upgrade: nbformat>=4.4 in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (4.4.0)
Requirement already satisfied, skipping upgrade: entrypoints>=0.2.2 in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (0.2.3)
Requirement already satisfied, skipping upgrade: bleach in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (3.1.0)
Requirement already satisfied, skipping upgrade: pandocfilters>=1.4.1 in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (1.4.2)
Requirement already satisfied, skipping upgrade: testpath in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (0.3.1)
Requirement already satisfied, skipping upgrade: defusedxml in c:\users\sndr\anaconda3\lib\site-packages (from nbconvert>=5.4.1->interpret) (0.5.0)
Requirement already satisfied, skipping upgrade: matplotlib in c:\users\sndr\anaconda3\lib\site-packages (from SALib>=1.3.3->interpret) (2.2.2)
Requirement already satisfied, skipping upgrade: scikit-image>=0.12 in c:\users\sndr\anaconda3\lib\site-packages (from lime>=0.1.1.33->interpret) (0.13.1)
Requirement already satisfied, skipping upgrade: tqdm in c:\users\sndr\anaconda3\lib\site-packages (from shap>=0.28.5->interpret) (4.26.0)
Collecting prompt-toolkit<2.1.0,>=2.0.0 (from ipython>=7.4.0->interpret)
  Downloading https://files.pythonhosted.org/packages/f7/a7/9b1dd14ef45345f186ef69d175bdd2491c40ab1dfa4b2b3e4352df719ed7/prompt_toolkit-2.0.9-py3-none-any.whl (337kB)
    100% |████████████████████████████████| 337kB 6.8MB/s
Requirement already satisfied, skipping upgrade: pickleshare in c:\users\sndr\anaconda3\lib\site-packages (from ipython>=7.4.0->interpret) (0.7.4)
Requirement already satisfied, skipping upgrade: jedi>=0.10 in c:\users\sndr\anaconda3\lib\site-packages (from ipython>=7.4.0->interpret) (0.11.1)
Requirement already satisfied, skipping upgrade: setuptools>=18.5 in c:\users\sndr\anaconda3\lib\site-packages (from ipython>=7.4.0->interpret) (40.8.0)
Requirement already satisfied, skipping upgrade: decorator in c:\users\sndr\anaconda3\lib\site-packages (from ipython>=7.4.0->interpret) (4.2.1)
Collecting backcall (from ipython>=7.4.0->interpret)
  Downloading https://files.pythonhosted.org/packages/84/71/c8ca4f5bb1e08401b916c68003acf0a0655df935d74d93bf3f3364b310e0/backcall-0.1.0.tar.gz
Requirement already satisfied, skipping upgrade: pytz>=2011k in c:\users\sndr\anaconda3\lib\site-packages (from pandas>=0.24.0->interpret) (2018.9)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.5.0 in c:\users\sndr\anaconda3\lib\site-packages (from pandas>=0.24.0->interpret) (2.6.1)
Requirement already satisfied, skipping upgrade: jupyter-client in c:\users\sndr\anaconda3\lib\site-packages (from ipykernel>=5.1.0->interpret) (5.2.4)
Requirement already satisfied, skipping upgrade: tornado>=4.2 in c:\users\sndr\anaconda3\lib\site-packages (from ipykernel>=5.1.0->interpret) (4.5.3)
Requirement already satisfied, skipping upgrade: requests in c:\users\sndr\anaconda3\lib\site-packages (from plotly>=3.8.1->interpret) (2.18.4)
Collecting retrying>=1.3.3 (from plotly>=3.8.1->interpret)
  Downloading https://files.pythonhosted.org/packages/44/ef/beae4b4ef80902f22e3af073397f079c96969c69b2c7d52a57ea9ae61c9d/retrying-1.3.3.tar.gz
Collecting zipp>=0.5 (from importlib-metadata>=0.12->pytest>=4.3.0->interpret)
  Downloading https://files.pythonhosted.org/packages/a0/0f/9bf71d438d2e9d5fd0e4569ea4d1a2b6f5a524c234c6d221b494298bb4d1/zipp-0.5.1-py2.py3-none-any.whl
Requirement already satisfied, skipping upgrade: pyparsing in c:\users\sndr\anaconda3\lib\site-packages (from packaging->pytest>=4.3.0->interpret) (2.2.0)
Requirement already satisfied, skipping upgrade: pycparser in c:\users\sndr\anaconda3\lib\site-packages (from cffi>=1.11.5; sys_platform == "win32" and platform_python_implementation == "CPython"->gevent>=1.4.0->interpret) (2.18)
Requirement already satisfied, skipping upgrade: Werkzeug>=0.7 in c:\users\sndr\anaconda3\lib\site-packages (from Flask>=0.12->dash==0.39.0->interpret) (0.14.1)
Requirement already satisfied, skipping upgrade: itsdangerous>=0.21 in c:\users\sndr\anaconda3\lib\site-packages (from Flask>=0.12->dash==0.39.0->interpret) (0.24)
Requirement already satisfied, skipping upgrade: click>=2.0 in c:\users\sndr\anaconda3\lib\site-packages (from Flask>=0.12->dash==0.39.0->interpret) (6.7)
Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in c:\users\sndr\anaconda3\lib\site-packages (from jinja2->nbconvert>=5.4.1->interpret) (1.0)
Requirement already satisfied, skipping upgrade: ipython_genutils in c:\users\sndr\anaconda3\lib\site-packages (from traitlets>=4.2->nbconvert>=5.4.1->interpret) (0.2.0)
Requirement already satisfied, skipping upgrade: jsonschema!=2.5.0,>=2.4 in c:\users\sndr\anaconda3\lib\site-packages (from nbformat>=4.4->nbconvert>=5.4.1->interpret) (2.6.0)
Requirement already satisfied, skipping upgrade: webencodings in c:\users\sndr\anaconda3\lib\site-packages (from bleach->nbconvert>=5.4.1->interpret) (0.5.1)
Requirement already satisfied, skipping upgrade: cycler>=0.10 in c:\users\sndr\anaconda3\lib\site-packages (from matplotlib->SALib>=1.3.3->interpret) (0.10.0)
Requirement already satisfied, skipping upgrade: kiwisolver>=1.0.1 in c:\users\sndr\anaconda3\lib\site-packages (from matplotlib->SALib>=1.3.3->interpret) (1.0.1)
Requirement already satisfied, skipping upgrade: networkx>=1.8 in c:\users\sndr\anaconda3\lib\site-packages (from scikit-image>=0.12->lime>=0.1.1.33->interpret) (1.11)
Requirement already satisfied, skipping upgrade: pillow>=2.1.0 in c:\users\sndr\anaconda3\lib\site-packages (from scikit-image>=0.12->lime>=0.1.1.33->interpret) (5.0.0)
Requirement already satisfied, skipping upgrade: PyWavelets>=0.4.0 in c:\users\sndr\anaconda3\lib\site-packages (from scikit-image>=0.12->lime>=0.1.1.33->interpret) (0.5.2)
Requirement already satisfied, skipping upgrade: parso==0.1.* in c:\users\sndr\anaconda3\lib\site-packages (from jedi>=0.10->ipython>=7.4.0->interpret) (0.1.1)
Requirement already satisfied, skipping upgrade: pyzmq>=13 in c:\users\sndr\anaconda3\lib\site-packages (from jupyter-client->ipykernel>=5.1.0->interpret) (16.0.3)
Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in c:\users\sndr\anaconda3\lib\site-packages (from requests->plotly>=3.8.1->interpret) (3.0.4)
Requirement already satisfied, skipping upgrade: idna<2.7,>=2.5 in c:\users\sndr\anaconda3\lib\site-packages (from requests->plotly>=3.8.1->interpret) (2.6)
Requirement already satisfied, skipping upgrade: urllib3<1.23,>=1.21.1 in c:\users\sndr\anaconda3\lib\site-packages (from requests->plotly>=3.8.1->interpret) (1.22)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in c:\users\sndr\anaconda3\lib\site-packages (from requests->plotly>=3.8.1->interpret) (2019.3.9)
Building wheels for collected packages: skope-rules, dash, dash-table-experiments, dash-renderer, SALib, lime, dash-core-components, dash-html-components, dash-cytoscape, flask-compress, dash-table, backcall, retrying
  Building wheel for skope-rules (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\3e\8d\56\464f328ff3200c785626967ee39a6b2efc455469dab615f03e
  Building wheel for dash (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\fb\75\e5\278d80ca56f3c1d623565079cacf3db4e672948d34311e0c91
  Building wheel for dash-table-experiments (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\17\46\7c\936c2a123c17673d9f46ecc74e1692a118673009bc92c192ae
  Building wheel for dash-renderer (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\6f\33\33\6473598a2a280dcfe8507b020b66da25dafe063fff31bb28f6
  Building wheel for SALib (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\73\94\42\b5160b20f13581c0e7e4d9bc0afa77828900296f8bca82bafe
  Building wheel for lime (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\2f\8e\c1\c1cddd9cf8fbae812904fa5c84ef571e782891288d309d04c8
  Building wheel for dash-core-components (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\83\ac\bb\68cefc4f1e6ec359183f3d198cadbec07193b1e3087256a5a2
  Building wheel for dash-html-components (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\72\e5\cd\a82fd0f01affb14d3f3f19a19407f32a1845825603a7f9664b
  Building wheel for dash-cytoscape (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\32\bd\34\4c0a61c252c4bcee42ab4943e66e7c2d1f7809de90d4caf070
  Building wheel for flask-compress (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\96\32\88\a1f6d9dd3c29570ab3a8acc0d556b3b20abcf3c623c868ce0a
  Building wheel for dash-table (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\b9\7e\8a\1249b5961f59668eba0471800e618c47b4219f77e2887536bd
  Building wheel for backcall (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\98\b0\dd\29e28ff615af3dda4c67cab719dd51357597eabff926976b45
  Building wheel for retrying (setup.py) ... done
  Stored in directory: C:\Users\sndr\AppData\Local\pip\Cache\wheels\d7\a9\33\acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c
Successfully built skope-rules dash dash-table-experiments dash-renderer SALib lime dash-core-components dash-html-components dash-cytoscape flask-compress dash-table backcall retrying
jupyter-console 5.2.0 has requirement prompt_toolkit<2.0.0,>=1.0.0, but you'll have prompt-toolkit 2.0.9 which is incompatible.
datashader 0.6.9 has requirement dask[complete]>=0.18.0, but you'll have dask 0.16.1 which is incompatible.
Installing collected packages: zipp, importlib-metadata, pluggy, atomicwrites, pytest, greenlet, cffi, gevent, scipy, scikit-learn, skope-rules, flask-compress, retrying, plotly, dash-renderer, dash-core-components, dash-html-components, dash-table, dash, dash-table-experiments, SALib, lime, prompt-toolkit, backcall, ipython, shap, psutil, dash-cytoscape, pytest-runner, interpret
  Found existing installation: pluggy 0.6.0
    Uninstalling pluggy-0.6.0:
      Successfully uninstalled pluggy-0.6.0
  Found existing installation: pytest 3.3.2
    Uninstalling pytest-3.3.2:
      Successfully uninstalled pytest-3.3.2
  Found existing installation: greenlet 0.4.12
Cannot uninstall 'greenlet'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

C:\Users\sndr\Downloads>

error on show() method werkzeug.wrappers.json is not installed

hi, I am implementing the sample code but when I execute this code below:

from interpret import show

ebm_global = ebm.explain_global()
show(ebm_global)

I am receiveing following error

Example of an interaction term from GA2M (a.k.a. EBM)?

GAMs are non-linear terms per feature, combined in a linear way.
GA2Ms also include pairwise interactions, chosen in a heuristically efficient way with FAST.

If I use an explainable boosting classifier/regressor, how can I tell whether it considered interaction terms?

Can you document an example where interaction terms are used, including graphs?

Thanks.

Dependency on Matplotlib 2.1.0 in Linux

Hi ,

I'm trying to install on a linux docker. It seems there is a dependency on the older version of matplotlib and it does not install fine. The latest version of matplotlib 3.0.3 is installing fine. Can we please upgade the dependency or else please suggest me a workaround.

ERROR: Complete output from command python setup.py egg_info:
ERROR: IMPORTANT WARNING:
pkg-config is not installed.
matplotlib may not be able to find some of its dependencies
============================================================================
Edit setup.cfg to change the build options

BUILDING MATPLOTLIB
            matplotlib: yes [2.1.0]
                python: yes [3.7.3 | packaged by conda-forge | (default, Mar
                        27 2019, 23:01:00)  [GCC 7.3.0]]
              platform: yes [linux]

REQUIRED DEPENDENCIES AND EXTENSIONS
                 numpy: yes [version 1.16.3]
                   six: yes [using six version 1.12.0]
              dateutil: yes [using dateutil version 2.8.0]
backports.functools_lru_cache: yes [Not required]
          subprocess32: yes [Not required]
                  pytz: yes [using pytz version 2019.1]
                cycler: yes [using cycler version 0.10.0]
               tornado: yes [using tornado version 6.0.2]
             pyparsing: yes [using pyparsing version 2.4.0]
                libagg: yes [pkg-config information for 'libagg' could not
                        be found. Using local copy.]
              freetype: no  [The C/C++ header for freetype2 (ft2build.h)
                        could not be found.  You may need to install the
                        development package.]
                   png: no  [pkg-config information for 'libpng' could not
                        be found.]
                 qhull: yes [pkg-config information for 'libqhull' could not
                        be found. Using local copy.]

OPTIONAL SUBPACKAGES
           sample_data: yes [installing]
              toolkits: yes [installing]
                 tests: no  [skipping due to configuration]
        toolkits_tests: no  [skipping due to configuration]

OPTIONAL BACKEND EXTENSIONS
                macosx: no  [Mac OS-X only]
                qt5agg: no  [PySide2 not found; PyQt5 not found]
                qt4agg: no  [PySide not found; PyQt4 not found]
               gtk3agg: no  [Requires pygobject to be installed.]
             gtk3cairo: no  [Requires cairocffi or pycairo to be installed.]
                gtkagg: no  [Requires pygtk]
                 tkagg: yes [installing; run-time loading from Python Tcl /
                        Tk]
                 wxagg: no  [requires wxPython]
                   gtk: no  [Requires pygtk]
                   agg: yes [installing]
                 cairo: no  [cairocffi or pycairo not found]
             windowing: no  [Microsoft Windows only]

OPTIONAL LATEX DEPENDENCIES
                dvipng: no
           ghostscript: no
                 latex: yes [version 3.14159265]
               pdftops: no

OPTIONAL PACKAGE DATA
                  dlls: no  [skipping due to configuration]

============================================================================
                        * The following required packages can not be built:
                        * freetype, png * Try installing freetype with `apt-
                        * get install libfreetype6-dev` * Try installing png
                        * with `apt-get install libpng12-dev`
----------------------------------------

ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-tvjuo8p6/matplotlib/

Using interpretML in a python script

Hi!
I was wondering if it's possible to use the library in a python script. I've tried to do that for shap plots expecting for interpretML to produce a link to a dashboard:

shap = ShapKernel(predict_fn=bbc.predict, data=background_val, feature_names=feature_names) shap_local = shap.explain_local(most_probable.drop(['prediction','probabiltiy','target'],axis=1), most_probable.target, name='SHAP') show([shap_local])

The last line produces:
{'text/html': '\n<a href="http://127.0.0.1:7974/140466213906184/" target="_new">Open in new window</a><iframe src="http://127.0.0.1:7974/140466213906184/" width=100% height=800 frameBorder="0"></iframe>'} as in jupyter notebook.

However, when I open the link it says this site can't be reached.
Is there a way to call dashboards from the console or the library is to be used exclusively in jupyter?
Thanks!

adding verbose during training

Can we see performance of model on validation data during training, like we have in xgboost.

How to capture the URL being generated each time?

Dear Interpret Team,

How can I capture the URL being generated for the dashboard. I need to capture this URL and integrate with third party UI. If you can suggest an idea to overcome this technical hurdle, it would be of great help.

Rakesh

"GLIBC_2.14" error

I get "GLIBC_2.14" error when I run the below command. Is there a way to solve this without root access - in my conda env

my os:
Distributor ID: CentOS
Description: CentOS release 6.6 (Final)
Release: 6.6

from interpret.glassbox import ExplainableBoostingClassifier, LogisticRegression, ClassificationTree, DecisionListClassifier

ebm = ExplainableBoostingClassifier(random_state=seed)
ebm.fit(X_train, y_train)

/lib64/libc.so.6: version `GLIBC_2.14' not found

OSError Traceback (most recent call last)
in
----> 1 from interpret.glassbox import ExplainableBoostingClassifier, LogisticRegression, ClassificationTree, DecisionListClassifier
2
3 ebm = ExplainableBoostingClassifier(random_state=seed)
4 ebm.fit(X_train, y_train) #Works on dataframes and numpy arrays

~/anaconda/envs/fastai/lib/python3.7/site-packages/interpret/glassbox/init.py in
5 from .linear import LogisticRegression, LinearRegression # noqa: F401
6 from .skoperules import DecisionListClassifier # noqa: F401
----> 7 from .ebm.ebm import ExplainableBoostingClassifier # noqa: F401
8 from .ebm.ebm import ExplainableBoostingRegressor # noqa: F401

~/anaconda/envs/fastai/lib/python3.7/site-packages/interpret/glassbox/ebm/ebm.py in
5 from ...utils import perf_dict
6 from .utils import EBMUtils
----> 7 from .internal import NativeEBM
8 from ...utils import unify_data, autogen_schema
9 from ...api.base import ExplainerMixin

~/anaconda/envs/fastai/lib/python3.7/site-packages/interpret/glassbox/ebm/internal.py in
63
64
---> 65 Lib = load_library(debug=False)
66
67 # C-level interface

~/anaconda/envs/fastai/lib/python3.7/site-packages/interpret/glassbox/ebm/internal.py in load_library(debug)
59 is_debug = debug
60
---> 61 lib = ct.cdll.LoadLibrary(get_ebm_lib_path(debug=is_debug))
62 return lib
63

~/anaconda/envs/fastai/lib/python3.7/ctypes/init.py in LoadLibrary(self, name)
432
433 def LoadLibrary(self, name):
--> 434 return self._dlltype(name)
435
436 cdll = LibraryLoader(CDLL)

~/anaconda/envs/fastai/lib/python3.7/ctypes/init.py in init(self, name, mode, handle, use_errno, use_last_error)
354
355 if handle is None:
--> 356 self._handle = _dlopen(self._name, mode)
357 else:
358 self._handle = handle

OSError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/narjunan/anaconda/envs/fastai/lib/python3.7/site-packages/interpret/glassbox/ebm/../../lib/ebmcore_linux_x64.so)

Feature importance

Are the feature scores the same as the feature weights described in http://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf . Namely, is it the L2/l2 norm of the feature's or features' function in the $GA^2M$ framework?

requirements.txt doesn't have all the requirements

The README says:

pip install numpy scipy pyscaffold
pip install -U interpret

I am confused why the project requirements.txt (or equivalent) doesn't just also have numpy, scipy, pyscaffold, if they are .. well, required.

If they're already in the environment, they won't be installed, so that's harmless.

I see in setup.py they are listed separately, with the comment:

  # NOTE: Numpy here is a workaround to skope-rules' dependencies.

so I assume you have your reasons?

Feature importance with interactions

Thank you for giving a detailed explanation on feature importance on this page (#12). My question is what if the EBM model has interactions (say, I use the argument interactions = 10 when training the EBM model ExplainableBoostingRegressor, so it has 10 pairwise interactions).

In particular, for a given feature, if some interactions containing it are selected, would those be involved in the calculation of "how each train data point is scored by that feature"? Would this affect the global interpretation of the model, and/or the local interpretation?

Thank you.

Run by jupyter

It's not as issue but a note , hoping helpful for newegg :)

I run this on Win10 + Anaconda-Jupyter + Python3.7

Some trouble:
Cannot search & install interpret in Anaconda, so I install by pip, and include the lib in code


import sys
sys.path.append("D:\software\python\python37\Lib\site-packages") # my python3.7 location

import numpy as np
x_train = np.random.random((100, 20))
y_train = np.random.randint(2, size=(100, 1))

from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(x_train, y_train)
ebm_global = ebm.explain_global()

from interpret import show
from interpret import set_show_addr, get_show_addr
set_show_addr(('127.0.0.1', 7001)) # Will run on 127.0.0.1 at port 7001
show(ebm_global)

I got result in Jupyter, while make a try visit by brower with "http://127.0.0.1:7001/", page show me "Internal Server Error", and Jupyter show tips ( not a problem but just wondering what happened):

Traceback (most recent call last):
File "D:\hw\software\Anaconda3\lib\site-packages\gevent\pywsgi.py", line 976, in handle_one_response
self.run_application()
File "D:\hw\software\Anaconda3\lib\site-packages\gevent\pywsgi.py", line 923, in run_application
self.result = self.application(self.environ, self.start_response)
File "D:\software\python\python37\Lib\site-packages\interpret\visual\dashboard.py", line 187, in call
app = self.pool[ctx_id]
KeyError: 'favicon.ico'
2019-05-20T10:43:12Z {'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '59185', 'HTTP_HOST': '127.0.0.1:7001', (hidden keys: 23)} failed with KeyError

Show function while on the remote cloud

Sorry if i have any grammatical errors.(not a native speaker)

I've read this issue: #17

I think my problem is very similar, it might be some kind of server issues.

I have a virtual environment on amazon's aws ec2, opened jupyter notebook there, and connect the jupyter notebook on my local machine via ssh.

But when i use the show function(like the one in "Explaining Blackbox Classifiers.ipynb"), the plot will show "127.0.0.1 refused to connect".

Is there any way that i can still plot the interactive one on the cloud jupyter?
I can sucessfully plot an interactive one in local jupyter, but on the cloud it just fails.
Thank you for your attention!

[BUG] Unexpected Error Message in the notebook

When running the Interpretable Classification Methods notebook, if explain_global or explain_local is called on ExplainableBoostingClassifier without fitting the model first, the NotFittedError is not raised. Instead an AttributeError is raised.

Similarly, when running the Interpretable Regression Methods notebook, if explain_global or explain_local is called on ExplainableBoostingRegressor without fitting the model first, the NotFittedError is not raised. Instead an AttributeError is raised.

Issue in barplot

Hi,
In ebm.explain_global() in categorical feature plots (barplots) the x axis values calculated seems wrong to me. For categorical extra nan is coming in 'specific' 'names' list, this leads to the length mismatch between 'names' and 'scores' the last categorical value is being ignored from the plot.
And there is no null values in my data.
For example,
{'density': {'names': [nan, 'abc', 'efg'], 'scores': [66486, 118521]}, 'lower_bounds': [-2.246691779337516, 1.1945678479167394], 'names': [nan, 'abc', 'efg'], 'scores': [-2.136401943158369, 1.1984443228864698], 'type': 'univariate', 'upper_bounds': [-2.026112106979222, 1.2023207978562003]}

Saving and loading model from file

Hi,
Would it be possible to add functionality to save the model to disk and load a saved model from file?

Segmentation faults since interpret 0.1.11

Hi, thank you for your great work.

I just tried updating from 0.1.10 to 0.1.11 using pip and I am getting segmentation faults.
The issue does not seem to originate from a memory resource limitation (I have a 10 GB free memory margin when using 0.1.10) .
I was not able to diagnose further.

Platform : x86_64, Linux, ubuntu 16

Code extract: (works fine with 0.1.10)

_ebm = ExplainableBoostingRegressor(
    n_estimators=16,
    learning_rate=0.1,
    early_stopping_run_length=10,
    data_n_episodes=200,
    n_jobs=1, # same error with n_jobs=-2
    feature_names=_categorical_features + _numerical_features,
    feature_types= ['categorical'] * len(_categorical_features) + ['continuous'] * len(_numerical_features))

_ebm.fit(X, y)  # X: 108,728 x 39 numpy array

GDB trace:

# Segmentation fault (core dumped):

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
__GI___libc_free (mem=0x2) at malloc.c:2951
2951    malloc.c: No such file or directory.
(gdb) bt
#0  __GI___libc_free (mem=0x2) at malloc.c:2951
#1  0x00007ffed376128c in DataSetAttributeCombination::~DataSetAttributeCombination() ()
   from /home/ubuntu/miniconda3/envs/env/lib/python3.7/site-packages/interpret/glassbox/ebm/../../lib/lib_ebmcore_linux_x64.so
#2  0x00007ffed377508d in FreeTraining ()
   from /home/ubuntu/miniconda3/envs/env/lib/python3.7/site-packages/interpret/glassbox/ebm/../../lib/lib_ebmcore_linux_x64.so

Python logs tail: (lots of INFO and ERROR messages since this release !).

[...]
INFO: Entered GetBestModel: ebmTraining=0x555e1f55ad90, indexAttributeCombination=38
INFO: Exited GetBestModel 0x555e24b7e6f0
INFO: Deallocation start
INFO: Entered FreeTraining: ebmTraining=0x555e1f55ad90
INFO: Entered ~EbmTrainingState
INFO: ~EbmTrainingState identified as regression type
INFO: Entered ~CachedTrainingThreadResources
INFO: Exited ~CachedTrainingThreadResources
INFO: Entered SamplingWithReplacement::FreeSamplingSets
INFO: Entered ~SamplingWithReplacement
INFO: Exited ~SamplingWithReplacement
INFO: Exited SamplingWithReplacement::FreeSamplingSets
INFO: Entered ~DataSetAttributeCombination

Segmentation fault (core dumped)

(Full log here: ebm_error_log.txt ).

amazing work

(ok not an issue)
really :) had to properly register it
huge congrats

worked without issues on windows also

Doing some treatment for missing values

Can we add some random values/treat them as ""null"" for missing data. So that we can avoid error in during data exploration using :
hist = ClassHistogram().explain_data(X_train, y_train, name = 'TrainData').

Required by libstdc++.so.6

Found Error：
libstdc++.so.6: version CXXABI_1.3.8

I got libstdc++.so.6 but do not support CXXABI_1.3.8, as below
`CXXABI_1.3

CXXABI_1.3.1
CXXABI_1.3.2
CXXABI_1.3.3
CXXABI_1.3.4
CXXABI_1.3.5
CXXABI_1.3.6
CXXABI_1.3.7
CXXABI_TM_1
`

when try the base example：

 from interpret.glassbox import ExplainableBoostingClassifier
 ebm = ExplainableBoostingClassifier()
 ebm.fit(X_train, y_train)

My Server:
Centos7.5 ( Aliyun, Qcloud both tried)
gcc/g++ 4.8.5
libstdc++.so.6.0.19

Solution：
Copy from somewhere a libstdc++.so.6.0.24 , an make a sofelink of libstdc++.so.6

Of course it‘s not problem of interpretML，but still a trouble for newbe .

Build Error

I'm trying to build a local version of demo.
Using VS 2017 and get an error during the build:

Severity Code Description Project File Line Suppression State
Error MSB8020 The build tools for v142 (Platform Toolset = 'v142') cannot be found. To build using the v142 build tools, please install v142 build tools. Alternatively, you may upgrade to the current Visual Studio tools by selecting the Project menu or right-click the solution, and then selecting "Retarget solution". ebmcore C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.Cpp.Platform.targets 57

I tried a "Retarget solution" with no success.

OSError: exception: access violation reading 0x000002ABA0141000

Hi dear InterpretML Team,

I'm having this issue: OSError: exception: access violation reading 0x000002ABA0141000. This issue is with n_jobs=-1, when I set n_jobs=1 I got the same issue but access violation reading 0x0000026108C46000. I have no idea how to fix it. Seems that it comes from joblib! Here is what I'm trying to do on Jupyter notebook:

ebm = ExplainableBoostingClassifier(n_jobs=-1)
ebm.fit(x_train, y_train)
preds_interpret = ebm.predict_proba(x_test)

And here the traceback:

Traceback (most recent call last):
File "C:\anaconda\lib\site-packages\joblib\externals\loky\process_executor.py", line 418, in _process_worker
r = call_item()
File "C:\anaconda\lib\site-packages\joblib\externals\loky\process_executor.py", line 272, in call
return self.fn(*self.args, **self.kwargs)
File "C:\anaconda\lib\site-packages\joblib_parallel_backends.py", line 567, in call
return self.func(*args, **kwargs)
File "C:\anaconda\lib\site-packages\joblib\parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "C:\anaconda\lib\site-packages\joblib\parallel.py", line 225, in
for func, args, kwargs in self.items]
File "C:\anaconda\lib\site-packages\interpret\glassbox\ebm\ebm.py", line 789, in train_model
return estimator.fit(X, y)
File "C:\anaconda\lib\site-packages\interpret\glassbox\ebm\ebm.py", line 386, in fit
validation_scores=None,
File "C:\anaconda\lib\site-packages\interpret\glassbox\ebm\internal.py", line 376, in init
self._initialize_training_classification()
File "C:\anaconda\lib\site-packages\interpret\glassbox\ebm\internal.py", line 461, in _initialize_training_classification
self.num_inner_bags,
OSError: exception: access violation reading 0x000002ABA0141000

Integration into a Flask application

How can we embed the dashboard into an existing Flask app ?

interpretml / interpret Goto Github PK

interpret's Introduction

InterpretML

In the beginning machines learned in darkness, and data scientists struggled in the void to explain them.

Let there be light.

Installation

Introducing the Explainable Boosting Machine (EBM)

Supported Techniques

Train a glassbox model

Acknowledgements

Videos

External links

Papers that use or compare EBMs

Books that cover EBMs

External tools

Contact us

If a tree fell in your random forest, would anyone notice?

interpret's People

Contributors

Stargazers

Watchers

Forkers

interpret's Issues

Recommend Projects

Recommend Topics

Recommend Org