interpretml / gam-changer Goto Github PK

View Code? Open in Web Editor NEW

118.0 118.0 11.0 19.14 MB

Editing machine learning models to reflect human knowledge and values

Home Page: https://interpret.ml/gam-changer

License: MIT License

JavaScript 49.02% Makefile 0.30% Python 4.10% HTML 0.19% CSS 0.12% Shell 0.04% Svelte 45.24% SCSS 0.97%

interpretability machine-learning visualization

gam-changer's People

Contributors

Stargazers

Watchers

Forkers

yanlirock statmixedml jtang-qhzx blewy radovankavicky gapdata mdturp alokramteke wesleyclode jc855 lukedex

gam-changer's Issues

Support for new link fuctions

Hi there,
I see in the new version of interpretml (0.4.0), they just added new link functions for regression.
Will these also be added to gam-changer?

Kind Regards
Klaas

AttributeError: 'ExplainableBoostingClassifier' object has no attribute 'histogram_counts_'

gc.visualize(ebm, X_sample, y_sample) results in an attribute error as shown in the following image:

This was ran on the provided Binder and Google Colab demo notebooks

inconsistent with package interpret

import gamchanger as gc
from json import dump

# Extract model weights
model_data = gc.get_model_data(ebm)

# Generate sample data
sample_data = gc.get_sample_data(ebm, x_test, y_test)

# Save to `model.json` and `sample.json`
dump(model_data, open('./model.json', 'w'))
dump(sample_data, open('./sample.json', 'w'))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-81-f99310b1230c> in <module>
      3 
      4 # Extract model weights
----> 5 model_data = gc.get_model_data(ebm)
      6 
      7 # Generate sample data

/conda/envs/notebook/lib/python3.6/site-packages/gamchanger/gamchanger.py in get_model_data(ebm, resort_categorical)
     71     score_range = [np.inf, -np.inf]
     72 
---> 73     for i in range(len(ebm.feature_names)):
     74         cur_feature = {}
     75         cur_feature["name"] = ebm.feature_names[i]

TypeError: object of type 'NoneType' has no len()

The object ebm does not have the feature_names sub one.

Which version of gamchanger I can degrade?

%watermark -iv

gamchanger 0.1.10
numpy      1.18.0
interpret  0.3.0
pandas     1.0.4

Empty Metric Panel

Hi all,

When using an EBM with no missing values (I saw this was an issue before), and feature types of 'continuous' and 'nominal' only. I see no information in my metric panel when passing sample_data.json.

I have to subset my data as loading all 200k samples of my test dataframe causes an 'Out-of-memory- error in my google chrome browser.

import gamchanger as gc
from json import dump

Extract model weights

model_data = gc.get_model_data(ebm)

Generate sample data

sample_data = gc.get_sample_data(ebm, X_test.tail(2000), y_test.tail(2000))

Save to `model.json` and `sample.json`

dump(model_data, open('./model.json', 'w'))
dump(sample_data, open('./sample.json', 'w'))

Load GAM Changer with the model and sample data

import gamchanger as gc
gc.visualize(ebm, model_data=model_data, sample_data=sample_data)

Option to make the Contribution Scores more Interpretable

In several domains, Contribution Scores/Shape Functions have real world meaning. It would be useful to visualise these learned shape functions on their correct scale.

For example, when using an EBM to predict the effects of a individual Air Handler Unit's (AHU) heating valve on the energy used by an attached boiler system, these shape fucntions approximate the efficiency of the AHU. E.g. for each 1% increase in valve openness, 0.2kWh increase in energy usage. However, as you can see, the predicted values are in kWh (energy). In this domain, negative energy doesn't make sense, and then using 0% valve openess, we would expect 0 extra heat usage. Therefore, the ability to rescale the shape function so that it is non-negative (in this case) would be really useful for stateholder understanding and model interpretation.

Bug in gamchanger.py

The demo gam-changer-adult.ipynb breaks because of the bug below:

[/usr/local/lib/python3.8/dist-packages/gamchanger/gamchanger.py](https://localhost:xxxx/#) in get_model_data(ebm, resort_categorical)
     71     score_range = [np.inf, -np.inf]
     72 
---> 73     for i in range(len(ebm.feature_names)):
     74         cur_feature = {}
     75         cur_feature["name"] = ebm.feature_names[i]

TypeError: object of type 'NoneType' has no len()

A potential quick fix would be to change ebm.feature_names to ebm.term_names_

AttributeError: 'ExplainableBoostingClassifier' object has no attribute 'histogram_weights_'

import gamchanger as gc
When I run model_data = gc.get_model_data(ebm) I get the following error. I'm already using gam changer version 0.1.13. I trained my EBM model with version 0.3.2
I know that Interpret has changed the names of some parameters, but I thought the latest version of your package was up to date with these changes.

Incompatibility with ExplainableBoostingRegressor: unsupported feature type 'ordinal'

After training an EBM Regressor and manually specifying the datatypes to 'nominal or 'ordinal' ( as categorical is not supported) I cannot create a gc, even when manually trying to change the feature_types. See below example for the MPG dataset.

ebm = ExplainableBoostingRegressor(
    feature_names=['displacement', 'horsepower', 'weight', 'acceleration','origin', 'cylinders', 'model_year'
       ],
    feature_types=['continuous', 'continuous','continuous','continuous','nominal','ordinal','ordinal'],
    random_state=42,
    n_jobs=-1
)
ebm.fit(X_train, y_train)
ebm.feature_types = ['continuous', 'continuous', 'continuous', 'continuous', 'categorical', 'categorical', 'categorical']

gc only seems to work with feature_type='none'.

Is there a workaround or fix?

GAMChanger fails to load data in some cases

I've found a bug where GAMChanger sometimes doesn't populate the 'metrics' / 'feature' / 'history' panel. It seems that when this happens, the GAMChanger interface has failed to load the validation samples, because the status bar says "0/0 validation samples selected".

This seems to occur sometimes based on the data that is provided, and might have something to do with missing data points, but I'm struggling to figure out exactly what the cause is.

~~Below is the smallest reproducing example I can come up with.~~

See following comment for a better MWE.

import pandas as pd
import gamchanger as gc
from interpret.glassbox import ExplainableBoostingRegressor

# Works
X = pd.read_csv('demo-X-succeed.csv')
y = pd.read_csv('demo-y-succeed.csv')['OrderedFractionOfEstate']

# Doesn't work
#X = pd.read_csv('demo-X-fail.csv')
#y = pd.read_csv('demo-y-fail.csv')['OrderedFractionOfEstate']

ebm = ExplainableBoostingRegressor(interactions=False)
ebm.fit(X, y)

gc.visualize(ebm, X, y)

I've attached the CSV files, which differ in that the 'succeed' files have a single extra data point. That is, when loading 'demo-[X|y]-fail.csv' the GamChanger interface loads, but the side panel doesn't populate (unexpected behaviour). When loading 'demo-[X|y]-succeed.csv', the GamChanger interface loads and the side panel populates the metrics as expected.

demo-X-fail.csv
demo-X-succeed.csv
demo-y-fail.csv
demo-y-succeed.csv

End bins cannot be interpolated, extrapolated or split

When using GAM Changer on the outer/end bins, you cannot split these bins into extra bins. Also, as you cannot split this bin, it seems you cannot make a steped interpolation (either monotonic increasing or decreasing).

Finally, it would be really useful to be able to increase the domain of the EBM past the upper (lower) most value in the training domain. These are likely to be the times expert knowledge is going to be most useful, when the model cannot "see" the data.

P.s. Amazing work. This is soooo useful ⭐⭐⭐⭐⭐

pair_preprocessor_ is None

First, thanks for adding this module - it is a great idea and a great contribution to transparent models.
When running your code there seems to be an old (I think) mapping in line 430 in gamchanger.py
# Use the ebm's mapping to map level name to bin index ebm_col_mapping = ebm_copy.pair_preprocessor_.col_mapping_

I think this relates to earlier versions of EBM
When I change this line to
ebm_col_mapping = ebm_copy.feature_groups_
The adjusted model is properly exported

Version 0.1.4 breaks gam-changer-adult.ipynb notebook

Hello,

There is a check for nans in 0.1.4 (line 268 of gamchanger.py) that breaks on string values. If I use version 0.1.3, the notebook still works. I think columns of string type should be encoded before this check.

Thanks,
Jessica

[/usr/local/lib/python3.7/dist-packages/gamchanger/gamchanger.py](https://localhost:8080/#) in get_sample_data(ebm, x_test, y_test, resort_categorical)
    266 
    267     # Drop all rows with any NA values
--> 268     if np.isnan(x_test_copy).any():
    269         na_row_indexes = np.isnan(x_test_copy).any(axis=1)
    270         x_test_copy = x_test_copy[~na_row_indexes]

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Insufficient Zoom

I don't know if this is a bug or if the zoom is just limited, but when I try to modify the function for some features that are long-tailed distributed, the points end up overlapping, and I cannot continue using the interface. 🤔

When displaying the explanations from the interpret package, this doesn't happen because plotly has an 'infinite' zoom-in.

``get_model_data`` fails when trying to extract model data with monotonized features

Hello,

I encountered an issue when trying to use the get_model_data function to extract model data from an ExplainableBoostingClassifier instance with monotonized features.

Steps to Reproduce:

Fit an ExplainableBoostingClassifier instance.
Apply the monotonize method for one of the features.
Attempt to extract the model data using get_model_data.

Expected Behavior:
The function should return the model data successfully.

Actual Behavior:
The function fails with a TypeError in this line, and upon further investigation, I noticed that the entry in ebm.standard_deviations_ for the feature I applied monotonize to is being set to None.

Workaround:
Currently, I'm manually replacing the None value with a placeholder value, but I believe this should be handled gracefully by the library itself.

Please let me know if any further information is required or if there's a known solution to this problem.

Thanks in advance for your help.

Best,
Krzysztof