interpretml / gam-changer Goto Github PK
View Code? Open in Web Editor NEWEditing machine learning models to reflect human knowledge and values
Home Page: https://interpret.ml/gam-changer
License: MIT License
Editing machine learning models to reflect human knowledge and values
Home Page: https://interpret.ml/gam-changer
License: MIT License
Hi there,
I see in the new version of interpretml (0.4.0), they just added new link functions for regression.
Will these also be added to gam-changer?
Kind Regards
Klaas
import gamchanger as gc
from json import dump
# Extract model weights
model_data = gc.get_model_data(ebm)
# Generate sample data
sample_data = gc.get_sample_data(ebm, x_test, y_test)
# Save to `model.json` and `sample.json`
dump(model_data, open('./model.json', 'w'))
dump(sample_data, open('./sample.json', 'w'))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-81-f99310b1230c> in <module>
3
4 # Extract model weights
----> 5 model_data = gc.get_model_data(ebm)
6
7 # Generate sample data
/conda/envs/notebook/lib/python3.6/site-packages/gamchanger/gamchanger.py in get_model_data(ebm, resort_categorical)
71 score_range = [np.inf, -np.inf]
72
---> 73 for i in range(len(ebm.feature_names)):
74 cur_feature = {}
75 cur_feature["name"] = ebm.feature_names[i]
TypeError: object of type 'NoneType' has no len()
The object ebm
does not have the feature_names
sub one.
Which version of gamchanger
I can degrade?
%watermark -iv
gamchanger 0.1.10
numpy 1.18.0
interpret 0.3.0
pandas 1.0.4
Hi all,
When using an EBM with no missing values (I saw this was an issue before), and feature types of 'continuous' and 'nominal' only. I see no information in my metric panel when passing sample_data.json.
I have to subset my data as loading all 200k samples of my test dataframe causes an 'Out-of-memory- error in my google chrome browser.
import gamchanger as gc
from json import dump
model_data = gc.get_model_data(ebm)
sample_data = gc.get_sample_data(ebm, X_test.tail(2000), y_test.tail(2000))
model.json
and sample.json
dump(model_data, open('./model.json', 'w'))
dump(sample_data, open('./sample.json', 'w'))
import gamchanger as gc
gc.visualize(ebm, model_data=model_data, sample_data=sample_data)
In several domains, Contribution Scores/Shape Functions have real world meaning. It would be useful to visualise these learned shape functions on their correct scale.
For example, when using an EBM to predict the effects of a individual Air Handler Unit's (AHU) heating valve on the energy used by an attached boiler system, these shape fucntions approximate the efficiency of the AHU. E.g. for each 1% increase in valve openness, 0.2kWh increase in energy usage. However, as you can see, the predicted values are in kWh (energy). In this domain, negative energy doesn't make sense, and then using 0% valve openess, we would expect 0 extra heat usage. Therefore, the ability to rescale the shape function so that it is non-negative (in this case) would be really useful for stateholder understanding and model interpretation.
The demo gam-changer-adult.ipynb
breaks because of the bug below:
[/usr/local/lib/python3.8/dist-packages/gamchanger/gamchanger.py](https://localhost:xxxx/#) in get_model_data(ebm, resort_categorical)
71 score_range = [np.inf, -np.inf]
72
---> 73 for i in range(len(ebm.feature_names)):
74 cur_feature = {}
75 cur_feature["name"] = ebm.feature_names[i]
TypeError: object of type 'NoneType' has no len()
A potential quick fix would be to change ebm.feature_names
to ebm.term_names_
import gamchanger as gc
When I run model_data = gc.get_model_data(ebm)
I get the following error. I'm already using gam changer version 0.1.13
. I trained my EBM model with version 0.3.2
I know that Interpret has changed the names of some parameters, but I thought the latest version of your package was up to date with these changes.
After training an EBM Regressor and manually specifying the datatypes to 'nominal or 'ordinal' ( as categorical is not supported) I cannot create a gc, even when manually trying to change the feature_types. See below example for the MPG dataset.
ebm = ExplainableBoostingRegressor(
feature_names=['displacement', 'horsepower', 'weight', 'acceleration','origin', 'cylinders', 'model_year'
],
feature_types=['continuous', 'continuous','continuous','continuous','nominal','ordinal','ordinal'],
random_state=42,
n_jobs=-1
)
ebm.fit(X_train, y_train)
ebm.feature_types = ['continuous', 'continuous', 'continuous', 'continuous', 'categorical', 'categorical', 'categorical']
gc only seems to work with feature_type='none'.
Is there a workaround or fix?
I've found a bug where GAMChanger sometimes doesn't populate the 'metrics' / 'feature' / 'history' panel. It seems that when this happens, the GAMChanger interface has failed to load the validation samples, because the status bar says "0/0 validation samples selected".
This seems to occur sometimes based on the data that is provided, and might have something to do with missing data points, but I'm struggling to figure out exactly what the cause is.
Below is the smallest reproducing example I can come up with.
See following comment for a better MWE.
import pandas as pd
import gamchanger as gc
from interpret.glassbox import ExplainableBoostingRegressor
# Works
X = pd.read_csv('demo-X-succeed.csv')
y = pd.read_csv('demo-y-succeed.csv')['OrderedFractionOfEstate']
# Doesn't work
#X = pd.read_csv('demo-X-fail.csv')
#y = pd.read_csv('demo-y-fail.csv')['OrderedFractionOfEstate']
ebm = ExplainableBoostingRegressor(interactions=False)
ebm.fit(X, y)
gc.visualize(ebm, X, y)
I've attached the CSV files, which differ in that the 'succeed' files have a single extra data point. That is, when loading 'demo-[X|y]-fail.csv' the GamChanger interface loads, but the side panel doesn't populate (unexpected behaviour). When loading 'demo-[X|y]-succeed.csv', the GamChanger interface loads and the side panel populates the metrics as expected.
demo-X-fail.csv
demo-X-succeed.csv
demo-y-fail.csv
demo-y-succeed.csv
When using GAM Changer on the outer/end bins, you cannot split these bins into extra bins. Also, as you cannot split this bin, it seems you cannot make a steped interpolation (either monotonic increasing or decreasing).
Finally, it would be really useful to be able to increase the domain of the EBM past the upper (lower) most value in the training domain. These are likely to be the times expert knowledge is going to be most useful, when the model cannot "see" the data.
P.s. Amazing work. This is soooo useful ⭐⭐⭐⭐⭐
First, thanks for adding this module - it is a great idea and a great contribution to transparent models.
When running your code there seems to be an old (I think) mapping in line 430 in gamchanger.py
# Use the ebm's mapping to map level name to bin index ebm_col_mapping = ebm_copy.pair_preprocessor_.col_mapping_
I think this relates to earlier versions of EBM
When I change this line to
ebm_col_mapping = ebm_copy.feature_groups_
The adjusted model is properly exported
Hello,
There is a check for nans in 0.1.4 (line 268 of gamchanger.py
) that breaks on string values. If I use version 0.1.3, the notebook still works. I think columns of string type should be encoded before this check.
Thanks,
Jessica
[/usr/local/lib/python3.7/dist-packages/gamchanger/gamchanger.py](https://localhost:8080/#) in get_sample_data(ebm, x_test, y_test, resort_categorical)
266
267 # Drop all rows with any NA values
--> 268 if np.isnan(x_test_copy).any():
269 na_row_indexes = np.isnan(x_test_copy).any(axis=1)
270 x_test_copy = x_test_copy[~na_row_indexes]
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I don't know if this is a bug or if the zoom is just limited, but when I try to modify the function for some features that are long-tailed distributed, the points end up overlapping, and I cannot continue using the interface. 🤔
When displaying the explanations from the interpret package, this doesn't happen because plotly has an 'infinite' zoom-in.
Hello,
I encountered an issue when trying to use the get_model_data
function to extract model data from an ExplainableBoostingClassifier
instance with monotonized features.
Steps to Reproduce:
ExplainableBoostingClassifier
instance.monotonize
method for one of the features.get_model_data
.Expected Behavior:
The function should return the model data successfully.
Actual Behavior:
The function fails with a TypeError
in this line, and upon further investigation, I noticed that the entry in ebm.standard_deviations_
for the feature I applied monotonize
to is being set to None
.
Workaround:
Currently, I'm manually replacing the None
value with a placeholder value, but I believe this should be handled gracefully by the library itself.
Please let me know if any further information is required or if there's a known solution to this problem.
Thanks in advance for your help.
Best,
Krzysztof
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.