Giter Club home page Giter Club logo

gam-changer's People

Contributors

harsha-nori avatar interpret-ml avatar wmeints avatar xiaohk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gam-changer's Issues

pair_preprocessor_ is None

First, thanks for adding this module - it is a great idea and a great contribution to transparent models.
When running your code there seems to be an old (I think) mapping in line 430 in gamchanger.py
# Use the ebm's mapping to map level name to bin index ebm_col_mapping = ebm_copy.pair_preprocessor_.col_mapping_

I think this relates to earlier versions of EBM
When I change this line to
ebm_col_mapping = ebm_copy.feature_groups_
The adjusted model is properly exported

Version 0.1.4 breaks gam-changer-adult.ipynb notebook

Hello,

There is a check for nans in 0.1.4 (line 268 of gamchanger.py) that breaks on string values. If I use version 0.1.3, the notebook still works. I think columns of string type should be encoded before this check.

Thanks,
Jessica

[/usr/local/lib/python3.7/dist-packages/gamchanger/gamchanger.py](https://localhost:8080/#) in get_sample_data(ebm, x_test, y_test, resort_categorical)
    266 
    267     # Drop all rows with any NA values
--> 268     if np.isnan(x_test_copy).any():
    269         na_row_indexes = np.isnan(x_test_copy).any(axis=1)
    270         x_test_copy = x_test_copy[~na_row_indexes]

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Support for new link fuctions

Hi there,
I see in the new version of interpretml (0.4.0), they just added new link functions for regression.
Will these also be added to gam-changer?

Kind Regards
Klaas

Insufficient Zoom

I don't know if this is a bug or if the zoom is just limited, but when I try to modify the function for some features that are long-tailed distributed, the points end up overlapping, and I cannot continue using the interface. 🤔

When displaying the explanations from the interpret package, this doesn't happen because plotly has an 'infinite' zoom-in.

inconsistent with package interpret

import gamchanger as gc
from json import dump

# Extract model weights
model_data = gc.get_model_data(ebm)

# Generate sample data
sample_data = gc.get_sample_data(ebm, x_test, y_test)

# Save to `model.json` and `sample.json`
dump(model_data, open('./model.json', 'w'))
dump(sample_data, open('./sample.json', 'w'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-81-f99310b1230c> in <module>
      3 
      4 # Extract model weights
----> 5 model_data = gc.get_model_data(ebm)
      6 
      7 # Generate sample data

/conda/envs/notebook/lib/python3.6/site-packages/gamchanger/gamchanger.py in get_model_data(ebm, resort_categorical)
     71     score_range = [np.inf, -np.inf]
     72 
---> 73     for i in range(len(ebm.feature_names)):
     74         cur_feature = {}
     75         cur_feature["name"] = ebm.feature_names[i]

TypeError: object of type 'NoneType' has no len()

The object ebm does not have the feature_names sub one.

Which version of gamchanger I can degrade?

%watermark -iv

gamchanger 0.1.10
numpy      1.18.0
interpret  0.3.0
pandas     1.0.4

Empty Metric Panel

Hi all,

When using an EBM with no missing values (I saw this was an issue before), and feature types of 'continuous' and 'nominal' only. I see no information in my metric panel when passing sample_data.json.

I have to subset my data as loading all 200k samples of my test dataframe causes an 'Out-of-memory- error in my google chrome browser.

import gamchanger as gc
from json import dump

Extract model weights

model_data = gc.get_model_data(ebm)

Generate sample data

sample_data = gc.get_sample_data(ebm, X_test.tail(2000), y_test.tail(2000))

Save to model.json and sample.json

dump(model_data, open('./model.json', 'w'))
dump(sample_data, open('./sample.json', 'w'))

Load GAM Changer with the model and sample data

import gamchanger as gc
gc.visualize(ebm, model_data=model_data, sample_data=sample_data)

image

End bins cannot be interpolated, extrapolated or split

When using GAM Changer on the outer/end bins, you cannot split these bins into extra bins. Also, as you cannot split this bin, it seems you cannot make a steped interpolation (either monotonic increasing or decreasing).

Finally, it would be really useful to be able to increase the domain of the EBM past the upper (lower) most value in the training domain. These are likely to be the times expert knowledge is going to be most useful, when the model cannot "see" the data.

P.s. Amazing work. This is soooo useful ⭐⭐⭐⭐⭐

Incompatibility with ExplainableBoostingRegressor: unsupported feature type 'ordinal'

After training an EBM Regressor and manually specifying the datatypes to 'nominal or 'ordinal' ( as categorical is not supported) I cannot create a gc, even when manually trying to change the feature_types. See below example for the MPG dataset.

ebm = ExplainableBoostingRegressor(
    feature_names=['displacement', 'horsepower', 'weight', 'acceleration','origin', 'cylinders', 'model_year'
       ],
    feature_types=['continuous', 'continuous','continuous','continuous','nominal','ordinal','ordinal'],
    random_state=42,
    n_jobs=-1
)
ebm.fit(X_train, y_train)
ebm.feature_types = ['continuous', 'continuous', 'continuous', 'continuous', 'categorical', 'categorical', 'categorical']

gc only seems to work with feature_type='none'.

Is there a workaround or fix?

``get_model_data`` fails when trying to extract model data with monotonized features

Hello,

I encountered an issue when trying to use the get_model_data function to extract model data from an ExplainableBoostingClassifier instance with monotonized features.

Steps to Reproduce:

  1. Fit an ExplainableBoostingClassifier instance.
  2. Apply the monotonize method for one of the features.
  3. Attempt to extract the model data using get_model_data.

Expected Behavior:
The function should return the model data successfully.

Actual Behavior:
The function fails with a TypeError in this line, and upon further investigation, I noticed that the entry in ebm.standard_deviations_ for the feature I applied monotonize to is being set to None.

Workaround:
Currently, I'm manually replacing the None value with a placeholder value, but I believe this should be handled gracefully by the library itself.

Please let me know if any further information is required or if there's a known solution to this problem.

Thanks in advance for your help.

Best,
Krzysztof

Option to make the Contribution Scores more Interpretable

In several domains, Contribution Scores/Shape Functions have real world meaning. It would be useful to visualise these learned shape functions on their correct scale.

For example, when using an EBM to predict the effects of a individual Air Handler Unit's (AHU) heating valve on the energy used by an attached boiler system, these shape fucntions approximate the efficiency of the AHU. E.g. for each 1% increase in valve openness, 0.2kWh increase in energy usage. However, as you can see, the predicted values are in kWh (energy). In this domain, negative energy doesn't make sense, and then using 0% valve openess, we would expect 0 extra heat usage. Therefore, the ability to rescale the shape function so that it is non-negative (in this case) would be really useful for stateholder understanding and model interpretation.

GAMChanger fails to load data in some cases

I've found a bug where GAMChanger sometimes doesn't populate the 'metrics' / 'feature' / 'history' panel. It seems that when this happens, the GAMChanger interface has failed to load the validation samples, because the status bar says "0/0 validation samples selected".

This seems to occur sometimes based on the data that is provided, and might have something to do with missing data points, but I'm struggling to figure out exactly what the cause is.

Below is the smallest reproducing example I can come up with.

See following comment for a better MWE.

import pandas as pd
import gamchanger as gc
from interpret.glassbox import ExplainableBoostingRegressor

# Works
X = pd.read_csv('demo-X-succeed.csv')
y = pd.read_csv('demo-y-succeed.csv')['OrderedFractionOfEstate']

# Doesn't work
#X = pd.read_csv('demo-X-fail.csv')
#y = pd.read_csv('demo-y-fail.csv')['OrderedFractionOfEstate']

ebm = ExplainableBoostingRegressor(interactions=False)
ebm.fit(X, y)

gc.visualize(ebm, X, y)

I've attached the CSV files, which differ in that the 'succeed' files have a single extra data point. That is, when loading 'demo-[X|y]-fail.csv' the GamChanger interface loads, but the side panel doesn't populate (unexpected behaviour). When loading 'demo-[X|y]-succeed.csv', the GamChanger interface loads and the side panel populates the metrics as expected.

demo-X-fail.csv
demo-X-succeed.csv
demo-y-fail.csv
demo-y-succeed.csv

Bug in gamchanger.py

The demo gam-changer-adult.ipynb breaks because of the bug below:

[/usr/local/lib/python3.8/dist-packages/gamchanger/gamchanger.py](https://localhost:xxxx/#) in get_model_data(ebm, resort_categorical)
     71     score_range = [np.inf, -np.inf]
     72 
---> 73     for i in range(len(ebm.feature_names)):
     74         cur_feature = {}
     75         cur_feature["name"] = ebm.feature_names[i]

TypeError: object of type 'NoneType' has no len()

A potential quick fix would be to change ebm.feature_names to ebm.term_names_

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.