Giter Club home page Giter Club logo

eli5's People

Contributors

ashwinb-hat avatar guillemgsubies avatar ivanprado avatar jnothman avatar kmike avatar krkd avatar lopuhin avatar mehaase avatar rg2410 avatar rmax avatar teabolt avatar zzz4zzz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eli5's Issues

html features: preserve whitespaces

Features with whitespaces in front get these whitespaces removed in HTML.

Compare:

+2.837  spa 
+2.805   spa

and

2016-10-21 16 59 04

I think whitespaces should be replaced with   for HTML display. It could also make sense to use another background for text, in order to show whitespaces in the end.

It is hard to customize formatting in IPython notebook

Currently in order to change formatting options in IPython notebook user has to do something like this:

from IPython.display import HTML
expl = explain_weights(clf, vec=fe, top=20)
HTML(format_as_html(expl, highlight_spaces=False, horizontal_layout=False))

It'd be nice to reduce it to a one-liner.

Text highlighting: should we preserve density?

When highlighting a feature, we can highlight it regardless of length (currently in master), or try to preserve density, so coloring longer feature with a less intense color. I tried that second approach in preserve-density branch, here are some screenshots with master behaviour on top (links to notebooks: https://github.com/TeamHG-Memex/eli5/blob/preserve-density/notebooks/explain_text_prediction.ipynb for words and https://github.com/TeamHG-Memex/eli5/blob/preserve-density/notebooks/explain_text_prediction_char.ipynb for chars).

2016-10-21 11 28 54

2016-10-21 11 31 22

drop scikit-learn 0.17.x support

I tried to add tests for scikit-learn 0.17, but it turns out compatibility shims in eli5.lime don't work - e.g. KFold has different API. What do you think about dropping scikit-learn 0.17 support, and supporting only 0.18.x? //cc @lopuhin

Extra white borders in html table for feature importances

At least when the table has no extra styles. Reproducing:

py.test tests/test_sklearn_explain_weights.py::test_explain_random_forest -s
open .html/test_sklearn_explain_weights_test_explain_random_forest_RandomForestClassifier.html

2016-12-14 18 52 34

TODO:

  • check weights table styles
  • check styles in ipython notebook

defer generating dummy feature names in FeatureUnhasher

I think FeatureUnhasher.get_feature_names should have an option to use nan / None as feature names instead of generated FEATURE[%d] string names. Creating all these string is the slowest part of this code, and it looks unnecessary because printing/formatting code can easily generate missing feature names itself.

Unstable test test_lime_utils.py::test_fit_proba

https://travis-ci.org/TeamHG-Memex/eli5/jobs/173112065 - I think this is the same failure I already saw, I added random_state but it did not help:

=================================== FAILURES ===================================
________________________________ test_fit_proba ________________________________
    def test_fit_proba():
        X = np.array([
            [0.0, 0.8],
            [0.0, 0.5],
            [1.0, 0.1],
            [0.9, 0.2],
            [0.7, 0.3],
        ])
        y_proba = np.array([
            [0.0, 1.0],
            [0.1, 0.9],
            [1.0, 0.0],
            [0.55, 0.45],
            [0.4, 0.6],
        ])
        y_bin = y_proba.argmax(axis=1)
    
        # fit on binary labels
        clf = LogisticRegression(C=10, random_state=42)
        clf.fit(X, y_bin)
        y_pred = clf.predict_proba(X)[:,1]
        mae = mean_absolute_error(y_proba[:,1], y_pred)
        print(y_pred, mae)
    
        # fit on probabilities
        clf2 = LogisticRegression(C=10, random_state=42)
        fit_proba(clf2, X, y_proba, expand_factor=200)
        y_pred2 = clf2.predict_proba(X)[:,1]
        mae2 = mean_absolute_error(y_proba[:,1], y_pred2)
        print(y_pred2, mae2)
    
        assert mae2 * 1.2 < mae
    
        # let's get 3th example really right
        sample_weight = np.array([0.1, 0.1, 0.1, 10.0, 0.1])
        clf3 = LogisticRegression(C=10, random_state=42)
        fit_proba(clf3, X, y_proba, expand_factor=200, sample_weight=sample_weight)
        y_pred3 = clf3.predict_proba(X)[:,1]
        print(y_pred3)
    
        val = y_proba[3][1]
        assert abs(y_pred3[3] - val) * 1.5 < abs(y_pred2[3] - val)
>       assert abs(y_pred3[3] - val) * 1.5 < abs(y_pred[3] - val)
E       assert (0.077946544208881308 * 1.5) < 0.10327808741270417
E        +  where 0.077946544208881308 = abs((0.3720534557911187 - 0.45000000000000001))
E        +  and   0.10327808741270417 = abs((0.34672191258729584 - 0.45000000000000001))
tests/test_lime_utils.py:53: AssertionError
----------------------------- Captured stdout call -----------------------------
[ 0.92137462  0.87156298  0.26152978  0.34672191  0.49837953] 0.114698148448
[ 0.99854408  0.90620802  0.1122826   0.31398412  0.59140365] 0.0529117527887
[ 0.9862338   0.94839957  0.23016764  0.37205346  0.59652343]

JSON serialization of Explanation

I think it makes sense to add something like asdict method to Explanation that will return a JSON-serializable object (it will just call attr.asdict(self)).
And also add test that check that it is indeed json-serializable (right now it can have some numpy ints that are not seriazable).

Make _weight_range and _weight_color functions from formatters.html public

And maybe also some other functions? They are needed if we want to render weights in html similar to how it is done in the html formatter.
Another option would be to use an object instead of (name, weight) tuple, and add hsl_color attribute to it. I'm not sure which is better, making functions public feels less committing.

show_weights with OneVsRestClassifier

Hi guys, I really like this tool! I have a pipeline say

mlb = MultiLabelBinarizer()
y_train = mlb.fit_transform(y_train)
vec = TfidfVectorizer(ngram_range=(1, 2), stop_words='english')
clf = OneVsRestClassifier(LogisticRegressionCV())
pipeline = make_pipeline(vec, clf)
pipeline.fit(X_train, y_train)

show_prediction works neatly, but I run into 'LogisticRegressionCV' object has no attribute 'classes_' when calling eli5.show_weights(clf.estimator, vec=vec, target_names=mlb.classes_) or unsupported class if I use clf directly.

Is it possible to work around this problem or do you plan adding support for this soon?

Cheers!
Simon

allow to filter features by their names

Sometimes it is useful to check coefficients only for some of the features. For example, here (scroll down to "What are important features?") one may want to check how e.g. query:... features affect the result, without looking at all other features. This also can be helpful when adding a new feature.

What about adding 'feature_re' or 'feature_patterns' argument to explain_weights functions?

Negative feature weights have different order in text and html

Order in text is wrong:

 $ py.test tests/test_sklearn_explain_prediction.py::test_explain_linear_regression[reg0] -s
============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.5.1, pytest-3.0.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/kostia/shub/memex/eli5, inifile: 
plugins: hypothesis-3.4.2
collected 25 items 

tests/test_sklearn_explain_prediction.py {'estimator': 'ElasticNet(alpha=1.0, copy_X=True, fit_intercept=True, '
              'l1_ratio=0.5,\n'
              '      max_iter=1000, normalize=False, positive=False, '
              'precompute=False,\n'
              "      random_state=42, selection='cyclic', tol=0.0001, "
              'warm_start=False)',
 'method': 'linear model',
 'targets': [{'feature_weights': {'neg': [('x10', -19.656206335733643),
                                          ('x12', -16.947217711388856),
                                          ('x9', -3.368443508747657),
                                          ('x7', -0.73147197826808674)],
                                  'neg_remaining': 0,
                                  'pos': [('<BIAS>', 38.96972344614295),
                                          ('x5', 6.8348858609128671),
                                          ('x11', 4.8082096167385444),
                                          ('x8', 1.8485323743243427),
                                          ('x0', 0.23929256935816867)],
                                  'pos_remaining': 0},
              'score': 11.997304333338633,
              'target': 'y'}]}
Explained as: linear model
'y' (score=11.997) top features
----------------
 +38.970  <BIAS>
  +6.835  x5    
  +4.808  x11   
  +1.849  x8    
  +0.239  x0    
 -19.656  x10   
 -16.947  x12   
  -3.368  x9    
  -0.731  x7    

unhashing: sign of a feature can be confusing in case of collisions

A follow-up to #10 and #18: when deciding if a feature should be in top positive or in top negative features we should take in account sign of the most popular term, e.g. instead of

(-)people | considered | approximately +1.739 (as it is now)

it should be better to show

people | (-)considered | (-)approximately -1.739

add helpers for non-text data to eli5.lime

add IPython interactive widget

A widget may allow to change options, e.g.:

  • change a number of features to show;
  • show only some of the classes;
  • filter features by name;
  • switch between layouts;
  • etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.