I am learning how to use LIME to explain my data/model. When I was using "lime/doc/not

Please ignore that notebook : ). There is <a href="https://github.com/marcotcr/lime/pu

Use of scaler for continuous variables about lime HOT 3 CLOSED

marcotcr commented on July 23, 2024

Use of scaler for continuous variables

from lime.

Comments (3)

marcotcr commented on July 23, 2024

Please ignore that notebook : ). There is a pr that should make regression easier.

Anyway, to your question:
The reason why the default behavior for tabular data is discretization is that it is hard to reason about weights with non-discretized data, since you end up having to multiply the weight with the feature value to get the 'contribution'. Then you get 'double negatives' (positive contribution), and the like.

You are right that the distance is not used for weighting when discretization is on. That is a bug.
You are also right in noting that if the discretization is too broad, the explanations may not be appropriate. I would recommend trying deciles, or even trying discretize_continuous=False, but with the caveat that explanations require a little more thought for interpreting in that case.

from lime.

irissandiego commented on July 23, 2024

Thanks! That's very helpful.

I was trying to get the top 3 explanations/features for about 300K records for a model with 150 variables, and it took really long (a few days). I am wondering if it should take this long or it's because the way I am using it. I have an xgboost model and I cannot use xgboost classifier in sklearn because I have to use monotonic constraints of xgboost, so I wrote this predict_fn to make the prediction of xgboost work with LIME (I have binary tag):

sample_size=5000
n_features=150

def predict_fn(x):
     df = pd.DataFrame(x.reshape((sample_size,n_features)), columns=features)
     dtest = xgboost.DMatrix(df)
     preds = model.predict(dtest).reshape(-1,1)
     p0 = 1 - preds
     return np.hstack((p0, preds))

Here is how I print the reasons:

for i in range(len(X_test)):
     exp = explainer.explain_instance(X_test.iloc[i].as_matrix(), predict_fn, num_features=3,   num_samples=sample_size)
     print exp.as_list()

Should it take this long? or it's the workaround I am using? or as_list()?

from lime.

marcotcr commented on July 23, 2024

It sounds about right.
You are having the model predict 5K * 300K times, which totals 1.5 billion predictions. If it's taking you ~3.5 days, your model is making around 5000 predictions per second, which is about what I would expect depending on your machine.
If you want to make it faster, make the sample_size smaller.

from lime.

Recommend Projects

Use of scaler for continuous variables about lime HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent