jlevy44 / interactiontransformer Goto Github PK
View Code? Open in Web Editor NEWExtract meaningful interactions from machine learning models to obtain machine-learning performance with statistical model interpretability.
License: MIT License
Extract meaningful interactions from machine learning models to obtain machine-learning performance with statistical model interpretability.
License: MIT License
Dear jlevy44,
Once again I want to thank you for being able to use your code.
I have used the code for both XGBoost and Random Forest. Now I would like to use the code in combination with Neural Network. I know that I can't use TreeExplainer for the SHAP values but I can use KernelExplainer for the SHAP values. But there is no attribute 'shap_interaction_values' for this Explainer. Do you have any suggestions on how to implement this/make it possible to use the code for NN?
Thanks in advance.
Kind regards,
Jeroen
Dear jlevy44,
I wanted to use the InteractionTransformer in combination with the XGBClassifier. Following your demo on GitHub, I run:
from xgboost import XGBClassifier
transformer=InteractionTransformer(untrained_model=XGBClassifier(random_state=42, tree_method='hist'),max_train_test_samples=1000,mode_interaction_extract=int(np.sqrt(X_train.shape[1])))
transformer.fit(X_train,y_train)
Where my X_train and y_train are dataframes with shape (700000,39) and (700000,1), respectively.
I get the following error:
---------------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1661 blocks = [
-> 1662 make_block(values=blocks[0], placement=slice(0, len(axes[0])))
1663 ]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in make_block(values, placement, klass, ndim, dtype)
2721
-> 2722 return klass(values, ndim=ndim, placement=placement)
2723
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in init(self, values, placement, ndim)
129 if self._validate_ndim and self.ndim and len(self.mgr_locs) != len(self.values):
--> 130 raise ValueError(
131 f"Wrong number of items passed {len(self.values)}, "
ValueError: Wrong number of items passed 1, placement implies 39
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
in
1 from xgboost import XGBClassifier
2 transformer=InteractionTransformer(untrained_model=XGBClassifier(random_state=42),max_train_test_samples=1000,mode_interaction_extract=int(np.sqrt(X_train.shape[1]))) # mode_interaction_extract='sqrt'
----> 3 transformer.fit(X_train,y_train)
~\InteractionTransformer.py in fit(self, X, y)
204 # import pickle
205 # pickle.dump(shap_vals,open('shap_test.pkl','wb'))
--> 206 true_top_interactions=self.get_top_interactions(shap_vals)
207 #print(true_top_interactions)
208 self.design_terms='+'.join((np.core.defchararray.add(np.vectorize(lambda x: "Q('{}')*".format(x))(true_top_interactions.iloc[:,0]),np.vectorize(lambda x: "Q('{}')".format(x))(true_top_interactions.iloc[:,1]))).tolist())
~\InteractionTransformer.py in get_top_interactions(self, shap_vals)
223
224 """
--> 225 interaction_matrix=pd.DataFrame(shap_vals.mean(0),columns=self.features,index=self.features)#reduce(lambda x,y:x+y,shap_vals)/len(shap_vals)
226 interation_matrix_self_interact_removed=interaction_matrix.copy()
227 if not self.self_interactions:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in init(self, data, index, columns, dtype, copy)
495 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
496 else:
--> 497 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
498
499 # For data is list-like, or Iterable (will consume into list)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
232 block_values = [values]
233
--> 234 return create_block_manager_from_blocks(block_values, [columns, index])
235
236
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1670 blocks = [getattr(b, "values", b) for b in blocks]
1671 tot_items = sum(b.shape[0] for b in blocks)
-> 1672 raise construction_error(tot_items, blocks[0].shape[1:], axes, e)
1673
1674
ValueError: Shape of passed values is (39, 1), indices imply (39, 39)
---------------------------------------------------------------------------------
I then tried it with the data provided in your demo and everything worked fine. Do you know what could possibly go wrong?
Thanks in advance,
Hassan
Dear jlevy44,
I am very thankful for being able to use your code. However, I ran into a problem while trying to use the code for the Random Forest Regressor. If I run:
"
transformer=InteractionTransformer(RandomForestRegressor(random_state = 42),max_train_test_samples=100,mode_interaction_extract=10, cv_scoring='r2',num_workers=8,compute_interaction_dask=False,use_background_data=False)
transformer.fit(X_train,y_train)
"
Then the code is done in 7/8 minutes, and I get the results that I want. But when I add multiple parameters to the Random Forest Regressor, like for example:
"transformer=InteractionTransformer(RandomForestRegressor(random_state = 42, n_estimators = 2000, max_features = 0.2, max_depth = 50, bootstrap = True),max_train_test_samples=100,mode_interaction_extract=10, cv_scoring='r2',num_workers=8,compute_interaction_dask=False,use_background_data=False)
transformer.fit(X_train,y_train)
"
Then the code keeps running and running and it doesn't end.
I really want to see the output for the Random Forest Regressor with all the specified parameters because this model has a much better fit on my data. Do you know how to solve the problem?
Thanks in advance.
Kind regards,
Jeroen
Hello,
I am trying to use your transformer for XGBoostRegressor but I keep on receiving the following error : Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.
Do you have any suggestions on how to solve this problem?
Hi,
when running the demo file on python I encountered an error in line "transformer.fit(X_train,y_train)". I uncommented the lines 204, 205, and 208 here . The array in shap_test.pkl contains only nan. The output of python and the error message is below:
Shap Interaction Size: (240, 56, 56)
Empty DataFrame
Columns: [feature_1, feature_2, shap_interaction_score]
Index: []
<pandas.core.indexing._iLocIndexer object at 0x2abb2b457770>
Traceback (most recent call last):
File "", line 1, in
File "/scratch/ducryf/int/interactiontransformer/InteractionTransformer.py", line 209, in fit
self.design_terms='+'.join((np.core.defchararray.add(np.vectorize(lambda x: "Q('{}')*".format(x))(true_top_interactions.iloc[:,0]),np.vectorize(lambda x: "Q('{}')".format(x))(true_top_interactions.iloc[:,1]))).tolist())
File "/usr/scratch/blauen/ducryf/miniconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2091, in call
return self._vectorize_call(func=func, args=vargs)
File "/usr/scratch/blauen/ducryf/miniconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2161, in _vectorize_call
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
File "/usr/scratch/blauen/ducryf/miniconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2117, in _get_ufunc_and_otypes
raise ValueError('cannot call vectorize
on size 0 inputs '
ValueError: cannot call vectorize
on size 0 inputs unless otypes
is set
I get the same error on two systems, one using WSL (windows subsystem for linux) with Ubuntu 20.04, anaconda 4.9.2, and python 3.8.5. The other system is CentOS 7.4.1708, miniconda 4.9.2, and python 3.7.4.
How can I solve this issue?
Cheers,
Fabian
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.