Giter Club home page Giter Club logo

interactiontransformer's People

Contributors

jlevy44 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

chenhz1223

interactiontransformer's Issues

Possibilities InteractionTransfomer in combination with Neural Network

Dear jlevy44,

Once again I want to thank you for being able to use your code.

I have used the code for both XGBoost and Random Forest. Now I would like to use the code in combination with Neural Network. I know that I can't use TreeExplainer for the SHAP values but I can use KernelExplainer for the SHAP values. But there is no attribute 'shap_interaction_values' for this Explainer. Do you have any suggestions on how to implement this/make it possible to use the code for NN?

Thanks in advance.

Kind regards,
Jeroen

Error in dimensions when using InteractionTransformer

Dear jlevy44,

I wanted to use the InteractionTransformer in combination with the XGBClassifier. Following your demo on GitHub, I run:

from xgboost import XGBClassifier
transformer=InteractionTransformer(untrained_model=XGBClassifier(random_state=42, tree_method='hist'),max_train_test_samples=1000,mode_interaction_extract=int(np.sqrt(X_train.shape[1])))
transformer.fit(X_train,y_train)

Where my X_train and y_train are dataframes with shape (700000,39) and (700000,1), respectively.

I get the following error:
---------------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1661 blocks = [
-> 1662 make_block(values=blocks[0], placement=slice(0, len(axes[0])))
1663 ]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in make_block(values, placement, klass, ndim, dtype)
2721
-> 2722 return klass(values, ndim=ndim, placement=placement)
2723

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in init(self, values, placement, ndim)
129 if self._validate_ndim and self.ndim and len(self.mgr_locs) != len(self.values):
--> 130 raise ValueError(
131 f"Wrong number of items passed {len(self.values)}, "

ValueError: Wrong number of items passed 1, placement implies 39

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in
1 from xgboost import XGBClassifier
2 transformer=InteractionTransformer(untrained_model=XGBClassifier(random_state=42),max_train_test_samples=1000,mode_interaction_extract=int(np.sqrt(X_train.shape[1]))) # mode_interaction_extract='sqrt'
----> 3 transformer.fit(X_train,y_train)

~\InteractionTransformer.py in fit(self, X, y)
204 # import pickle
205 # pickle.dump(shap_vals,open('shap_test.pkl','wb'))
--> 206 true_top_interactions=self.get_top_interactions(shap_vals)
207 #print(true_top_interactions)
208 self.design_terms='+'.join((np.core.defchararray.add(np.vectorize(lambda x: "Q('{}')*".format(x))(true_top_interactions.iloc[:,0]),np.vectorize(lambda x: "Q('{}')".format(x))(true_top_interactions.iloc[:,1]))).tolist())

~\InteractionTransformer.py in get_top_interactions(self, shap_vals)
223
224 """
--> 225 interaction_matrix=pd.DataFrame(shap_vals.mean(0),columns=self.features,index=self.features)#reduce(lambda x,y:x+y,shap_vals)/len(shap_vals)
226 interation_matrix_self_interact_removed=interaction_matrix.copy()
227 if not self.self_interactions:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in init(self, data, index, columns, dtype, copy)
495 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
496 else:
--> 497 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
498
499 # For data is list-like, or Iterable (will consume into list)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
232 block_values = [values]
233
--> 234 return create_block_manager_from_blocks(block_values, [columns, index])
235
236

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1670 blocks = [getattr(b, "values", b) for b in blocks]
1671 tot_items = sum(b.shape[0] for b in blocks)
-> 1672 raise construction_error(tot_items, blocks[0].shape[1:], axes, e)
1673
1674

ValueError: Shape of passed values is (39, 1), indices imply (39, 39)
---------------------------------------------------------------------------------

I then tried it with the data provided in your demo and everything worked fine. Do you know what could possibly go wrong?

Thanks in advance,

Hassan

Endless run with RandomForest

Dear jlevy44,

I am very thankful for being able to use your code. However, I ran into a problem while trying to use the code for the Random Forest Regressor. If I run:
"
transformer=InteractionTransformer(RandomForestRegressor(random_state = 42),max_train_test_samples=100,mode_interaction_extract=10, cv_scoring='r2',num_workers=8,compute_interaction_dask=False,use_background_data=False)
transformer.fit(X_train,y_train)
"
Then the code is done in 7/8 minutes, and I get the results that I want. But when I add multiple parameters to the Random Forest Regressor, like for example:
"transformer=InteractionTransformer(RandomForestRegressor(random_state = 42, n_estimators = 2000, max_features = 0.2, max_depth = 50, bootstrap = True),max_train_test_samples=100,mode_interaction_extract=10, cv_scoring='r2',num_workers=8,compute_interaction_dask=False,use_background_data=False)
transformer.fit(X_train,y_train)
"
Then the code keeps running and running and it doesn't end.
I really want to see the output for the Random Forest Regressor with all the specified parameters because this model has a much better fit on my data. Do you know how to solve the problem?

Thanks in advance.

Kind regards,
Jeroen

XGBoostRegressor

Hello,
I am trying to use your transformer for XGBoostRegressor but I keep on receiving the following error : Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.
Do you have any suggestions on how to solve this problem?

Empty data frame encountered during demo run with epistasis.test.csv

Hi,

when running the demo file on python I encountered an error in line "transformer.fit(X_train,y_train)". I uncommented the lines 204, 205, and 208 here . The array in shap_test.pkl contains only nan. The output of python and the error message is below:

Shap Interaction Size: (240, 56, 56)
Empty DataFrame
Columns: [feature_1, feature_2, shap_interaction_score]
Index: []
<pandas.core.indexing._iLocIndexer object at 0x2abb2b457770>
Traceback (most recent call last):
File "", line 1, in
File "/scratch/ducryf/int/interactiontransformer/InteractionTransformer.py", line 209, in fit
self.design_terms='+'.join((np.core.defchararray.add(np.vectorize(lambda x: "Q('{}')*".format(x))(true_top_interactions.iloc[:,0]),np.vectorize(lambda x: "Q('{}')".format(x))(true_top_interactions.iloc[:,1]))).tolist())
File "/usr/scratch/blauen/ducryf/miniconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2091, in call
return self._vectorize_call(func=func, args=vargs)
File "/usr/scratch/blauen/ducryf/miniconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2161, in _vectorize_call
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
File "/usr/scratch/blauen/ducryf/miniconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 2117, in _get_ufunc_and_otypes
raise ValueError('cannot call vectorize on size 0 inputs '
ValueError: cannot call vectorize on size 0 inputs unless otypes is set

I get the same error on two systems, one using WSL (windows subsystem for linux) with Ubuntu 20.04, anaconda 4.9.2, and python 3.8.5. The other system is CentOS 7.4.1708, miniconda 4.9.2, and python 3.7.4.

How can I solve this issue?

Cheers,
Fabian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.