Giter Club home page Giter Club logo

genesim's Introduction

GENESIM: GENetic Extraction of a Single, Interpretable Model

This repository contains an innovative algorithm that constructs an ensemble using well-known decision tree induction algorithms such as CART, C4.5, QUEST and GUIDE combined with bagging and boosting. Then, this ensemble is converted to a single, interpretable decision tree in a genetic fashion. For a certain number of iterations, random pairs of decision trees are merged together by first converting them to sets of k-dimensional hyperplanes and then calculating the intersection of these two sets (a classic problem from computational geometry). Moreover, in each iteration, an individual is mutated with a certain probabibility. After these iterations, the accuracy on a validation set is measured for each of the decision trees in the population and the one with the highest accuracy (and lowest number of nodes in case of a tie) is returned. Example.py has run code for all implemented algorithms and returns their average predictive performance, computational complexity and model complexity on a number of dataset

Dependencies

An install.sh script is provided that will install all required dependencies

Documentation

A nicely looking documentation page is available in the doc/ directory. Download the complete directory and open index.html

Decision Tree Induction Algorithm Wrappers

A wrapper is written around Orange C4.5, sklearn CART, GUIDE and QUEST. The returned object is a Decision Tree, which can be found in decisiontree.py. Moreover, different methods are available on this decision tree: classify new, unknown samples; visualise the tree; export it to string, JSON and DOT; etc.

Ensemble Technique Wrappers

A wrapper is written around the well-known state-of-the-art ensemble techniques XGBoost and Random Forests

Similar techniques

A wrapper written around the R package inTrees and an implementation of ISM can be found in the constructors package.

New dataset

A new dataset can easily be plugged in into the benchmark. For this, a load_dataset() function must be written in load_datasets.py

Contact

You can contact me at givdwiel.vandewiele at ugent.be for any questions, proposals or if you wish to contribute.

Referring

Please refer to my work when you use it. A reference to this github or to the following (yet unpublished) paper:

@article{vandewiele2016genesim, title={GENESIM: genetic extraction of a single, interpretable model}, author={Vandewiele, Gilles and Janssens, Olivier and Ongenae, Femke and De Turck, Filip and Van Hoecke, Sofie}, journal={arXiv preprint arXiv:1611.05722}, year={2016} }

genesim's People

Contributors

gillesvandewiele avatar jamlamberti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genesim's Issues

Vectorize code

Remove as many for-loops as possible and vectorize as much as possible

Evaluation TODO's

  • Do an evaluation with oracle method (LIME?) where predictions of the black box are used as labels to train a new decision tree. Compare GENESIM with this oracle method for Neural Nets, XGB and RF.

  • Re-run all evaluations and put the accuracy of the initial ensemble (that gets merged by GENESIM) in the table as well (how much of the accuracy remains after GENESIM?)

Can you help me with some environment installation issues?

Bro, I am a computer graduate student and I am studying your paper and would like to try to reproduce it (date:10 Dec 2023). However, unfortunately I'm currently having problems configuring my environment, and I was wondering if I could ask you to help me see if I can fix itπŸ˜‚πŸ€ž.
My OS is Windows 10 and I can't run install.sh directly, so I copied the commands and executed them one by one, but there is an error when installing the "rpy2" package, I tried to skip this step but I can't execute the "example.py", I would like to ask if it's possible to remove this package or to find an alternative, here is the error message.

pip list info

(venv) E:\study\Python_Workspace\GENESIM-master>pip list
Package             Version
------------------- ------------
contourpy           1.1.1       
cycler              0.12.1      
fonttools           4.46.0      
graphviz            0.20.1      
imbalanced-learn    0.11.0      
importlib-resources 6.1.1       
joblib              1.3.2       
scikit-learn        1.3.2
scipy               1.10.1
setuptools          69.0.2
six                 1.16.0
threadpoolctl       3.2.0
tzdata              2023.3
wheel               0.41.2
xgboost             2.0.2
zipp                3.17.0

Give list of decision trees as parameter instead of constructors

In the ISM, inTrees and GENESIM class, TreeConstructors are given along as a parameter to the merging functions. Then in these functions, decision trees are constructed using these TreeConstructors. The code would look nicer if instead a list of Decision Trees was already given along as a parameter.

Unable to run the example provided

After following the instructions to setup the GENESIM, I was trying to run it for just one dataset ( wine dataset). But I was getting exceptions when the CART algorithm tries to build the trees.
Following is the snippet that i get:

CART
Traceback (most recent call last):
  File "example.py", line 65, in <module>
    clf = algorithms[algorithm].construct_classifier(train, feature_cols, label_col)
  File "D:\GENESIM\GENESIM\constructors\treeconstructor.py", line 237, in construct_classifier
    shuffle=True, random_state=None))
  File "D:\GENESIM\GENESIM\constructors\treeconstructor.py", line 336, in get_best_cart_classifier
    tree = cart.construct_classifier(train_tune, X_train_tune.columns, label_col, param_opt=False)
  File "D:\GENESIM\GENESIM\constructors\treeconstructor.py", line 249, in construct_classifier
    self.dt.fit(self.X, self.y)
  File "C:\Users\Naveen Kaushik\Anaconda2\lib\site-packages\sklearn\tree\tree.py", line 739, in fit
    X_idx_sorted=X_idx_sorted)
  File "C:\Users\Naveen Kaushik\Anaconda2\lib\site-packages\sklearn\tree\tree.py", line 199, in fit
    % self.min_samples_split)
ValueError: min_samples_split must be at least 2 or in (0, 1], got 1

Before CART it built trees with xgboost (Although with exception related to Bayesian Optimization).

What could be the possible reason for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.