Giter Club home page Giter Club logo

experiments-itea-paper's Introduction

Experiments of the ITEA paper

OBS: although there is a standalone implementation of ITEA in this repository, it is not the newest version, and is outdated. This specific implementation was made to serve the specific purpose of the paper. I highly recommend to use the high-performing Haskell version (that comes with a python wrapper) by @folivetti, or to use my most updated version (the only one that I am maintaining).


In order to validade our proposed algorithm, several other methods were fine tuded through a gridsearch process, and then applied to the same set of problems.

We also performed some particular investigations, like the Marginal Effect of the expressions generated by the ITEA, SymTree, and FEAT algorithms.

Finally, a Bonferroni-adjusted wilcoxon test was performed between the ITEA and every other algorithm.

To make results more transparent and share our metodology, this repository organizes the source code, data set and results utilized in the paper.

In the next topics, the folder structure will be presented, then a detailed description of the main folders wil be given.

Citing us:
@misc{defranca2020interactiontransformation,
      title={Interaction-Transformation Evolutionary Algorithm for Symbolic Regression}, 
      author={Fabricio Olivetti de Franca and Guilherme Seidyo Imai Aldeia},
      year={2020},
      eprint={1902.03983},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Folder structure

.
├── datasets 
│   ├── commaSeparated
│   └── tabSeparated
├── docs
│   └── GSGP
│       ├── GSGP_documentation
│       ├── GSGP_examples
│       └── GSGP_original_code
├── results
│   ├── disentanglement
│   ├── gridsearch
│   ├── iteaMarginalEffect
│   └── rmse
└── src
    ├── analysis
    │   ├── RMSEs
    │   ├── disentanglement
    │   ├── hypothesysTesting
    │   └── marginalEffect
    └── gridsearch
        ├── GSGP-gridsearch
        └── itea-gridsearch

22 directories                                                                     
  • datasets: contains the data sets used in the paper, already separated in a 5-fold configuration. This way, we assure that every algorithm is tested over the same train and test configurations, no matter how the random generator is setup. For the GSGP, the input needs to be tabular separated data, so there is two folders holding the same data but with different separators. Also, a script to split a dataset into the 5 fold files is inside this folder.
  • docs: the original documentation of the GSGP is in this folder.
  • results
    • disentanglement: csv files containing the expression and disentanglement measures for the ITEA, SymTree, and FEAT (full) algorithms;
    • gridsearch: some of the studied algorithms are slow - performing a gridsearch makes it more time consuming. In order to overcome this problems, the gridsearch of specific algorithms can be interupted and start from checkpoints. This is achieved through a file to store the RMSE of different folds for different configurations. All files here does not have direct use, as they are used just as checkpoints.
    • iteaMarginalEffect: notebook to plot marginal effects for ITEA expressions.
    • rmse: files with the RMSE on the train and test partition on every fold of every data set, obtained by using the best configuration found in the gridsearch. Those are the reported results in the paper.
  • src
    • analysis: statistical tests, marginal effects analysis, disentanglement, etc. Source code of all analysis made in the paper.
    • gridsearch: source code to perform the gridsearch and evaluate all gorithms, obtaining the results inside ./results/rmse.
      • GSGP is executed on a jupyter notebook to facilitate debugging (python is used mainly to run shell instructions.), and is on a separated folder because the C++ implementation requires auxiliary files. There is also a python script with the content of the notebook.
      • Lasso, LassoLars, Rigde, Tree, Forest, kNN, and elnet results are obtained with the gridsearchCV and regressor implementations provided by scikit-learn. The use of the script is pretty straightfoward.
      • dcgp, feat, gplearn and itea are evolutionary algorithms and use our implementation of gridsearch. The implementation creates a file enumerating the possible configurations for the gridsearch as a reference.
        • gplearn, itea and feat are designed to create checkpoints even during the gridsearch, due to the slow time of execution.

Acknowledgments

This project is supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), grant number 2018/14173-8.

experiments-itea-paper's People

Contributors

galdeia avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.