Giter Club home page Giter Club logo

openfe's Introduction

OpenFE: An efficient automated feature generation tool

OpenFE is a new framework for automated feature generation for tabular data. OpenFE is easy-to-use, effective, and efficient with following advantages:

  • OpenFE can discover effective candidate features for improving the learning performance of both GBDT and neural networks.
  • OpenFE is efficient and supports parallel computing.
  • OpenFE covers 23 useful and effective operators for generating candidate features.
  • OpenFE supports binary-classification, multi-classification, and regression tasks.
  • OpenFE can automatically handle missing values and categorical features.

For further details, please refer to the paper.

Extensive comparison experiments on public datasets show that OpenFE outperforms existing feature generation methods on both effectiveness and efficiency. Moreover, we validate OpenFE on the IEEE-CIS Fraud Detection Kaggle competition, and show that a simple XGBoost model with features generated by OpenFE beats 99.3% of 6351 data science teams. The features generated by OpenFE results in larger performance improvement than the features provided by the first-place team in the competition.

🔥 News

  • [2023-06-25]: The code and datasets to reproduce the results in our paper are now available at OpenFE_reproduce. Please note that the code for OpenFE in OpenFE_reproduce is not the most recent version, as it is intended solely for reproduction purposes. Typically, employing the latest version here will yield superior performance.
  • [2023-04-26]: OpenFE has been accepted by ICML2023!

🏴󠁶󠁵󠁭󠁡󠁰󠁿 Get Started and Documentation

Installation

It is recommended to use pip for installation.

pip install openfe

Please do not use conda install openfe for installation. It will install another python package different from ours.

⚡️ A Quick Example

It only takes four lines of codes to generate features by OpenFE. First, we generate features by OpenFE. Next, we augment the train and test data by the generated features.

from openfe import OpenFE, transform

ofe = OpenFE()
features = ofe.fit(data=train_x, label=train_y, n_jobs=n_jobs)  # generate new features
train_x, test_x = transform(train_x, test_x, features, n_jobs=n_jobs) # transform the train and test data according to generated features.

We provide an example using the standard california_housing dataset in this link. A more complicated example demonstrating OpenFE can outperform machine learning experts in the IEEE-CIS Fraud Detection Kaggle competition is provided in this link. Users can also refer to our documentation for more advanced usage of OpenFE and FAQ about feature generation.

openfe's People

Contributors

zhangtp1996 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.