Giter Club home page Giter Club logo

Comments (3)

gykovacs avatar gykovacs commented on June 11, 2024 1

Hi, only a handful of oversampling techniques considers categorical variables, and even so, it is not implemented in the smote-variants package. Most of the oversampling techniques operate in the Euclidean space, treating all attributes continuous. A commonly followed way to use oversampling techniques with categorical variables is encoding the categorical variables, for exampleyusing one-hot encoding. Then, oversamoling techniques might end up in feature values which are fractional numbers, but from the regression point of view it is not a problem as it just expresses that the samole might be somewhere between the two categories.

Alternatively, omce the one-hot encoding is done and the oversampling is applied, you might convert the oversampled fractional values to crisp binary ones to keep the categorical nature.

from smote_variants.

gykovacs avatar gykovacs commented on June 11, 2024 1

SMOTENC is just a hack to apply SMOTE to categorical data. If you encode your categorical features by one-hot encoding and standardize the continuous features to have the standard deviation 1, vanilla SMOTE and all other smote variants (including DEAGO) will operate in the same metric space as SMOTENC. So there is no need for special arguments to pass categorical features, you just need to encode them properly.

from smote_variants.

eponraj27392 avatar eponraj27392 commented on June 11, 2024

Since I found SMOTENC from imbalanced learn library which can take cat_feature index as input, I thought this libraray too have some attributes to mention about the cat_features.

from smote_variants.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.