Giter Club home page Giter Club logo

Comments (2)

gykovacs avatar gykovacs commented on June 5, 2024

There can be multiple reasons for that. In many cases the authors of a particular SMOTE variant did not cover all the possible corner cases, for example,

  1. all minority samples are treated as noise according to the noise definition of the technique,
  2. the method wants to work with, say, 5 nearest neighbors, but there are only 3 minority samples,
  3. mathematical techniques like self-organizing maps, do not converge,
  4. etc.,

all of these because of the nature of the data is not compatible with the parameter settings and presumptions of the SMOTE variant.

Where I found reasonable resolutions, I implemented them, in those cases when it is unfeasible (for example, determining the 5 closest neighbors when you have only 3 samples in a class), the data is returned unaltered, although I would expect some message in the logs if logging is enabled.

Most likely your data is a corner case of the SOMO implementation with the parameters you used. Adjusting the parameters might lead to a properly operating SOMO.

Also, if you share a minimal working example, I can look into it.

from smote_variants.

leaphan avatar leaphan commented on June 5, 2024

thanks for your reply, i wrote a code like this:

pip install -U imbalanced-learn
pip install smote-variants
import numpy as np
import smote_variants as sv
#import imblearn.datasets as imbd
from imblearn.datasets import fetch_datasets

datasets = fetch_datasets(filter_data=['oil'])
X, y = datasets['oil']['data'], datasets['oil']['target']
[print('Class {} has {} instances'.format(label, count))
for label, count in zip(*np.unique(y, return_counts=True))]

oversampler= sv.SOMO()
X_samp, y_samp= oversampler.sample(X, y)

[print('Class {} has {} instances after oversampling'.format(label, count))
for label, count in zip(*np.unique(y_samp, return_counts=True))]
print(X_samp, y_samp)

and the print result :
Class -1 has 896 instances
Class 1 has 41 instances
Class -1 has 896 instances after oversampling
Class 1 has 41 instances after oversampling
After oversampling, There is no change in the number of two types of samples.

from smote_variants.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.