Giter Club home page Giter Club logo

smote_tune's Introduction

Smote_tune

Tuning Smote

smote_tune's People

Contributors

amritbhanu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

smote_tune's Issues

need de psuedo code

here's a light weight description. mote that point3 has to be changed for numeric attributes

t

2.   DE scores each {\em pop}$_i$ according to various objective
   scores $o$. In the case of our goal models, the objectives are $o_1$ the sum of the cost
 of its decisions, $o_2$ the number of ignore edges, and the number of $o_3$ satisfied goals
 and $o_4$  softgoals.

 3. OPTIMIZE tries to each replace {\em pop}$_i$ with a mutant $q$
 built by extrapolating between three other members of population $a,b,c$.
 At probability $p_1$, for each decision $a_k \in a$, then
 $m_k= a_k \vee (p_1 < \mathit{rand}() \wedge( b_k \vee c_k))$.

 4. Each mutant $m$ is assessed by calling  $\text{SAMPLE}(\textit{model,prior=m})$;
 i.e. by seeing what can be achieved within a goal after first assuming
 that $\textit{prior}=m$.

 5.  To test if the mutant $m$ is preferred to {\em pop}$_i$, OPTIMIZE uses
  Zitler's continuous domination {\em cdom}
  predicate~\cite{Zitzler2004}. This predicate compares two sets of objectives
  from sets $x$ and $y$. In that comparison,
  $x$ is better than another $y$ if $x$  ``losses'' least.
  In the following, $``n''$ is the number of objectives and $w_j \in \{-1, 1\}$

shows if we seek to maximize $o_j$.
[
\begin{array}{rcl}
x \succ y & =& \textit{loss}(y,x) > \textit{loss}(x,y)\
\textit{loss}(x,y)& = &\sum_j^n -e^{\Delta(j,x,y,n)}/n\
\Delta(j,x,y,n) & = & w_j(o_{j,x} - o_{j,y})/n
\end{array}
]

  1. OPTIMIZE repeatedly loops over the population, trying to replace items with mutants,
    until new better mutants stop being found.

  2. Return the population.
    \\hline
    \end{tabular}
    \caption{Procedure OPTIMIZE: strives to find ``good'' priors which,
    when passes to SAMPLE, maximize the number of edges used
    while also minimizing cost, and
    maximizing satisfied hard goals and soft goals.
    OPTIMIZE is based on Storn's differential evolution optimizer~\protect\cite{storn1997differential}.
    OPTIMIZE is called by the RANK procedure of \fig{rank}.
    For the reader unfamiliar with the mutation technique of step 3 and the {\em cdom}
    scoring of step5, we note that these
    are standard practice in the search-based
    SE community
    \cite{Fu2016,krall2015gale}.
    }\label{fig:optimize}

To dos:

  • Change to histograms
  • tune on x evaluate on x.
  • auc1 and auc2 (loc,recall)

Important attributes:

Weka: Total 20 attributes in each datasets. Datasets from top to bottom (high to low imbalance) . CFS attribute selection, and breadth first.

  • ant : cbo, rfc, lcom, loc, cam, amc, max_cc
  • redaktor - cbm, max_cc
  • arc - cbo, rfc, ce, npm, cam
  • ivy - wmc, cbo, rfc, ce, npm, loc, moa, amc
  • prop - lcom, ce , loc, moa, max_cc
  • tomcat - cbo, rfc, loc, moa , max_cc
  • camel - cbo, lcom, ca, avg_cc
  • jedit - rfc, moa

to dos:

  • original smote paper datasets and the measures
  • include f2 score, use abs nos for oversampling and undersampling (50, 100, 200, 400)

Tuning Results (with % m, n)

Experiment:

  • Goal - maximizing Fscore for each of 6 learners separately.
  • Once the parameters are found, reporting all 5 evaluation measures.
  • train, validation, and test sets.
  • parameters tuned are:
    • m(20,50) and n(80,50) % of oversampling and undersampling respectively.
    • power of distance metric (r) (0.1 to 5)??
    • k=(2,20) exponential??
    • Didn't do the preprocessing part (exp = 0.3 to 3)

Results:

Accuracy

file

Recall

file

Precision

file

F_score

file

False Alarm

file

Raw AUC Values

Within Measure Assessment

AUC for SMOTE

smote

AUC for SMOTUNED

smotuned

this para makes no sense

Results by Tantithamthavorn et al [50] also suggested that
every dataset comes with different attributes. And also clas-
sification techniques often have configurable parameters that
control characteristics of these classifiers that they produce.
Now time has come to even think about hyperparameter opti-
mization of these techniques and come up with an automated
process [2], [16] to tune these parameters for every dataset.

tuning per goal

  • Doesn't include 3 datasets: prop, tomcat and jedit.
  • AUC(pf,pd)

Conclusions:

  • AUC the highest improvement.
  • Recall good improvements
  • Fscore = DT and RF most improvement
  • Others either the same or worse.

Results:

Accuracy - Maximize

file

AUC - Maximize

file

Recall - Maximize

file

Precision - Maximize

file

F_score - Maximize

file

False Alarm - Minimise

file

lit revirew needs more work

it leaves open issue like

  • why we are using smote and not Pelayo's trial and error

more generally , lit reviews must respect and disrespect. respectfully present others work, then point out their fatal mistake and why this work in needeed

sentence order reverdes?

t is important to select how many synthetic examples to create ($m$) and how much undersampling ($m$) of majority class needs to be done. In this case number of oversampling and undersampling are the same.

paper too short.

increase width of fig 2,3,4,5. make full page wide (but dont increase font size)

how?

SMOTE's super-sampling selects instances from the minority class and finds k'' nearest neighbors for each instance and then creates new instances using the selected instances and their neighbors until we have m'' numbers of minority class samples.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.