Giter Club home page Giter Club logo

ecol's People

Contributors

aclorena avatar lpfgarcia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ecol's Issues

d3

A medida deveria retornar o numero de ruidos por classe. Estamos retornando o numero de ruidos

F1

Equation 2.5

D1

It think the value is inverted (it should be nrow/volume). Definition: average number of examples per unit of volume

F1v

LDA return one column dataset for acute-*

t1 vs t2

A T1 no artigo da Lorena é log(d/n) e no relatorio tecnico é n/d

Summarization para N3 e N4 faz sentido?

Fiquei em dúvida sobre o summarization aplicado a N3 e N4. Se for ver, as medidas L2 e L3 também deveriam ser sumarizadas da mesma maneira, ou seja, além da sumarização multiclasses delas, ter por instância. Acho que, apesar de manter a taxa entre 0 e 1, acaba não fazendo muito sentido para N3 e N4.

F4

Check again the code of removing funciton

C2

1 - C2

Bug due to version of R

Hi Luis, I`m using R 3.3.3

library(ECoL)
complexity(iris[,1:4], iris[,5], type="class")
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/mnt/nfs/modules/apps/R/3.3.3/lib64/R/library/igraph/libs/igraph.so':
libgmp.so.3: cannot open shared object file: No such file or directory

Any suggestion?
Cheers

L3

Check execution time

Valor da medida C1 (balance)

Olá Luis, notei que erramos em uma das medidas, inclusive no artigo da CSUR está errado... É a medida C1 da balance. Ela está retornando valor 1 para problemas balanceados, e deveria retornar 0. A gente tem que fazer 1 - C1 como resposta... Pior que no artigo está com a definição errada também. Vou consertar isso para o artigo de agora.

d1

Mandei email para os autores. Estou aguardando resposta.

T3 and T4

I found a problem with the PCA function, in line:

aux <- which(summary(aux)$importance[3,] <= 0.95)

I applied it to a subset of the iris dataset containing only classes and I got:

PC1 PC2
0.9435 1.0000

In this case, the two PCS must be used, but the test will only chooses the first.

In another test (another subset), I obtain:

PC1 PC2
0.9919 1.0000

In this case one PC must be returned, but the function obtains 0. (This gives rise to inf in T3, for instance)

Normalization

The measures F3, F4, N2, LSC, L1, Density, ClsCoef and Hubs need to be normalized to return higher values with the increase of complexity.

Alterar as siglas de algumas medidas

Pensei em alterar as siglas de algumas medidas, para ficar mais uniforme. Sugestões:

  • C1 de balance vira B1
  • C2 de balance vira B2
  • T2, T3 e T4 de dimensionality viram D1, D2 e D3
  • T1 e LSC de neighborhood viram N5 e N6
  • Density, ClsCoef e Hubs de Network viram G1, G2 e G3 (de graph)

Também não sei se é factível com base nas dependências que existem

N1, N2, N3 e T1

As medidas de neighborhood estão usando a distancia de gower. Ok?

Error while executing complexity()

complexity(Species ~., iris)
Error in complexity.default(modFrame[, -1, drop = FALSE], modFrame[, 1, :
argument "type" is missing, with no default

Sugestão de alteração de nome

Sugiro alterar o nome da categoria overlapping para feature-based ou algo do tipo, porque elas extraem o overlapping do ponto de vista dos atributos apenas e há medidas de outras categorias que também extraem overlapping. Mas entendo que talvez não seja factível pelo fato de outras bibliotecas usarem as medidas.

L1

Conferir o parametro de normalizacao

N4

Check execution time

Dist with normalization

Discuss about that and if necessary, apply normalization for all measures which uses distance function.

Discrepancies between implemented measures and original definitions

Hi, I've come across two packages for data complexity measures: yours and RomeroBarata/dcme, which I have somewhat extended. I am seeing some discrepancies among the results from both packages, and from yours and the definitions included in the paper:

  • Lorena, A. C., Garcia, L. P. F., Lehmann, J., de Souto, M. C. P., and Ho, T. K. (2018). How Complex is your classification problem? A survey on measuring classification complexity. arXiv:1808.03591

I was hoping you could help me understand if/why your package may be altering some calculations and whether this is an error or an intended effect.

Thanks in advance!

David


Here's a list of the discrepancies I found, and a minimal example using the following variables:

x <- iris[, 1:4]
y <- iris$Species == "setosa"

Overlapping

  • F1: last line of the implementation computes 1/(aux + 1). This is not indicated in the original formulation (since F1 is the maximum of the Fisher's discriminant ratios of each feature), and gives strange results:
> ECoL::overlapping(x, y, measures = "F1")
0.148504
> dcme::F1(x, y)
16.66501
  • F2: I have to check both implementations of this to see the differences with respect to the definition.
> ECoL::overlapping(x, y, measures = "F2")
0
> dcme::F2(x, y)
0.004855226
  • F3: again, I still have to check implementations and the definition. Since F3 is higher when complexity is lower, I would assume for this example it should be 1 or close to 1.
> ECoL::overlapping(x, y, measures = "F3")
0 
> dcme::F3(x, y)
1

Dimensionality

  • T2: instead of the ratio of number of examples per dimension, seems to be the ratio of dimensions per example:
> ECoL::dimensionality(x, y, measures = "T2")
0.02666667 
> dcme::T2(x)
37.5
> 1/ECoL::dimensionality(x, y, measures = "T2")
37.5
  • T3: instead of the ratio of number of examples per PCA dimension, seems to be the ratio of PCA dimensions per example:
> ECoL::dimensionality(x, y, measures = "T3")
0.01333333 
> dcme::T3(x)
75
> 1/ECoL::dimensionality(x, y, measures = "T3")
75 

Balance

  • C2: the last line of the implemented version returns 1 - 1/aux where aux already had the value of C2, according to the definition:
> ECoL::balance(x, y, measures = "C2")
0.2 
> dcme::C2(y)
1.25
> # ECoL implementation without the last line
> (function(y) {
    ii <- summary(y)
    nc <- length(ii)
    aux <- ((nc - 1)/nc) * sum(ii/(length(y) - ii))
    return(aux)
  })(as.factor(y))
1.25

T1

Verificar a funcao radios (dj - tmp)

m1 and m2 are inverted

The original T2 (m1) measure takes n/m, in which n is the number of examples and m is the number of features.

D3

D3 with problem for flare.arff

c2 is inverted

I assumed it is the imbalance ratio. In this case it should be n_max/n_min

C3 is a multiclass version?

hyperretangle

The function hyperretangle only works for numeric attrs.

L1, L2 e L3

As medida de linearidade estão usando a funcao OVO para decompor os problemas multiclasse em binários. OVO é a técnica mais adequada?

F1

Discuss with Ana about the Inf values in the matrix of attributes and classes

Normalization of the graph-measures

In the paper I will introduce some simple alterations into the graph-based measures in order to normalize them between 0 and 1. The codes will have to be adapted accordingly, but the alterations are quite simple.

id1

O termo "eigenvalue transformation" significa PCA?

LSCAvg

I think the current implementation is imprecise. You do not need to remove the hyperspheres contained inside others in this measure.

L3

A medida L3 esta gerando, usando a medida de interpolação, o mesmo número de exemplos do conjunto de treinamento. Este número esta correto?

T1

A t1 esta correta?

D2

How to calculate the volume of dataset with categorical attributes?

t4

T4 e a media de N/V?

ID1

Fix the bug.

LSCAvg

Conferir pq deveria ser entre 0 e 1

F1

Conferir e ver um novo algoritmo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.