lpfgarcia / ecol Goto Github PK

View Code? Open in Web Editor NEW

57.0 57.0 11.0 174.09 MB

Extended Complexity Library in R

License: Other

R 95.83% Rebol 4.17%

complexity-measures pattern-recognition r-package

ecol's People

Contributors

Stargazers

Watchers

Forkers

minghao2016 hamidkarimi ngocson2vn victorsanunes maffei2443 wn1695173791 mqlenhamey danrodgar codinglifev kyleerwin davidg12138

ecol's Issues

d3

A medida deveria retornar o numero de ruidos por classe. Estamos retornando o numero de ruidos

D1

It think the value is inverted (it should be nrow/volume). Definition: average number of examples per unit of volume

F1v

LDA return one column dataset for acute-*

t1 vs t2

A T1 no artigo da Lorena é log(d/n) e no relatorio tecnico é n/d

Summarization para N3 e N4 faz sentido?

Fiquei em dúvida sobre o summarization aplicado a N3 e N4. Se for ver, as medidas L2 e L3 também deveriam ser sumarizadas da mesma maneira, ou seja, além da sumarização multiclasses delas, ter por instância. Acho que, apesar de manter a taxa entre 0 e 1, acaba não fazendo muito sentido para N3 e N4.

F4

Check again the code of removing funciton

Feature Selection

Look the articles and run one of the FS techniques

C2

1 - C2

F4 - overlapping.r

A F4 esta correta?

Bug due to version of R

Hi Luis, I`m using R 3.3.3

library(ECoL)
complexity(iris[,1:4], iris[,5], type="class")
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/mnt/nfs/modules/apps/R/3.3.3/lib64/R/library/igraph/libs/igraph.so':
libgmp.so.3: cannot open shared object file: No such file or directory

Any suggestion?
Cheers

L3

Check execution time

Valor da medida C1 (balance)

Olá Luis, notei que erramos em uma das medidas, inclusive no artigo da CSUR está errado... É a medida C1 da balance. Ela está retornando valor 1 para problemas balanceados, e deveria retornar 0. A gente tem que fazer 1 - C1 como resposta... Pior que no artigo está com a definição errada também. Vou consertar isso para o artigo de agora.

d1

Mandei email para os autores. Estou aguardando resposta.

Hyperretangle and volume functions

Discuss with Ana

T3 and T4

I found a problem with the PCA function, in line:

aux <- which(summary(aux)$importance[3,] <= 0.95)

I applied it to a subset of the iris dataset containing only classes and I got:

PC1 PC2
0.9435 1.0000

In this case, the two PCS must be used, but the test will only chooses the first.

In another test (another subset), I obtain:

PC1 PC2
0.9919 1.0000

In this case one PC must be returned, but the function obtains 0. (This gives rise to inf in T3, for instance)

F3 - overlapping.r

A F3 esta correta?

F1v: LDA

Usando codigo da Nuria

T1: não entendi porque sumarizar

Na medida T1 não entendi o que significa tirar a média, mínimo, máximo, etc.

Normalization

The measures F3, F4, N2, LSC, L1, Density, ClsCoef and Hubs need to be normalized to return higher values with the increase of complexity.

Alterar as siglas de algumas medidas

Pensei em alterar as siglas de algumas medidas, para ficar mais uniforme. Sugestões:

C1 de balance vira B1
C2 de balance vira B2
T2, T3 e T4 de dimensionality viram D1, D2 e D3
T1 e LSC de neighborhood viram N5 e N6
Density, ClsCoef e Hubs de Network viram G1, G2 e G3 (de graph)

Também não sei se é factível com base nas dependências que existem

N1, N2, N3 e T1

As medidas de neighborhood estão usando a distancia de gower. Ok?

Error while executing complexity()

complexity(Species ~., iris)
Error in complexity.default(modFrame[, -1, drop = FALSE], modFrame[, 1, :
argument "type" is missing, with no default

Sugestão de alteração de nome

Sugiro alterar o nome da categoria overlapping para feature-based ou algo do tipo, porque elas extraem o overlapping do ponto de vista dos atributos apenas e há medidas de outras categorias que também extraem overlapping. Mas entendo que talvez não seja factível pelo fato de outras bibliotecas usarem as medidas.

L1

Conferir o parametro de normalizacao

N4

Check execution time

LSC

1 - LSC

Dist with normalization

Discuss about that and if necessary, apply normalization for all measures which uses distance function.

Discrepancies between implemented measures and original definitions

Hi, I've come across two packages for data complexity measures: yours and RomeroBarata/dcme, which I have somewhat extended. I am seeing some discrepancies among the results from both packages, and from yours and the definitions included in the paper:

Lorena, A. C., Garcia, L. P. F., Lehmann, J., de Souto, M. C. P., and Ho, T. K. (2018). How Complex is your classification problem? A survey on measuring classification complexity. arXiv:1808.03591

I was hoping you could help me understand if/why your package may be altering some calculations and whether this is an error or an intended effect.

Thanks in advance!

David

Here's a list of the discrepancies I found, and a minimal example using the following variables:

x <- iris[, 1:4]
y <- iris$Species == "setosa"

Overlapping

F1: last line of the implementation computes 1/(aux + 1). This is not indicated in the original formulation (since F1 is the maximum of the Fisher's discriminant ratios of each feature), and gives strange results:

> ECoL::overlapping(x, y, measures = "F1")
0.148504
> dcme::F1(x, y)
16.66501

F2: I have to check both implementations of this to see the differences with respect to the definition.

> ECoL::overlapping(x, y, measures = "F2")
0
> dcme::F2(x, y)
0.004855226

F3: again, I still have to check implementations and the definition. Since F3 is higher when complexity is lower, I would assume for this example it should be 1 or close to 1.

> ECoL::overlapping(x, y, measures = "F3")
0 
> dcme::F3(x, y)
1

Dimensionality

T2: instead of the ratio of number of examples per dimension, seems to be the ratio of dimensions per example:

> ECoL::dimensionality(x, y, measures = "T2")
0.02666667 
> dcme::T2(x)
37.5
> 1/ECoL::dimensionality(x, y, measures = "T2")
37.5

T3: instead of the ratio of number of examples per PCA dimension, seems to be the ratio of PCA dimensions per example:

> ECoL::dimensionality(x, y, measures = "T3")
0.01333333 
> dcme::T3(x)
75
> 1/ECoL::dimensionality(x, y, measures = "T3")
75

Balance

C2: the last line of the implemented version returns 1 - 1/aux where aux already had the value of C2, according to the definition:

> ECoL::balance(x, y, measures = "C2")
0.2 
> dcme::C2(y)
1.25
> # ECoL implementation without the last line
> (function(y) {
    ii <- summary(y)
    nc <- length(ii)
    aux <- ((nc - 1)/nc) * sum(ii/(length(y) - ii))
    return(aux)
  })(as.factor(y))
1.25