lpfgarcia / ecol Goto Github PK
View Code? Open in Web Editor NEWExtended Complexity Library in R
License: Other
Extended Complexity Library in R
License: Other
A medida deveria retornar o numero de ruidos por classe. Estamos retornando o numero de ruidos
Equation 2.5
It think the value is inverted (it should be nrow/volume). Definition: average number of examples per unit of volume
LDA return one column dataset for acute-*
A T1 no artigo da Lorena é log(d/n) e no relatorio tecnico é n/d
Fiquei em dúvida sobre o summarization aplicado a N3 e N4. Se for ver, as medidas L2 e L3 também deveriam ser sumarizadas da mesma maneira, ou seja, além da sumarização multiclasses delas, ter por instância. Acho que, apesar de manter a taxa entre 0 e 1, acaba não fazendo muito sentido para N3 e N4.
Check again the code of removing funciton
Look the articles and run one of the FS techniques
1 - C2
A F4 esta correta?
Hi Luis, I`m using R 3.3.3
library(ECoL)
complexity(iris[,1:4], iris[,5], type="class")
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/mnt/nfs/modules/apps/R/3.3.3/lib64/R/library/igraph/libs/igraph.so':
libgmp.so.3: cannot open shared object file: No such file or directory
Any suggestion?
Cheers
Check execution time
Olá Luis, notei que erramos em uma das medidas, inclusive no artigo da CSUR está errado... É a medida C1 da balance. Ela está retornando valor 1 para problemas balanceados, e deveria retornar 0. A gente tem que fazer 1 - C1 como resposta... Pior que no artigo está com a definição errada também. Vou consertar isso para o artigo de agora.
Mandei email para os autores. Estou aguardando resposta.
Discuss with Ana
I found a problem with the PCA function, in line:
aux <- which(summary(aux)$importance[3,] <= 0.95)
I applied it to a subset of the iris dataset containing only classes and I got:
PC1 PC2
0.9435 1.0000
In this case, the two PCS must be used, but the test will only chooses the first.
In another test (another subset), I obtain:
PC1 PC2
0.9919 1.0000
In this case one PC must be returned, but the function obtains 0. (This gives rise to inf in T3, for instance)
A F3 esta correta?
Usando codigo da Nuria
Na medida T1 não entendi o que significa tirar a média, mínimo, máximo, etc.
The measures F3, F4, N2, LSC, L1, Density, ClsCoef and Hubs need to be normalized to return higher values with the increase of complexity.
Pensei em alterar as siglas de algumas medidas, para ficar mais uniforme. Sugestões:
Também não sei se é factível com base nas dependências que existem
As medidas de neighborhood estão usando a distancia de gower. Ok?
complexity(Species ~., iris)
Error in complexity.default(modFrame[, -1, drop = FALSE], modFrame[, 1, :
argument "type" is missing, with no default
Sugiro alterar o nome da categoria overlapping para feature-based ou algo do tipo, porque elas extraem o overlapping do ponto de vista dos atributos apenas e há medidas de outras categorias que também extraem overlapping. Mas entendo que talvez não seja factível pelo fato de outras bibliotecas usarem as medidas.
Conferir o parametro de normalizacao
Check execution time
1 - LSC
Discuss about that and if necessary, apply normalization for all measures which uses distance function.
Hi, I've come across two packages for data complexity measures: yours and RomeroBarata/dcme, which I have somewhat extended. I am seeing some discrepancies among the results from both packages, and from yours and the definitions included in the paper:
I was hoping you could help me understand if/why your package may be altering some calculations and whether this is an error or an intended effect.
Thanks in advance!
David
Here's a list of the discrepancies I found, and a minimal example using the following variables:
x <- iris[, 1:4]
y <- iris$Species == "setosa"
1/(aux + 1)
. This is not indicated in the original formulation (since F1 is the maximum of the Fisher's discriminant ratios of each feature), and gives strange results:> ECoL::overlapping(x, y, measures = "F1")
0.148504
> dcme::F1(x, y)
16.66501
> ECoL::overlapping(x, y, measures = "F2")
0
> dcme::F2(x, y)
0.004855226
> ECoL::overlapping(x, y, measures = "F3")
0
> dcme::F3(x, y)
1
> ECoL::dimensionality(x, y, measures = "T2")
0.02666667
> dcme::T2(x)
37.5
> 1/ECoL::dimensionality(x, y, measures = "T2")
37.5
> ECoL::dimensionality(x, y, measures = "T3")
0.01333333
> dcme::T3(x)
75
> 1/ECoL::dimensionality(x, y, measures = "T3")
75
1 - 1/aux
where aux already had the value of C2, according to the definition:> ECoL::balance(x, y, measures = "C2")
0.2
> dcme::C2(y)
1.25
> # ECoL implementation without the last line
> (function(y) {
ii <- summary(y)
nc <- length(ii)
aux <- ((nc - 1)/nc) * sum(ii/(length(y) - ii))
return(aux)
})(as.factor(y))
1.25
Verificar a funcao radios (dj - tmp)
The original T2 (m1) measure takes n/m, in which n is the number of examples and m is the number of features.
svm should use scale=true
D3 with problem for flare.arff
I assumed it is the imbalance ratio. In this case it should be n_max/n_min
C3 is a multiclass version?
The function hyperretangle only works for numeric attrs.
As medida de linearidade estão usando a funcao OVO para decompor os problemas multiclasse em binários. OVO é a técnica mais adequada?
Discuss with Ana about the Inf values in the matrix of attributes and classes
In the paper I will introduce some simple alterations into the graph-based measures in order to normalize them between 0 and 1. The codes will have to be adapted accordingly, but the alterations are quite simple.
O termo "eigenvalue transformation" significa PCA?
Check execution time
I think the current implementation is imprecise. You do not need to remove the hyperspheres contained inside others in this measure.
A medida L3 esta gerando, usando a medida de interpolação, o mesmo número de exemplos do conjunto de treinamento. Este número esta correto?
A t1 esta correta?
How to calculate the volume of dataset with categorical attributes?
T4 e a media de N/V?
A F2 esta correta?
Fix the bug.
Generate the same datasrt
Conferir pq deveria ser entre 0 e 1
Conferir e ver um novo algoritmo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.