Giter Club home page Giter Club logo

scikit-feature's People

Contributors

bacalfa avatar jundongl avatar vivian1993 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scikit-feature's Issues

Return JMI Values

Is there a way I can get the JMI values being calculated for feature selection

SKLearn fit/transform compatability?

Is there an "out of the box" way to use this directly as a scikit learn "transformer"? i.e do the methods support fit, fit_transform, etc'?
Sorry if this exists and I missed it somewhere! (Without this, the methods can't be used directly with sklearn's pipelines or CV methods).

Bug in `lap_score.py`

Hi there!

It seems to be an error in lap_score.py. Please, take a look at the following excerpt:

    # if 'W' is not specified, use the default W
    if 'W' not in kwargs.keys():
        W = construct_W(X)
    # construct the affinity matrix W
    W = kwargs['W']

If the user does not pre-compute W, then the last line results in a KeyError. I think it's easy to fix, since there seems to be a missing else:

    if 'W' not in kwargs.keys():
        # if 'W' is not specified, use the default W
        W = construct_W(X)
    else:
        # construct the affinity matrix W
        W = kwargs['W']

Thanks. Regards.

TypeError: __init__() got an unexpected keyword argument 'n_folds'

When trying score = svm_forward.svm_forward(X_train_values, y_train_reg_values, 50), I got the following error:

TypeError Traceback (most recent call last)
in
8
9 # score = svm_backward.svm_backward(X_train_values, y_train_reg_values,50)
---> 10 score = svm_forward.svm_forward(X_train_values, y_train_reg_values, 50)

~/anaconda3/envs/ish3test/lib/python3.6/site-packages/skfeature/function/wrapper/svm_forward.py in svm_forward(X, y, n_selected_features)
26 n_samples, n_features = X.shape
27 # using 10 fold cross validation
---> 28 cv = KFold(n_samples, n_folds=10, shuffle=True)
29 # choose SVM as the classifier
30 clf = SVC()

TypeError: init() got an unexpected keyword argument 'n_folds'

I'm wondering whether it's because I'm using Python3?

number of selected features

# perform evaluation on classification task
num_fea = 100 # number of selected features
clf = svm.LinearSVC() # linear SVM
here is the code from your script.
i want to know how you are setting num_fea = 100 # number of selected features
is their any criteria? because in some scripts you set it 10.
if i have 193 features how much i will give to num_fea?

please help me to understand this

Thanks

relieF error

Hi, when I run relieF, I got the following error:

File "C:\Users\Massimo\Anaconda3\lib\site-packages\skfeature\function\similarity_based\reliefF.py", line 101, in reliefF
score += near_miss_term[label]/(k*p_dict[label])

TypeError: ufunc 'add' output (typecode 'O') could not be coerced to provided output parameter (typecode 'd') according to the casting rule ''same_kind''

Can you help me please?

Massimo

Dimension mismatch

Hello there ! hope you're doing fine.

I was just trying to use the SPEC and Laplacian Score modules to de-noise a BoW (489 docs, 7895 terms) and got the following errors:

SPEC:
**File "", line 2, in
spectral_fs.spec(x)

File "C:\Users\Erick Garciaoliva\Anaconda3\lib\site-packages\skfeature\function\similarity_based\SPEC.py", line 74, in spec
l = LA.norm(F_hat)

File "C:\Users\Erick Garciaoliva\Anaconda3\lib\site-packages\numpy\linalg\linalg.py", line 2450, in norm
sqnorm = dot(x, x)

File "C:\Users\Erick Garciaoliva\Anaconda3\lib\site-packages\scipy\sparse\base.py", line 481, in mul
raise ValueError('dimension mismatch')

ValueError: dimension mismatch**

LP-Score:
**File "", line 1, in
lapscore = LaplacianScore(x)

File "", line 35, in LaplacianScore
t=np.matmul(np.matmul(Xt,D.toarray()),I)/np.matmul(np.matmul(np.transpose(I),D.toarray()),I)

ValueError: matmul: Input operand 0 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)**

Maybe this has to do with the data type of the BoW ?

Bug in `construct_W.py`

Hi there,

I'm trying to construct the W weight matrix to work with lap_score on the following simple dataset: employes-region.txt. I've tried the following code, which is provided as an example in file test_lap_score.py:

    kwargs_W = {"metric": "euclidean", "neighbor_mode": "knn", "weight_mode": "heat_kernel", "k": 5, 't': 1}
    W = construct_W.construct_W(X, **kwargs_W)

Unfortunately, it fails with the following exception at line 152 of file construct_W.py:

could not broadcast input array from shape (25) into shape (30) 

I've gone through the code, and I think that the problem's that the dimensions of G are wrong. This is the piece of code involved in the exception:

            t = kwargs['t']
            # compute pairwise euclidean distances
            D = pairwise_distances(X)
            D **= 2
            # sort the distance matrix D in ascending order
            dump = np.sort(D, axis=1)
            idx = np.argsort(D, axis=1)  #  *** 1
            idx_new = idx[:, 0:k+1]  #  *** 2
            dump_new = dump[:, 0:k+1] #  *** 2
            # compute the pairwise heat kernel distances
            dump_heat_kernel = np.exp(-dump_new/(2*t*t))
            G = np.zeros((n_samples*(k+1), 3)) #  *** 2
            G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1) #  *** 2
            G[:, 1] = np.ravel(idx_new, order='F') # *** EXCEPTION HERE!!
            G[:, 2] = np.ravel(dump_heat_kernel, order='F')
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)

I think that there's a problem at line *** 1. Should it compute idxusing dump? I mean:

            idx = np.argsort(dump, axis=1)  #  *** 1

And the other problem is at the lines *** 2. Shouldn't they use k as a multiplier instead of k+1? That is:

            idx_new = idx[:, 0:k]  #  *** 2
            dump_new = dump[:, 0:k] #  *** 2
            # compute the pairwise heat kernel distances
            dump_heat_kernel = np.exp(-dump_new/(2*t*t))
            G = np.zeros((n_samples*(k), 3)) #  *** 2
            G[:, 0] = np.tile(np.arange(n_samples), (k, 1)).reshape(-1) #  *** 2

I've fixed my local installation using this path and I've run the system on a large collection with 200+ datasets. It works correctly now.

I've seen that there are many other lines in which a similar patch might apply, bu I haven't tried other configuration options.

Thanks! Regards

Please add compatibility with sparse matrices

For the library to work i have to convert the sparse matrix to dense and it takes a lot of memory and because of that sometimes i am unable to perform the task required due to memory error. Specifically i am talking about the statistical methods like CHI2 /giniIndex etc

FCBF.py Error fixed

Hi,
There is a typo on line 39 of the FCBF.py file. I fixed it and was able to get it working on my system, by
changing dytpes to datatype and removing the quotes around object.

Current:t1 = np.zeros((n_features, 2), dtypes='object')

Correct = t1 = np.zeros((n_features, 2), dtype=object)

Best,
Sparkle

Maybe a mistake in CFS

Hi

Maybe I'm wrong but I think that

rff *= 2

should be:

rff *= (n_features **2 - n_features)

in the merit calculation function

Thanks for sharing your code!

MemoryError

Hello:

I am using the unsupervised feature selection using Laplacian Score. But I am facing the below error message

Traceback (most recent call last):
File "fs.py", line 12, in
W = construct_W.construct_W(frame, *_kwargs_W)
File "/usr/local/lib/python2.7/dist-packages/skfeature/utility/construct_W.py", line 141, in construct_W
D = pairwise_distances(X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/pairwise.py", line 1207, in pairwise_distances
return _parallel_pairwise(X, Y, func, n_jobs, *_kwds)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/pairwise.py", line 1054, in _parallel_pairwise
return func(X, Y, **kwds)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/pairwise.py", line 231, in euclidean_distances
distances = safe_sparse_dot(X, Y.T, dense_output=True)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/extmath.py", line 184, in safe_sparse_dot
return fast_dot(a, b)
MemoryError

I have updated the scikit-learn; but the issue persists. Any inputs will be helpful

Regards
Sayantan Guha

gini_index implementation

Hello, when I tried gini_index to get the importance of the features. The output is always be:
gini_index : [0.5 0.5 0.5 0.5 0.5]
Is there any problem of this function?

Error in Laplacian Score function

The following is the error message:
C:\ProgramData\Anaconda3\envs\py27\lib\site-packages\skfeature\function\similarity_based\lap_score.pyc in lap_score(X, **kwargs)
34 W = construct_W(X)
35 # construct the affinity matrix W
---> 36 W = kwargs['W']
37 # build the diagonal D matrix from affinity matrix W
38 D = np.array(W.sum(axis=1))

To fix it, need to change line 34 as the following:
In line 34, kwargs['W ']= construct_W(X)

Speed up recomendation for `cfs` function

In the code for cfs(X, y), you are calling repeatedly the function merit_calculation(X, y), which it self calls repeatedly the function su_calculation, sometimes with exactly the same feature(s) as in previous rounds.

To avoid repeatedly computing su_calculation(fi, y) with the same feature fi, it would be ideal to save the computation results into a list or a dictionary when they are called the first time, and to load those values instead of recomputing them when they are called afterwards. That would ensure the linear complexity of the algorithm and improve its speed.

This could be the code to achieve that:

def merit_calculation(X, y, F, memo):
    rff = 0
    rcf = 0
    for i in F:
        if i not in memo:
            fi = X[:, i]
            memo[i] = su_calculation(fi, y)
        rcf += memo[i]
        for j in F:
            if j > i:
                if (i,j) not in memo:
                    fj = X[:, j]
                    memo[(i,j)] = su_calculation(fi, fj)
                rff += memo[(i,j)]
    rff *= 2
    merits = rcf / np.sqrt(len(F) + rff)
    return merits

And the usage, supplying the indices and the memory dictionary on each call.

...
memo = {}
...
t = merit_calculation(X, y, F, memo)
F = ...
t = merit_calculation(X, y, F, memo)
F = ...
...

Problem while installing

Hi,
while installing the scikit-feature, I get the following error:
Could not find a version that satisfies the requirement scikit-feature (from versions: )
No matching distribution found for scikit-feature.

I tried it using python 2.7 and 3.6. but the same error occurred. Could you please help me?

IndexError when using JMI and MRMR

Hi.
I have tried JMI and MRMR but I got following error:
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

Also the same error raises when I run examples provided within the package like est_JMI.py.

Example code for calculating fisher score

Hey! Could you please provide an example code for calculating fisher score present in the path

skfeature.function.similarity_based.fisher_score

Could you please help me with what class labels have to be provided as the function should extract the fisher features and provide the labels.

Error occours when run an example on Python3.

Hi, thanks for your great work.
And I run into an error when I try to run the test_MRMR.py example.
It seems like some code are on the Python2.x, but when try to run on the Python3.x, it crashed.

Could you please update these part of code, make it fit in Python3?
Thx a lot.

missing LCSI . What is this? and how I could resolve this error?

D:\prj>python findFeatures.py
Traceback (most recent call last):
File "findFeatures.py", line 7, in
from skfeature.function.information_theoretical_based import MIM # infogain
File "D:\ProgramData\Anaconda3\lib\site-packages\skfeature\function\information_theoretical_based\MIM.py", line 1, in
import LCSI
ModuleNotFoundError: No module named 'LCSI'

Limited Features

We have a dataset of 72 X 3571, which mean that our features are less than samples. We test out dataset on you spec() feature selection technique the output is array of zeros.
Kindly check this issue.

UDFS Errors

UDFS.py L97: The additive parameter \lambda should be independent from gamma (introduced in eq 8 in the paper). It should probably default to something small. It's just used to make the covariance invertible.

Also, construction of S_i seems incorrect to me: UDFS.py L100: indexing on idx_new should be idx_new[:,q]?

construct_W modifies X with cosine metric

Dear,

As the title indicates, when the function construct_W is called using the 'cosine' option, the following operation is done:

for i in range(n_samples):
X[i, :] = X[i, :]/max(1e-12, X_normalized[i])

But this actually changes the values contained in the input variable X outside the function. This may be problematic when we want to use X elsewhere and nothing indicated that X was modified.

I realized it by running the same code before and after calling construct_W and getting different results

best,

The linear_assignment_ module is deprecated

warnings.warn(
"The linear_assignment_ module is deprecated in 0.21 "
"and will be removed from 0.23. Use "
"scipy.optimize.linear_sum_assignment instead.",
DeprecationWarning)

Parallel evaluation of MI based measures

I added a PR for a prototype solution (which you closed without comment), if you would rather use another library eg Job lib it should be relatively straightforward and I can take a look at that?

entropy value is negative

I found for some continuous variables, the entropy_estimators library return the negative number. Here is the reply I got from the author of this library,

For continuous variables, this package is calculating the differential entropy. Unfortunately, the differential entropy can be negative, making interpretation more difficult than in the discrete case. See chapter 8 of Cover and Thomas, for example, for a discussion of how to interpret negative differential entropies. (Consider, for instance, the differential entropy for a Gaussian which is proportional to log variance. If the variance is small, you get a negative number.)

My question is for the information theoretical based methods which use this library for entropy calculation, if the entropy result is negative, will the feature selection result still be valid?
Thanks

Please publish on PyPi

Currently I have to go through a hassle to install this dependency for my project. When installed on PyPi, this package becomes more accessible.

Create a pip package

Dear authors

Congratulations. The package is very good.
My research group is using the scikit-feature inside other projects and we would like to know if is possible to generate a pypi package.

No module named 'entropy_estimators'

I was trying to use CFS.py with python3. It gave me the following error:

File "/usr/local/lib/python3.5/dist-packages/skfeature/function/statistical_based/CFS.py", line 2, in <module>
    from skfeature.utility.mutual_information import su_calculation
  File "/usr/local/lib/python3.5/dist-packages/skfeature/utility/mutual_information.py", line 1, in <module>
    import entropy_estimators as ee
ImportError: No module named 'entropy_estimators'

Please tell me what should I do to solve this problem. Am I missing something or doing something wrong?
Thanks.

is K-L refers to Kozachenko-Leonenko k-nearest neighbour estimator used to estimate the entropy

K -L used in
https://github.com/jundongl/scikit-feature/blob/master/skfeature/utility/entropy_estimators.py
def entropy(x, k=3, base=2):
"""
The classic K-L k-nearest neighbor continuous entropy estimator x should be a list of vectors,
e.g. x = [[1.3],[3.7],[5.1],[2.4]] if x is a one-dimensional scalar and we have four samples
"""

is it
Kozachenko-Leonenko k-nearest neighbour estimator used to estimate the entropy

https://stackoverflow.com/questions/43265770/entropy-python-implementation

Please add Python 3 support

A large portion of the Python user base doesn't use Python 2 any more, and therefore can't use a package that doesn't support Python 3. Please add Python 3 support; it shouldn't take too much extra work.

TypeError: transpose() takes exactly 1 argument (2 given)

Hello:

I am facing the below error while using the unsupervised feature selection using Laplacian Score.

Traceback (most recent call last):
File "testfs.py", line 13, in
score = lap_score.lap_score(frame, W=W)
File "/usr/local/lib/python2.7/dist-packages/skfeature/function/similarity_based/lap_score.py", line 42, in lap_score
Xt = np.transpose(X)
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 534, in transpose
return transpose(axes)
TypeError: transpose() takes exactly 1 argument (2 given)

Thanks
Sayantan Guha

Issue with UDFS; MCFS

Hi,
when I try to run the code for UDFS and MCFS, I get the following issue:

image

Kindly help me on this.

Get " sparse matrix length is ambiguous; use getnnz() or shape[0]" error!

corpus, categories = get_detail_content_category()

vectorizer = TfidfVectorizer(max_df=1.0, max_features=6000, min_df=1,
                             stop_words=get_cn_stopwords(),
                             encoding='utf-8', decode_error='ignore',
                             analyzer='word', tokenizer=cn_tokenize)

X = vectorizer.fit_transform(corpus)
y = categories

idx = ICAP.icap(X, y, n_selected_features=1000)
selected_X = X[:, idx[0:1000]]

After I run this code, i get error like title. I don't why, any help is appreciable.

'function' object has no attribute 'entropyd'


AttributeError Traceback (most recent call last)
in ()
----> 1 mod(0,1,2,0,1)

in mod(cross, smote, norstd, model, graph)
96 if cross==0:
97 X2=stdselector(X,norstd)
---> 98 X3=fcb(X2,Y)
99 X_train, X_test, Y_train, Y_test=nocross(X3,Y)
100 print(X2.shape)

in fcb(X_train, Y_train)
88 print("10foldcrossvalidation mean SPECIFICITY",np.mean(spe))
89 def fcb(X_train,Y_train):
---> 90 idx =CFS.cfs(X_train,Y_train)
91 features = X[:, idx[0:num_fea]]
92 print(features)

c:\python\lib\site-packages\skfeature\function\statistical_based\CFS.py in cfs(X, y)
70 F.append(i)
71 # calculate the merit of current selected features
---> 72 t = merit_calculation(X[:, F], y)
73 if t > merit:
74 merit = t

c:\python\lib\site-packages\skfeature\function\statistical_based\CFS.py in merit_calculation(X, y)
28 for i in range(n_features):
29 fi = X[:, i]
---> 30 rcf += su_calculation(fi, y)
31 for j in range(n_features):
32 if j > i:

c:\python\lib\site-packages\skfeature\utility\mutual_information.py in su_calculation(f1, f2)
57 # calculate information gain of f1 and f2, t1 = ig(f1,f2)
58 t1 = information_gain(f1, f2)
---> 59 # calculate entropy of f1, t2 = H(f1)
60 t2 = ee.entropyd(f1)
61 # calculate entropy of f2, t3 = H(f2)

c:\python\lib\site-packages\skfeature\utility\mutual_information.py in information_gain(f1, f2)
17
18 ig = ee.entropyd(f1) - conditional_entropy(f1, f2)
---> 19 return ig
20
21

AttributeError: 'function' object has no attribute 'entropyd'

Examples are not compatible with scikit-learn 0.20.4

I have been trying to execute the examples in source code (in particular http://featureselection.asu.edu) and I am struggling with which scikit-learn version to use.

By default (if no particular version is specified) pip download scikit-learn version 0.20.4. This version yields the following error:

ImportError: cannot import name cross_validation

I have tried manually installing older versions, but got different errors.

Versions 0.10 and 0.12 yield

ImportError: cannot import name accuracy_score

Version 0.15 yields

ImportError: No module named skfeature.function.similarity_based

Could you provide which version of scikit-learn, numpy and scipy should be installed to execute the examples?

Anyway, I am able to import the algorithms from skfeature.function.*, the issue is on running the examples.

Thank you.


System configuration:
Python 2.7.17 (Anaconda)
Numpy 1.16.4
SciPy 1.2.3

Sample datasets

Can we include sample datasets to have a proper testing suite?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.