danilkolikov / fsfc Goto Github PK

View Code? Open in Web Editor NEW

87.0 87.0 27.0 2.02 MB

Feature Selection for Clustering

License: MIT License

Makefile 0.18% Python 99.82%

clustering feature-selection machine-learning

fsfc's People

Stargazers

Watchers

fsfc's Issues

Entropy based Feature Selection for Clustering method

Hi,

Great package, I hope that you continue to develop it. I don't see this paper included. Have I overlooked it? Any plans to implement it?

http://www.public.asu.edu/~huanliu/papers/pakdd00clu.pdf

Best,

Andrew

Installation with pip

Hey,

since your project has the right setup, one can indeed install it over pip using following command:

pip install git+https://github.com/danilkolikov/fsfc.git

This also installs all the requirements. So there is no need for the complicated solution!

Kind regards,
Alexandra

Thanks for this library!
There is an error when importing calinski_harabaz_score in fsfc/fsfc/__test__/AlgorithmTest.py, line 4.
calinski_harabasz_score has a spelling error. There is a missing 's' . It should be calinski_harabasz_score
This also needs to be changed on line 112 and 114.
I am using scikit-learn version 0.24.1

Entropy based Feature Selection for Clustering method

Hi,

Great package, I hope that you continue to develop it. I don't see this paper included. Have I overlooked it? Any plans to implement it?

http://www.public.asu.edu/~huanliu/papers/pakdd00clu.pdf

Best,

Andrew

Trying to extract relevant features after clustering

Hi,
I am using this library as part of my thesis project, to extract relevant features for my Multitask learning model, and to prevent negative gradient flow.

I have followed the steps as mentioned in the github page. Attaching the code, below

from fsfc.generic import NormalizedCut
from sklearn.pipeline import Pipeline
from sklearn.cluster import KMeans

X = dt.to_numpy()
pipeline = Pipeline([
('select', NormalizedCut(3)),
('cluster', KMeans())
])
pipeline.fit_predict(X)

Attaching the error below

MemoryError Traceback (most recent call last)
in
3 ('cluster', KMeans())
4 ])
----> 5 pipeline.fit_predict(X)

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\metaestimators.py in (*args, **kwargs)
118
119 # lambda, but not partial, allows help() to work with update_wrapper
--> 120 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
121 # update the docstring of the returned function
122 update_wrapper(out, self.fn)

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py in fit_predict(self, X, y, **fit_params)
447 """
448 fit_params_steps = self._check_fit_params(**fit_params)
--> 449 Xt = self._fit(X, y, **fit_params_steps)
450
451 fit_params_last_step = fit_params_steps[self.steps[-1][0]]

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py in _fit(self, X, y, **fit_params_steps)
305 message_clsname='Pipeline',
306 message=self._log_message(step_idx),
--> 307 **fit_params_steps[name])
308 # Replace the transformer of the step with the fitted
309 # transformer. This is necessary when loading the transformer

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\joblib\memory.py in call(self, *args, **kwargs)
350
351 def call(self, *args, **kwargs):
--> 352 return self.func(*args, **kwargs)
353
354 def call_and_shelve(self, *args, **kwargs):

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
752 with _print_elapsed_time(message_clsname, message):
753 if hasattr(transformer, 'fit_transform'):
--> 754 res = transformer.fit_transform(X, y, **fit_params)
755 else:
756 res = transformer.fit(X, y, **fit_params).transform(X)

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
697 if y is None:
698 # fit method of arity 1 (unsupervised transformation)
--> 699 return self.fit(X, **fit_params).transform(X)
700 else:
701 # fit method of arity 2 (supervised transformation)

~\Desktop\Master Thesis\DataSet\fsfc\base.py in fit(self, x, *rest)
70
71 def fit(self, x, *rest):
---> 72 self.scores = self._calc_scores(x)
73 return self
74

~\Desktop\Master Thesis\DataSet\fsfc\generic\SPEC.py in _calc_scores(self, x)
42
43 def _calc_scores(self, x):
---> 44 similarity = rbf_kernel(x)
45 adjacency = similarity
46 degree_vector = np.sum(adjacency, 1)

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\pairwise.py in rbf_kernel(X, Y, gamma)
1103 gamma = 1.0 / X.shape[1]
1104
-> 1105 K = euclidean_distances(X, Y, squared=True)
1106 K *= -gamma
1107 np.exp(K, K) # exponentiate K in-place

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\pairwise.py in euclidean_distances(X, Y, Y_norm_squared, squared, X_norm_squared)
311 else:
312 # if dtype is already float64, no need to chunk and upcast
--> 313 distances = - 2 * safe_sparse_dot(X, Y.T, dense_output=True)
314 distances += XX
315 distances += YY

c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\extmath.py in safe_sparse_dot(a, b, dense_output)
150 ret = np.dot(a, b)
151 else:
--> 152 ret = a @ b
153
154 if (sparse.issparse(a) and sparse.issparse(b)

MemoryError: Unable to allocate 1.44 TiB for an array with shape (444234, 444234) and data type float64

Require to install sklearn 22.1 and anyway dose not works

I was trying to use your code but unfortunately i find some problems

ModuleNotFoundError Traceback (most recent call last)
in
----> 1 from sklearn.feature_selection.base import SelectorMixin
ModuleNotFoundError: No module named 'sklearn.feature_selection.base'

for resolve that i have installed scikit-learn==22.1 but anyway

ImportError Traceback (most recent call last)
in
----> 1 from sklearn.feature_selection.base import SelectorMixin
~/anaconda3/envs/python3/lib/python3.6/site-packages/sklearn/feature_selection/base.py in
4 from . import _base
5 from ..externals._pep562 import Pep562
----> 6 from ..utils.deprecation import _raise_dep_warning_if_not_pytest
7
8 deprecated_path = 'sklearn.feature_selection.base'
ImportError: cannot import name '_raise_dep_warning_if_not_pytest

can you update the code to run to the latest and without error ?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

danilkolikov / fsfc Goto Github PK

fsfc's People

Stargazers

Watchers

Forkers

fsfc's Issues

Entropy based Feature Selection for Clustering method

Installation with pip

make test errors

Entropy based Feature Selection for Clustering method

Trying to extract relevant features after clustering

Require to install sklearn 22.1 and anyway dose not works

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent