Giter Club home page Giter Club logo

spherecluster's People

Contributors

cv3d avatar jasonlaska avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spherecluster's Issues

AttributeError: 'SphericalKMeans' object has no attribute '_check_fit_data'

I've been getting a
AttributeError: 'SphericalKMeans' object has no attribute '_check_fit_data'

error every time I try to run a SphericalKMeans fit. Diving a little further, it looks like the _check_fit_data method only exists in two locations in the repository...

Line 318 in spherical_kmeans.py (the line that throws the error in this case).

Line 772 in von_mises_fisher_mixture.py where it seems to be a defined method.

Based on what I can see from the imports, it looks like the _check_fit_data doesn't actually exist in the context of spherical_kmeans.py, so the error kind of makes sense.

Could this be the result of some accidental deletions? I went through the commit history and couldn't find anything that immediately seemed like the issue. Or is there something very obvious that I'm missing... wouldn't be the first time :)

Also as an FYI, I'm running Python 3.6.4.

Source install fails due exceptions in setup.py

Hello,

I stumbled over following issue.

When installing a package with spherecluster dependency and spherecluster is installed from source distribution (.tgz) then it fails with exception numpy is required during installation raised from setup.py#12 even when package has correctly numpy (and scipy) dependencies listed.

How to reproduce:

  1. make a toy setup.py

from setuptools import setup

setup(
    name="spherecluster_test",
    version="1.0.0",
    install_requires=["numpy", "scipy", "spherecluster"]
)
  1. build it to wheel
python setup.py bdist_wheel
  1. try to install it with --no-binary option to force spherecluser source distribution:
pip install --no-binary "spherecluster" dist/spherecluster_test-1.0.0-py3-none-any.whl

Processing ./dist/spherecluster_test-1.0.0-py3-none-any.whl
Collecting scipy (from spherecluster-test==1.0.0)
  Using cached https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl
Collecting numpy (from spherecluster-test==1.0.0)
  Using cached https://files.pythonhosted.org/packages/16/21/2e88568c134cc3c8d22af290865e2abbd86efa58a1358ffcb19b6c74f9a3/numpy-1.15.3-cp36-cp36m-manylinux1_x86_64.whl
Collecting spherecluster (from spherecluster-test==1.0.0)
  Using cached https://files.pythonhosted.org/packages/27/27/614b9e568e9a9a8d46938310b7caf092657343bf037b9fae416baf611d06/spherecluster-0.1.6.tar.gz
    Complete output from command python setup.py egg_info:
    numpy is required during installation

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-gtqjhbjz/spherecluster/

I suggest to remove following lines from setup.py#L9-L19.

try:
    import numpy  # NOQA
except ImportError:
    print('numpy is required during installation')
    sys.exit(1)

try:
    import scipy  # NOQA
except ImportError:
    print('scipy is required during installation')
    sys.exit(1)

a question about updating centroids

hi ~ I have a question about updating centroids in your code as follows:

        # computation of the means
        if sp.issparse(X):
            centers = _k_means._centers_sparse(X, labels, n_clusters,
                                               distances)
        else:
            centers = _k_means._centers_dense(X, labels, n_clusters, distances)

        # l2-normalize centers (this is the main contibution here)
        centers = normalize(centers)

When using cosine similarity in clustering, if you just normalize the centers calculated with _k_means._centers_XXX, which were designed to update centers when using eu distance, won't the result centers have different directions from what they should be?
Hope I've describe my question clearly and looking forward to your reply~ Thanks~

Initialization is using euclidean distance

centers = _init_centroids(
X, n_clusters, init, random_state=random_state, x_squared_norms=x_squared_norms
)
if verbose:
print("Initialization complete")

I might be getting this wrong, but the code here seems to be using initialization function from sklearn. This could cause issue since the kmeans++ initialization in sklearn is based on euclidean distance. It should be replaced with cosine distance.

Cannot import spherecluster with scikit-learn 1.0.2: sklearn.cluster.k_means_ has been renamed

The traceback is

Traceback (most recent call last):
File "//python3.9/site-packages/IPython/core/interactiveshell.py", line 3457, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
import spherecluster
File "/snap/pycharm-community/267/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "//python3.9/site-packages/spherecluster/init.py", line 2, in
from .spherical_kmeans import SphericalKMeans
File "/snap/pycharm-community/267/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self.system_import(name, *args, **kwargs)
File "//python3.9/site-packages/spherecluster/spherical_kmeans.py", line 7, in
from sklearn.cluster.k_means
import (
File "/snap/pycharm-community/267/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self.system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'sklearn.cluster.k_means
'

The problem is that 'sklearn.cluster.k_means_' has been renamed to 'sklearn.cluster._kmeans' in some intermediate scikit-learn version.

ImportError: cannot import name '_k_means'

Hi. I install spherecluster usingpip install spherecluster successfully in Ubuntu 18.04. But when I call from spherecluster import SphericalKMeans, I got an ImportError.

Traceback (most recent call last):
  File "LM/vectors_cluster.py", line 9, in <module>
    from spherecluster import SphericalKMeans
  File "/path_to_anaconda/lib/python3.6/site-packages/spherecluster/__init__.py", line 2, in <module>
    from .spherical_kmeans import SphericalKMeans
  File "/path_to_anaconda/lib/python3.6/site-packages/spherecluster/spherical_kmeans.py", line 16, in <module>
    from sklearn.cluster import _k_means
ImportError: cannot import name '_k_means'

Here is my environmental information:

Package Version
numpy 1.14.3
scipy 1.1.0
scikit-learn 0.22.2.post1
pytest 3.5.1
nose 1.3.7
joblib 0.14.1
spherecluster 0.1.7

If anyone can help me, I would really appreciate it!

TypeError: Expected sequence or array-like, got <class 'NoneType'>

self.clus.fit(self.data)
File "D:\工程\知识图谱自动构建\AutoBuild\venv\lib\site-packages\spherecluster\spherical_kmeans.py", line 363, in fit
return_n_iter=True,
File "D:\工程\知识图谱自动构建\AutoBuild\venv\lib\site-packages\spherecluster\spherical_kmeans.py", line 189, in spherical_k_means
random_state=random_state,
File "D:\工程\知识图谱自动构建\AutoBuild\venv\lib\site-packages\spherecluster\spherical_kmeans.py", line 39, in _spherical_kmeans_single_lloyd
sample_weight = _check_sample_weight(X, sample_weight)
File "D:\工程\知识图谱自动构建\AutoBuild\venv\lib\site-packages\sklearn\utils\validation.py", line 1215, in _check_sample_weight
n_samples = _num_samples(X)
File "D:\工程\知识图谱自动构建\AutoBuild\venv\lib\site-packages\sklearn\utils\validation.py", line 147, in _num_samples
raise TypeError(message)
TypeError: Expected sequence or array-like, got <class 'NoneType'>

hi,the params of " _check_sample_weight(sample_weight, X, dtype=None):" defined by sklearn is "sample_weight,X" ,but in spherical_kmeans(line 39) call this fuction : "_check_sample_weight(X, sample_weight)" . does the order of params lead to this?

How to use spherecluster in Jupyter notebook

Hi Jason,
I am trying to use the package 'spherecluster' in Jupyter notebook, but I encounter the following message:
ModuleNotFoundError: No module named 'spherecluster'
after the command "import spherecluster"
I have installed the package through the Windows command window without problem.
Could you give me an insight of what I have done wrong?
Thank you.
Dmitry

SphericalKMeans does not converge

Hi,
for some reason SphericalKMeans doesn't find any valid centroids. On my data. The same happens with randomly generated data, which clearly has clusters.

Running the following code always leads to the cluster centers [1. 1. 1.]:

from scipy.stats import vonmises
import numpy as np
from spherecluster import SphericalKMeans

ang = vonmises.rvs(25, loc=1, size=100)
ang = np.hstack((ang, vonmises.rvs(25, loc=3, size=100)))
ang = np.hstack((ang, vonmises.rvs(25, loc=5, size=100)))

skm = SphericalKMeans(n_clusters=3)
skm.fit(ang.reshape((-1, 1)))
print skm.cluster_centers_

Python 2.7.12, spherecluster 0.1.2

Am I missing something?

Mistake in sample_vMF

In the function _sample_weight, b is wrongly calculated as b = dim / (np.sqrt(4. * kappa ** 2 + dim ** 2) + 2 * kappa).
The reference material (eq 4 in Wood 1994) has it as b = (-2 * kappa + sqrt(4 * kappa ** 2 + dim ** 2)) / dim

ValueError: Data l2-norm must be 1, found 0.0

Thank you for your great source code.
While I using soft von_mises_fisher_mixture, I got this error. The error only happen with 1000 short documents, it run well with small amount. Below is the full error log.

Could you please show me how to fix it? Thank you so much

File "mvmf_document_clustering.py", line 65, in <module>
    vmf_soft.fit(X)
  File "C:\Users\phuocphan\miniconda3\envs\Py36\lib\site-packages\spherecluster\von_mises_fisher_mixture.py", line 826, in fit
    X = self._check_fit_data(X)
  File "C:\Users\phuocphan\miniconda3\envs\Py36\lib\site-packages\spherecluster\von_mises_fisher_mixture.py", line 789, in _check_fit_data
    raise ValueError("Data l2-norm must be 1, found {}".format(n))
ValueError: Data l2-norm must be 1, found 0.0

Question about sample_vMF

Hi,
@jasonlaska Thanks for your codes! I learned a lot from the codes!

Recently I met a problem: when I used 'VonMisesFisherMixture' to estimate a distribution of sequence data, and then used 'sample_vMF' to produce some pseudo samples, I found all pseudo samples have the same trend as the real ones, but the Y value is always smaller than the real samples.

Later, I created a list of 10 sequences, all with the value of [0.95 , 0.9, 0.85, 0.8, ..., 0.05, 0]. In my opinion, when I produced pseudo samples from vmf distribution, I should get the same sequence. However, I got [3.82291703e-01, 3.62182865e-01, 3.42056000e-01, 3.21941382e-01, 3.01821386e-01, 2.81705993e-01, 2.61568550e-01, 2.41452144e-01, 2.21317166e-01, 2.01214795e-01, 1.81081612e-01, 1.60967944e-01, 1.40855783e-01, 1.20740353e-01, 1.00600080e-01, 8.04905383e-02, 6.03600387e-02, 4.02586200e-02, 2.01175857e-02, -8.79339154e-06], which is still a straight line, but each value becomes smaller (that is, 0.95 -> 0.382). Why is that? How to solve this problem?

Thank you!

VMF scaling denominator was inf

Hi
I tried to run the soft version, and this error appears 'VMF scaling denominator was inf'. what could be the reasons for that.
Another question is there any way to estimate the number of the clusters.

Thanks

Strict Matplotlib requirement prevents installation on python 3.6

Hello,

Thanks for making this package, it is really useful!

I am using python 3.6 on OS X 10.12.6 and I tried to install the package through pip. The installation fails while trying to install the matplotlob version as specified in requirements.txt file. This is, I think, a known issue for matplotlib which was fixed in later versions (see issues like matplotlib/matplotlib#3889). I have successfully installed matplotlib version 2.0.2 in my environment.

I think that there are 3 ways of solving this issue.

  1. Bump the version of matplotlib this package depends on
  2. Remove the strict dependency (==) requirement
  3. Remove the matplotlib dependency altogether. From what I understand, matplotlib is only used in the examples and is thus not needed for the packaged version of the library (similar to seaborn and tabulate)

Spherical K-Means is producing different results each run even when fixing `random_state` at an integer

It seems to me that fixing the random_state parameter at an integer when calling SphericalKMeans constructor is still seeding a numpy.random.RandomState pseudo-random number generator and thus not making Spherical K-Means completely deterministic. The consequent call to _k_init provides it with a random_state of the type RandomState instance instead of int, which preserves the randomness as per the routine's documentation:

    random_state : int, RandomState instance
        The generator used to initialize the centers. Use an int to make the
        randomness deterministic.

As a result, I get different close results across runs.

I want the algorithm to be deterministic for the sake of research, am I missing something in that regard?

P.S. I am initialising the algorithm with k-means++.

pip package not up to date?

Hi

First of all, thank you for sharing this package!

I'm installing spherecluster with pip and had to manually edit spherical_kmeans.py to fix the import of _k_means (I changed it to from sklearn.cluster import _k_means_fast as _k_means).

I can see this change has already been made in the repo.

Maybe the pip package isn't up to date?

Best
Mehdi

Add `normalize=True` parameter

Add normalize=True parameter that normalizes data (optional to user) to both classes so that check_estimator can be applied in tests.

installation: no module spherical kmeans

I'm using anaconda, installing in the environment where i have python3.4 version.
I tried downloading via pip and via python setup.py install in this environment and also in a global, python2.7 environment, each time uninstalling everything and trying again. No luck. Everywhere I get a no-module "spherical_kmeans" message. What can be done here? Thanks a lot!

1
2
3

Using Spherical clustering for Mini-Batch K-Means

Thank you for sharing this great package,
I wanted to experiment with K-Means on a big enough data set that it would require a Mini-Batch version of K-Means,
Do you have any direction for me to follow on and extend your implementation to Mini-Batch?

Error in _sample_orthonormal_to

In line 55 in spherecluster/util.py, in the _sample_orthonormal_to function, it reads:

proj_mu_v = mu * np.dot(mu, v) / np.linalg.norm(mu)

Shouldn't it instead be:

proj_mu_v = mu * np.dot(mu, v) / np.linalg.norm(mu)**2

If norm(mu)=1 then it doesn't make a difference, but otherwise they are quite different.

black dependency breaks python3.5 install

black is only available for python3.6 and higher; its presence in requirements.txt makes installation of this package on python3.5 (under, for me, Ubuntu 16.04 LTS) fail.

Since black doesn't seem to be used by the package (just in the dev process?) maybe it can be removed from requirements.txt? Doing so makes local installation work fine for me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.