Giter Club home page Giter Club logo

Comments (16)

lmcinnes avatar lmcinnes commented on April 28, 2024 2

I think the problem is that that should be sigmas[k] instead, but I'll have to trace through the whole thing to be sure.

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

Thanks for detailed bug report! This may be the same as issue #33. I believe the latest master version should have fixed this, but I haven't put out a release on pip yet (waiting to roll a few more bug fixes and features together). If you have time to clone the master branch from github and install it (after removing the old pip version) and try that instead I would appreciate it. I believe it should fix the issue you are seeing, and if it doesn't then I clearly have a little more work to do.

from umap.

duhaime avatar duhaime commented on April 28, 2024

@lmcinnes thanks for your quick response. It seems the problem has been resolved on master. I've run several dozen runs since reinstalling and haven't hit the division by zero error, so I think it should be safe to close this issue. Thanks again for this great work!

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

Thanks for checking that out so fast.

from umap.

100518832 avatar 100518832 commented on April 28, 2024

I am running to zero division errors every couple of runs, I am using CentOS, and spyder (py 3.6).

    runfile('/home/anaconda3/envs/ML/scripts/beta_projects/umap_test.py', wdir='/home/anaconda3/envs/ML/scripts/beta_projects')

  File "/home/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
    execfile(filename, namespace)

  File "/home/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/home/anaconda3/envs/ML/scripts/beta_projects/umap_test.py", line 83, in <module>
    embedding = umap.UMAP(n_neighbors = ideal_model['n_neighbors'], min_dist = ideal_model['min_dist'], metric = ideal_model['metric_iso'], n_components = 3).fit_transform(input_data)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 1402, in fit_transform
    self.fit(X)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 1361, in fit
    self.verbose

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 385, in rptree_leaf_array
    angular=angular)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
    angular)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 315, in make_tree
    angular)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
    angular)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
    angular)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
    angular)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
    angular)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 315, in make_tree
    angular)

  File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 301, in make_tree
    rng_state)

ZeroDivisionError: division by zero ```

from umap.

duhaime avatar duhaime commented on April 28, 2024

@100518832 did you git clone from master then run python setup.py install inside the directory to install the library? The latest release on PYPI (0.1.5) still has this problem I believe, so if you pip installed you may be affected, but if you run the setup.py install this should clear up...

from umap.

100518832 avatar 100518832 commented on April 28, 2024

@duhaime, I do not recall running the python setup.py install; however, I will try this. Also, I have noticed that I only get a zero division error with the following three argument parameters.

Embedding Failed - info: 15, 0.01, correlation
Embedding Failed - info: 15, 0.05, correlation
Embedding Failed - info: 20, 0.01, correlation

Here are all the parameters I am currently testing through

n_neighbors = [15,20,40]
min_dist = [0.01,0.05]
metric_iso = ['correlation','euclidean','manhattan']

from umap.

duhaime avatar duhaime commented on April 28, 2024

@100518832 I'd try installing from the repo's setup.py file, as the fixes for the div by zero problem are on master but not Pypi...

from umap.

paxtonfitzpatrick avatar paxtonfitzpatrick commented on April 28, 2024

I'm now getting this error with the inverse_transform method. I'm using the code in the repo's master branch, at the 0.4rc1 tag (fc59aa7)

Code:

import numpy as np
from umap import UMAP

to_reduce = np.vstack(data)        # shape: (1137, 25); dtype: 'float64')

np.random.seed(0)
reducer = UMAP(random_state=0).fit(to_reduce)
embeddings = reducer.transform(to_reduce)

# create a 2D grid over the embedding space
resolution = 50
x_min, y_min = embeddings.min(axis=0) // 1
x_max, y_max = embeddings.max(axis=0) // 1 + 1
x_step = (x_max - x_min) / resolution
y_step = (y_max - y_min) / resolution
xs = np.arange(x_min, x_max, x_step)
ys = np.arange(y_min, y_max, y_step)

X, Y = np.meshgrid(xs, ys)
xy_grid = np.empty((resolution, resolution, 2), dtype=np.float64)
for (x_ix, y_ix), X_val in np.ndenumerate(X):
    xy_grid[x_ix, y_ix] = (X_val, Y[x_ix, y_ix])

# recover vector in original space for each gridpoint
vertices = xy_grid.reshape(resolution**2, 2)
np.random.seed(0)
high_dim_vertices = reducer.inverse_transform(vertices)
high_dim_grid = high_dim_vertices.reshape(resolution, resolution, 25)

Traceback:

ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-127-93accc0c2718> in <module>
      1 # recover vector in original space for each gridpoint
      2 np.random.seed(0)
----> 3 high_dim_vertices = reducer.inverse_transform(vertices)
      4 high_dim_grid = high_dim_vertices.reshape(resolution, resolution, 25)

/opt/conda/lib/python3.7/site-packages/umap/umap_.py in inverse_transform(self, X)
   2203             _input_distance_func,
   2204             tuple(self._metric_kwds.values()),
-> 2205             verbose=self.verbose,
   2206         )
   2207 

ZeroDivisionError: division by zero

At first I thought it was related to #33 because I can get the error to go away by setting min_dist sufficiently high (~.5), which makes the 2D grid points spread out enough that they wouldn't get rounded off to the same value as float32's. But now I'm not so sure, because I still get the error if I do
vertices += np.random.uniform(-10, 10, vertices.shape)
before inverse_transforming them. Any help would be greatly appreciated!!

Additional info:

numpy==1.16.5
sklearn==0.21.3
scipy==1.2.1
numba==0.45.1
tbb==2020.0.133

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

This is somewhat disconcerting, and it is non-obvious to me where exactly this could be occurring. One thing you could try, presuming the compute isn't too expensive, is turning of the numba compilation of the inverse transform related functions and running again -- that way we can get a stack trace to the exact location of the error.

from umap.

paxtonfitzpatrick avatar paxtonfitzpatrick commented on April 28, 2024

So the actual error is coming from here, in optimize_layout_inverse. At least for me, it's happening when more data points are inverse transformed than were used to fit the model. To compute grad_coeff, the jth item from sigmas is used. But is sigmas is the distance to the kth nearest neighbor for each data point used to fit the model, so it has self._raw_data.shape[0] entries. Meanwhile j is row indices of the adjacency matrix for the 1-skeleton of the inverse_transformed data.

Traceback is kinda weird-looking since I ran it through the Pycharm debugger, but here it is:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 2060, in <module>
    main()
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 2054, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1405, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1412, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/paxtonfitzpatrick/Documents/Dartmouth/CDL/umap/debugging.py", line 79, in <module>
    topic_space_grid = reducer.inverse_transform(vertices)
  File "/Users/paxtonfitzpatrick/Documents/Dartmouth/CDL/umap/umap/umap_.py", line 2270, in inverse_transform
    verbose=self.verbose,
  File "/Users/paxtonfitzpatrick/Documents/Dartmouth/CDL/umap/umap/layouts.py", line 510, in optimize_layout_inverse
    grad_coeff = -(1 / (w_l * sigmas[j] + 1e-6))
IndexError: index 1172 is out of bounds for axis 0 with size 1137

from umap.

paxtonfitzpatrick avatar paxtonfitzpatrick commented on April 28, 2024

That did it! And that makes sense based on how grad_coeff is computed just below that... Thanks so much!

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

Can I assume that this will arrive with your upcoming PR, or should I fix it and push it myself?

from umap.

paxtonfitzpatrick avatar paxtonfitzpatrick commented on April 28, 2024

I changed it on my fork, so it'll get included in my PR. I'm ready to submit that today btw -- just want to confirm about what I commented on #367

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

Excellent -- I'm looking forward to it. Thanks for all your hard work on this!

from umap.

o1lo01ol1o avatar o1lo01ol1o commented on April 28, 2024

Any update on this? I have a dataset that seems to be throwing it when the size of the data is large.

from umap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.