Giter Club home page Giter Club logo

Comments (11)

lmcinnes avatar lmcinnes commented on April 28, 2024 2

I wanted to provide an update on this -- I know you have moved on to other issues, but just in case you run into another project where UMAP might be useful...

I believe this issue has finally been resolved. It proved remarkably tricky to get to the bottom of, but was, in the end, a code bug in the SGD optimization of the layout. The latest master branch (v0.2.0+) has a new SGD optimization layout algorithm that does not encounter this issue and should produce much better looking embeddings for large datasets.

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

Hi, thanks for the report. You are correct that this is likely a version issue, well spotted. I should fix that up (my fault for only looking at the latest docs and going with RecursionError). Of course the other issue is that you are getting such a thing at all, but then with 1.7 million data points that might just be possible if they don't split too nicely. Still, expected is something like a depth of 21, so something is going seriously astray. If you can share the data I would be interested in trying to figure out what is going wrong exactly.

from umap.

nicolerg avatar nicolerg commented on April 28, 2024

Now that I've got it running in 3.5, I got a much more detailed error:

`Traceback (most recent call last):
File "umapping.py", line 25, in
u = umap.UMAP(metric="correlation", n_neighbors=25).fit_transform(data)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/umap/umap_.py", line 1573, in fit_transform
self.fit(X)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/umap/umap_.py", line 1534, in fit
self.verbose
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/umap/umap_.py", line 553, in rptree_leaf_array
angular=angular)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/umap/umap_.py", line 359, in make_tree
rng_state)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/dispatcher.py", line 330, in _compile_for_args
raise e
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/dispatcher.py", line 307, in _compile_for_args
return self.compile(tuple(argtypes))
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/dispatcher.py", line 579, in compile
cres = self._compiler.compile(args, return_type)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/dispatcher.py", line 80, in compile
flags=flags, locals=self.locals)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/compiler.py", line 763, in compile_extra
return pipeline.compile_extra(func)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/compiler.py", line 360, in compile_extra
return self._compile_bytecode()
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/compiler.py", line 722, in _compile_bytecode
return self._compile_core()
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/compiler.py", line 709, in _compile_core
res = pm.run(self.status)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/compiler.py", line 246, in run
raise patched_exception
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/compiler.py", line 238, in run
stage()
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/compiler.py", line 452, in stage_nopython_frontend
self.locals)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/compiler.py", line 865, in type_inference_stage
infer.propagate()
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/typeinfer.py", line 844, in propagate
raise errors[0]
numba.errors.TypingError: Failed at nopython (nopython frontend)
Internal error at <numba.typeinfer.ArgConstraint object at 0x7f5325955b00>:
--%<-----------------------------------------------------------------
Traceback (most recent call last):
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/errors.py", line 259, in new_error_context
yield
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/typeinfer.py", line 189, in call
assert ty.is_precise()
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/typeinfer.py", line 137, in propagate
constraint(typeinfer)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/typeinfer.py", line 190, in call
typeinfer.add_type(self.dst, ty, loc=self.loc)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/errors.py", line 265, in new_error_context
six.reraise(type(newerr), newerr, sys.exc_info()[2])
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/six.py", line 658, in reraise
raise value.with_traceback(tb)
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/errors.py", line 259, in new_error_context
yield
File "/users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/numba/typeinfer.py", line 189, in call
assert ty.is_precise()
numba.errors.InternalError:
[1] During: typing of argument at /users/nicolerg/anaconda2/envs/py35/lib/python3.5/site-packages/umap/umap_.py (248)
--%<-----------------------------------------------------------------

File "../../../../anaconda2/envs/py35/lib/python3.5/site-packages/umap/umap_.py", line 248

This error may have been caused by the following argument(s):

  • argument 0: cannot determine Numba type of <class 'pandas.core.frame.DataFrame'>`

I don't think I'm allowed to share the data, so I'll keep troubleshooting and let you know if I figure anything out. I'd appreciate if you have any insight from this more detailed error.

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

That's fine, data is often not shareable.

This new error is a pre-error that is again my fault. There is no array checking in place as scikit-learn does yet, which means you can't pass in a pandas dataframe (at least not yet).. To get past this you shoudl be able to do

u = umap.UMAP(metric="correlation", n_neighbors=25).fit_transform(data.values)

but that may land you back in recursion error land. You can probably just change that by hand yourself in the code if you need to -- I won't be able to get a chance to test a commit a fix for a little while (a PR would be very welcome if you do get the fix sorted).

I'll also try and add in scikit-learn's checking to make this work better.

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

After looking at options I went with the simplest possible thing I could do. Unfortunately I have had issues replicating the RecursionError so I admit my testing that this will actually fix the issue is quite inadequate. I would greatly appreciate it if, when you have available time, you could pull from master, rebuild and reinstall and see if this fixes the problem for you. Thanks again.

from umap.

nicolerg avatar nicolerg commented on April 28, 2024

The recursion error went away as soon as I changed the input from a data frame to an array (I didn't pull from master to get this result). The job does finish now, granted it takes some time, but the first result I got is really strange. This is with metric="correlation" and n_neighbors=25.
roadcode-all-umap

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

The correlation metric actually has some issues (which I only very recently debugged). Try cosine instead and see if that helps. It certainly is an odd result, but may look stranger due to overplotting. Consider setting the s value on the scatterplot to 1, or something similar.

from umap.

KeithTheEE avatar KeithTheEE commented on April 28, 2024

I might know something about that weird result. I haven't checked in with the latest version of the code for a while so I might be out of date here, but here goes.

Briefly: Those results pop up when the learning rate is 'bad' (there could be other causes, this is the cause I know about). Playing with the inputs a and b could help that.

In more depth: the program determines a learning rate (or embedding adjustment rate) based on a few values. The user has control over those values to various degrees (layers of abstraction). Using the defaults, the program determines this off of the spread and min_dist values, defaults being 1.0 and 0.1 respectively. From there it passes those values to a function which solves for a and b using a curve fit on an exponential decay. Using the default spread and min_dist, the fitted a and b values are about 1.577 and 0.895 respectively. After that those values--combined with the alpha and gamma parameter (defaults being 1.0 and 1.0) and the embedding data--are used in optimize_layout to determine grad_coeff and grad_d which impacts the embedding adjustments. If you need to use correlation, playing around with those values might help resolve a signal from the noise.
(also please correct me if I'm misunderstanding the implications of the code)

from umap.

nicolerg avatar nicolerg commented on April 28, 2024

roadcode-all-umap-10n-cosine

Above is the result for using method "cosine." Looks similar to "correlation." (Groups are NOT labeled - color is random).

I ran t-SNE with a subset of 100k data points, and this is what it looks like (groups ARE labelled; color is not random).

100k-tsne-single

Perhaps UMAP would look similar if I only chose a subset of 100k data points? This update is for your benefit. I've moved on from this point.

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

Thanks for the update, it does look like something is going astray. It may be related to issue #32 . It may also relate to hubness issues. I'm really not sure, but it is something I will continue to look into. I greatly appreciate the feedback, and am sorry that UMAP didn't seem to work for you. Hopefully I'll have some fixes in the future, so please don't write off the algorithm entirely just yet.

from umap.

sleighsoft avatar sleighsoft commented on April 28, 2024

Looks resolved. Closing.

from umap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.