Giter Club home page Giter Club logo

Comments (13)

cglopezespina avatar cglopezespina commented on April 28, 2024 1

Is trustworthiness a good method for selection of UMAP parameters. It is mentioned above, but I have not seen it in any other resource.

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

from umap.

DataWaveAnalytics avatar DataWaveAnalytics commented on April 28, 2024

Thanks for the hints, Leland. I will try to implement my version of the metrics for large datasets or, at least, a methodology to do it with what is available.

BTW, there is a new MNIST-like dataset called "fashion MNIST" if you want to test (released Aug. 2017). They argue we should move away from MNIST to test new algorithms (see the link for details).

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

from umap.

DataWaveAnalytics avatar DataWaveAnalytics commented on April 28, 2024

I got these visualizations of fashion mnist with t-SNE and LargeVis (trains+test=70000 using labels only for the colors). It looks like LargeVis looks better, but then when I evaluate both of them using a kNN classifier (with 10 runs for each training % and using the mean) t-SNE is a better embedding for this task.(Should I trust my eyes or the numbers?)

Do you have a visualization using umap that you can share?

fashion_mnist_tsne
fashion_mnist_largevis
k50_fashion_mnist
k100_fashion_mnist
k500_fashion_mnist

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

I can dig up a visualization I think. It is closer to LargeVis in appearance, but still a little different. As to what to trust; I think you have to trust both to some extent. t-SNE does do some thing right, and those curves do matter, so despite the clearly better appearance of LargeVis there seems to be something deceptive going on underneath it all.

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

Here's what UMAP did:

image

As I said, more similar to LargeVis. It is worth noting that UMAP has kept some of the groups together where LargeVis split them into multiple blobs (the royal blue category in your LargeVis plot, equivalent to the pale purple in the UMAP plot, for example). I wonder if that may effect the kNN-classifier accuracy?

I also find the banding of three classes that all three algorithms reproduced quite interesting; the fact that the three algorithms all generated it gives me confidence that it probably isn't an artifact of the reduction but actually a property of the data, but if so ... that's quite intriguing.

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

I tweaked the min_dist parameter (which defines how closely the embedding should pack points together in the embedded space) to compress things less (and hence resemble the t-SNE result more) and got this:

image
Still very similar (up to rotation) but less aggressive in separating clusters and showing a little more of the interconnected structure. I believe this would almost certainly embed a whole lot better in 3 or 4 dimensions.

from umap.

DataWaveAnalytics avatar DataWaveAnalytics commented on April 28, 2024

Thank you for sharing, UMAP is doing great (visually). I definitely need to study the details of your implementation. Are you planning to submit a preprint soon? (just trying to decide if I wait for your document or I should jump to implementation instead)

I believe a less aggressive separation would lead to better k-NN classifiers performance, but we should evaluate with trustworthiness and continuity anyway (or others like the scale-independent criteria).

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

I'm struggling to find time to short up all the math and get the preprint done (because I really want sound explanations of why things work, which means getting good explanations well hammered out). It will be a little while yet unfortunately. The code may be a little hard to follow, but check the numba branch as that has code that is, perhaps easier to wrap one's head around. The preprint will probably help rather a lot though. Thanks for the extra reminder that I really need to get to work on getting that done.

from umap.

lionely avatar lionely commented on April 28, 2024

Is it too much to ask for a code example displaying an implementation of "trustworthiness" and "continuity"? I'm trying to evaluate the quality of dimensionality reductions acquired from t-SNE.

Any help would be greatly appreciated!

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

There's something in https://github.com/lmcinnes/umap/blob/master/umap/validation.py and you can see https://github.com/scikit-learn/scikit-learn/blob/ccd3331f7eb3468ac96222dc5350e58c58ccba20/sklearn/manifold/t_sne.py#L394 for a (semi-canonical) implementation.

from umap.

hoangthienan95 avatar hoangthienan95 commented on April 28, 2024

Are "Trustworthiness" and "continuity" still the two best measures for evaluating the embedding? In validation.py, I see the parameter max_k in the function trustworthiness_vector. How do I choose this parameter?

Also, kind of related, you said there are some guidance on how many n_components we should choose. Any update on that? Without the metric above, I also don't know how to optimize for n_components and other parameters. TIA!

from umap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.