Giter Club home page Giter Club logo

Comments (11)

lmcinnes avatar lmcinnes commented on April 28, 2024

That would definitely be a bug. I believe I had the seed working such that it eliminated randomness -- that is the latest version of UMAP has a random_state parameter you can set on initialisation. You can set it with a number, or a numpy RandomState, wither way, you should be able to fix that to reproduce results.

Just setting numpy's random seed is not going to be enough because of interactions with numba and the fact that UMAP uses its own internal PRNG for speed. Can you clarify under what conditions you aren't getting repeatability?

from umap.

allenqm avatar allenqm commented on April 28, 2024

Hi thanks for responding. In this case I'm using the random_state parameter and setting it to 42:

embedding = umap.UMAP(n_neighbors=15,
                      min_dist=0.1,
                      n_components=2,
                      random_state=42,
                      metric='correlation', verbose=3).fit_transform(wordvectors) 
#wordvectors is an n_sample X n_dim numpy array of word vectors

I then plot the embedding like so:
pandas.DataFrame(embedding).plot(kind='scatter', x= 0, y=1, alpha=0.05).

The graph is get is different each time. I tried switching the axis but that doesn't explain the differences.

fyi I just pip installed umap.

Any thoughts would be helpful. Thanks!

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

Okay, that's definitely disconcerting because I worked through getting the random_state to work properly (which turned out to be frustratingly non-trivial) and for at least the test dataset I was working with it produced perfectly consistent results when fixed. I'll try a few other datasets to verify that it is indeed working for me at least, and then perhaps we can start trying to track down why it isn't working for you. Which python version are you using? That's potentially one reason for issues...

from umap.

vseledkin avatar vseledkin commented on April 28, 2024

nondeterminism probably comes from unstable result of metric_nn_descent function, I observe that some rows of returned knn_indices, knn_dists are not sorted according to knn distance (this may be a serious bug, not sure)

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

That is a bug that was caught and should be fixed in more recent versions. It should either be in the current master or will appear in version 0.3.

from umap.

vmarkovtsev avatar vmarkovtsev commented on April 28, 2024

This is still happening for me on 0.3.8. However, @warenlg found that fixing the numpy seed makes it fully deterministic: numpy.random.seed(42)

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024

So this is somewhat disconcerting, and is definitely on my list of things to fix. I am not honestly quite sure where or how this is happening.

from umap.

ericloud avatar ericloud commented on April 28, 2024

This is still happening for me on 0.3.10.
I try several way to fix the numpy seed, as proposed by @vmarkovtsev, but it doesn't work for me.
Is it possible to have more details ? An example would be great.

from umap.

sleighsoft avatar sleighsoft commented on April 28, 2024

@ericloud Can you provide example code of what you did in order for others to reproduce the issue?

from umap.

ericloud avatar ericloud commented on April 28, 2024

Eureka!
random_state works perfectly fine on my side.
The problem was in the matrix given in input, where the features was randomly ordered.

#This transformation return unique ids but not in deterministic order
ids_selected = list(set(ids_selected))

mat = mat.loc[ids_selected]

#A solution to fix it.
mat.sort_index(inplace=True)

embedding = umap.UMAP(
        n_neighbors=10,
        random_state=42
    ).fit_transform(mat.T)

Thanks.

from umap.

sleighsoft avatar sleighsoft commented on April 28, 2024

Glad you resolved it. See here for details on Python Data Structures https://docs.python.org/3/tutorial/datastructures.html#sets.

from umap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.