Comments (11)
That would definitely be a bug. I believe I had the seed working such that it eliminated randomness -- that is the latest version of UMAP has a random_state
parameter you can set on initialisation. You can set it with a number, or a numpy RandomState, wither way, you should be able to fix that to reproduce results.
Just setting numpy's random seed is not going to be enough because of interactions with numba and the fact that UMAP uses its own internal PRNG for speed. Can you clarify under what conditions you aren't getting repeatability?
from umap.
Hi thanks for responding. In this case I'm using the random_state parameter and setting it to 42:
embedding = umap.UMAP(n_neighbors=15,
min_dist=0.1,
n_components=2,
random_state=42,
metric='correlation', verbose=3).fit_transform(wordvectors)
#wordvectors is an n_sample X n_dim numpy array of word vectors
I then plot the embedding like so:
pandas.DataFrame(embedding).plot(kind='scatter', x= 0, y=1, alpha=0.05).
The graph is get is different each time. I tried switching the axis but that doesn't explain the differences.
fyi I just pip installed umap.
Any thoughts would be helpful. Thanks!
from umap.
Okay, that's definitely disconcerting because I worked through getting the random_state to work properly (which turned out to be frustratingly non-trivial) and for at least the test dataset I was working with it produced perfectly consistent results when fixed. I'll try a few other datasets to verify that it is indeed working for me at least, and then perhaps we can start trying to track down why it isn't working for you. Which python version are you using? That's potentially one reason for issues...
from umap.
nondeterminism probably comes from unstable result of metric_nn_descent function, I observe that some rows of returned knn_indices, knn_dists are not sorted according to knn distance (this may be a serious bug, not sure)
from umap.
That is a bug that was caught and should be fixed in more recent versions. It should either be in the current master or will appear in version 0.3.
from umap.
This is still happening for me on 0.3.8. However, @warenlg found that fixing the numpy seed makes it fully deterministic: numpy.random.seed(42)
from umap.
So this is somewhat disconcerting, and is definitely on my list of things to fix. I am not honestly quite sure where or how this is happening.
from umap.
This is still happening for me on 0.3.10.
I try several way to fix the numpy seed, as proposed by @vmarkovtsev, but it doesn't work for me.
Is it possible to have more details ? An example would be great.
from umap.
@ericloud Can you provide example code of what you did in order for others to reproduce the issue?
from umap.
Eureka!
random_state
works perfectly fine on my side.
The problem was in the matrix given in input, where the features was randomly ordered.
#This transformation return unique ids but not in deterministic order
ids_selected = list(set(ids_selected))
mat = mat.loc[ids_selected]
#A solution to fix it.
mat.sort_index(inplace=True)
embedding = umap.UMAP(
n_neighbors=10,
random_state=42
).fit_transform(mat.T)
Thanks.
from umap.
Glad you resolved it. See here for details on Python Data Structures https://docs.python.org/3/tutorial/datastructures.html#sets.
from umap.
Related Issues (20)
- Setting a random state still leads to stochastic results
- Implementation of sciki-learn's get_feature_names_out() API is not correct
- Is 'n_training_epochs' working for parameteric UMAP?
- visualize video data
- How to combine UMAP models in new data?
- Edit instructions to make them compatible with zsh
- Empty API page on UMAP API Guide? HOT 1
- PCA diagnostic error HOT 2
- Speed inquries HOT 2
- UMAP crashes when torch also imported before first run HOT 2
- Unable to pickle trained UMAP instance
- Reducing Model Size for UMAP on Large Datasets HOT 2
- umap.UMAP accepts strings as n_neighbors and min_dist, causing later failures
- Optimal dimensions
- RunUMAP Failing HOT 1
- Semi-deterministic output even though randon_state is set
- TypeError: Dispatcher._rebuild() got an unexpected keyword argument 'impl_kind' HOT 1
- illegal hardware instruction python HOT 2
- Transform new input with composite model HOT 1
- Inquiry on Utilizing UMAP for Text Similarity and Clustering HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from umap.