Giter Club home page Giter Club logo

Comments (12)

DmitryUlyanov avatar DmitryUlyanov commented on May 28, 2024 1

Do not know for sure, but the format the digits are stored can be different, e.g. [0,1] or 0...255. And t-SNE does a gradient descent, which may fail if the scaling and learning rates are wrong.

Try the example test.py from abouve, do you get a pretty image?

from multicore-tsne.

DmitryUlyanov avatar DmitryUlyanov commented on May 28, 2024

Did you try py_bh_tsne or any other non sk-learn package? Do they also produce a worse result? There can be some implementation differences, default parameters and so on. This repo uses py_bh_tsne as the base, I fixed some errors there, but it still can be imperfect. I would give it another try and check the implementation, but I hope sk-learn guys will improve their t-sne efficiency earlier, making this repo useless (that is how it should be).

from multicore-tsne.

areshytko avatar areshytko commented on May 28, 2024

Yes, unfortunately sk-learn's t-sne is unusable now except for such toy datasets. Yes, that's strange the output shows that the algorithm quickly converged to a low error and stopped any progress further.


Learning embedding...
Iteration 50: error is 43.405481 (50 iterations in 0.00 seconds)
Iteration 100: error is 44.709520 (50 iterations in 0.00 seconds)
Iteration 150: error is 43.567784 (50 iterations in 0.00 seconds)
Iteration 200: error is 42.564679 (50 iterations in 0.00 seconds)
Iteration 250: error is 1.118502 (50 iterations in 0.00 seconds)
Iteration 300: error is 0.238091 (50 iterations in 0.00 seconds)
Iteration 350: error is 0.117268 (50 iterations in 0.00 seconds)
Iteration 400: error is 0.120770 (50 iterations in 0.00 seconds)
Iteration 450: error is 0.121062 (50 iterations in 0.00 seconds)
Iteration 500: error is 0.121366 (50 iterations in 0.00 seconds)
Iteration 550: error is 0.121098 (50 iterations in 0.00 seconds)
Iteration 600: error is 0.121540 (50 iterations in 0.00 seconds)
Iteration 650: error is 0.121057 (50 iterations in 0.00 seconds)
Iteration 700: error is 0.120856 (50 iterations in 0.00 seconds)
Iteration 750: error is 0.121666 (50 iterations in 0.00 seconds)
Iteration 800: error is 0.121161 (50 iterations in 0.00 seconds)
Iteration 850: error is 0.121708 (50 iterations in 0.00 seconds)
Iteration 900: error is 0.121865 (50 iterations in 0.00 seconds)
Iteration 950: error is 0.122631 (50 iterations in 0.00 seconds)
Iteration 999: error is 0.121577 (50 iterations in 0.00 seconds)
Fitting performed in 0.00 seconds.

Comparing to that MNIST test example slowly but progressed till the last iteration.
And the IRIS dataset is a simple one - linearly separable

No I haven't tried other implementations yet

from multicore-tsne.

DmitryUlyanov avatar DmitryUlyanov commented on May 28, 2024

from multicore-tsne.

shaidams64 avatar shaidams64 commented on May 28, 2024

I also got a very different result from sklearn implementation on mnist dataset:
Multi-core tsne
screen shot 2017-07-13 at 10 39 43 am
sklearn tsne
screen shot 2017-07-13 at 10 39 53 am

from multicore-tsne.

DmitryUlyanov avatar DmitryUlyanov commented on May 28, 2024

Hi, the picture in the README file is t-sne visualization for MNIST dataset, made with the code from this repository. Here is the code https://github.com/DmitryUlyanov/Multicore-TSNE/blob/master/python/tests/test.py

from multicore-tsne.

shaidams64 avatar shaidams64 commented on May 28, 2024

Hey, I loaded the dataset from sklearn and ran the multicore_tsne on it, would there be a difference?
from MulticoreTSNE import MulticoreTSNE as MultiTSNE digits2 = load_digits()
m_tsne = MultiTSNE(n_jobs=4, init='pca', random_state=0) m_y = m_tsne.fit_transform(digits2.data)
plt.scatter(m_y[:, 0], m_y[:, 1], c=digits2.target) plt.show()

from multicore-tsne.

shaidams64 avatar shaidams64 commented on May 28, 2024

Yes it works with your example. It appears the scalings are different for the datasets. The dataset from sklearn is 0...16 but the one in your example is [-1,1]. So is this version working only with normalized datasets?

from multicore-tsne.

bartimus9 avatar bartimus9 commented on May 28, 2024

Thank you for putting this together, as it is the only multicore TSNE application I can get to successfully complete. However, my results are identical to shaidams64. I have an arcsinh transformed data set and I tried an implementation of this method in R (single core) and I get good results. Sklearn implementation (python) on the same data set returns a very similar result. This multi-core implementation works quickly, but produces an indiscernible cloud of points. I have carefully aligned all of the arguments I can, and the result is the same. Even when I set multicoreTSNE to use only one core, the result is the same (cloud of points). Any recommendations on how to fix this?

EDIT: This discussion thread ends with a multicore TSNE implementation that does reproduce my results with Sklearn and Rtsne. lvdmaaten/bhtsne#18

from multicore-tsne.

YubinXie avatar YubinXie commented on May 28, 2024

Is this problem solved with this multi-core tsne?

from multicore-tsne.

Ryanglambert avatar Ryanglambert commented on May 28, 2024

from multicore-tsne.

orihomie avatar orihomie commented on May 28, 2024

Hi, facing same problem for now - results of sklearn tsne and yours differs on the same params

Yes it works with your example. It appears the scalings are different for the datasets. The dataset from sklearn is 0...16 but the one in your example is [-1,1]. So is this version working only with normalized datasets?

So, if I'm getting it right - data normalizing should help (to make results be about "same")?

from multicore-tsne.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.