Giter Club home page Giter Club logo

multicore-tsne's Introduction

Multicore t-SNE Build Status

This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with Python CFFI-based wrappers. This code also works faster than sklearn.TSNE on 1 core (as of version 0.18).

What to expect

Barnes-Hut t-SNE is done in two steps.

  • First step: an efficient data structure for nearest neighbours search is built and used to compute probabilities. This can be done in parallel for each point in the dataset, this is why we can expect a good speed-up by using more cores.

  • Second step: the embedding is optimized using gradient descent. This part is essentially consecutive so we can only optimize within iteration. In fact some parts can be parallelized effectively, but not all of them a parallelized for now. That is why the second step speed-up will not be as significant as first step sepeed-up but there is still room for improvement.

So when can you benefit from parallelization? It is almost true, that the second step computation time is constant of D and depends mostly on N. The first part's time depends on D a lot, so for small D time(Step 1) << time(Step 2), for large D time(Step 1) >> time(Step 2). As we are only good at parallelizing step 1 we will benefit most when D is large enough (MNIST's D = 784 is large, D = 10 even for N=1000000 is not so much). I wrote multicore modification originally for Springleaf competition, where my data table was about 300000 x 3000 and only several days left till the end of the competition so any speed-up was handy.

Benchmark

1 core

Interestingly, this code beats other implementations. We compare to sklearn (Barnes-Hut of course), L. Van der Maaten's bhtsne, py_bh_tsne repo (cython wrapper for bhtsne with QuadTree). perplexity = 30, theta=0.5 for every run. In fact py_bh_tsne repo works at the same speed as this code when using more optimization flags for the compiler.

This is a benchmark for 70000x784 MNIST data:

Method Step 1 (sec) Step 2 (sec)
MulticoreTSNE(n_jobs=1) 912 350
bhtsne 4257 1233
py_bh_tsne 1232 367
sklearn(0.18) ~5400 ~20920

I did my best to find what is wrong with sklearn numbers, but it is the best benchmark I could do (you can find the test script in MulticoreTSNE/examples folder).

Multicore

This table shows a relative to 1 core speed-up when using n cores.

n_jobs Step 1 Step 2
1 1x 1x
2 1.54x 1.05x
4 2.6x 1.2x
8 5.6x 1.65x

How to use

Install

Directly from pypi

pip install MulticoreTSNE

From source

Make sure cmake is installed on your system, and you will also need a sensible C++ compiler, such as gcc or llvm-clang. On macOS, you can get both via homebrew.

To install the package, please do:

git clone https://github.com/DmitryUlyanov/Multicore-TSNE.git
cd Multicore-TSNE/
pip install .

Tested with python >= 3.6 (conda).

Run

You can use it as a near drop-in replacement for sklearn.manifold.TSNE.

from MulticoreTSNE import MulticoreTSNE as TSNE

tsne = TSNE(n_jobs=4)
Y = tsne.fit_transform(X)

Please refer to sklearn TSNE manual for parameters explanation.

This implementation n_components=2, which is the most common case (use Barnes-Hut t-SNE or sklearn otherwise). Also note that some parameters are there just for the sake of compatibility with sklearn and are otherwise ignored. See MulticoreTSNE class docstring for more info.

MNIST example

from sklearn.datasets import fetch_openml
from MulticoreTSNE import MulticoreTSNE as TSNE
from matplotlib import pyplot as plt

X, _ = fetch_openml(
  "mnist_784", version=1, return_X_y=True, as_frame=False, parser="pandas"
)
embeddings = TSNE(n_jobs=4).fit_transform(X)
vis_x = embeddings[:, 0]
vis_y = embeddings[:, 1]
plt.scatter(vis_x, vis_y, c=digits.target, cmap=plt.cm.get_cmap("jet", 10), marker='.')
plt.colorbar(ticks=range(10))
plt.clim(-0.5, 9.5)
plt.show()

Test

You can test it on MNIST dataset with the following command:

python MulticoreTSNE/examples/test.py --n_jobs <n_jobs>

Note on jupyter use

To make the computation log visible in jupyter please install wurlitzer (pip install wurlitzer) and execute this line in any cell beforehand:

%load_ext wurlitzer

Memory leakages are possible if you interrupt the process. Should be OK if you let it run until the end.

License

Inherited from original repo's license.

Future work

  • Allow other types than double
  • Improve step 2 performance (possible)

Citation

Please cite this repository if it was useful for your research:

@misc{Ulyanov2016,
  author = {Ulyanov, Dmitry},
  title = {Multicore-TSNE},
  year = {2016},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/DmitryUlyanov/Multicore-TSNE}},
}

Of course, do not forget to cite L. Van der Maaten's paper

multicore-tsne's People

Contributors

cciccole avatar chmodsss avatar dmitryulyanov avatar ecederstrand avatar falexwolf avatar guenteru avatar hurutoriya avatar jona-sassenhagen avatar jorvis avatar kernc avatar milianw avatar nighttrain42 avatar sbodenstein avatar sergiuser1 avatar sroecker avatar thocevar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multicore-tsne's Issues

Using this package does bad things to my NumPy

import numpy as np

print(np.log(np.nextafter(0, np.inf, dtype=np.float64)))

from MulticoreTSNE import MulticoreTSNE as TSNE
tsne = TSNE(n_jobs=6)

print(np.log(np.nextafter(0, np.inf, dtype=np.float64)))

Result:

# print(np.log(np.nextafter(0, np.inf, dtype=np.float64)))
-744.4400719213812
# from MulticoreTSNE import MulticoreTSNE as TSNE
# tsne = TSNE(n_jobs=6)
# print(np.log(np.nextafter(0, np.inf, dtype=np.float64)))
__main__:1: RuntimeWarning: divide by zero encountered in log
-inf

After calling tsne = TSNE(n_jobs=6) my numpy is not working any more as intended.

How can I fix this?

Results differ from scikit-learn implementation

t-sne is inherently randomized but still not that much. It produces consistently different (much worse) results compared to scikit-learn Barnes-Hut implementation.

Example on IRIS dataset:

Scikit-learn with default parameters and learning rate 100

original

Multicore T-SNE with default parameters and learning rate 100

multicore

The greater distance of setosa cluster is also supported by general statistical properties of the dataset (and other embedding algorithms) so the results of scikit-learn lib are more consistent with the original manifold structure

Segmentation fault on big datasets

Hello @DmitryUlyanov !

I have been trying to run TSNE on big datasets and so far it has been working great but I think I have reached your program's limit.

I have a huuge dataset ( 3091356 x 1120 ) and as soon as I start to fit the data I just get segmentation fault ( core dumped) .

I have the RAM required to run this thing, is it possible that you have some sort of pointer or malloc error ?

Thanks!

visualize for every step

Wonderful work! The API is simply perfect. But I wonder to know how to see the steps I have run and how can I visualize every step during the progress?

MulticoreTSNE on python 3.7.3 using conda

I am using a python 3.7 version and could not install MulticoreTSNE using conda, or using pip (pip install MulticoreTSNE) since it tries to downgrade a few installed packages, including python itself (to 3.6.8). Below is the error message.

Should we be expecting MulticoreTSNE to be compatable with python 3.7, or would you recommend installing 3.6.8? I would avoid the latter, since it means quite a bit of reloading.

Cheers

Upon trying to install:

The following packages will be DOWNGRADED:

_ipyw_jlab_nb_ext~ 0.1.0-py37_0 --> 0.1.0-py36_0
louvain 0.6.1-py37h0a44026_2 --> 0.6.1-py36h0a44026_2
mkl-service 1.1.2-py37hfbe908c_5 --> 1.1.2-py36hfbe908c_5
navigator-updater 0.2.1-py37_0 --> 0.2.1-py36_0
pot 0.5.1-py37h1702cab_1000 --> 0.5.1-py36h1702cab_1000
pycairo 1.18.0-py37ha54c0a8_1000 --> 1.18.0-py36ha54c0a8_1000
pycurl 7.43.0.2-py37ha12b0ac_0 --> 7.43.0.2-py36ha12b0ac_0
pyqt 5.9.2-py37h655552a_2 --> 5.9.2-py36h655552a_2
pyreadr 0.1.9-py37h2573ce8_0 --> 0.1.9-py36h2573ce8_0
python 3.7.3-h359304d_0 --> 3.6.8-haf84260_0
python-igraph 0.7.1.post7-py37h01d97ff_0 --> 0.7.1.post7-py36h01d97ff_0
sphinxcontrib 1.0-py37_1 --> 1.0-py36_1

Puzzle about the speed

Do you just implement this algorithm with the help of openMP?
Have you ever make some change in the process of bhtsne?

Using single core even when n_jobs=4 is used

I am seeing

Performing t-SNE using 1 cores.
Using no_dims = 2, perplexity = 30.000000, and theta = 0.500000
Computing input similarities...
Building tree...

this verbose message even when my n_jobs=4

Error in builds using cmake >= 3.20?

I'm trying to install MulticoreTSNE into a Docker image built on top of the Jupyter Minimal distribution (Anaconda Python, etc.). Previously I was able to run this without a hitch. I've tested the following combinations:

  1. jupyter/minimal-notebook:584f43f0658 (Late April 2021)
    • cmake 3.18.2 = Build successful (this was the version dumped from my last working image)
    • cmake 3.20.1 = Build unsuccessful (error below)
  2. jupyter/minimal-notebook:016833b15ceb (Late Feb 2021)
    • cmake 3.18.2 = Build successful (this was the version dumped from my last working image)
    • cmake 3.20.1 = Build unsuccessful (error below)

Forcing cmake to downgrade to 3.18.2 seems to force a downgrade of Python from 3.9 to 3.8 so it's possible that is the source of the problem, but the report suggests it's cmake.

The error is:

#13 53.91     running build_ext
#13 53.91     cmake version 3.20.1
#13 53.91
#13 53.91     CMake suite maintained and supported by Kitware (kitware.com/cmake).
#13 53.91     CMake Error: Unknown argument --
#13 53.91     CMake Error: Run 'cmake --help' for all supported options.
#13 53.91
#13 53.91     ERROR: Cannot generate Makefile. See above errors.

Full context is:

> [4/5] RUN conda-env create -n ethos -f ./python.test.yml     && conda clean --all --yes --force-pkgs-dirs     && find /opt/conda/ -follow -type f -name '*.a' -delete     && find /opt/conda/ -follow -type f -name '*.pyc' -delete     && find /opt/conda/ -follow -type f -name '*.js.map' -delete     && pip cache purge     && rm -rf /home/jovyan/.cache/pip     && rm ./python.test.yml:
#12 0.549 Collecting package metadata (repodata.json): ...working... done
#12 25.27 Solving environment: ...working... done
#12 29.94
#12 29.94 Downloading and Extracting Packages
libffi-3.3           | 51 KB     | ########## | 100%
lz4-c-1.9.3          | 179 KB    | ########## | 100%
libgomp-9.3.0        | 376 KB    | ########## | 100%
libedit-3.1.20191231 | 121 KB    | ########## | 100%
xz-5.2.5             | 343 KB    | ########## | 100%
rhash-1.4.1          | 192 KB    | ########## | 100%
krb5-1.17.2          | 1.4 MB    | ########## | 100%
readline-8.1         | 295 KB    | ########## | 100%
bzip2-1.0.8          | 484 KB    | ########## | 100%
_openmp_mutex-4.5    | 22 KB     | ########## | 100%
tzdata-2021a         | 121 KB    | ########## | 100%
libssh2-1.9.0        | 226 KB    | ########## | 100%
certifi-2020.12.5    | 143 KB    | ########## | 100%
sqlite-3.35.4        | 1.4 MB    | ########## | 100%
ca-certificates-2020 | 137 KB    | ########## | 100%
zstd-1.4.9           | 431 KB    | ########## | 100%
libuv-1.41.0         | 1.0 MB    | ########## | 100%
setuptools-49.6.0    | 943 KB    | ########## | 100%
libstdcxx-ng-9.3.0   | 4.0 MB    | ########## | 100%
libcurl-7.76.1       | 328 KB    | ########## | 100%
cmake-3.20.1         | 14.7 MB   | ########## | 100%
libgcc-ng-9.3.0      | 7.8 MB    | ########## | 100%
python-3.9.2         | 27.3 MB   | ########## | 100%
zlib-1.2.11          | 106 KB    | ########## | 100%
pip-21.0.1           | 1.1 MB    | ########## | 100%
ld_impl_linux-64-2.3 | 618 KB    | ########## | 100%
tk-8.6.10            | 3.2 MB    | ########## | 100%
openssl-1.1.1k       | 2.1 MB    | ########## | 100%
libnghttp2-1.43.0    | 808 KB    | ########## | 100%
wheel-0.36.2         | 31 KB     | ########## | 100%
python_abi-3.9       | 4 KB      | ########## | 100%
libev-4.33           | 104 KB    | ########## | 100%
_libgcc_mutex-0.1    | 3 KB      | ########## | 100%
expat-2.3.0          | 168 KB    | ########## | 100%
ncurses-6.2          | 985 KB    | ########## | 100%
c-ares-1.17.1        | 109 KB    | ########## | 100%
#12 44.17 Preparing transaction: ...working... done
#12 44.43 Verifying transaction: ...working... done
#12 45.95 Executing transaction: ...working... done
#12 47.59 Installing pip dependencies: ...working... Ran pip subprocess with arguments:
#12 52.31 ['/opt/conda/envs/ethos/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/jovyan/condaenv.4kzuzpkn.requirements.txt']
#12 52.31 Pip subprocess output:
#12 52.31 Collecting MulticoreTSNE
#12 52.31   Downloading MulticoreTSNE-0.1.tar.gz (20 kB)
#12 52.31 Collecting numpy
#12 52.31   Downloading numpy-1.20.2-cp39-cp39-manylinux2010_x86_64.whl (15.4 MB)
#12 52.31 Collecting cffi
#12 52.31   Downloading cffi-1.14.5-cp39-cp39-manylinux1_x86_64.whl (406 kB)
#12 52.31 Collecting pycparser
#12 52.31   Downloading pycparser-2.20-py2.py3-none-any.whl (112 kB)
#12 52.31 Building wheels for collected packages: MulticoreTSNE
#12 52.31   Building wheel for MulticoreTSNE (setup.py): started
#12 52.31   Building wheel for MulticoreTSNE (setup.py): finished with status 'error'
#12 52.31   Running setup.py clean for MulticoreTSNE
#12 52.31 Failed to build MulticoreTSNE
#12 52.31 Installing collected packages: pycparser, numpy, cffi, MulticoreTSNE
#12 52.31     Running setup.py install for MulticoreTSNE: started
#12 52.31     Running setup.py install for MulticoreTSNE: finished with status 'error'
#12 52.31
#12 52.31 failed
#12 52.31
#12 52.31
#12 52.31 ==> WARNING: A newer version of conda exists. <==
#12 52.31   current version: 4.10.0
#12 52.31   latest version: 4.10.1
#12 52.31
#12 52.31 Please update conda by running
#12 52.31
#12 52.31     $ conda update -n base conda
#12 52.31
#12 52.31
#12 52.31 Pip subprocess error:
#12 52.31   ERROR: Command errored out with exit status 1:
#12 52.31    command: /opt/conda/envs/ethos/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-110qdwqv/multicoretsne_4b5d168de5e04f6c894a77b8595839b9/setup.py'"'"'; __file__='"'"'/tmp/pip-install-110qdwqv/multicoretsne_4b5d168de5e04f6c894a77b8595839b9/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-gm7zmxgs
#12 52.31        cwd: /tmp/pip-install-110qdwqv/multicoretsne_4b5d168de5e04f6c894a77b8595839b9/
#12 52.31   Complete output (26 lines):
#12 52.31   running bdist_wheel
#12 52.31   running build
#12 52.31   running build_py
#12 52.31   creating build
#12 52.31   creating build/lib.linux-x86_64-3.9
#12 52.31   creating build/lib.linux-x86_64-3.9/MulticoreTSNE
#12 52.31   copying MulticoreTSNE/__init__.py -> build/lib.linux-x86_64-3.9/MulticoreTSNE
#12 52.31   creating build/lib.linux-x86_64-3.9/MulticoreTSNE/tests
#12 52.31   copying MulticoreTSNE/tests/test_base.py -> build/lib.linux-x86_64-3.9/MulticoreTSNE/tests
#12 52.31   copying MulticoreTSNE/tests/__init__.py -> build/lib.linux-x86_64-3.9/MulticoreTSNE/tests
#12 52.31   running egg_info
#12 52.31   writing MulticoreTSNE.egg-info/PKG-INFO
#12 52.31   writing dependency_links to MulticoreTSNE.egg-info/dependency_links.txt
#12 52.31   writing requirements to MulticoreTSNE.egg-info/requires.txt
#12 52.31   writing top-level names to MulticoreTSNE.egg-info/top_level.txt
#12 52.31   reading manifest file 'MulticoreTSNE.egg-info/SOURCES.txt'
#12 52.31   reading manifest template 'MANIFEST.in'
#12 52.31   writing manifest file 'MulticoreTSNE.egg-info/SOURCES.txt'
#12 52.31   running build_ext
#12 52.31   cmake version 3.20.1
#12 52.31
#12 52.31   CMake suite maintained and supported by Kitware (kitware.com/cmake).
#12 52.31   CMake Error: Unknown argument --
#12 52.31   CMake Error: Run 'cmake --help' for all supported options.
#12 52.31
#12 52.31   ERROR: Cannot generate Makefile. See above errors.
#12 52.31   ----------------------------------------
#12 52.31   ERROR: Failed building wheel for MulticoreTSNE
#12 52.31     ERROR: Command errored out with exit status 1:
#12 52.31      command: /opt/conda/envs/ethos/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-110qdwqv/multicoretsne_4b5d168de5e04f6c894a77b8595839b9/setup.py'"'"'; __file__='"'"'/tmp/pip-install-110qdwqv/multicoretsne_4b5d168de5e04f6c894a77b8595839b9/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-5jx__mkf/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/envs/ethos/include/python3.9/MulticoreTSNE
#12 52.31          cwd: /tmp/pip-install-110qdwqv/multicoretsne_4b5d168de5e04f6c894a77b8595839b9/
#12 52.31     Complete output (26 lines):
#12 52.31     running install
#12 52.31     running build
#12 52.31     running build_py
#12 52.31     creating build
#12 52.31     creating build/lib.linux-x86_64-3.9
#12 52.31     creating build/lib.linux-x86_64-3.9/MulticoreTSNE
#12 52.31     copying MulticoreTSNE/__init__.py -> build/lib.linux-x86_64-3.9/MulticoreTSNE
#12 52.31     creating build/lib.linux-x86_64-3.9/MulticoreTSNE/tests
#12 52.31     copying MulticoreTSNE/tests/test_base.py -> build/lib.linux-x86_64-3.9/MulticoreTSNE/tests
#12 52.31     copying MulticoreTSNE/tests/__init__.py -> build/lib.linux-x86_64-3.9/MulticoreTSNE/tests
#12 52.31     running egg_info
#12 52.31     writing MulticoreTSNE.egg-info/PKG-INFO
#12 52.31     writing dependency_links to MulticoreTSNE.egg-info/dependency_links.txt
#12 52.31     writing requirements to MulticoreTSNE.egg-info/requires.txt
#12 52.31     writing top-level names to MulticoreTSNE.egg-info/top_level.txt
#12 52.31     reading manifest file 'MulticoreTSNE.egg-info/SOURCES.txt'
#12 52.31     reading manifest template 'MANIFEST.in'
#12 52.31     writing manifest file 'MulticoreTSNE.egg-info/SOURCES.txt'
#12 52.31     running build_ext
#12 52.31     cmake version 3.20.1
#12 52.31
#12 52.31     CMake suite maintained and supported by Kitware (kitware.com/cmake).
#12 52.31     CMake Error: Unknown argument --
#12 52.31     CMake Error: Run 'cmake --help' for all supported options.
#12 52.31
#12 52.31     ERROR: Cannot generate Makefile. See above errors.
#12 52.31     ----------------------------------------
#12 52.31 ERROR: Command errored out with exit status 1: /opt/conda/envs/ethos/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-110qdwqv/multicoretsne_4b5d168de5e04f6c894a77b8595839b9/setup.py'"'"'; __file__='"'"'/tmp/pip-install-110qdwqv/multicoretsne_4b5d168de5e04f6c894a77b8595839b9/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-5jx__mkf/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/envs/ethos/include/python3.9/MulticoreTSNE Check the logs for full command output.
#12 52.31
#12 52.31
#12 52.31 CondaEnvException: Pip failed
#12 52.31
------
executor failed running [/bin/bash -c conda-env create -n ${env_nm} -f ./${yaml_nm}     && conda clean --all --yes --force-pkgs-dirs     && find /opt/conda/ -follow -type f -name '*.a' -delete     && find /opt/conda/ -follow -type f -name '*.pyc' -delete     && find /opt/conda/ -follow -type f -name '*.js.map' -delete     && pip cache purge     && rm -rf /home/$NB_USER/.cache/pip     && rm ./${yaml_nm}]: exit code: 1

Low CPU utilization during second phase

It says in "future work" that phase 2 can be improved, so maybe this is what you're referring to?

On phase 2 I see very low CPU utilization (see attached screenshot). Running Multicore-TSNE on 38 cores only utilize 1 core to 100% while the others lie at around 8%.

mc-tsne-low-cpu

Memory Allocation Fail (Big Data)

Hello,

I use Multicore-opt-TSNE on a big data, 24 000 000 events and 18 parameters on a ubuntu server with 40 core, and 500G Ram with this command line :

python2 MulticoreTSNE/run/run_optsne.py --optsne --data Data.csv --outfile Data_tsne.csv --n_threads 40 --perp 50

and i have this error : Memory allocation failed!

Can you tell me if i change something in my commande line (in my parameters) i have more luck to run my job or if you know the setup necessary to run a Multicore-Tsne on thi big data ?

Best regards.

Quentin Barbier

Support for n_components==1, others support it.

Looks like the failure might be happening in:
Multicore-TSNE/multicore_tsne/tsne.cpp
with this function: evaluateError where it is producing nans.

Willing to send $100USD in Bitcoin to the first person that can demonstrate a solution before I do.

Support for embedding in D>2

The algorithm does not seem to work properly if the target space is bigger than 2-dimensional. Is there a plan for extended functionality?

Build Fails on ubuntu 18.04

Hello,

Both through the pip and git clone install I get this:

` ERROR: running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/MulticoreTSNE
copying MulticoreTSNE/init.py -> build/lib.linux-x86_64-3.7/MulticoreTSNE
creating build/lib.linux-x86_64-3.7/MulticoreTSNE/tests
copying MulticoreTSNE/tests/init.py -> build/lib.linux-x86_64-3.7/MulticoreTSNE/tests
copying MulticoreTSNE/tests/test_base.py -> build/lib.linux-x86_64-3.7/MulticoreTSNE/tests
running egg_info
creating MulticoreTSNE.egg-info
writing MulticoreTSNE.egg-info/PKG-INFO
writing dependency_links to MulticoreTSNE.egg-info/dependency_links.txt
writing requirements to MulticoreTSNE.egg-info/requires.txt
writing top-level names to MulticoreTSNE.egg-info/top_level.txt
writing manifest file 'MulticoreTSNE.egg-info/SOURCES.txt'
reading manifest file 'MulticoreTSNE.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'MulticoreTSNE.egg-info/SOURCES.txt'
running build_ext
cmake version 3.14.4

CMake suite maintained and supported by Kitware (kitware.com/cmake).
-- The CXX compiler identification is unknown
CMake Error at CMakeLists.txt:1 (PROJECT):
No CMAKE_CXX_COMPILER could be found.

Tell CMake where to find the compiler by setting either the environment
variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
to the compiler, or to the compiler name if it is in the PATH.

-- Configuring incomplete, errors occurred!
See also "/tmp/pip-req-build-ztrw2075/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log".
See also "/tmp/pip-req-build-ztrw2075/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log".

ERROR: Cannot generate Makefile. See above errors`

Segmentation Fault (core dumped)

I got a segmentation fault (core dumped) error.

This was at the start of the Computing input similarities...` step.
Any ideas how to debug?

Thanks

Results differ based on number of threads

This is a "foward" from RGLab/Rtsne.multicore#7

Part 1

I observed that the results differ based on the number of threads specified.

In my application which used BH-SNE to create a 2D embedding followed by automated clustering using DBSCAN, I have replaced the single-threaded Rtsne call by a call to your multi-threaded Rtsne.multicore. This was nice&easy thanks to the similarity of both interfaces.

However, when I run the application, the results differ ever so slightly, as indicated below (just the first couple of points each time):
Using 1 thread

-4.3473001944841 -9.88816236259427
-0.264536173449281 2.26121958696939
-11.8037471711157 -1.23420653192463
18.5043209507443 -13.4638139443446
1.51823629529208 -27.2209786228982
8.44296382274354 11.5004388863181
17.0385503073606 -19.5842234534257
-1.80122124653633 -35.1542911986375
-14.9339466535662 11.4724805072396
-16.7179891732902 10.300907221322

Using 2 threads

-4.33102494052646 -9.94346771160292
-0.300330796745644 2.47627128482164
-14.4865548712467 3.83169546954971
18.0266761572745 -13.3481838170748
1.55009711170931 -27.3536683521347
8.57133969496983 11.704078885386
16.8146752705904 -19.4804761345993
-1.67702875389705 -35.6116919363096
-16.328562693303 10.9834569354747
-17.9212513482976 10.1738069116024

Using 3 threads

-4.15202535615338 -9.91628914440292
-0.266922842312901 2.30165398545058
-12.0458514750223 -1.26327092092668
18.3116039523395 -13.4472311793933
1.8728867702686 -27.0478452540983
8.21259960134093 11.338018514761
16.938103908809 -19.4664656504238
-1.51129210868152 -35.5926372619633
-15.7107052664802 10.622091607029
-16.9275577907434 10.5760540704756

Using 4 threads

-4.40493207317474 -10.2542865145978
-0.240311071414228 2.34386945654285
-11.613066543124 -1.22167721092907
17.978213066292 -13.6367838896947
1.68103298346623 -27.3950001130062
8.48320430773571 11.5841961868582
16.5975194709815 -19.6467988772466
-1.21063128661383 -35.6738754692542
-16.2962040171112 11.6000609166704
-16.4988660902924 10.7927849813962

The results using the same number of threads seems to be consistent between different runs, though - which is good at least :)

Using 1 thread - a second run

-4.3473001944841 -9.88816236259427
-0.264536173449281 2.26121958696939
-11.8037471711157 -1.23420653192463
18.5043209507443 -13.4638139443446
1.51823629529208 -27.2209786228982
8.44296382274354 11.5004388863181
17.0385503073606 -19.5842234534257
-1.80122124653633 -35.1542911986375
-14.9339466535662 11.4724805072396
-16.7179891732902 10.300907221322

And for all the points, computing the MD5SUM:

cat ./one_threads/one.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
2410c2539be68ffe1f52d1be0f04bfac  -
cat ./one_threads_old/one.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
2410c2539be68ffe1f52d1be0f04bfac  -
cat ./two_threads/two.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
1f7dd4212d74b162420c79e619b3b91b  -
 cat ./three_threads/three.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
f659b3527318c9545766fed14fc72daa  -
./four_threads/four.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
0e7425b7acf3438d047fb1550bbd069f  -

While the differences are hard to spot by eye - I mean in a 2D scatterplot -, the automatic clustering is affected by the differences.

Your input is greatly appreciated!

Part 2

I explore this further and here is a minimal working example:

library(Rtsne.multicore) # Load package
library(digest)
iris_unique <- unique(iris) # Remove duplicates
mat <- as.matrix(iris_unique[,1:4])
set.seed(42) # Sets seed for reproducibility
tsne_out1 <- Rtsne.multicore(mat, num_threads = 1) # Run TSNE
set.seed(42) # Sets seed for reproducibility
tsne_out1_2 <- Rtsne.multicore(mat, num_threads = 1) # Run TSNE
set.seed(42) # Sets seed for reproducibility
tsne_out2 <- Rtsne.multicore(mat, num_threads = 2) # Run TSNE
set.seed(42) # Sets seed for reproducibility
tsne_out2_2 <- Rtsne.multicore(mat, num_threads = 2) # Run TSNE
set.seed(42) # Sets seed for reproducibility
tsne_out3 <- Rtsne.multicore(mat, num_threads = 3) # Run TSNE
set.seed(42) # Sets seed for reproducibility
tsne_out3_2 <- Rtsne.multicore(mat, num_threads = 3) # Run TSNE
set.seed(42) # Sets seed for reproducibility
tsne_out4 <- Rtsne.multicore(mat, num_threads = 4) # Run TSNE
print(digest(tsne_out1))
print(digest(tsne_out1_2))
print(digest(tsne_out2))
print(digest(tsne_out2_2))
print(digest(tsne_out3))
print(digest(tsne_out3_2))
print(digest(tsne_out4))

and some demo output from Rstudio:

> source('~/.active-rstudio-document')
[1] "6adbcd6eb0106f49c7ac0a99eae369fc"
[1] "6adbcd6eb0106f49c7ac0a99eae369fc"
[1] "6269caaf71aca51ca57e2ead7425a14f"
[1] "6269caaf71aca51ca57e2ead7425a14f"
[1] "82974082989bc301349e03f3d9ee5c5b"
[1] "a8c779d9a4f54f2c14d84b624ffe9da9"
[1] "ccc0b4af068a4c2005504c0b1493e256"
> source('~/.active-rstudio-document')
[1] "6adbcd6eb0106f49c7ac0a99eae369fc"
[1] "6adbcd6eb0106f49c7ac0a99eae369fc"
[1] "6269caaf71aca51ca57e2ead7425a14f"
[1] "6269caaf71aca51ca57e2ead7425a14f"
[1] "b3479248cefc9b979521e13b25418223"
[1] "07dd9ce0d52e0cb0d1332f8d4849675c"
[1] "8b3a73318d64dd07f96ecdc2e06251d5"

As you can see, the results are consistent between different runs using the same number of threads (here for 1 or 2 threads) yet differ when using different numbers of threads.
Moreover, I am confused as to why the results for 3 threads and 4 threads are different between two runs, i.e., behave differently than 1 or 2 threads.

This is quite puzzling to me and your input is highly appreciated!

Best,

Cedric

'No, no this should not happen' Happens

Title is pretty self explanatory. I used your implementation a few weeks ago successfully and everything was perfect, but now when i installed this on another machine after a few iterations it starts spamming that particular error message. I have tried installing everything from scratch and nothing seems to work. I am using the same data as with the other machines.

Has something been changed? Its pretty silly, but this is the only TSNE implementation that i can find that wont take me a day per attempt.

Cannot find GOMP_4.0?

Im running into this error:

OSError: cannot load library /home/vsilva/anaconda2/lib/python2.7/site-packages/MulticoreTSNE/libtsne_multicore.so: /home/vsilva/anaconda2/bin/../lib/libgomp.so.1: version GOMP_4.0' not found (required by /home/vsilva/anaconda2/lib/python2.7/site-packages/MulticoreTSNE/libtsne_multicore.so). Additionally, ctypes.util.find_library() did not manage to locate a library called '/home/vsilva/anaconda2/lib/python2.7/site-packages/MulticoreTSNE/libtsne_multicore.so'`

I've tried searching everywhere but noone has a consistent answer. Does anyone run into this?

N jobs doesn't affect number of cores used

I downloaded and installed as per instructions, however on my 2017 MacBook Pro running High Sierra, the test script using MNIST always uses 1 core, regardless of the n_jobs parameter passed in.

n_components=2

Hi

Is there any roadmap for allowing higher dimensionality ? tSNE can also be used to reduce the dimension in datasets - e.g from 200 down to 10. Being able to do this with something much faster than sklearn would be really cool.

Thanks

Ian

other metrics

Hi!

is it possible to add other metrics, some data are not well suited for default 'euclidean' one?

Many thanks!

Will there be a version for Windows?

Whilst the compilation and installation worked fine on Windows 8.1, running the code in Python results in

OSError: cannot load library \lib\site-packages\MulticoreTSNE/libtsne_multicore.so: error 0x7e

I guess that's since windows rather expects a DLL than a .so library. Unfortunately my CMAKE skills are not sufficient to adjust the current build instructions to also produce a .dll on Windows - so here's me hoping that someone might fix that.

Feature Request: Output

I would like to request some form of status output.
For example, I have access to a machine with 40 cores and 100 Gb ram and have been running Multicore-TSNE for a few days. It would be a nice to get some output every now and again.

License Clarification

Hi @DmitryUlyanov, thanks for this library. I was looking at the license and it looks like the copyright year and name aren't filled out.

Copyright (c) [year], [fullname]

In the source code, there are a few headers that mention:

 *  Created by Laurens van der Maaten.
 *  Copyright 2012, Delft University of Technology. All rights reserved.
 *
 *  Multicore version by Dmitry Ulyanov, 2016. [email protected]

Also, readme mentions the license is inherited from bhtsne, but that repo uses original BSD license (4 clause) and this one uses BSD 3 Clause.

Can you provide some clarification on the licensing?

macOS Sierra - add linker flags

Building in macOS 10.12.6. In link.txt, generated by the makefile, I needed to add the following flags to get it link:
-lc++ -lstdc++

segmentation fault (core dumped) sometime

Hello

I get "segmentation fault (core dumped)" when run multicore-tsne. One of the cases that the crash occurs is when the input data contains lots of zeros. Is there any fix for this problem?
Thanks

from MulticoreTSNE import MulticoreTSNE as TSNE
import numpy as np
tsne = TSNE(n_jobs=40, perplexity=30)
tsne.fit(np.zeros([5,3]))

KL Divergence Not Provided and Perplexity Range

This is a great work! The speed is impressive!

Meanwhile, I notice that this version does not provide the KL divergence attribute, right? In fact, in Scikit-Learn if you can easily get it from

tsne.kl_divergence_

Also, it seems that the perplexity should be smaller than 1/3 of the number of data points - any way to use a larger perplexity?

Squared euclidean distance breaks VPTree

Squared euclidean distance cannot be used in VPTree search and thus sqrt() should be calculated for result:
sqrt() should be used in https://github.com/DmitryUlyanov/Multicore-TSNE/blob/master/multicore_tsne/vptree.h#L71
or https://github.com/DmitryUlyanov/Multicore-TSNE/blob/master/multicore_tsne/vptree.h#L206

Using squared euclidean distance in VPTree causes search to not find all k nearest points.
See lvdmaaten/bhtsne#41 (comment) and http://stevehanov.ca/blog/index.php?id=130

"It is worth repeating that you must use a distance metric that satisfies the triangle inequality. I spent a lot of time wondering why my VP tree was not working. It turns out that I had not bothered to find the square root in the distance calculation. This step is important to satisfy the requirements of a metric space, because if the straight line distance to a <= b+c, it does not necessarily follow that a2 <= b2 + c2."

Omitting sqrt in VPTree search seems to bring increased performance because it doesn't search all necessary branches in tree. You can ensure that by calculating t-SNE with both metrics and using same initial coordinates. You will see that output differs a bit. I have done this in comment lvdmaaten/bhtsne#41 (comment)

Build currently fails (error: ‘mean_y’ was not declared in this scope)

When I pull from today's trunk and attempt to install I get an error here.

/usr/bin/make -f CMakeFiles/tsne_multicore.dir/build.make CMakeFiles/tsne_multicore.dir/build
make[2]: Entering directory '/home/jorvis/git/Multicore-TSNE/build/temp.linux-x86_64-3.7'
[ 33%] Building CXX object CMakeFiles/tsne_multicore.dir/splittree.cpp.o
/usr/bin/c++  -Dtsne_multicore_EXPORTS  -Wall -fopenmp -O3 -DNDEBUG -O3 -fPIC -ffast-math -funroll-loops -fPIC   -o CMakeFiles
/tsne_multicore.dir/splittree.cpp.o -c /home/jorvis/git/Multicore-TSNE/multicore_tsne/splittree.cpp
/home/jorvis/git/Multicore-TSNE/multicore_tsne/splittree.cpp: In member function ‘void SplitTree::subdivide()’:
/home/jorvis/git/Multicore-TSNE/multicore_tsne/splittree.cpp:197:18: error: ‘mean_y’ was not declared in this scope
         delete[] mean_y;
                  ^
CMakeFiles/tsne_multicore.dir/build.make:65: recipe for target 'CMakeFiles/tsne_multicore.dir/splittree.cpp.o' failed
make[2]: *** [CMakeFiles/tsne_multicore.dir/splittree.cpp.o] Error 1
make[2]: Leaving directory '/home/jorvis/git/Multicore-TSNE/build/temp.linux-x86_64-3.7'
CMakeFiles/Makefile2:70: recipe for target 'CMakeFiles/tsne_multicore.dir/all' failed
make[1]: *** [CMakeFiles/tsne_multicore.dir/all] Error 2
make[1]: Leaving directory '/home/jorvis/git/Multicore-TSNE/build/temp.linux-x86_64-3.7'
Makefile:86: recipe for target 'all' failed
make: *** [all] Error 2

Any one got MultiCoreTSNE run on python 3.5?

Hi,

I was trying to get TSNE running on unbuntu in a docker container. However I am getting this error message below: Any way to get around with this? Thanks!

user12@82cf25a0ccd7:~/Multicore-TSNE$ pip install .
File "/miniconda/lib/python3.5/site.py", line 176
file=sys.stderr)
^
SyntaxError: invalid syntax

Cannot find cmake but cmake is installed

I tried doing both pip install as in the directions along with the actual setup file but it can't find my cmake.

jespinozlt-osx:Multicore-TSNE jespinoz$ ls
MANIFEST.in		README.md		mnist-tsne.png		multicore_tsne		python			requirements.txt	setup.py		torch

jespinozlt-osx:Multicore-TSNE jespinoz$ which cmake
/Users/jespinoz/anaconda/bin/cmake


jespinozlt-osx:Multicore-TSNE jespinoz$ python setup.py install
running install
-- The C compiler identification is AppleClang 8.0.0.8000042
-- The CXX compiler identification is AppleClang 8.0.0.8000042
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Checking if C linker supports --verbose
-- Checking if C linker supports --verbose - no
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Checking if CXX linker supports --verbose
-- Checking if CXX linker supports --verbose - no
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp=libomp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [ ]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [/openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-Qopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-xopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [+Oopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-qsmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-mp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-fopenmp=libomp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [ ]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [/openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-Qopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-xopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [+Oopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-qsmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-mp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
CMake Error at /Users/jespinoz/anaconda/share/cmake-3.6/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find OpenMP (missing: OpenMP_C_FLAGS OpenMP_CXX_FLAGS)
Call Stack (most recent call first):
  /Users/jespinoz/anaconda/share/cmake-3.6/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
  /Users/jespinoz/anaconda/share/cmake-3.6/Modules/FindOpenMP.cmake:234 (find_package_handle_standard_args)
  CMakeLists.txt:6 (FIND_PACKAGE)


-- Configuring incomplete, errors occurred!
See also "/Users/jespinoz/Multicore-TSNE/multicore_tsne/release/CMakeFiles/CMakeOutput.log".
See also "/Users/jespinoz/Multicore-TSNE/multicore_tsne/release/CMakeFiles/CMakeError.log".
cannot find cmake

installation fails on macOS

I followed the instructions on this link to get a version of gcc that supports openmp.

but it looks like the install script isn't using gcc:

-- The C compiler identification is AppleClang 8.0.0.8000038
-- The CXX compiler identification is AppleClang 8.0.0.8000038
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc

help?

Unable to set n_component to anything other than 2

When I try to set n_component = 3
ts_2 = TSNE(n_component=3, n_jobs=4,perplexity=100,random_state=5, verbose=2)

This is the error message I get:
assert n_components == 2, 'n_components should be 2'
AssertionError: n_components should be 2

Failed building wheel for MulticoreTSNE

Hello,

I'm trying to install the MulticoreTSNE using the pip install MulticoreTSNE, but getting the following error:

/Users/jason/opt/miniconda3/lib/python3.8/site-packages/cmake/data/CMake.app/Contents/bin/cmake -E cmake_progress_start /private/var/folders/st/t3rpc2cn5m3c761j79yj7dm40000gn/T/pip-req-build-y8q7m_a5/build/temp.macosx-10.9-x86_64-3.8/CMakeFiles 0
  installing to build/bdist.macosx-10.9-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.macosx-10.9-x86_64
  creating build/bdist.macosx-10.9-x86_64/wheel
  creating build/bdist.macosx-10.9-x86_64/wheel/MulticoreTSNE
  copying build/lib.macosx-10.9-x86_64-3.8/MulticoreTSNE/libtsne_multicore.so -> build/bdist.macosx-10.9-x86_64/wheel/MulticoreTSNE
  creating build/bdist.macosx-10.9-x86_64/wheel/MulticoreTSNE/tests
  copying build/lib.macosx-10.9-x86_64-3.8/MulticoreTSNE/tests/__init__.py -> build/bdist.macosx-10.9-x86_64/wheel/MulticoreTSNE/tests
  copying build/lib.macosx-10.9-x86_64-3.8/MulticoreTSNE/tests/test_base.py -> build/bdist.macosx-10.9-x86_64/wheel/MulticoreTSNE/tests
  copying build/lib.macosx-10.9-x86_64-3.8/MulticoreTSNE/__init__.py -> build/bdist.macosx-10.9-x86_64/wheel/MulticoreTSNE
  running install_egg_info
  Copying MulticoreTSNE.egg-info to build/bdist.macosx-10.9-x86_64/wheel/MulticoreTSNE-0.1-py3.8.egg-info
  running install_scripts
  [WARNING] This wheel needs a higher macOS version than the version your Python interpreter is compiled against.  To silence this warning, set MACOSX_DEPLOYMENT_TARGET to at least 11_0 or recreate these files with lower MACOSX_DEPLOYMENT_TARGET:
  build/bdist.macosx-10.9-x86_64/wheel/MulticoreTSNE/libtsne_multicore.soTraceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/private/var/folders/st/t3rpc2cn5m3c761j79yj7dm40000gn/T/pip-req-build-y8q7m_a5/setup.py", line 74, in <module>
      setup(
    File "/Users/jason/opt/miniconda3/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
      return distutils.core.setup(**attrs)
    File "/Users/jason/opt/miniconda3/lib/python3.8/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/Users/jason/opt/miniconda3/lib/python3.8/distutils/dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "/Users/jason/opt/miniconda3/lib/python3.8/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/Users/jason/opt/miniconda3/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 328, in run
      impl_tag, abi_tag, plat_tag = self.get_tag()
    File "/Users/jason/opt/miniconda3/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 278, in get_tag
      assert tag in supported_tags, "would build wheel with unsupported tag {}".format(tag)
  AssertionError: would build wheel with unsupported tag ('cp38', 'cp38', 'macosx_11_0_x86_64')
  ----------------------------------------
  ERROR: Failed building wheel for MulticoreTSNE

The version of python is 3.8.5, and the operating system is macOS Big Sur.
Any suggestions would be appreciated.

cmake required

Make sure you have cmake installed, otherwise it will silently fail. Should be in docs.

TSNE results in dense centred ball

What could cause the TSNE to result in a dense centred ball and where to look for a solution.
I have 650K images with perplexity 50, angle 0.1, learning rate 1000.

Other attempts with similar sized batch of images did not exhibit this behaviour.

Any thoughts?

html_tsne

test fails with "OSError: cannot load library"

OS: ubuntu
python version: Python 3.6.4 :: Anaconda, Inc.

steps to reproduce:

  1. python MulticoreTSNE/examples/test.py

full error:

downloading MNIST
downloaded
Traceback (most recent call last):
  File "MulticoreTSNE/examples/test.py", line 81, in <module>
    tsne = TSNE(n_jobs=int(args.n_jobs), verbose=1, n_components=args.n_components, random_state=660)
  File "/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/MulticoreTSNE/__init__.py", line 63, in __init__
    self.C = self.ffi.dlopen(path + "/libtsne_multicore.so")
  File "/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/cffi/api.py", line 141, in dlopen
    lib, function_cache = _make_ffi_library(self, name, flags)
  File "/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/cffi/api.py", line 802, in _make_ffi_library
    backendlib = _load_backend_lib(backend, libname, flags)
  File "/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/cffi/api.py", line 797, in _load_backend_lib
    raise OSError(msg)
OSError: cannot load library '/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/MulticoreTSNE/libtsne_multicore.so': /home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/MulticoreTSNE/libtsne_multicore.so: undefined symbol: _ZNSt8ios_base4InitD1Ev.  Additionally, ctypes.util.find_library() did not manage to locate a library called '/home/marder/anaconda3/envs/mctsne/lib/python3.6/site-packages/MulticoreTSNE/libtsne_multicore.so'

Use option `random_state` / Set seed via `srand (random_state);`

Hi Dmitry,

currently, the option random_state is avoided and thereby every tSNE plot looks different. Would you consider setting a seed for the initialization as described above, in random_state != None? If you want, I can make a pull request for that.

Cheers,
Alex

Feature request: saving transform at intermediate iterations

I'd like to save the transformation at some range of intermediate iterations (or even every iteration if possible).

Something like this. Specifically for this sort of example animation.

Right now this is sort of possible by setting the random state and init and running from scratch each time with a different niter, but that's not exactly right.

Unable to install using Pip

I am unable to install using pip install .It gives Running setup.py bdist_wheel for MulticoreTSNE ... error, CMake error at CMakeLists.txt:1 Failed to run MSBuild command.Then build failed error. I am using cmake version 3.11.0-rc4, Microsoft .Net Framework v4.0.30319.
I am getting this error:
$ pip install .
Processing c:\users\deep chatterjee\multicore-tsne
Requirement already satisfied: numpy in e:\anaconda3\lib\site-packages (from Mul ticoreTSNE==0.1) (1.14.5)
Requirement already satisfied: cffi in e:\anaconda3\lib\site-packages (from Mult icoreTSNE==0.1) (1.10.0)
Requirement already satisfied: pycparser in e:\anaconda3\lib\site-packages (from cffi->MulticoreTSNE==0.1) (2.18)
Building wheels for collected packages: MulticoreTSNE
Running setup.py bdist_wheel for MulticoreTSNE: started
Running setup.py bdist_wheel for MulticoreTSNE: finished with status 'error'
Complete output from command E:\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\Public\Documents\Wondershare\CreatorTemp\pip- req-build-fc9af9iu\setup.py';f=getattr(tokenize, 'open', open)(file);code=f .read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" b dist_wheel -d C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-wheel-c0r7vy ex --python-tag cp36:
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.6
creating build\lib.win-amd64-3.6\MulticoreTSNE
copying MulticoreTSNE_init_.py -> build\lib.win-amd64-3.6\MulticoreTSNE
creating build\lib.win-amd64-3.6\MulticoreTSNE\tests
copying MulticoreTSNE\tests\test_base.py -> build\lib.win-amd64-3.6\MulticoreT SNE\tests
copying MulticoreTSNE\tests_init_.py -> build\lib.win-amd64-3.6\MulticoreTS NE\tests
running egg_info
creating MulticoreTSNE.egg-info
writing MulticoreTSNE.egg-info\PKG-INFO
writing dependency_links to MulticoreTSNE.egg-info\dependency_links.txt
writing requirements to MulticoreTSNE.egg-info\requires.txt
writing top-level names to MulticoreTSNE.egg-info\top_level.txt
writing manifest file 'MulticoreTSNE.egg-info\SOURCES.txt'
reading manifest file 'MulticoreTSNE.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'MulticoreTSNE.egg-info\SOURCES.txt'
running build_ext
cmake version 3.11.0-rc4

CMake suite maintained and supported by Kitware (kitware.com/cmake).
-- Building for: Visual Studio 10 2010
CMake Error at CMakeLists.txt:1 (PROJECT):
Failed to run MSBuild command:

  C:/Windows/Microsoft.NET/Framework/v4.0.30319/MSBuild.exe

to get the value of VCTargetsPath:

  Microsoft (R) Build Engine version 4.6.1055.0
  [Microsoft .NET Framework, version 4.0.30319.42000]
  Copyright (C) Microsoft Corporation. All rights reserved.

  Build started 02-08-2018 17:04:52.
  Project "C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-f                                                                                                                c9af9iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcx                                                                                                                proj" on node 1 (default targets).
  C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu\b                                                                                                                uild\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj(14,2                                                                                                                ): error MSB4019: The imported project "C:\Microsoft.Cpp.Default.props" was not                                                                                                                 found. Confirm that the path in the <Import> declaration is correct, and that th                                                                                                                e file exists on disk.
  Done Building Project "C:\Users\Public\Documents\Wondershare\CreatorTemp\p                                                                                                                ip-req-build-fc9af9iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCT                                                                                                                argetsPath.vcxproj" (default targets) -- FAILED.

  Build FAILED.

  "C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu\                                                                                                                build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj" (d                                                                                                                efault target) (1) ->
    C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu                                                                                                                \build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj(14                                                                                                                ,2): error MSB4019: The imported project "C:\Microsoft.Cpp.Default.props" was no                                                                                                                t found. Confirm that the path in the <Import> declaration is correct, and that                                                                                                                 the file exists on disk.

      0 Warning(s)
      1 Error(s)

  Time Elapsed 00:00:00.06


Exit code: 1

-- Configuring incomplete, errors occurred!
See also "C:/Users/Public/Documents/Wondershare/CreatorTemp/pip-req-build-fc9a f9iu/build/temp.win-amd64-3.6/Release/CMakeFiles/CMakeOutput.log".

ERROR: Cannot generate Makefile. See above errors.


Failed building wheel for MulticoreTSNE
Running setup.py clean for MulticoreTSNE
Failed to build MulticoreTSNE
Installing collected packages: MulticoreTSNE
Running setup.py install for MulticoreTSNE: started
Running setup.py install for MulticoreTSNE: finished with status 'error'
Complete output from command E:\Anaconda3\python.exe -u -c "import setuptool s, tokenize;file='C:\Users\Public\Documents\Wondershare\CreatorTemp\pi p-req-build-fc9af9iu\setup.py';f=getattr(tokenize, 'open', open)(file);code =f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-record-r ni883bh\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.6
creating build\lib.win-amd64-3.6\MulticoreTSNE
copying MulticoreTSNE_init_.py -> build\lib.win-amd64-3.6\MulticoreTSNE
creating build\lib.win-amd64-3.6\MulticoreTSNE\tests
copying MulticoreTSNE\tests\test_base.py -> build\lib.win-amd64-3.6\Multicor eTSNE\tests
copying MulticoreTSNE\tests_init_.py -> build\lib.win-amd64-3.6\Multicore TSNE\tests
running egg_info
writing MulticoreTSNE.egg-info\PKG-INFO
writing dependency_links to MulticoreTSNE.egg-info\dependency_links.txt
writing requirements to MulticoreTSNE.egg-info\requires.txt
writing top-level names to MulticoreTSNE.egg-info\top_level.txt
reading manifest file 'MulticoreTSNE.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'MulticoreTSNE.egg-info\SOURCES.txt'
running build_ext
cmake version 3.11.0-rc4

CMake suite maintained and supported by Kitware (kitware.com/cmake).
-- Building for: Visual Studio 10 2010
CMake Error at CMakeLists.txt:1 (PROJECT):
  Failed to run MSBuild command:

    C:/Windows/Microsoft.NET/Framework/v4.0.30319/MSBuild.exe

  to get the value of VCTargetsPath:

    Microsoft (R) Build Engine version 4.6.1055.0
    [Microsoft .NET Framework, version 4.0.30319.42000]
    Copyright (C) Microsoft Corporation. All rights reserved.

    Build started 02-08-2018 17:04:57.
    Project "C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build                                                                                                                -fc9af9iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.v                                                                                                                cxproj" on node 1 (default targets).
    C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu                                                                                                                \build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj(14                                                                                                                ,2): error MSB4019: The imported project "C:\Microsoft.Cpp.Default.props" was no                                                                                                                t found. Confirm that the path in the <Import> declaration is correct, and that                                                                                                                 the file exists on disk.
    Done Building Project "C:\Users\Public\Documents\Wondershare\CreatorTemp                                                                                                                \pip-req-build-fc9af9iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\V                                                                                                                CTargetsPath.vcxproj" (default targets) -- FAILED.

    Build FAILED.

    "C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9i                                                                                                                u\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj"                                                                                                                 (default target) (1) ->
      C:\Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9                                                                                                                iu\build\temp.win-amd64-3.6\Release\CMakeFiles\3.11.0-rc4\VCTargetsPath.vcxproj(                                                                                                                14,2): error MSB4019: The imported project "C:\Microsoft.Cpp.Default.props" was                                                                                                                 not found. Confirm that the path in the <Import> declaration is correct, and tha                                                                                                                t the file exists on disk.

        0 Warning(s)
        1 Error(s)

    Time Elapsed 00:00:00.04


  Exit code: 1



-- Configuring incomplete, errors occurred!
See also "C:/Users/Public/Documents/Wondershare/CreatorTemp/pip-req-build-fc                                                                                                                9af9iu/build/temp.win-amd64-3.6/Release/CMakeFiles/CMakeOutput.log".

ERROR: Cannot generate Makefile. See above errors.

----------------------------------------

Command "E:\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C: \Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu\se tup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n' , '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Use rs\Public\Documents\Wondershare\CreatorTemp\pip-record-rni883bh\install-record.t xt --single-version-externally-managed --compile" failed with error code 1 in C: \Users\Public\Documents\Wondershare\CreatorTemp\pip-req-build-fc9af9iu\

Thanks in advance

Non-verbose crash on input containing NaN

Hello and thank you for creating this library!

Some students of mine got stuck trying to use this library when they where providing wrongfully data forgetting to impute the NaN values and the library would just crash without any message specifying the nature of the error.

I will be making a pull request shortly to address this case and to add an exception that explains what needs to be fixed.

thread exception

Hello.
Thanks for your job.
I try to use your library, but I have this exception:

Exception` in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.5/dist-packages/MulticoreTSNE/init.py", line 20, in run
self._target(*self._args)
TypeError: an integer is required

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.