Giter Club home page Giter Club logo

gidr-dun's Introduction

Gradient Dimensionality Reduction; an implementation of tSNE, UMAP, and GDR

Corresponds to the paper "ActUp: Analyzing and Consolidating tSNE and UMAP". Written by Andrew Draganov, Jakob Rødsgaard Jørgensen and Katrine Scheel Nellemann.

Overview

This library contains simplified and standalone implementations of TSNE and UMAP. We also include our generalization of them, which can recreate either one by flipping the normalization. Furthermore, we support multiple backends. We have the following implementations available:

  • UMAP
    • numba
    • cython
  • TSNE
    • cython
  • GDR (which can recreate both TSNE and UMAP embeddings)
    • numba
    • cython
    • gpu
    • pytorch (cpu/gpu)

On our machine, the cython implementation performs the gradient updates almost an order of magnitude faster than the numpy one. The gpu code is simply a barebones translation of the cython code into CUDA.

We have tried to trim all of the fat possible out of these algorithms. This will hopefully make the methods easier to extend for future research/design. This means that some features are missing, such as UMAP's ability to call .fit() and then .transform() (we only support .fit_transform()). Additionally, we default to the NNdescent nearest neighbors algorithm in all cases.

Despite this, we added the option to toggle all of the hyperparameters between the UMAP and TSNE algorithms. For example, you can run TSNE with UMAP's pseudo-distance metric and normalization. Or UMAP with TSNE's symmetrization and the standard Euclidean distance metric. etc. etc.

Installation

Numba install

If you'd like to simply install the numba versions of UMAP and GDR, then you are good to go with a simple

pip install GradientDR

You can then use it by calling

from GDR import GradientDR
dr = GradientDR()
dr.fit_transform(dataset)

Cython install

Cython requires OpenMP support. We must compile it against clang++ as gcc++ does not handle memory allocation the correct way when parallelizing.

  • Clone the repository and cd into it
  • Export your version of clang++ that has OpenMP support to the environment variable CLANG_PATH.
    • export CLANG_PATH=clang++ should work on linux machines
    • Mac users must first install llvm with OpenMP support and set the CLANG_PATH variable to point to that clang++ compiler
  • Run python setup_cython.py build_ext --inplace from the base directory in a python>=3.8 venv or conda environment

Run the cython implementations by

from GDR import GradientDR
dr = GradientDR(cython=True)
dr.fit_transform(dataset)

GPU install

  • Clone the repository and cd into it
  • Run make install_cuda_code from the base directory in a python>=3.8 venv or conda environment
    • We have tested for cuda 11, 11.3, 11.5, and 11.6.
    • This assumes that you have a default nvcc version. If this environment variable is not set, you must first export it
      • export nvcc=/usr/local/cuda-11.X/bin/nvcc

Run the gpu implementations by

from GDR import GradientDR
dr = GradientDR(gpu=True)
dr.fit_transform(dataset)

Usage

You can set up a model to run each algorithm by the following constructors:

  • UMAP -- dr = GradientDR(optimize_method='umap')
  • TSNE -- dr = GradientDR(optimize_method='tsne', cython=True)
    • Requires cython=True as TSNE's Barnes-Hut trees cannot be written into numba easily
  • GDR -- dr = GradientDR(optimize_method='gdr')
    • This is the default as it can reproduce both UMAP and TSNE by toggling the normalized parameter

Repo name

The name of the repository corresponds to when we wanted to use the acronym GiDR-DUN: Gradient Dimensionality Reduction; Differences between and UNification of tSNE and UMAP.

Contact

For questions, please raise an issue or email draganovandrew (at) cs.au.dk .

If you use this code, please cite our paper :)

gidr-dun's People

Contributors

andrew-draganov avatar jakobrj avatar katrinescheel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.