Giter Club home page Giter Club logo

distributed-embeddings's Introduction

Documentation LICENSE

distributed-embeddings is a library for building large embedding based (e.g. recommender) models in Tensorflow 2. It provides a scalable model parallel wrapper that automatically distribute embedding tables to multiple GPUs, as well as efficient embedding operations that cover and extend Tensorflow's embedding functionalities.

Refer to NVIDIA Developer blog about Terabyte-scale Recommender Training for more details.

Features

Distributed Model Parallel Wrappers

dist_model_parallel contain tools to enable model parallel training by changing only three lines of your script. It can also be used alongside data parallel to form hybrid parallel training. Users can easily experiment large scale embeddings beyond single GPU's memory capacity without complex code to handle cross-worker communication.

To start model parallel, simply wrap a list of keras Embedding layers with dist_model_parallel.DistributedEmbedding

Embedding Layers

distributed_embeddings.Embedding combines functionalities of tf.keras.layers.Embedding and tf.nn.embedding_lookup_sparse under a unified Keras layer API. The backend is designed to achieve high GPU efficiency.

Input Key Mapping with IntergerLookup Layers

distributed_embeddings.IntegerLookup extends tf.keras.layers.IntegerLookup's functionality with on-the-fly vocabulary building. This allow user to start training directly from input keys without offline preprocessing. A highly optimized GPU backend is along with CPU support.

See more details at User Guide

Installation

Requirements

Python 3, CUDA 11 or newer, TensorFlow 2

Containers

You can build inside 22.03 or later NGC TF2 image:

Note: horovod v0.27 and TensorFlow 2.10, alternatively NGC 23.03 container, is required for building v0.3+

docker pull nvcr.io/nvidia/tensorflow:23.06-tf2-py3

Build from source

After clone this repository, run:

git submodule update --init --recursive
make pip_pkg && pip install artifacts/*.whl

Test installation with:

python -c "import distributed_embeddings"

You can also run Synthetic and DLRM examples.

Feedback and Support

If you'd like to contribute to the library directly, see the CONTRIBUTING.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.

If you're interested in learning more about how distributed-embeddings works, see documentation.

distributed-embeddings's People

Contributors

fdecayed avatar tgrel avatar edknv avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.