Giter Club home page Giter Club logo

nebullvm's Introduction

nebullvm nebuly logo

Nebullvm

nebullvmĀ is an open-source tool designed to speed up AI inference in just a few lines of code. nebullvm boosts your model to achieve the maximum acceleration that is physically possible on your hardware.

We are building a new AI inference acceleration product leveraging state-of-the-art open-source optimization tools enabling the optimization of the whole software to hardware stack. If you like the idea, give us a star to support the projectĀ ā­

nebullvm 22 08 29-01

The coreĀ nebullvmĀ workflow consists of 3 steps:

  • Select: input your model in your preferred DL framework and express your preferences regarding:
    • Accuracy loss: do you want to trade off a little accuracy for much higher performance?
    • Optimization time: stellar accelerations can be time-consuming. Can you wait, or do you need an instant answer?
  • Search: nebullvm automatically tests every combination of optimization techniques across the software-to-hardware stack (sparsity, quantization, compilers, etc.) that is compatible with your needs and local hardware.
  • Serve: finally, nebullvm chooses the best configuration of optimization techniques and returns an accelerated version of your model in the DL framework of your choice (just on steroidsĀ šŸš€).

API quick view

Only a single line of code is needed to get your accelerated model:

import torch
import torchvision.models as models
from nebullvm.api.functions import optimize_model

# Load a resnet as example
model = models.resnet50()

# Provide an input data for the model
input_data = [((torch.randn(1, 3, 256, 256), ), torch.tensor([0]))]

# Run nebullvm optimization in one line of code
optimized_model = optimize_model(
    model, input_data=input_data, optimization_time="constrained"
)

# Try the optimized model
x = torch.randn(1, 3, 256, 256)
res = optimized_model(x)

For more details, please visit Installation and Get started.

How it works

We are not here to reinvent the wheel, but to build an all-in-one open-source product to master all the available AI acceleration techniques and deliver the fastest AI ever. As a result, nebullvm leverages available enterprise-grade open-source optimization tools. If these tools and communities already exist, and are distributed under a permissive license (Apache, MIT, etc), we integrate them and happily contribute to their communities. However, many tools do not exist yet, in which case we implement them and open-source the code so that the community can benefit from it.

Product design

nebullvmĀ is shaped around 4 building blocks and leverages a modular design to foster scalability and integration of new acceleration components across the stack.

  • Converter: converts the input model from its original framework to the framework backends supported by nebullvm, namely PyTorch, TensorFlow, and ONNX. This allows the Compressor and Optimizer modules to apply any optimization technique to the model.
  • Compressor:Ā applies various compression techniques to the model, such as pruning, knowledge distillation, or quantization-aware training.
  • Optimizer:Ā converts the compressed models to the intermediate representation (IR) of the supported deep learning compilers. The compilers apply both post-training quantization techniques and graph optimizations, to produce compiled binary files.
  • Inference Learner:Ā takes the best performing compiled model and converts it to the same interface as the original input model.

nebullvm nebuly ai

TheĀ compressorĀ stage leverages the following open-source projects:

  • Intel/neural-compressor: targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
  • SparseML: libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models.

TheĀ optimizer stageĀ leverages the following open-source projects:

  • Apache TVM: open deep learning compiler stack for cpu, gpu and specialized accelerators.
  • BladeDISC: end-to-end Dynamic Shape Compiler project for machine learning workloads.
  • DeepSparse: neural network inference engine that delivers GPU-class performance for sparsified models on CPUs.
  • OpenVINO: open-source toolkit for optimizing and deploying AI inference.
  • ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
  • TensorRT: C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
  • TFliteĀ andĀ XLA: open-source libraries to accelerate TensorFlow models.

Documentation

Community

  • Discord: best for sharing your projects, hanging out with the community and learning about AI acceleration.
  • Github issues: ideal for suggesting new acceleration components, requesting new features, and reporting bugs and improvements.

Weā€™re developing nebullvm together with our community so the best way to get started is to pick a good-first issue. Please read our contribution guidelines for a deep dive on how to best contribute to our project!

Don't forget to leave a star ā­ to support the project and happy acceleration šŸš€

Status

  • Model converter backends
    • ONNX, PyTorch, TensorFlow
    • Jax
  • Compressor
    • Pruning and sparsity
    • Quantized-aware training, distillation, layer replacement and low rank compression
  • Optimizer
    • TensorRT, OpenVINO, ONNX Runtime, TVM, PyTorch, DeepSparse, BladeDisc
    • TFlite, XLA
  • Inference learners
    • PyTorch, ONNX, Hugging Face, TensorFlow
    • Jax

Join the community | Contribute to the library

Installation ā€¢ Get started ā€¢ Notebooks ā€¢ Benchmarks

nebullvm's People

Contributors

diegofiori avatar aurimgg avatar valeriosofi avatar emilecourthoud avatar kartikeyporwal avatar reiase avatar solomidhero avatar francis-oss avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.