Giter Club home page Giter Club logo

nimble's Introduction

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Nimble is a deep learning execution engine that accelerates model inference and training by running GPU tasks (i.e., GPU kernels and memory operations) in parallel with minimal scheduling overhead. Given a PyTorch DL model, Nimble automatically generates a GPU task schedule, which employs an optimal parallelization strategy for the model. The schedule is wrapped in a Nimble object and can be seamlessly applied to PyTorch programs. Nimble improves the speed of inference and training by up to 22.34× and 3.61× compared to PyTorch, respectively. Moreover, Nimble outperforms TensorRT by up to 2.81×.

  • Speedup in Inference (ImageNet models)

Inference performance comparison on an NVIDIA V100 GPU.
  • Speedup in Training (CIFAR-10 models)
Batch 32 Batch 64 Batch 128

Training performance comparison on an NVIDIA V100 GPU.

Version

This version of Nimble is built on top of PyTorch v1.7.1 with CUDA 11.0. If you want to see the old version of Nimble we used for our experiments in the paper, please checkout to main_pytorch_v1.4.1.

Install Nimble

Please refer to instructions to install Nimble from source.

Use Nimble

Nimble supports both inference and training of neural networks.

Model Inference

import torch
import torchvision

# Instantiate a PyTorch Module and move it to a GPU
model = torchvision.models.resnet50()
model = model.cuda()
model.eval()

# Prepare a dummy input
input_shape = [1, 3, 224, 224]
dummy_input = torch.randn(*input_shape).cuda()

# Create a Nimble object
nimble_model = torch.cuda.Nimble(model)
nimble_model.prepare(dummy_input, training=False)

# Execute the object
rand_input = torch.rand(*input_shape).cuda()
output = nimble_model(rand_input)

Model Training

import torch
import torchvision

BATCH = 32

# Instantiate a PyTorch Module and move it to a GPU
model = torchvision.models.resnet50(num_classes=10)
model = model.cuda()
model.train()

# Define a loss function and an optimizer
loss_fn = torch.nn.CrossEntropyLoss().cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Prepare a dummy input
input_shape = [BATCH, 3, 32, 32]
dummy_input = torch.randn(*input_shape).cuda()

# Create a Nimble object
nimble_model = torch.cuda.Nimble(model)
nimble_model.prepare(dummy_input, training=True)

# Execute the forward pass
rand_input = torch.rand(*input_shape).cuda()
output = nimble_model(rand_input)

# Compute loss
label = torch.zeros(BATCH, dtype=torch.long).cuda()
loss = loss_fn(output, label)

# Execute the backward pass
loss.backward()

# Perform an optimization step
optimizer.step()

Reproduce Evaluation Results

Please refer to evaluation instructions to reproduce the evaluation results.

Publication

Woosuk Kwon*, Gyeong-In Yu*, Eunji Jeong, and Byung-Gon Chun (* equal contribution), Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning, 34th Conference on Neural Information Processing Systems (NeurIPS), Spotlight, December 2020.

Citation

@inproceedings{kwon2020nimble,
  title={Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning},
  author={Kwon, Woosuk and Yu, Gyeong-In and Jeong, Eunji and Chun, Byung-Gon},
  booktitle={NeurIPS},
  year={2020}
}

Troubleshooting

Create an issue for questions and bug reports.

Contribution

We welcome your contributions to Nimble! We aim to create an open-source project that is contributed by the open-source community. For general discussions about development, please subscribe to [email protected].

License

BSD 3-clause license

nimble's People

Contributors

ezyang avatar soumith avatar gchanan avatar apaszke avatar jerryzh168 avatar colesbury avatar zdevito avatar smessmer avatar yangqing avatar zou3519 avatar ssnl avatar suo avatar pietern avatar bddppq avatar houseroad avatar malfet avatar bwasti avatar goldsborough avatar zasdfgbnm avatar onnxbot avatar peterjc123 avatar xuhdev avatar mrshenli avatar jspark1105 avatar killeent avatar vishwakftw avatar wanchaol avatar rohan-varma avatar supriyar avatar alband avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.