Giter Club home page Giter Club logo

optimum-benchmark's Introduction

Optimum-Benchmark Logo

All benchmarks are wrong, some will cost you less than others.

Optimum-Benchmark ๐Ÿ‹๏ธ

Optimum-Benchmark is a unified multi-backend & multi-device utility for benchmarking Transformers, Diffusers, PEFT, TIMM and Optimum libraries, along with all their supported optimizations & quantization schemes, for inference & training, in distributed & non-distributed settings, in the most correct, efficient and scalable way possible.

News ๐Ÿ“ฐ

  • PYPI release soon.
  • Added a Python API to run benchmarks with isolation, distribution and tracking features supported by the library.

Motivations ๐Ÿค”

  • HuggingFace hardware partners wanting to know how their hardware performs compared to another hardware on the same models.
  • HuggingFace ecosystem users wanting to know how their chosen model performs in terms of latency, throughput, memory usage, energy consumption, etc compared to another model.
  • Experimenting with hardware & backend specific optimizations & quantization schemes that can be applied to models and improve their computational/memory/energy efficiency.
  • [...]

ย 

Note

Optimum-Benchmark is a work in progress and is not yet ready for production use, but we're working hard to make it so. Please keep an eye on the project and help us improve it and make it more useful for the community.

ย 

CI Status ๐Ÿšฆ

Optimum-Benchmark is continuously and intensively tested on a variety of devices, backends, benchmarks and launchers to ensure its stability with over 300 tests running on every PR (you can request more tests if you want to).

API ๐Ÿ“ˆ

API_CPU API_CUDA API_MISC API_ROCM

CLI ๐Ÿ“ˆ

CLI_CPU_NEURAL_COMPRESSOR CLI_CPU_ONNXRUNTIME CLI_CPU_OPENVINO CLI_CPU_PYTORCH CLI_CPU_PY_TXI CLI_CUDA_ONNXRUNTIME CLI_CUDA_PYTORCH_MULTI_GPU CLI_CUDA_PYTORCH_SINGLE_GPU CLI_CUDA_TENSORRT_LLM CLI_CUDA_TORCH_ORT_MULTI_GPU CLI_CUDA_TORCH_ORT_SINGLE_GPU CLI_MISC CLI_ROCM_PYTORCH_MULTI_GPU CLI_ROCM_PYTORCH_SINGLE_GPU

Quickstart ๐Ÿš€

Installation ๐Ÿ“ฅ

You can install optimum-benchmark using pip:

pip install optimum-benchmark@git+https://github.com/huggingface/optimum-benchmark.git

or if you want to tinker with the code, you can clone the repository and install it in editable mode:

git clone https://github.com/huggingface/optimum-benchmark.git
cd optimum-benchmark
pip install -e .
Advanced install options

Depending on the backends you want to use, you can install optimum-benchmark with the following extras:

  • PyTorch (default): pip install optimum-benchmark
  • OpenVINO: pip install optimum-benchmark[openvino]
  • Torch-ORT: pip install optimum-benchmark[torch-ort]
  • OnnxRuntime: pip install optimum-benchmark[onnxruntime]
  • TensorRT-LLM: pip install optimum-benchmark[tensorrt-llm]
  • OnnxRuntime-GPU: pip install optimum-benchmark[onnxruntime-gpu]
  • Intel Neural Compressor: pip install optimum-benchmark[neural-compressor]
  • Py-TXI: pip install optimum-benchmark[py-txi]

Running backend benchmarks using the Python API ๐Ÿงช

You can run benchmarks from the Python API, using the launch entrypoint. It takes an ExperimentConfig object as input and returns a BenchmarkReport object containing the benchmark results. The use of configuration files is optional, but recommended for utmost correctness and reproducibility of benchmarks.

Here's an example of how to run a benchmark using the pytorch backend, torchrun launcher and inference benchmark.

from optimum_benchmark.experiment import launch, ExperimentConfig
from optimum_benchmark.backends.pytorch.config import PyTorchConfig
from optimum_benchmark.launchers.torchrun.config import TorchrunConfig
from optimum_benchmark.benchmarks.inference.config import InferenceConfig

if __name__ == "__main__":
    launcher_config = TorchrunConfig(nproc_per_node=2)
    benchmark_config = InferenceConfig(latency=True, memory=True)
    backend_config = PyTorchConfig(model="gpt2", device="cuda", device_ids="0,1", no_weights=True)
    experiment_config = ExperimentConfig(
        experiment_name="api-launch",
        benchmark=benchmark_config,
        launcher=launcher_config,
        backend=backend_config,
    )

    benchmark_report = launch(experiment_config)

    # push artifacts to the hub
    experiment_config.push_to_hub("IlyasMoutawwakil/benchmarks")
    benchmark_report.push_to_hub("IlyasMoutawwakil/benchmarks")

If you're on VSCode, you can hover over the configuration classes to see the available parameters and their descriptions. Documentation will be available soon (help is welcome!).

Running backend benchmarks using the Hydra CLI ๐Ÿงช

You can also run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory for hydra. --config-dir is the directory where the configuration files are stored and --config-name is the name of the configuration file without its .yaml extension.

optimum-benchmark --config-dir examples/ --config-name pytorch_bert

This will run the benchmark using the configuration in examples/pytorch_bert.yaml and store the results in runs/pytorch_bert.

The result files are benchmark_report.json, the program's logs cli.log and the configuration that's been used experiment_config.json, including backend, launcher, benchmark and environment information.

Advanced CLI options

Configuration overrides ๐ŸŽ›๏ธ

It's easy to override the default behavior of a benchmark from the command line of an already existing configuration file. For example, to run the same benchmark on a different device, you can use the following command:

optimum-benchmark --config-dir examples/ --config-name pytorch_bert backend.model=gpt2 backend.device=cuda

Configuration sweeps ๐Ÿงน

You can easily run configuration sweeps using the --multirun option. By default, configurations will be executed serially but other kinds of executions are supported with hydra's launcher plugins (e.g. hydra/launcher=joblib).

optimum-benchmark --config-dir examples --config-name pytorch_bert -m backend.device=cpu,cuda

Configurations structure ๐Ÿ“

You can create custom and more complex configuration files following these examples. They are heavily commented to help you understand the structure of the configuration files.

Features ๐ŸŽจ

optimum-benchmark allows you to run backend benchmarks with minimal configuration. A backend benchmark is defined by three main components:

  • The launcher to use (e.g. process)
  • The benchmark to run (e.g. training)
  • The backend to run on (e.g. onnxruntime)

Launchers ๐Ÿš€

  • Isolated process launcher (launcher=process).
  • Distributed inference/training launcher (launcher=torchrun).
  • Inline launcher (launcher=inline), not recommended for benchmarking.
General Launcher features ๐Ÿงฐ
  • Assert GPU devices (NVIDIA & AMD) isolation (launcher.device_isolation=true). This feature makes sure no other processes are running on the targeted GPU devices other than the benchmark. Espepecially useful when running benchmarks on shared resources.

Benchmarks ๐Ÿ‹

  • Training benchmark (benchmark=training) which benchmarks the model using the trainer class with a randomly generated dataset.
  • Inference benchmark (benchmark=inference) which benchmakrs the model's inference method (forward/call/generate) with randomly generated inputs.
Inference benchmark features ๐Ÿงฐ
  • Memory tracking (benchmark.memory=true)
  • Energy and efficiency tracking (benchmark.energy=true)
  • Latency and throughput tracking (benchmark.latency=true)
  • Warm up runs before inference (benchmark.warmup_runs=20)
  • Inputs shapes control (e.g. benchmark.input_shapes.sequence_length=128)
  • Forward, Call and Generate kwargs (e.g. for an LLM benchmark.generate_kwargs.max_new_tokens=100, for a diffusion model benchmark.call_kwargs.num_images_per_prompt=4)
Training benchmark features ๐Ÿงฐ
  • Memory tracking (benchmark.memory=true)
  • Energy and efficiency tracking (benchmark.energy=true)
  • Latency and throughput tracking (benchmark.latency=true)
  • Warm up steps before training (benchmark.warmup_steps=20)
  • Dataset shapes control (e.g. benchmark.dataset_shapes.sequence_length=128)
  • Training arguments control (e.g. benchmark.training_args.per_device_train_batch_size=4)

Backends & Devices ๐Ÿ“ฑ

  • Pytorch backend for CPU (backend=pytorch, backend.device=cpu)
  • Pytorch backend for CUDA (backend=pytorch, backend.device=cuda, backend.device_ids=0,1)
  • Pytorch backend for Habana Gaudi Processor (backend=pytorch, backend.device=hpu, backend.device_ids=0,1)
  • OnnxRuntime backend for CPUExecutionProvider (backend=onnxruntime, backend.device=cpu)
  • OnnxRuntime backend for CUDAExecutionProvider (backend=onnxruntime, backend.device=cuda)
  • OnnxRuntime backend for ROCMExecutionProvider (backend=onnxruntime, backend.device=cuda, backend.provider=ROCMExecutionProvider)
  • OnnxRuntime backend for TensorrtExecutionProvider (backend=onnxruntime, backend.device=cuda, backend.provider=TensorrtExecutionProvider)
  • Py-TXI backend for CPU and GPU (backend=py-txi, backend.device=cpu or backend.device=cuda)
  • Intel Neural Compressor backend for CPU (backend=neural-compressor, backend.device=cpu)
  • TensorRT-LLM backend for CUDA (backend=tensorrt-llm, backend.device=cuda)
  • Torch-ORT backend for CUDA (backend=torch-ort, backend.device=cuda)
  • OpenVINO backend for CPU (backend=openvino, backend.device=cpu)
  • OpenVINO backend for GPU (backend=openvino, backend.device=gpu)
General backend features ๐Ÿงฐ
  • Model selection (backend.model=gpt2), can be a model id from the HuggingFace model hub or an absolute path to a model folder.
  • Device selection (backend.device=cuda), can be cpu, cuda, mps, etc.
  • Device ids selection (backend.device_ids=0,1), can be a list of device ids to run the benchmark on multiple devices.
  • "No weights" feature, to benchmark models without downloading their weights (backend.no_weights=true)
Backend specific features ๐Ÿงฐ

For more information on the features of each backend, you can check their respective configuration files:

Contributing ๐Ÿค

Contributions are welcome! And we're happy to help you get started. Feel free to open an issue or a pull request. Things that we'd like to see:

  • More backends (Tensorflow, TFLite, Jax, etc).
  • More tests (for optimizations and quantization schemes).
  • More hardware support (Habana Gaudi Processor (HPU), Apple M series, etc).
  • Task evaluators for the most common tasks (would be great for output regression).

To get started, you can check the CONTRIBUTING.md file.

optimum-benchmark's People

Contributors

ilyasmoutawwakil avatar fxmarty avatar tomaarsen avatar aliabdelkader avatar actions-user avatar helena-intel avatar karthickai avatar lopozz avatar aoowweenn avatar benhachy avatar poznano-amd avatar regisss avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.