Phi3 optimization with Olive

This folder contains an example of optimizing the Phi-3-Mini-4K-Instruct model from Hugging Face or Azure Machine Learning Model Catalog for different hardware targets with Olive.

Prerequisites

Install the dependencies

pip install -r requirements.txt

einops
Pytorch: >=2.2.0
The official website offers packages compatible with CUDA 11.8 and 12.1. Please select the appropriate version according to your needs.
Package onnxruntime: >=1.18.0
Package onnxruntime-genai: >=0.2.0.

If you target GPU, pls install onnxruntime and onnxruntime-genai gpu packages.

For optimizing model from Hugging Face

if you have not loged in Hugging Face account,

Install Hugging Face CLI and login your Hugging Face account for model access

huggingface-cli login

For optimizing model from Azure Machine Learning Model Catalog

Install Olive with Azure Machine Learining dependency

pip install olive-ai[azureml]

if you have not loged in Azure account,

Install Azure Command-Line Interface (CLI) following this link
Run az login to login your Azure account to allows Olive to access the model.

Usage

we will use the phi3.py script to fine-tune and optimize model for a chosen hardware target by running the following commands.

python phi3.py [--target HARDWARE_TARGET] [--precision DATA_TYPE] [--source SOURCE] [--finetune_method METHOD] [--inference] [--prompt PROMPT] [--max_length LENGTH]

# Examples
python phi3.py --target mobile

python phi3.py --target mobile --source AzureML

python phi3.py --target mobile --inference --prompt "Write a story starting with once upon a time" --max_length 200

python phi3.py --target cuda --finetune_method lora --inference --prompt "Write a story starting with once upon a time" --max_length 200
# qlora introduce the quantization into base model which is not supported by onnxruntime-genai as of now!
python phi3.py --target cuda --finetune_method qlora

--target: cpu, cuda, mobile, web
--finetune_method: optional. The method used for fine-tuning. Options: qlora, lora. Default is none. Note that onnxruntime-genai only supports lora method as of now.
--precision: optional, for data precision. fp32 or int4 (default) for cpu target; fp32, fp16, or int4 (default) for GPU target; int4 (default) for mobile or web.
--source: optional, for model path. HF or AzureML. HF(Hugging Face model) by default.
--inference: optional, for non-web models inference/validation.
--prompt: optional, the prompt text fed into the model. Take effect only when --inference is set.
--max_length: optional, the max length of the output from the model. Take effect only when --inference is set.

This script includes

Generate the Olive configuration file for the chosen HW target
Fine-tune model by lora or qlora method with dataset of nampdn-ai/tiny-codes.
Generate optimized model with Olive based on the configuration file for the chosen HW target
(optional) Inference the optimized model with ONNX Runtime Generate() API with non-web target

If you have an Olive configuration file, you can also run the olive command for model generation:

olive run [--config CONFIGURATION_FILE]

# Examples
olive run --config phi3_mobile_int4.json

andmev / phi3 Goto Github PK

phi3's Introduction

Phi3 optimization with Olive

Prerequisites

For optimizing model from Hugging Face

For optimizing model from Azure Machine Learning Model Catalog

Usage

More Inference Examples

phi3's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent