Giter Club home page Giter Club logo

phi3's Introduction

Phi3 optimization with Olive

This folder contains an example of optimizing the Phi-3-Mini-4K-Instruct model from Hugging Face or Azure Machine Learning Model Catalog for different hardware targets with Olive.

Prerequisites

Install the dependencies

pip install -r requirements.txt

If you target GPU, pls install onnxruntime and onnxruntime-genai gpu packages.

For optimizing model from Hugging Face

if you have not loged in Hugging Face account,

  • Install Hugging Face CLI and login your Hugging Face account for model access
huggingface-cli login

For optimizing model from Azure Machine Learning Model Catalog

  • Install Olive with Azure Machine Learining dependency
pip install olive-ai[azureml]

if you have not loged in Azure account,

  • Install Azure Command-Line Interface (CLI) following this link
  • Run az login to login your Azure account to allows Olive to access the model.

Usage

we will use the phi3.py script to fine-tune and optimize model for a chosen hardware target by running the following commands.

python phi3.py [--target HARDWARE_TARGET] [--precision DATA_TYPE] [--source SOURCE] [--finetune_method METHOD] [--inference] [--prompt PROMPT] [--max_length LENGTH]

# Examples
python phi3.py --target mobile

python phi3.py --target mobile --source AzureML

python phi3.py --target mobile --inference --prompt "Write a story starting with once upon a time" --max_length 200

python phi3.py --target cuda --finetune_method lora --inference --prompt "Write a story starting with once upon a time" --max_length 200
# qlora introduce the quantization into base model which is not supported by onnxruntime-genai as of now!
python phi3.py --target cuda --finetune_method qlora
  • --target: cpu, cuda, mobile, web
  • --finetune_method: optional. The method used for fine-tuning. Options: qlora, lora. Default is none. Note that onnxruntime-genai only supports lora method as of now.
  • --precision: optional, for data precision. fp32 or int4 (default) for cpu target; fp32, fp16, or int4 (default) for GPU target; int4 (default) for mobile or web.
  • --source: optional, for model path. HF or AzureML. HF(Hugging Face model) by default.
  • --inference: optional, for non-web models inference/validation.
  • --prompt: optional, the prompt text fed into the model. Take effect only when --inference is set.
  • --max_length: optional, the max length of the output from the model. Take effect only when --inference is set.

This script includes

  • Generate the Olive configuration file for the chosen HW target
  • Fine-tune model by lora or qlora method with dataset of nampdn-ai/tiny-codes.
  • Generate optimized model with Olive based on the configuration file for the chosen HW target
  • (optional) Inference the optimized model with ONNX Runtime Generate() API with non-web target

If you have an Olive configuration file, you can also run the olive command for model generation:

olive run [--config CONFIGURATION_FILE]

# Examples
olive run --config phi3_mobile_int4.json

More Inference Examples

phi3's People

Contributors

andmev avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.