Giter Club home page Giter Club logo

sillm's Introduction

sillm

SiLLM - Silicon LLM Training & Inference Toolkit

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework. Building upon the foundation provided by MLX Examples, this project introduces additional features specifically designed to enhance LLM operations with MLX in a streamlined package.

  • LLM Loading: load LLMs for chat and training in different formats (Huggingface, Torch, GGUF, MLX)
  • LoRA Training: train LLMs using Low-rank Adaptation
  • DPO Training: train LLMs with Direct Preference Optimization

Features

  • Web app for a seamless chat experience running on local hardware
  • API server with OpenAI compatible chat endpoints
  • Model architectures: Llama, Mistral, Mixtral, Phi-2, Gemma, Qwen2, Starcoder2, DBRX, Cohere Command-R
  • Conversation templates: llama-2, chatml, alpaca, vicuna, gemma, phi, openchat
  • Loss functions for DPO: sigmoid, hinge, IPO, DPOP
  • Training loss plots using matplotlib
  • Perplexity calculation

Installation

Using pip:

pip install sillm-mlx

Usage

Chat web application

The web app uses Chainlit to provide a frontend for conversational AI running locally on Apple Silicon hardware.

sillm-app.mov

To start the web app, clone the repository and start the app using chainlit:

git clone https://github.com/armbues/SiLLM.git
cd SiLLM/app
pip install -r requirements.txt
python -m chainlit run app.py -w

Set the environment variables SILLM_MODEL_DIR and SILLM_ADAPTER_DIR to load local models/adapters.

Command-line interface (CLI) scripts

Run the CLI scripts with the argument -h to see a print-out of all available arguments.

Chat:

Simple CLI interface for chatting with an LLM in the terminal.

python -m sillm.chat /path/to/model

Running sillm.chat in the terminal with Gemma-2B-it on a MacBook Air M2 with 16GB memory:

sillm-chat.mov

Server:

Run an API server with basic functionality compatible with OpenAI compatible chat endpoints.

python -m sillm.server /path/to/model --port 8000

LoRA Fine-tuning:

Fine-tune a model with low-rank adaptation (LoRA).

python -m sillm.lora /path/to/model -d /path/to/dataset -o /output/adapters

DPO Fine-tuning:

Fine-tune a model with LoRA and direct preference optimization (DPO).

python -m sillm.dpo /path/to/model -d /path/to/dataset -o /output/adapters

Conversion

Convert a model while merging adapters or quantizing the weights.

Example of merging an adapter into a model:

python -m sillm.convert /path/to/input/model /path/to/output/model -a /path/to/adapters

Quantization

Quantize a model serially (without loading it entirely into memory):

python -m sillm.quantize /path/to/input/model /path/to/output/model --bits 4

Python

Minimal example of loading a model with SiLLM and generating a text completion:

import sillm

model = sillm.load("/path/to/model")
for s, _ in model.generate("On a beautiful Sunday morning,"):
    print(s, flush=True, end="")

Examples

LoRA training Mistral-7B-Instruct-v0.2 with the Nvidia HelpSteer dataset.

DPO training Qwen1.5-7B-Chat with the DPO Mix 7K dataset. The training consists of a supervised fine tuning (SFT) followed by direct preference optimization (DPO).

Implementation of the "Massive Multitask Language Understanding" benchmark using the MMLU dataset.

Calculating perplexity scores for a sample dataset of entry paragraphs from Wikipedia articles.

Model Support

SiLLM generally supports loading LLMs of the following model architectures/families: Llama 2, Mistral, Mixtral, Gemma, Phi, Qwen 2, StarCoder2.

Here is a list of models that were successfully tested with SiLLM:

Model Family Models/Sizes (HF) Models/Sizes (GGUF) Models/Sizes (MLX)
Llama-2 7b-chat.Q8_0, 13b-chat.Q8_0 7b, 7b-chat
Mistral v0.2 7b-instruct-v0.2 7b-instruct-v0.2.Q8_0
Mixtral v0.1 8x7B-Instruct
Gemma 2b, 2b-it, 7b, 7b-it
Phi-2 2.7b
Qwen 1.5 7b-chat, 14b-chat
StarCoder2 3b, 7b, 15b
CodeLlama 70b-instruct.Q4_0, Phind-34b-v2.Q4_0
DBRX (currently not supported) dbrx-instruct-4bit
Cohere Command-R, Command-R+

Roadmap

  • Repetition penalty for inference
  • Learning rate schedulers for training
  • Merging models
  • Saving models to GGUF
  • Fine tuning with ORPO

License

This project uses the MIT License.

Acknowledgments

Big thanks to the Apple MLX team for implementing and maintaining the MLX framework that makes it possible to unlock the power of Apple Silicon and run/train LLMs on MacBooks and other Apple devices. Thank you to all the contributors of the MLX Examples project and developers sharing model implementations online. Last but not least, thank you to the larger community sharing open weights models, fine tunes, and datasets - without you all the gen AI progress would happen behind locked doors!

sillm's People

Contributors

armbues avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.