Giter Club home page Giter Club logo

llm-inference's Introduction

LLM Inference - Quickly Deploy Productive LLM Service

中文文档

LLM Inference is a large language model serving solution for deploying productive LLM services.

We gained a great deal of inspiration and motivation from this open source project. We are incredibly grateful to them for providing us with the chance to further explore and innovate by standing on the shoulders of giants.

image

With this solution, you can:

  • Rapidly deploy various LLMs on CPU/GPU.
  • Deploy LLMs on multiple nodes through Ray cluster.
  • Speed up inference by using vLLM engine to build LLM inference.
  • Utilize Restful API to manage inference of model.
  • Customize model deployment by YAML.
  • Compare model inferences.

More features in Roadmap are coming soon.

Getting started

Deploy locally

Install LLM Inference and dependencies

You can start by cloning the repository and pip install llm-serve. It is recommended to deploy llm-serve with Python 3.10+.

git clone https://github.com/OpenCSGs/llm-inference.git
cd llm-inference
pip install .

Option to use another pip source for faster transfer if needed.

pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple/

Install specified dependencies by components:

pip install '.[backend]'
pip install '.[frontend]'

Note: Install vllm dependency if runtime supports GPUs, run the following command:

pip install '.[vllm]'

Option to use other pip sources for faster transfers if needed.

pip install '.[backend]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install '.[frontend]' -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install '.[vllm]' -i https://pypi.tuna.tsinghua.edu.cn/simple/

Install Ray and start a Ray Cluster locally

Pip install Ray:

pip install -U "ray[serve-grpc]==2.8.0"

Option to use another pip source for faster transfer if needed.

pip install -U "ray[serve-grpc]==2.8.0" -i https://pypi.tuna.tsinghua.edu.cn/simple/

Note: ChatGLM2-6b requires transformers<=4.33.3, while the latest vllm requires transformers>=4.36.0.

Start cluster then:

ray start --head --port=6379 --dashboard-host=0.0.0.0 --dashboard-port=8265

See reference here.

Quick start

You can follow the quick start to run an end-to-end case for model serving.

Uninstall

Uninstall llm-serve package:

pip uninstall llm-serve

Then shutdown the Ray cluster:

ray stop

API server

See the guide for API server and API documents.

Deploy on bare metal

See the guide to deploy on bare metal.

Deploy on kubernetes

See the guide to deploy on kubernetes.

FAQ

How to use model from local path or git server or S3 storage or OpenCSG Hub

See the guide for how to use model from local path or git server or S3 storage.

How to add new models using LLMServe Model Registry

LLMServe allows you to easily add new models by adding a single configuration file. To learn more about how to customize or add new models, see the LLMServe Model Registry.

Developer Guide

See the Developer Guide for how to setup a development environment so you can get started contributing.

Common Issues

See the document for some common issues.

llm-inference's People

Contributors

jasonhe258 avatar depenglee1707 avatar seanhh86 avatar pulltheflower avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.