PPL LLM Serving

Overview

ppl.llm.serving is a part of PPL.LLM system.

We recommend users who are new to this project to read the Overview of system.

ppl.llm.serving is a serving based on ppl.nn.llm for various Large Language Models(LLMs). This repository contains a server based on gRPC and inference support for LLaMA.

Prerequisites

Linux running on x86_64 or arm64 CPUs
GCC >= 9.4.0
CMake >= 3.18
Git >= 2.7.0
CUDA Toolkit >= 11.4. 11.6 recommended. (for CUDA)

Quick Start

Here is a brief tutorial, refer to LLaMA Guide for more details.

Installing Prerequisites(on Debian or Ubuntu for example)
```
apt-get install build-essential cmake git
```

Cloning Source Code

git clone https://github.com/openppl-public/ppl.llm.serving.git

Building from Source

./build.sh -DPPLNN_USE_LLM_CUDA=ON -DPPLNN_CUDA_ENABLE_NCCL=ON -DPPLNN_ENABLE_CUDA_JIT=OFF -DPPLNN_CUDA_ARCHITECTURES="'80;86;87'" -DPPLCOMMON_CUDA_ARCHITECTURES="'80;86;87'"

NCCL is required if multiple GPU devices are used.

Exporting Models

Refer to ppl.pmx for details.
Running Server
```
./ppl-build/ppl_llama_server /path/to/server/config.json
```
Server config examples can be found in src/models/llama/conf. You are expected to give the correct values before running the server.
- model_dir: path of models exported by ppl.pmx.
- model_param_path: params of models. $model_dir/params.json.
- tokenizer_path: tokenizer files for sentencepiece.
Running client: send request through gRPC to query the model
```
./ppl-build/client_sample 127.0.0.1:23333
```
See tools/client_sample.cc for more details.
Benchmarking
```
./ppl-build/client_qps_measure --target=127.0.0.1:23333 --tokenizer=/path/to/tokenizer/path --dataset=tools/samples_1024.json --request_rate=inf
```
See tools/client_qps_measure.cc for more details. --request_rate is the number of request per second, and value inf means send all client request with no interval.
Running inference offline:
```
./ppl-build/offline_inference /path/to/server/config.json
```
See tools/offline_inference.cc for more details.

License

This project is distributed under the Apache License, Version 2.0.

hijdk / ppl.llm.serving Goto Github PK

ppl.llm.serving's Introduction

PPL LLM Serving

Overview

Prerequisites

Quick Start

License

ppl.llm.serving's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent