Giter Club home page Giter Club logo

ppl.llm.serving's Introduction

PPL LLM Serving

Overview

ppl.llm.serving is a part of PPL.LLM system.

SYSTEM_OVERVIEW

We recommend users who are new to this project to read the Overview of system.

ppl.llm.serving is a serving based on ppl.nn.llm for various Large Language Models(LLMs). This repository contains a server based on gRPC and inference support for LLaMA.

Prerequisites

  • Linux running on x86_64 or arm64 CPUs
  • GCC >= 9.4.0
  • CMake >= 3.18
  • Git >= 2.7.0
  • CUDA Toolkit >= 11.4. 11.6 recommended. (for CUDA)

Quick Start

Here is a brief tutorial, refer to LLaMA Guide for more details.

  • Installing Prerequisites(on Debian or Ubuntu for example)

    apt-get install build-essential cmake git
  • Cloning Source Code

    git clone https://github.com/openppl-public/ppl.llm.serving.git
  • Building from Source

    ./build.sh -DPPLNN_USE_LLM_CUDA=ON -DPPLNN_CUDA_ENABLE_NCCL=ON -DPPLNN_ENABLE_CUDA_JIT=OFF -DPPLNN_CUDA_ARCHITECTURES="'80;86;87'" -DPPLCOMMON_CUDA_ARCHITECTURES="'80;86;87'"

    NCCL is required if multiple GPU devices are used.

  • Exporting Models

    Refer to ppl.pmx for details.

  • Running Server

    ./ppl-build/ppl_llama_server /path/to/server/config.json

    Server config examples can be found in src/models/llama/conf. You are expected to give the correct values before running the server.

    • model_dir: path of models exported by ppl.pmx.
    • model_param_path: params of models. $model_dir/params.json.
    • tokenizer_path: tokenizer files for sentencepiece.
  • Running client: send request through gRPC to query the model

    ./ppl-build/client_sample 127.0.0.1:23333

    See tools/client_sample.cc for more details.

  • Benchmarking

    ./ppl-build/client_qps_measure --target=127.0.0.1:23333 --tokenizer=/path/to/tokenizer/path --dataset=tools/samples_1024.json --request_rate=inf

    See tools/client_qps_measure.cc for more details. --request_rate is the number of request per second, and value inf means send all client request with no interval.

  • Running inference offline:

    ./ppl-build/offline_inference /path/to/server/config.json

    See tools/offline_inference.cc for more details.

License

This project is distributed under the Apache License, Version 2.0.

ppl.llm.serving's People

Contributors

ouonline avatar alcanderian avatar vincent-syr avatar openppl-public avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.