Giter Club home page Giter Club logo

llm-api's Introduction

LLM API

This application can be used to run LLMs (Large Language Models) in docker containers, it's built in a generic way so that it can be reused for multiple types of models

tested with the following models:

  • Llama 7b
  • Llama 13b
  • Llama 30b
  • Llama 65b
  • Alpaca 7b
  • Alpaca 13b
  • Alpaca 30b
  • Vicuna 13b

Alpaca and Vicuna are based on the Llama Language model, these models are intended only for academic research and any commercial use is prohibited. This project doesn't provide any links to download these models.

Contribution for supporting more models is welcomed.

roadmap

  • Write an implementation for Alpaca
  • Write an implementation for Llama
  • Write an implementation for Vicuna
  • Write an implementation for OpenAI
  • Write an implementation for RWKV-LM

Usage

In order to run this API on a local machine, a running docker engine is needed.

run using docker:

create a config.yaml file with the configs described below and then run:

docker run -v $PWD/models/:/models:rw -v $PWD/config.yaml:/llm-api/config.yaml:ro -p 8000:8000 --ulimit memlock=16000000000 1b5d/llm-api

or use the docker-compose.yaml in this repo and run using compose:

docker compose up

When running for the first time, the app will download the model from huggingface based on the configurations in setup_params and name the local model file accordingly, on later runs it looks up the same local file and loads it into memory

Config

to configure the application, edit config.yaml which is mounted into the docker container, the config file looks like this:

models_dir: /models     # dir inside the container
model_family: alpaca
model_name: 7b
setup_params:
  key: value
model_params:
  key: value

setup_params and model_params are model specific, see below for model specific configs.

You can override any of the above mentioned configs using environment vars prefixed with LLM_API_ for example: LLM_API_MODELS_DIR=/models

Endpoints

In general all LLMs will have a generalized set of endpoints

POST /generate
{
    "prompt": "What is the capital of France?",
    "params": {
        ...
    }
}
POST /agenerate
{
    "prompt": "What is the capital of France?",
    "params": {
        ...
    }
}
POST /embeddings
{
    "text": "What is the capital of France?"
}

Llama / Alpaca

You can configure the model usage in a local config.yaml file, the configs, here is an example:

models_dir: /models     # dir inside the container
model_family: alpaca
model_name: 7b
setup_params:
  repo_id: user/repo_id
  filename: ggml-model-q4_0.bin
  convert: false
  migrate: false
model_params:
  ctx_size: 2000
  seed: -1
  n_threads: 8
  n_batch: 2048
  n_parts: -1
  last_n_tokens_size: 16

Fill repo_id and filename to a huggingface repo where a model is hosted, and let the application download it for you.

the following example shows the different params you can sent to Alpaca generate and agenerate endpoints:

POST /generate

curl --location 'localhost:8000/generate' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "What is the capital of paris",
    "params": {
        "n_predict": 300,
        "temp": 0.1,
        "top_k": 40,
        "top_p": 0.95,
        "stop": ["\Q"],
        "repeat_penalty": 1.3
    }
}'

Credits

credits goes to

  • llama.cpp for making it possible to run Llama and Alpaca models on CPU.
  • serge for providing an example on how to build an API using FastApi
  • llama-cpp-python for the python bindings lib for llama.cpp

llm-api's People

Contributors

1b5d avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.