the api seems similar to vllm. would love to see a latency / through

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

benchmark against vllm? about aphrodite-engine HOT 3 CLOSED

pygmalionai commented on June 11, 2024

benchmark against vllm?

from aphrodite-engine.

Comments (3)

AlpinDale commented on June 11, 2024

Hi! Thanks for your interest.

Yup, we do make heavy use of vLLM so it's quite similar. Aphrodite is designed more to be used in-house for our future website, so our main focus won't really be supporting a user-base with this.

I'll keep it in mind to run a few benchmarks for the attention and get us some metrics when it's done.

from aphrodite-engine.

ishaan-jaff commented on June 11, 2024

Hi @AlpinDale i'm the maintainer of LiteLLM and we allow you to maximize throughput by load balancing between multiple LLM endpoints.
Thought it would be useful for you, I'd love feedback if not

Here's the quick start, to use LiteLLM load balancer (works with 100+ LLMs)
doc: https://docs.litellm.ai/docs/simple_proxy#model-alias

Step 1 Create a Config.yaml

model_list:
- model_name: openhermes
  litellm_params:
      model: openhermes
      temperature: 0.6
      max_tokens: 400
      custom_llm_provider: "openai"
      api_base: http://192.168.1.23:8000/v1
- model_name: openhermes
  litellm_params:
      model: openhermes
      custom_llm_provider: "openai"
      api_base: http://192.168.1.23:8001/v1
- model_name: openhermes
  litellm_params:
      model: openhermes
      custom_llm_provider: "openai"
      frequency_penalty : 0.6
      api_base: http://192.168.1.23:8010/v1

Step 2: Start the litellm proxy:

litellm --config /path/to/config.yaml

Step3 Make Request to LiteLLM proxy:

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "openhermes",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

from aphrodite-engine.

RonanKMcGovern commented on June 11, 2024

Hi! Thanks for your interest.

Yup, we do make heavy use of vLLM so it's quite similar. Aphrodite is designed more to be used in-house for our future website, so our main focus won't really be supporting a user-base with this.

I'll keep it in mind to run a few benchmarks for the attention and get us some metrics when it's done.

Any chance there are some benchmarks vs vLLM?

Seems you're implying the main benefit is the fp8 attention option? (But no flash decoding? Unlike TGI?)

from aphrodite-engine.

Recommend Projects

benchmark against vllm? about aphrodite-engine HOT 3 CLOSED

Comments (3)

Step 1 Create a Config.yaml

Step 2: Start the litellm proxy:

Step3 Make Request to LiteLLM proxy:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent