🚀 The feature, motivation and pitch vLLM provides some metrics on

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This is great - thank you <a class="user-mention notranslate" data-hovercard-type="use

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes about vllm HOT 6 OPEN

achandrasekar commented on August 24, 2024 8

[Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes

from vllm.

Comments (6)

robertgshaw2-neuralmagic commented on August 24, 2024 2

@ywang96 couple of these are implemented in a branch. Will triage and help get merged

from vllm.

simon-mo commented on August 24, 2024 1

They do look useful to me! Looking forward to the contribution! Also adding @ywang96 for awareness.

from vllm.

ywang96 commented on August 24, 2024 1

This is great - thank you @achandrasekar!

Note that a few metrics in the list (e.g, request_input_length, request_output_length) are already supported by vLLM, so it would be great to consolidate them in your upcoming contribution. I do think we're currently missing a metric related to queue time, which is very important to decide when to scale up inference services.

from vllm.

achandrasekar commented on August 24, 2024 1

@gyliu513 yes, thanks for bringing this up. We discussed this in the last LLM Semantic Conventions meeting. I've created an issue(open-telemetry/semantic-conventions#1102) and an initial PR(open-telemetry/semantic-conventions#1103) to create the server metrics. Let's collaborate there. We can also discuss in the next semconv meeting.

from vllm.

davidgxue commented on August 24, 2024

+1 would be great to have these!!!

from vllm.

gyliu513 commented on August 24, 2024

@achandrasekar do you mind bring this to otel semantic convention team meeting and discuss there as well? We are working for LLM Semantic Convetion, and this is an area that we do not have now.

An related issue in otel semantic convention open-telemetry/semantic-conventions#1079

Here is the meeting info https://docs.google.com/document/d/1EKIeDgBGXQPGehUigIRLwAUpRGa7-1kXB736EaYuJ2M/edit#heading=h.ylazl6464n0c

from vllm.

[Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes about vllm HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent