Comments (6)
@ywang96 couple of these are implemented in a branch. Will triage and help get merged
from vllm.
They do look useful to me! Looking forward to the contribution! Also adding @ywang96 for awareness.
from vllm.
This is great - thank you @achandrasekar!
Note that a few metrics in the list (e.g, request_input_length, request_output_length) are already supported by vLLM, so it would be great to consolidate them in your upcoming contribution. I do think we're currently missing a metric related to queue time, which is very important to decide when to scale up inference services.
from vllm.
@gyliu513 yes, thanks for bringing this up. We discussed this in the last LLM Semantic Conventions meeting. I've created an issue(open-telemetry/semantic-conventions#1102) and an initial PR(open-telemetry/semantic-conventions#1103) to create the server metrics. Let's collaborate there. We can also discuss in the next semconv meeting.
from vllm.
+1 would be great to have these!!!
from vllm.
@achandrasekar do you mind bring this to otel semantic convention team meeting and discuss there as well? We are working for LLM Semantic Convetion, and this is an area that we do not have now.
An related issue in otel semantic convention open-telemetry/semantic-conventions#1079
Here is the meeting info https://docs.google.com/document/d/1EKIeDgBGXQPGehUigIRLwAUpRGa7-1kXB736EaYuJ2M/edit#heading=h.ylazl6464n0c
from vllm.
Related Issues (20)
- [Bug]: error: triton_flash_attention.py
- [Usage]: TimeoutError()
- [Misc]: output text is not match with first top_logprobs item on stream mode.
- max_tokens must be at least 1, got -160
- [Performance]: About the use of flash_attn_varlen_func() HOT 1
- [Usage]: NVIDIA多型号的GPU如何利用到? HOT 1
- [Feature]: support torch 2.3.1 HOT 1
- [Misc]: 我在使用vllm启动的openai api在进行对话时出现这样的情况 HOT 2
- [Feature]: Continuous streaming of `UsageInfo` HOT 1
- [Usage]: Running Llama 3 70B on A100 GPU - Tried to allocate 160MiB. GPU HOT 2
- [Usage]: Is this an error ? "async_llm_engine.py:154] Aborted request cmpl-xxxxx"
- [Bug]: "Triton Error [CUDA]: device kernel image is invalid" when loading Mixtral-8x7B-Instruct-v0.1 in fused_moe.py HOT 1
- [RFC]: proper resource cleanup for LLM class with file-like usage HOT 10
- [New Model]: Chameleon support HOT 1
- [Feature]: Support Nemotron-4-340B HOT 1
- [RFC]: Add runtime weight update API HOT 3
- [Usage]: qwen2-1.5b-gptq-in4 single gpu multiprocessing deployment fail
- [Bug]: Two V100 server with a total of 16GPU running Distributed Inference and Serving Vllm with error HOT 7
- [Misc]: how to understand: NUM_ELEMS_PER_THREAD = HEAD_SIZE / THREAD_GROUP_SIZE
- [Bug]: asyncio.exceptions.CancelledError asyncio.exceptions.TimeoutError HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.