Comments (5)
感觉这几个是一个问题,content 内容很长,开的并发很高,就会触发这个问题
from lmdeploy.
Please provide the code for the client that can be used for reproduction, thanks.
from lmdeploy.
I will do my best to get a reliable reproduction of this detokenize issue. Possibly not related, since there doesn't seem to be any tokenizer issues in the logs, but maybe worth referencing since it has the same "an illegal memory access" message:
from lmdeploy.
@zhyncs 可以这样复现:
wrk -t10 -c100 -d30s -s 01_post.lua --latency http://0.0.0.0:8081/v1/chat/completions
01_post.lua
file:
wrk.method = "POST"
wrk.body = [[
{
"model": "yi",
"temperature": 0.7,
"messages": [
{
"role": "user",
"content": "worker_rlimit_nofile 是一个在 Nginx 或其他基于 Unix-like 系统的 Web 服务器配置中的指令,用于设置工作进程可以打开的最大文件描述符数。这个设置对于服务器性能有重要影响,因为它决定了服务器可以同时处理多少个并发连接。在这里,655350 是设置的具体数值。这个数值设置的相当高,意味着服务器配置了非常高的并发处理能力。在 Unix-like 系统中,文件描述符用于访问所有类型的文件,包括网络套接字。因此,增加这个限制可以让服务器处理更多的并发请求,特别是对于需要处理大量静态文件或者提供大量 Web 服务的场景。设置这个值通常需要服务器管理员有适当的权限,并且可能需要在系统级别进行相应的调整,因为操作系统也有自己的限制。在实际应用中,服务器管理员需要根据服务器的硬件资源、预期的负载以及实际的应用场景来合理设置这个值,以确保服务器既能充分利用资源,又不会因为超过系统限制而导致性能问题。"
}
],
"stream": false,
"max_tokens": 0
}]]
wrk.headers["Content-Type"] = "application/json"
部署模型的模型是:
CUDA_VISIBLE_DEVICES=0 lmdeploy serve api_server ./Yi-1.5-9B-Chat --server-port 8081 --model-name yi --cache-max-entry-count 0.9 --tp 1 --session-len 4096 --enable-prefix-caching
from lmdeploy.
@lzhangzz could you please investigate this issue?
from lmdeploy.
Related Issues (20)
- i want to run profile_throughput.py using the smooth_quant model. Why did an error occur? HOT 3
- [Bug] 使用 lmdeploy 部署 internVL2-40B-AWQ, 容器中有triton环境,但是在triton环境检查时报错
- [Bug] 通过lmdeploy上线 Qwen-vl及其lora,但检查后发现lora并没有上线成功 HOT 3
- [Bug] Lmdeploy LLM Llama3在4090单卡和双卡上的推理结果不一致
- [Feature] multi-node training HOT 2
- [Bug] LMDeploy docker image with finetuned InternVL model doesnt work HOT 1
- [Bug] lmdeploy卡住,不能接收任何请求 HOT 3
- smooth 量化后推理性能没有提升 HOT 1
- [Feature] Add `logits_processor` to `GenerationConfig` HOT 3
- CPU offload when InternVL2-40B inference using lmdeploy.pipeline HOT 1
- [Docs] llava-llama3的图片预处理和前向推理过程 HOT 2
- [Bug] internvl2-2b使用awq量化后,推理速度基本上没有提升,精度还掉点 HOT 4
- [Bug] lmdeploy部署报错API call is not supported in the installed CUDA driver HOT 5
- [Bug] 一张卡上部署多个模型 HOT 3
- question about implements LRU policy
- [Feature] Support InternVL2-1B with the Turbomind Engine?
- 能否支持InternVL2-8B量化,有无相关文档 HOT 1
- [Bug] lmdeploy - ERROR - run out of tokens. session_id=1 HOT 1
- Scale out llm model deployment across different machine gpu's HOT 1
- [Feature] 能否在新版本中增加SM60级别的N卡适配
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lmdeploy.