Checklist <input type

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="23

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Bug] `detokenize_incrementally`: OverflowError: out of range integral type conversion attempted about lmdeploy HOT 5 OPEN

josephrocca commented on August 16, 2024

[Bug] `detokenize_incrementally`: OverflowError: out of range integral type conversion attempted

from lmdeploy.

Comments (5)

medwang1 commented on August 16, 2024 1

#1619

感觉这几个是一个问题，content 内容很长，开的并发很高，就会触发这个问题

from lmdeploy.

zhyncs commented on August 16, 2024

Please provide the code for the client that can be used for reproduction, thanks.

from lmdeploy.

josephrocca commented on August 16, 2024

I will do my best to get a reliable reproduction of this detokenize issue. Possibly not related, since there doesn't seem to be any tokenizer issues in the logs, but maybe worth referencing since it has the same "an illegal memory access" message:

#1744

from lmdeploy.

medwang1 commented on August 16, 2024

@zhyncs 可以这样复现：

wrk -t10 -c100 -d30s -s 01_post.lua --latency http://0.0.0.0:8081/v1/chat/completions

01_post.lua file:

wrk.method = "POST"
wrk.body = [[
	{
		"model": "yi",
		"temperature": 0.7,
		"messages": [
			{
				"role": "user",
				"content": "worker_rlimit_nofile 是一个在 Nginx 或其他基于 Unix-like 系统的 Web 服务器配置中的指令，用于设置工作进程可以打开的最大文件描述符数。这个设置对于服务器性能有重要影响，因为它决定了服务器可以同时处理多少个并发连接。在这里，655350 是设置的具体数值。这个数值设置的相当高，意味着服务器配置了非常高的并发处理能力。在 Unix-like 系统中，文件描述符用于访问所有类型的文件，包括网络套接字。因此，增加这个限制可以让服务器处理更多的并发请求，特别是对于需要处理大量静态文件或者提供大量 Web 服务的场景。设置这个值通常需要服务器管理员有适当的权限，并且可能需要在系统级别进行相应的调整，因为操作系统也有自己的限制。在实际应用中，服务器管理员需要根据服务器的硬件资源、预期的负载以及实际的应用场景来合理设置这个值，以确保服务器既能充分利用资源，又不会因为超过系统限制而导致性能问题。"
			}
		],
		"stream": false,
		"max_tokens": 0
	}]]
wrk.headers["Content-Type"] = "application/json"

部署模型的模型是：
CUDA_VISIBLE_DEVICES=0 lmdeploy serve api_server ./Yi-1.5-9B-Chat --server-port 8081 --model-name yi --cache-max-entry-count 0.9 --tp 1 --session-len 4096 --enable-prefix-caching

from lmdeploy.