<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

GPU usage increasing continually while profile_serving benchmark about lmdeploy HOT 4 CLOSED

internlm commented on July 17, 2024

GPU usage increasing continually while profile_serving benchmark

from lmdeploy.

Comments (4)

tpoisonooo commented on July 17, 2024

fp16 weight requires 16G, and each concurrency requires 1GB to save kv_cache. As the concurrency increases, a total of 30+ GB is normal.

Try int8 quantization to save memory.

from lmdeploy.

tpoisonooo commented on July 17, 2024

https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization.md

from lmdeploy.

github-actions commented on July 17, 2024

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

from lmdeploy.

github-actions commented on July 17, 2024

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

from lmdeploy.

GPU usage increasing continually while profile_serving benchmark about lmdeploy HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent