Your current environment Here is my host environment</

[Bug]: Small context lengths consume more memory than large context lengths about vllm HOT 3 CLOSED

majestichou commented on June 29, 2024

[Bug]: Small context lengths consume more memory than large context lengths

from vllm.

Comments (3)

robertgshaw2-neuralmagic commented on June 29, 2024

This is not a bug. We pre-allocate memory for the KV cache based on the following formula:

(total memory * gpu_util) - weights - maximum_activation_size`

The maximum_activation_size is measured from peak memory from running a profile run with your model_max_length. So you are seeing memory go up during the profiling run and then drop back down

from vllm.

majestichou commented on June 29, 2024

This is not a bug. We pre-allocate memory for the KV cache based on the following formula:

(total memory * gpu_util) - weights - maximum_activation_size`

The maximum_activation_size is measured from peak memory from running a profile run with your model_max_length. So you are seeing memory go up during the profiling run and then drop back down

Small context lengths may consume more memory than large context lengths. What is the reason？

from vllm.

robertgshaw2-neuralmagic commented on June 29, 2024

This is not a bug. We pre-allocate memory for the KV cache based on the following formula:

(total memory * gpu_util) - weights - maximum_activation_size`

The maximum_activation_size is measured from peak memory from running a profile run with your model_max_length. So you are seeing memory go up during the profiling run and then drop back down

Small context lengths may consume more memory than large context lengths. What is the reason？

We allocate memory for the KV cache size and weights based on the maximum potential activation size.

longer_content ==> large maximum activation size ==> less space for kv cache ==> less memory allocated

from vllm.

[Bug]: Small context lengths consume more memory than large context lengths about vllm HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent