Your current environment <div class="snippet-clipboard-content notranslate posit

[Bug]: OutOfMemoryError when loading a small model with a huge context length about vllm HOT 3 OPEN

alugowski commented on August 16, 2024 2

[Bug]: OutOfMemoryError when loading a small model with a huge context length

from vllm.

Comments (3)

alugowski commented on August 16, 2024

This error is in the profile_run used to determine memory usage. If I patch the code to ignore the error, and a few lines below patch the model length to make raise_if_cache_size_invalid happy, the model starts. Sure it won't reach the full 1M context, but it will work with 200k on an 80GB GPU.

This will be more pressing for users of small GPUs as the popular models increase their context lengths beyond 8k.

I'm happy to submit my monkey patches if there isn't already a plan to support large context models.

from vllm.

DarkLight1337 commented on August 16, 2024

You can manually set --max-model-len to reduce the context length.

Not sure whether it's a good idea to automatically limit the context length based on available memory. @simon-mo any thoughts?

from vllm.

alugowski commented on August 16, 2024

You can manually set --max-model-len to reduce the context length.

Not sure whether it's a good idea to automatically limit the context length based on available memory. @simon-mo any thoughts?

Agreed that a purely automatic setting may give folks the wrong impression that they can use the full context of the model even if their hardware won't allow it. One alternative is --max-model-len max that would start the model no matter what and report the actual max context in the logs.

Right now someone must start vllm, see the crash, parse out the max context size from the log, and set that with --max-model-len. But that's only if the profile_run() doesn't OOM with the exception in the OP, in that case the user must guess at the max model len (the log message with the actual max is printed later, and depends on profile_run() succeeding).

from vllm.

Recommend Projects

[Bug]: OutOfMemoryError when loading a small model with a huge context length about vllm HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent