Your current environment <div class="snippet-clipboard-content notranslate posit

There is a pending PR trying to address this problem: <a class="issue-link js-issue-li

[Bug]: OOM when setting prompt_logprobs=1 about vllm HOT 3 OPEN

janphilippfranken commented on August 26, 2024 1

[Bug]: OOM when setting prompt_logprobs=1

from vllm.

Comments (3)

zifeitong commented on August 26, 2024 2

Something like this: model = LLM(..., enable_chunked_prefill=True, max_num_batched_tokens=512, gpu_memory_utilization=0.9)

Try smaller values of gpu_memory_utilization and/or max_num_batched_tokens if you still see OOM.

from vllm.

zifeitong commented on August 26, 2024

There is a pending PR trying to address this problem: #5355.

Meanwhile, you can try the chunked prefill feature which worked for me as a workaround: https://docs.vllm.ai/en/latest/models/performance.html#chunked-prefill.

from vllm.

janphilippfranken commented on August 26, 2024

would you mind sharing your code? let's say i have n_prompts=10, and set prompt_logprobs=0, i'd ideally get the logprobs for all 10 prompts using a single call model.generate(prompts=prompts, sampling_params=sampling_params).

from vllm.

[Bug]: OOM when setting prompt_logprobs=1 about vllm HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent