Your current environment <div class="snippet-clipboard-content notranslate posit

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

To make my suggestion clear, <div class="highlight highlight-source-diff notransla

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

[Bug]: chunked prefill scheudler uses up swap on many n>=2 requests about vllm HOT 4 OPEN

toslunar commented on July 3, 2024

[Bug]: chunked prefill scheudler uses up swap on many n>=2 requests

from vllm.

Comments (4)

simon-mo commented on July 3, 2024

@rkooo567 any possible causes?

from vllm.

toslunar commented on July 3, 2024

To make my suggestion clear,

-        # Schedule new prefills.
-        remaining_waiting, prefills = self._schedule_prefills(
-            self.waiting, budget, curr_loras, enable_chunking=True)
+        if len(remaining_swapped) == 0:
+            # Schedule new prefills.
+            remaining_waiting, prefills = self._schedule_prefills(
+                self.waiting, budget, curr_loras, enable_chunking=True)

on https://github.com/vllm-project/vllm/blob/v0.5.0.post1/vllm/core/scheduler.py#L871-L873 fixes the issue.

However, the condition if len(remaining_swapped) == 0 looks too strict and may affect performance when the most of the requests are n == best_of == 1. Something like "CPU KV cache usage < 50%" could be better.

from vllm.

rkooo567 commented on July 3, 2024

I think n>1 creates more sequences, so it is more likely to use swap/preemption (because there's higher pressure to kv cache). Checking remaining_swapped==0 makes sense to me actually. We should prioritize swapped requests over prefill anyway. (and if all swaps are scheduled, remaining swap becomes 0 anyway). @toslunar would you like to create a PR?

from vllm.

toslunar commented on July 3, 2024

Thank you @rkooo567. It makes sense.

I created a PR. The diff is slightly different than my previous comment.

from vllm.

[Bug]: chunked prefill scheudler uses up swap on many n>=2 requests about vllm HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent