Your current environment <div class="snippet-clipboard-content notranslate posit

Your current environment <div class="snippet-clipboard-content notr

[Bug]: flashinfer backend bug about vllm HOT 3 CLOSED

haichuan1221 commented on July 17, 2024

[Bug]: flashinfer backend bug

from vllm.

Comments (3)

haichuan1221 commented on July 17, 2024

Your current environment

I set vllm to flashinfer, but i get the error below:

INFO:     127.0.0.1:38616 - "POST /v1/completions HTTP/1.1" 200 OK
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/responses.py", line 265, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/responses.py", line 261, in wrap
    await func()
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
    message = await receive()
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 553, in receive
    await self.message_event.wait()
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/asyncio/locks.py", line 226, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f47bc642f40

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 75, in app
    await response(scope, receive, send)
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/responses.py", line 265, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
    raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)

🐛 Describe the bug

Firstly, I set vllm attention backend to flashinfer as shown below: export VLLM_ATTENTION_BACKEND=FLASHINFER

Secondly, I run the vllm server as: python -m vllm.entrypoints.openai.api_server --model LLM-Research/Meta-Llama-3-70B-Instruct --tensor-parallel-size 4 --trust-remote-code --max-model-len 8192 --port 30000 --swap-space 16 --disable-log-requests --enable-prefix-caching --enforce-eager

It turns out that flashinfer does not support prefix caching.

from vllm.

haichuan1221 commented on July 17, 2024

problem solved

from vllm.

codevoyager1984 commented on July 17, 2024

problem solved

I encounter the same errors here, can u share how u solve this ? Many thanks.

from vllm.

[Bug]: flashinfer backend bug about vllm HOT 3 CLOSED

Comments (3)

Your current environment

🐛 Describe the bug

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent