Comments (3)
Your current environment
I set vllm to flashinfer, but i get the error below: INFO: 127.0.0.1:38616 - "POST /v1/completions HTTP/1.1" 200 OK ERROR: Exception in ASGI application Traceback (most recent call last): File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/responses.py", line 265, in __call__ await wrap(partial(self.listen_for_disconnect, receive)) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/responses.py", line 261, in wrap await func() File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/responses.py", line 238, in listen_for_disconnect message = await receive() File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 553, in receive await self.message_event.wait() File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/asyncio/locks.py", line 226, in wait await fut asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f47bc642f40 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__ return await self.app(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/middleware/cors.py", line 85, in __call__ await self.app(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 65, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 756, in __call__ await self.middleware_stack(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/routing.py", line 75, in app await response(scope, receive, send) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/starlette/responses.py", line 265, in __call__ await wrap(partial(self.listen_for_disconnect, receive)) File "/mnt/harddisk/miniconda3/envs/flashinfer/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__ raise BaseExceptionGroup( exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
🐛 Describe the bug
Firstly, I set vllm attention backend to flashinfer as shown below:
export VLLM_ATTENTION_BACKEND=FLASHINFER
Secondly, I run the vllm server as:
python -m vllm.entrypoints.openai.api_server --model LLM-Research/Meta-Llama-3-70B-Instruct --tensor-parallel-size 4 --trust-remote-code --max-model-len 8192 --port 30000 --swap-space 16 --disable-log-requests --enable-prefix-caching --enforce-eager
It turns out that flashinfer does not support prefix caching.
from vllm.
problem solved
from vllm.
problem solved
I encounter the same errors here, can u share how u solve this ? Many thanks.
from vllm.
Related Issues (20)
- [Bug]: autogen can't work with vllm v0.5.1
- v0.5.2, v0.5.3, v0.6.0 Release Tracker HOT 4
- [Bug]: Severe computation errors when batching request for microsoft/Phi-3-mini-128k-instruct HOT 3
- [Bug]: The shape of the embed_tokens of llama model doesn't match the llama3 configuration
- [Bug]: TypeError: 'NoneType' object is not callable when start Gemma2-27b-it HOT 5
- [Bug]: Seed issue with Pipeline Parallel HOT 2
- [Bug]: vLLM is unable to load Mistral on Inferentia and AWS neuron HOT 1
- [Installation]: ERROR: Could not find a version that satisfies the requirement pyzmq (from versions: none) HOT 4
- [Bug]: No metrics exposed at /metrics with 0.5.2 (0.5.1 is fine), possible regression? HOT 3
- [Bug]: Can't load gemma-2-9b-it with vllm 0.5.2 HOT 9
- unable to run vllm model deployment HOT 5
- [Bug]: failed when run Qwen2-54B-A14B-GPTQ-Int4(MOE) HOT 3
- [Performance]: [Speculative Decoding] Measurement of Cost Coefficient through vLLM HOT 3
- [Usage]: PeftModelForCausalLM is not JSON serializable
- [Feature]: Pipeline parallelism support for qwen model HOT 1
- [Installation]: Unable to build docker image for v0.5.2
- [Bug]: [vllm-openvino]: ValueError: `use_cache` was set to `True` but the loaded model only supports `use_cache=False`. HOT 5
- [Bug]: Gemma 27B crashes on GCP A100 HOT 1
- [Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2
- [New Model]: Codestral Mamba HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.