Comments (5)
I have the same error message with 4*A100. I tried the latest version and 0.4.2 version, both of them cannot work on Phi-3-medium-128k-instruct. My env is built from the Dockerfile.
@UCASZ I think the issue is with the number of GPUs. Once i changed tensor parallel from 4 to 2, the errors all went away.
from vllm.
from vllm.
Phi-3-medium-* is supported, and I can confirm it's working for me with vllm==0.4.2
. The medium has the same architecture as mini, Phi3ForCausalLM
vs small, which for what ever reason has Phi3SmallForCausalLM
as architecture, and isn't supported. And looking at your error it doesn't look related to support for the model architecture.
from vllm.
I have the same error message with 4*A100. I tried the latest version and 0.4.2 version, both of them cannot work on Phi-3-medium-128k-instruct. My env is built from the Dockerfile.
from vllm.
Doesn't work for me. Using vllm-worker on Runpod.
Getting:
▼ 2024-05-30 15:28:53.431 [sh1003a97ylbp1] [info] �B File "/vllm-installation/vllm/config.py", line 111, in __init__ ▼ 2024-05-30 15:28:53.431 [sh1003a97ylbp1] [info] � model_config = ModelConfig( ▼ 2024-05-30 15:28:53.431 [sh1003a97ylbp1] [info] �Y File "/vllm-installation/vllm/engine/arg_utils.py", line 287, in create_engine_configs ▼ 2024-05-30 15:28:53.431 [sh1003a97ylbp1] [info] �9 engine_configs = engine_args.create_engine_configs() ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �[ File "/vllm-installation/vllm/engine/async_llm_engine.py", line 622, in from_engine_args ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �K return AsyncLLMEngine.from_engine_args(AsyncEngineArgs(**self.config)) ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �6 File "/src/engine.py", line 103, in _initialize_llm ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] � raise e ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �6 File "/src/engine.py", line 106, in _initialize_llm ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �D self.llm = self._initialize_llm() if engine is None else engine ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �. File "/src/engine.py", line 24, in __init__ ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �� vllm_engine = vLLMEngine() ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �. File "/src/handler.py", line 6, in <module> ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �#Traceback (most recent call last): ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �� ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �EThe above exception was the direct cause of the following exception: ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �� ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] ���ValueError: Loading microsoft/Phi-3-medium-128k-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option
trust_remote_code=Trueto remove this error. ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �� raise ValueError( ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �~ File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 627, in resolve_trust_remote_code ▼ 2024-05-30 15:28:53.430 [sh1003a97ylbp1] [info] �3 trust_remote_code = resolve_trust_remote_code( ▼ 2024-05-30 15:28:53.429 [sh1003a97ylbp1] [info] �~ File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 931, in from_pretrained ▼ 2024-05-30 15:28:53.429 [sh1003a97ylbp1] [info] �) config = AutoConfig.from_pretrained( ▼ 2024-05-30 15:28:53.429 [sh1003a97ylbp1] [info] �V File "/vllm-installation/vllm/transformers_utils/config.py", line 30, in get_config ▼ 2024-05-30 15:28:53.429 [sh1003a97ylbp1] [info] �#Traceback (most recent call last): ▼ 2024-05-30 15:28:53.429 [sh1003a97ylbp1] [info] ��2engine.py :105 2024-05-30 07:28:53,428 Error initializing vLLM engine: Failed to load the model config. If the model is a custom model not yet available in the HuggingFace transformers library, consider setting
trust_remote_code=Truein LLM or using the
--trust-remote-codeflag in the CLI. ▼ 2024-05-30 15:28:53.216 [sh1003a97ylbp1] [info] �vSpecial tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. ▼ 2024-05-30 15:28:52.547 [sh1003a97ylbp1] [info] �� warnings.warn( ▼ 2024-05-30 15:28:52.547 [sh1003a97ylbp1] [info] ���/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning:
resume_downloadis deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use
force_download=True. ▼ 2024-05-30 15:28:50.788 [sh1003a97ylbp1] [info] ��
from vllm.
Related Issues (20)
- [Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference. HOT 1
- [Feature]: Linear adapter support for Mixtral
- [Feature]: VLLM suport for function calling in Mistral-7B-Instruct-v0.3 HOT 1
- [Bug]: Issue with Token Processing Efficiency and Key-Value Cache Utilization in AsyncLLMEngine
- [Bug]: WSL2(Including Docker) 2 GPU problem --tensor-parallel-size 2 HOT 1
- [Bug]: Unable to Use Prefix Caching in AsyncLLMEngine HOT 10
- [Performance]: What can we learn from OctoAI HOT 7
- [Bug]: Model Launch Hangs with 16+ Ranks in vLLM HOT 2
- [Usage]: Prefix caching in VLLM HOT 1
- [Bug]: Incorrect Example for the Inference with Prefix
- [Feature]: BERT models for embeddings HOT 1
- [Bug]: The Offline Inference Embedding Example Fails HOT 5
- [Bug]: Offline Inference with the OpenAI Batch file format yields unnecessary `asyncio.exceptions.CancelledError` HOT 2
- [Feature]: MoE kernels (Mixtral-8x22B-Instruct-v0.1) are not yet supported on CPU only ?
- [Bug]: vLLM api_server.py when using with prompt_token_ids causes error.
- [Bug]: loading squeezellm model
- [Usage]: How can I deploy llama3-70b on a server with 8 3090 GPUs with lora and CUDA graph.
- [Usage]: how to use the gpu_cache_usage_perc as a custom metric in k8s HPA?
- [Bug]: Issues with Applying LoRA in vllm on a T4 GPU
- [Bug]: Issues with Applying LoRA in vllm on a T4 GPU HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.