Comments (1)
Hmm, I'm getting this error for #5276:
https://buildkite.com/vllm/ci-aws/builds/3678#0190710a-4a20-4bb1-8f85-8efe1a7615a1
The stack trace suggests that import torch
inside conftest.py
is to blame, but I'm pretty sure the import was there from the beginning, so that can't be why.
When I try to log the traceback in torch.cuda.device_count()
, I get this:
File "/home/cyrusleung/miniconda3/envs/vllm/bin/pytest", line 8, in <module>
sys.exit(console_main())
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/config/__init__.py", line 197, in console_main
code = main()
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/config/__init__.py", line 174, in main
ret: Union[ExitCode, int] = config.hook.pytest_cmdline_main(
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_hooks.py", line 501, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_manager.py", line 119, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_callers.py", line 102, in _multicall
res = hook_impl.function(*args)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/main.py", line 332, in pytest_cmdline_main
return wrap_session(config, _main)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/main.py", line 285, in wrap_session
session.exitstatus = doit(config, session) or 0
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/main.py", line 339, in _main
config.hook.pytest_runtestloop(session=session)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_hooks.py", line 501, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_manager.py", line 119, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_callers.py", line 102, in _multicall
res = hook_impl.function(*args)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/main.py", line 364, in pytest_runtestloop
item.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_hooks.py", line 501, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_manager.py", line 119, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_callers.py", line 102, in _multicall
res = hook_impl.function(*args)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/runner.py", line 115, in pytest_runtest_protocol
runtestprotocol(item, nextitem=nextitem)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/runner.py", line 134, in runtestprotocol
reports.append(call_and_report(item, "call", log))
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/runner.py", line 239, in call_and_report
call = CallInfo.from_call(
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/runner.py", line 340, in from_call
result: Optional[TResult] = func()
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/runner.py", line 240, in <lambda>
lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_hooks.py", line 501, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_manager.py", line 119, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_callers.py", line 102, in _multicall
res = hook_impl.function(*args)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/runner.py", line 172, in pytest_runtest_call
item.runtest()
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/python.py", line 1772, in runtest
self.ihook.pytest_pyfunc_call(pyfuncitem=self)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_hooks.py", line 501, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_manager.py", line 119, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/pluggy/_callers.py", line 102, in _multicall
res = hook_impl.function(*args)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/_pytest/python.py", line 195, in pytest_pyfunc_call
result = testfunction(**testargs)
File "/home/cyrusleung/vllm-rocm/tests/distributed/test_multimodal_broadcast.py", line 43, in test_models
run_test(
File "/home/cyrusleung/vllm-rocm/tests/models/test_llava.py", line 113, in run_test
with vllm_runner(model_id,
File "/home/cyrusleung/vllm-rocm/tests/conftest.py", line 439, in __init__
self.model = LLM(
File "/home/cyrusleung/vllm-rocm/vllm/entrypoints/llm.py", line 144, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/home/cyrusleung/vllm-rocm/vllm/engine/llm_engine.py", line 405, in from_engine_args
engine = cls(
File "/home/cyrusleung/vllm-rocm/vllm/engine/llm_engine.py", line 238, in __init__
self.model_executor = executor_class(
File "/home/cyrusleung/vllm-rocm/vllm/executor/distributed_gpu_executor.py", line 25, in __init__
super().__init__(*args, **kwargs)
File "/home/cyrusleung/vllm-rocm/vllm/executor/executor_base.py", line 41, in __init__
self._init_executor()
File "/home/cyrusleung/vllm-rocm/vllm/executor/multiproc_gpu_executor.py", line 68, in _init_executor
self.driver_worker = self._create_worker(
File "/home/cyrusleung/vllm-rocm/vllm/executor/gpu_executor.py", line 67, in _create_worker
wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
File "/home/cyrusleung/vllm-rocm/vllm/worker/worker_base.py", line 311, in init_worker
self.worker = worker_class(*args, **kwargs)
File "/home/cyrusleung/vllm-rocm/vllm/worker/worker.py", line 87, in __init__
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
File "/home/cyrusleung/vllm-rocm/vllm/worker/model_runner.py", line 196, in __init__
self.attn_backend = get_attn_backend(
File "/home/cyrusleung/vllm-rocm/vllm/attention/selector.py", line 45, in get_attn_backend
backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
File "/home/cyrusleung/vllm-rocm/vllm/attention/selector.py", line 151, in which_attn_to_use
if torch.cuda.get_device_capability()[0] < 8:
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
prop = get_device_properties(device)
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/cuda/__init__.py", line 306, in _lazy_init
queued_call()
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/cuda/__init__.py", line 173, in _check_capability
for d in range(device_count()):
File "/home/cyrusleung/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/cuda/__init__.py", line 745, in device_count
import traceback; traceback.print_stack()
But I think this is supposed to happen, right?
Edit: The traceback is from a local version of the PR which has some additional changes compared to the CI build. I'll push it when its dependency has been merged so I can see whether the failure still persists.
Update: Using lazy import in vllm.transformer_utils.image_processor
seems to fix the problem.
from vllm.
Related Issues (20)
- [Bug]: load minicpm model, then get KeyError: 'lm_head.weight'
- [Bug][CI/Build]: Missing attribute 'nvmlDeviceGetHandleByIndex' in AMD tests HOT 1
- [Bug]: Garbled Tokens appears in vllm generation result every time change to new LLM model (Qwen)
- [Bug]: ValidationError using langchain_community.llms.VLLM HOT 2
- [Installation]: how to disable NCCL support on Jetson cevices HOT 1
- [Bug]: benchmark_serving.py cannot calculate Median TTFT correctly HOT 3
- [Feature]: support Ascend 910B in the future HOT 2
- [New Model]: Lora for Qwen/Qwen2-57B-A14B
- [Usage]: how to initiate the gemma2-27b with a 4-bit quantization? HOT 2
- [Bug]: TypeError in benchmark_serving.py when using --model parameter
- [Gemma 2 27B]: Update docker hub image to support gemma-2-27B-it HOT 1
- [Bug]: Loading LoRA is super slow when using tensor parallel HOT 4
- [Feature]: Add readiness endpoint /ready and return /health earlier (vLLM on Kubernetes) HOT 4
- [RFC]: Priority Scheduling HOT 3
- [Feature]: Add support for interchangable radix attention
- [Usage]: Why is the useage information missing in the streaming call. Not streaming is there. HOT 2
- [Misc]: Best practice for accelerating and deploying Llava series & Phi3-Vision using vLLM
- [Usage]: How to deploy multiple models in openai api server and specify different gpu for each model? HOT 1
- [Bug]: Flashinfer stuck with CUDA Graph
- [Bug]: ray error when tp>=2 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.