Your current environment Collecting environment information...

[Bug]: asyncio.exceptions.CancelledError asyncio.exceptions.TimeoutError about vllm HOT 4 OPEN

ZZhangxian commented on August 25, 2024

[Bug]: asyncio.exceptions.CancelledError asyncio.exceptions.TimeoutError

from vllm.

Comments (4)

ZZhangxian commented on August 25, 2024

2024-06-21T14:00:57.229006087+08:00 ERROR: Exception in ASGI application
2024-06-21T14:00:57.229015739+08:00 Traceback (most recent call last):
2024-06-21T14:00:57.229022362+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
2024-06-21T14:00:57.229028830+08:00 request_outputs = await self.engine.step_async()
2024-06-21T14:00:57.229035509+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
2024-06-21T14:00:57.229041491+08:00 output = await self.model_executor.execute_model_async(
2024-06-21T14:00:57.229047778+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async
2024-06-21T14:00:57.229053895+08:00 return await self._driver_execute_model_async(execute_model_req)
2024-06-21T14:00:57.229060130+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async
2024-06-21T14:00:57.229066146+08:00 return await self.driver_exec_model(execute_model_req)
2024-06-21T14:00:57.229072047+08:00 asyncio.exceptions.CancelledError
2024-06-21T14:00:57.229077477+08:00
2024-06-21T14:00:57.229084802+08:00 During handling of the above exception, another exception occurred:
2024-06-21T14:00:57.229090528+08:00
2024-06-21T14:00:57.229096490+08:00 Traceback (most recent call last):
2024-06-21T14:00:57.229102644+08:00 File "/root/anaconda3/envs/vllm/li
b/python3.10/asyncio/tasks.py", line 456, in wait_for
2024-06-21T14:00:57.229108631+08:00 return fut.result()
2024-06-21T14:00:57.229114492+08:00 asyncio.exceptions.CancelledError

from vllm.

ZZhangxian commented on August 25, 2024

ERROR 06-21 15:09:02 async_llm_engine.py:52] x, _ = self.down_proj(x)
2024-06-21T15:09:02.358742631+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-06-21T15:09:02.358749790+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] return self._call_impl(*args, **kwargs)
2024-06-21T15:09:02.358756498+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-06-21T15:09:02.358763389+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] return forward_call(*args, **kwargs)
2024-06-21T15:09:02.358770096+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 804, in forward
2024-06-21T15:09:02.358776852+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] output_parallel = self.quant_method.apply(self, input_parallel)
ERROR 06-21 15:09:02 async_llm_engine.py:52] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/awq.py", line 169, in apply
2024-06-21T15:09:02.358790378+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] out = torch.matmul(reshaped_x, out)
ERROR 06-21 15:09:02 async_llm_engine.py:52] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 492.00 MiB. GPU
ERROR:asyncio:Exception in callback functools.partial(<function _log_task_completion at 0x7fa2827113f0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7fa258667df0>>)
2024-06-21T15:09:02.361567693+08:00 handle: <Handle functools.partial(<function _log_task_completion at 0x7fa2827113f0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7fa258667df0>>)>
2024-06-21T15:09:02.361575961+08:00 Traceback (most recent call last):
2024-06-21T15:09:02.361583992+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
2024-06-21T15:09:02.361591500+08:00 return_value = task.result()
2024-06-21T15:09:02.361598778+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop
2024-06-21T15:09:02.361606784+08:00 has_requests_in_progress = await asyncio.wait_for(
2024-06-21T15:09:02.361614131+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method start_worker_execution_loop: CUDA out of memory. Tried to allocate 492.00 MiB. GPU � has a total capacity of 79.32 GiB of which 77.56 MiB is free. Process 2811341 has 79.24 GiB memory in use. Of the allocated memory 73.08 GiB is allocated by PyTorch, and 1.52 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables), Traceback (most recent call last):
2024-06-21T15:09:02.361643250+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
2024-06-21T15:09:02.361654663+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-21T15:09:02.361660848+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] return func(*args, **kwargs)
2024-06-21T15:09:02.361631137+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
2024-06-21T15:09:02.361671065+08:00 request_outputs = await self.engine.step_async()
File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 294, in start_worker_execution_loop
2024-06-21T15:09:02.361686031+08:00 output = await self.model_executor.execute_model_async(
2024-06-21T15:09:02.361696125+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async
2024-06-21T15:09:02.361703177+08:00 return await self._driver_execute_model_async(execute_model_req)
2024-06-21T15:09:02.361710431+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async
2024-06-21T15:09:02.361717227+08:00 return await self.driver_exec_model(execute_model_req)
2024-06-21T15:09:02.361724226+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2024-06-21T15:09:02.361730993+08:00 result = self.fn(*self.args, **self.kwargs)
2024-06-21T15:09:02.361738099+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] while self._execute_model_non_driver():
2024-06-21T15:09:02.361771008+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 317, in _execute_model_non_driver
2024-06-21T15:09:02.361781698+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] self.model_runner.execute_model(None, self.gpu_cache)
2024-06-21T15:09:02.361744892+08:00 return func(*args, **kwargs)
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-21T15:09:02.361796983+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] return func(*args, **kwargs)
output = self.model_runner.execute_model(seq_group_metadata_list,

from vllm.

ZZhangxian commented on August 25, 2024

Why doesn't the VLLM-related process disappear after the program reports an error, the vllm process will always occupy the video memory, but the request interface will always report the same error

from vllm.

ZZhangxian commented on August 25, 2024

Why doesn't the VLLM-related process disappear after the program reports an error, the vllm process will always occupy the video memory, but the request interface will always report the same error

You need to manually kill the VLLM-related process and then restart the asynchronous service to restore the normal process

from vllm.

[Bug]: asyncio.exceptions.CancelledError asyncio.exceptions.TimeoutError about vllm HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent