Giter Club home page Giter Club logo

Comments (4)

ZZhangxian avatar ZZhangxian commented on August 25, 2024

2024-06-21T14:00:57.229006087+08:00 ERROR: Exception in ASGI application
2024-06-21T14:00:57.229015739+08:00 Traceback (most recent call last):
2024-06-21T14:00:57.229022362+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
2024-06-21T14:00:57.229028830+08:00 request_outputs = await self.engine.step_async()
2024-06-21T14:00:57.229035509+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
2024-06-21T14:00:57.229041491+08:00 output = await self.model_executor.execute_model_async(
2024-06-21T14:00:57.229047778+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async
2024-06-21T14:00:57.229053895+08:00 return await self._driver_execute_model_async(execute_model_req)
2024-06-21T14:00:57.229060130+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async
2024-06-21T14:00:57.229066146+08:00 return await self.driver_exec_model(execute_model_req)
2024-06-21T14:00:57.229072047+08:00 asyncio.exceptions.CancelledError
2024-06-21T14:00:57.229077477+08:00
2024-06-21T14:00:57.229084802+08:00 During handling of the above exception, another exception occurred:
2024-06-21T14:00:57.229090528+08:00
2024-06-21T14:00:57.229096490+08:00 Traceback (most recent call last):
2024-06-21T14:00:57.229102644+08:00 File "/root/anaconda3/envs/vllm/li
b/python3.10/asyncio/tasks.py", line 456, in wait_for
2024-06-21T14:00:57.229108631+08:00 return fut.result()
2024-06-21T14:00:57.229114492+08:00 asyncio.exceptions.CancelledError

from vllm.

ZZhangxian avatar ZZhangxian commented on August 25, 2024

ERROR 06-21 15:09:02 async_llm_engine.py:52] x, _ = self.down_proj(x)
2024-06-21T15:09:02.358742631+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-06-21T15:09:02.358749790+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] return self._call_impl(*args, **kwargs)
2024-06-21T15:09:02.358756498+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-06-21T15:09:02.358763389+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] return forward_call(*args, **kwargs)
2024-06-21T15:09:02.358770096+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 804, in forward
2024-06-21T15:09:02.358776852+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] output_parallel = self.quant_method.apply(self, input_parallel)
ERROR 06-21 15:09:02 async_llm_engine.py:52] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/awq.py", line 169, in apply
2024-06-21T15:09:02.358790378+08:00 ERROR 06-21 15:09:02 async_llm_engine.py:52] out = torch.matmul(reshaped_x, out)
ERROR 06-21 15:09:02 async_llm_engine.py:52] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 492.00 MiB. GPU
ERROR:asyncio:Exception in callback functools.partial(<function _log_task_completion at 0x7fa2827113f0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7fa258667df0>>)
2024-06-21T15:09:02.361567693+08:00 handle: <Handle functools.partial(<function _log_task_completion at 0x7fa2827113f0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7fa258667df0>>)>
2024-06-21T15:09:02.361575961+08:00 Traceback (most recent call last):
2024-06-21T15:09:02.361583992+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 42, in _log_task_completion
2024-06-21T15:09:02.361591500+08:00 return_value = task.result()
2024-06-21T15:09:02.361598778+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 532, in run_engine_loop
2024-06-21T15:09:02.361606784+08:00 has_requests_in_progress = await asyncio.wait_for(
2024-06-21T15:09:02.361614131+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method start_worker_execution_loop: CUDA out of memory. Tried to allocate 492.00 MiB. GPU � has a total capacity of 79.32 GiB of which 77.56 MiB is free. Process 2811341 has 79.24 GiB memory in use. Of the allocated memory 73.08 GiB is allocated by PyTorch, and 1.52 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables), Traceback (most recent call last):
2024-06-21T15:09:02.361643250+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
2024-06-21T15:09:02.361654663+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-21T15:09:02.361660848+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] return func(*args, **kwargs)
2024-06-21T15:09:02.361631137+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 506, in engine_step
2024-06-21T15:09:02.361671065+08:00 request_outputs = await self.engine.step_async()
File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 235, in step_async
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 294, in start_worker_execution_loop
2024-06-21T15:09:02.361686031+08:00 output = await self.model_executor.execute_model_async(
2024-06-21T15:09:02.361696125+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 166, in execute_model_async
2024-06-21T15:09:02.361703177+08:00 return await self._driver_execute_model_async(execute_model_req)
2024-06-21T15:09:02.361710431+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 149, in _driver_execute_model_async
2024-06-21T15:09:02.361717227+08:00 return await self.driver_exec_model(execute_model_req)
2024-06-21T15:09:02.361724226+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2024-06-21T15:09:02.361730993+08:00 result = self.fn(*self.args, **self.kwargs)
2024-06-21T15:09:02.361738099+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] while self._execute_model_non_driver():
2024-06-21T15:09:02.361771008+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 317, in _execute_model_non_driver
2024-06-21T15:09:02.361781698+08:00 (VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] self.model_runner.execute_model(None, self.gpu_cache)
2024-06-21T15:09:02.361744892+08:00 return func(*args, **kwargs)
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-06-21T15:09:02.361796983+08:00 File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 280, in execute_model
(VllmWorkerProcess pid=960) ERROR 06-21 15:09:02 multiproc_worker_utils.py:226] return func(*args, **kwargs)
output = self.model_runner.execute_model(seq_group_metadata_list,

from vllm.

ZZhangxian avatar ZZhangxian commented on August 25, 2024

Why doesn't the VLLM-related process disappear after the program reports an error, the vllm process will always occupy the video memory, but the request interface will always report the same error

from vllm.

ZZhangxian avatar ZZhangxian commented on August 25, 2024

Why doesn't the VLLM-related process disappear after the program reports an error, the vllm process will always occupy the video memory, but the request interface will always report the same error

You need to manually kill the VLLM-related process and then restart the asynchronous service to restore the normal process

from vllm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.