Comments (4)
One more detail - Docker container was started as
docker run -d \
--shm-size=10.24gb \
--gpus '"device=2"' \
-v /data/models:/root/.cache/huggingface \
--env "HF_TOKEN=ma_token" \
-p 8000:8000 \
--restart unless-stopped \
--name vllm-openai \
vllm/vllm-openai \
--host 0.0.0.0 \
--port 8000 \
--model=astronomer/Llama-3-8B-Instruct-GPTQ-4-Bit \
--enforce-eager \
--dtype=half \
--gpu-memory-utilization 0.95 \
--tensor-parallel-size=1
from vllm.
I am closing it myself - the issue appears to be in loading the model - the python script never gets to the uvicorn
section, so the actual server on 8000 never starts, even those the python script is running
from vllm.
Hello ! I have the same issue...
Did you solve it ? I am very interested if you have any solution :)
Thank you !
from vllm.
Hello. I am facing a similar issue. Is there a solution for this? Thanks
from vllm.
Related Issues (20)
- [Bug]: gpu-memory-utilization does not pickup enough GPU memory HOT 8
- [RFC]: Keep a Changelog & Add FAQs in the Documentation HOT 1
- [Misc]: How to force generate a fixed response from llama3 HOT 1
- [Feature]: `JetMoE` support
- [Bug]: Connection closed by peer.
- [Bug]: guide decoding lead to an incorrect function call arguments
- [Bug]: llama3-405b-fp8 NCCL communication HOT 5
- [Bug]: Using fp8 cutlass scaled_mm causes wrong output HOT 31
- [Feature]: High throughput has not been achieved in decoding stage when using json format output HOT 1
- [Bug]: vllm online mode gives variance logprobs even if temperature is 0 with same prompt HOT 9
- [Feature]: Check for presence of files at startup HOT 1
- [Bug]: Can't load vision model `microsoft/Phi-3.5-vision-instruct` HOT 4
- [Bug]: /metrics endpoint shows less information at latest (0.5.4) vllm docker container. HOT 4
- [Bug]: install vllm ocurr the building error HOT 4
- [Bug]: Phi-3-small-128k-instruct on 1 A100 GPUs - Assertion error: Does not support prefix-enabled attention. HOT 2
- [Bug]: Critical distributed executor bug HOT 10
- [New Model]: Snowflake Arctic Embed (Family)
- [Bug]: FP8 Marlin fallback out of memory regression
- [Usage]: Is there any way to hook features inside vision-language model? HOT 3
- [Bug]: my vllm phi-3-vision server runs one request correctly then returns an error for the same request stating 2509 image tokens to 0 placeholders HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.