Comments (4)
Normally there should be log similar to
INFO ] Started server process [1754022]
INFO ] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Please check whether the port is open?
from vllm.
dear @simon-mo . It is it just appears in another server log
--2024-04-17 21:17:33-- https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24853 (24K) [text/plain]
Saving to: ‘collect_env.py’
0K .......... .......... .... 100% 19.6M=0.001s
2024-04-17 21:17:33 (19.6 MB/s) - ‘collect_env.py’ saved [24853/24853]
File "collect_env.py", line 715
print(msg, file=sys.stderr)
^
SyntaxError: invalid syntax
INFO: Started server process [83214]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
from vllm.
I see. The API server is supposed to be running forever because it is a live server. This is the intended behavior.
For your use case, consider running it in a background process?
from vllm.
@simon-mo Thanks a lot! It seems to be working. Fingers crossed. I also had to include a sleep steps to give time to the server to spin up on the background.
from vllm.
Related Issues (20)
- [Bug]: `assert num_new_tokens == 1` fails when `SamplingParams.n` is not `1` and `max_tokens` is large. HOT 4
- [Usage]: If I use Offline way to launch the model, how can I get the metrics?
- [Misc]: Odd number GPU utilization?
- [Usage]: Not enough memory when run a 33b model float16 on 2 x L40 GPU (48G) HOT 4
- [Bug]: Engine iteration timed out. This should never happen! HOT 22
- [Bug]: Initialising LLM on multiple GPUs stuck at "Started a local Ray instance" HOT 8
- [Bug]: all_reduce assert result == 0, File "torch/cuda/graphs.py", line 88, in capture_end super().capture_end(), RuntimeError: CUDA error: operation failed due to a previous error during capture HOT 1
- [Usage]: Is fused_moe/fused_moe.py only support num_expert= 8 and 16?
- [Usage]: How to disable multi lora to avoid using punica ? Or is the punica being the only choice? HOT 6
- [Feature]: option to return hidden states HOT 1
- [Misc]: need "first good issue" HOT 10
- [Bug]: Prefix caching does not work on Pascal GPUs HOT 6
- [Bug]: OpenAI API request doesn't go through with 'guided_json' HOT 1
- [Performance]: Empirical Measurement of how to broadcast python object in vLLM HOT 6
- [Usage]: best way to hold multiple models online at the same time? HOT 1
- [Bug]: 1-card deployment and 2-card deployment yield inconsistent output logits. HOT 2
- [Misc]: Installation on CUDA machines HOT 1
- [Usage]: why vllm takes as much ram as possible ? HOT 2
- [Usage]: what is enforce_eager HOT 5
- [Usage]: How to start vllm with llava using docker compose
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.