Comments (6)
+1
from vllm.
found the related docker commands
https://hub.docker.com/layers/vllm/vllm-openai/v0.4.3/images/sha256-d55c38328bfda7b049dd6c3695d60fff8d88a63b19db058791879dc6bf3188af?context=explore
what new features require cuda>=12.4 ?
from vllm.
we use 12.4 because the old versions of base container will be deleted soon, according to nv policy https://gitlab.com/nvidia/container-images/cuda/-/blob/master/doc/support-policy.md .
from vllm.
It seems even v0.4.3 is now built with cuda 12.4 required.
from vllm.
we use 12.4 because the old versions of base container will be deleted soon, according to nv policy https://gitlab.com/nvidia/container-images/cuda/-/blob/master/doc/support-policy.md .
I see. Thanks.
from vllm.
It seems even v0.4.3 is now built with cuda 12.4 required.
Yes. But the document didn't update with that. https://docs.vllm.ai/en/v0.4.3/getting_started/installation.html
from vllm.
Related Issues (20)
- [Bug]: RuntimeError with tensor_parallel_size > 1 in Process Bootstrapping Phase HOT 14
- [Bug]: error when requesting max-num-seqs with speculation
- [Installation]: vllm on NVIDIA jetson AGX orin HOT 4
- [Bug]: Error while running command-r in high load for a long period in OpenBLAS on 2 GPUs
- [Bug]: multiprocessing KeyError from `cache[rtype].remove(name)` HOT 4
- IfEval Metrics not consistent with different vLLM versions HOT 6
- [Bug]: OOT models not included in ModelRegistry.get_supported_archs() HOT 4
- [Bug]: Ray distributed backend does not support out-of-tree models via ModelRegistry APIs HOT 12
- [Usage]: Running Cohere command R+ 104b using VLLM 16bf, getting 5 tokens per second very slow
- 量化后启动启动报错
- [Usage]: vllm0.5.0.post error when using openai completion interface HOT 1
- [Bug]: Qwen2-72B-Instruct-gptq-int4 Repetitive issues HOT 1
- [Usage]: Does class LLM support inference quantization on CPU?
- [Bug]: Eabling Prefix-Caching doesn't speed up inference HOT 2
- [Bug]: Option for preemption_mode uses underscore (_) instead of dash (-) HOT 1
- [Installation]: Build from source: Could NOT find Python. Could not build wheels for vllm. HOT 4
- [Installation]: pip install -e failed HOT 9
- ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.[Bug]: HOT 4
- [Usage]: Use quantization=gptq_marlin for faster inference HOT 2
- [Bug]: `flash_attn_cuda.varlen_fwd` may output a bad result when enabling prefix caching HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.