Comments (4)
Should be fixed in the next release, by #4609.
from vllm.
Thanks, do you know when? or do you have an idea how I can go around this? It's happening when I run lora on a 70b model which is running on 2 GPU, I'm trying to load llama3 70b on a single GPU (a100) but it doesn't seem to work, I run our of vram...
from vllm.
The next release is just around the corner. If you can't wait for that, you can install vLLM from main
branch directly.
from vllm.
Fixed by #4609, which has been released in v0.4.3.
Edit: Technically it is still in pre-release but should be out very soon.
from vllm.
Related Issues (20)
- [Bug]: TPU InternVL2 Model Error Graph break due to unsupported builtin _XLAC.PyCapsule._xla_get_replication_devices_count HOT 3
- [Feature]: Beam Search with Temperature > 0 HOT 2
- [Bug]: ValueError: could not broadcast input array from shape (513,) into shape (512,) HOT 12
- [Usage]: Does VLLM support starting multiple cards using mpirun? Want to bind different CPUs to each card. HOT 1
- [New Model]: FM9GForCausalLM HOT 1
- [Bug]: SpeculativeDecoding is outputting nonsense words
- Add smoothquant support HOT 3
- [Bug]: Persistent OutOfMemoryError error when using speculative decoding
- [Feature]: Support multi-node serving on Kubernetes HOT 4
- [Feature]: Faster guided decoding for pre-defined output
- [New Model]: quantized Qwen2 MoE models
- [Bug]: RuntimeError: CUDA error: invalid argument HOT 3
- [Usage]: VLLM start
- [Installation]: building CPU docker image crashes my machine HOT 1
- [Usage]: How to stop vllm serving properly? HOT 3
- [Performance]: TTFT increases linearly with the number of batched tokens HOT 2
- [Bug]: when tensor-parallel-size>1,Stuck HOT 9
- [Feature]: Chat Completion with Parallel Function Calling HOT 1
- [Performance]: Llama 3 70B; vLLM does not scale beyond TP=4 HOT 7
- [Bug]: vLLM 0.5.5 and FlashInfer0.1.6 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.