Comments (2)
GPTQModel v0.9.3 added Gemma 2 supuport for gptq 4bit quantization but the 27B model has inference issues though we haven't had time to test if vllm has similar infernece issue for the 27B model as HF transformers. 9B model is perfect though and passing with flying colors.
You can try quantizing a 27B with GPTQModel (use format=FORMAT.GPTQ, sym=True) and then try inferencing with vLLM. Let me know if you get it working.
from vllm.
I have a similar question. Whether vllm can directly use "load_in_4bit" to load quantizated model? If not, will it be implemented in the future?
from vllm.
Related Issues (20)
- [Bug]: autogen can't work with vllm v0.5.1
- v0.5.2, v0.5.3, v0.6.0 Release Tracker HOT 5
- [Bug]: Severe computation errors when batching request for microsoft/Phi-3-mini-128k-instruct HOT 3
- [Bug]: The shape of the embed_tokens of llama model doesn't match the llama3 configuration
- [Bug]: TypeError: 'NoneType' object is not callable when start Gemma2-27b-it HOT 5
- [Bug]: Seed issue with Pipeline Parallel HOT 2
- [Bug]: vLLM is unable to load Mistral on Inferentia and AWS neuron HOT 1
- [Installation]: ERROR: Could not find a version that satisfies the requirement pyzmq (from versions: none) HOT 4
- [Bug]: No metrics exposed at /metrics with 0.5.2 (0.5.1 is fine), possible regression? HOT 6
- [Bug]: Can't load gemma-2-9b-it with vllm 0.5.2 HOT 9
- unable to run vllm model deployment HOT 6
- [Bug]: failed when run Qwen2-54B-A14B-GPTQ-Int4(MOE) HOT 3
- [Performance]: [Speculative Decoding] Measurement of Cost Coefficient through vLLM HOT 5
- [Usage]: PeftModelForCausalLM is not JSON serializable HOT 1
- [Feature]: Pipeline parallelism support for qwen model HOT 1
- [Installation]: Unable to build docker image for v0.5.2
- [Bug]: [vllm-openvino]: ValueError: `use_cache` was set to `True` but the loaded model only supports `use_cache=False`. HOT 7
- [Bug]: Gemma 27B crashes on GCP A100 HOT 2
- [Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2 HOT 12
- [New Model]: Codestral Mamba HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.