Your current environment how to initiate the gemma2-27b with a 4-b

[Usage]: how to initiate the gemma2-27b with a 4-bit quantization? about vllm HOT 2 OPEN

maxin9966 commented on July 23, 2024

[Usage]: how to initiate the gemma2-27b with a 4-bit quantization?

from vllm.

Comments (2)

Qubitium commented on July 23, 2024

GPTQModel v0.9.3 added Gemma 2 supuport for gptq 4bit quantization but the 27B model has inference issues though we haven't had time to test if vllm has similar infernece issue for the 27B model as HF transformers. 9B model is perfect though and passing with flying colors.

You can try quantizing a 27B with GPTQModel (use format=FORMAT.GPTQ, sym=True) and then try inferencing with vLLM. Let me know if you get it working.

from vllm.

SJY8460 commented on July 23, 2024

I have a similar question. Whether vllm can directly use "load_in_4bit" to load quantizated model? If not, will it be implemented in the future?

from vllm.

Related Issues (20)

[Bug]: autogen can't work with vllm v0.5.1
v0.5.2, v0.5.3, v0.6.0 Release Tracker HOT 5
[Bug]: Severe computation errors when batching request for microsoft/Phi-3-mini-128k-instruct HOT 3
[Bug]: The shape of the embed_tokens of llama model doesn't match the llama3 configuration
[Bug]: TypeError: 'NoneType' object is not callable when start Gemma2-27b-it HOT 5
[Bug]: Seed issue with Pipeline Parallel HOT 2
[Bug]: vLLM is unable to load Mistral on Inferentia and AWS neuron HOT 1
[Installation]: ERROR: Could not find a version that satisfies the requirement pyzmq (from versions: none) HOT 4
[Bug]: No metrics exposed at /metrics with 0.5.2 (0.5.1 is fine), possible regression? HOT 6
[Bug]: Can't load gemma-2-9b-it with vllm 0.5.2 HOT 9
unable to run vllm model deployment HOT 6
[Bug]: failed when run Qwen2-54B-A14B-GPTQ-Int4(MOE) HOT 3
[Performance]: [Speculative Decoding] Measurement of Cost Coefficient through vLLM HOT 5
[Usage]: PeftModelForCausalLM is not JSON serializable HOT 1
[Feature]: Pipeline parallelism support for qwen model HOT 1
[Installation]: Unable to build docker image for v0.5.2
[Bug]: [vllm-openvino]: ValueError: `use_cache` was set to `True` but the loaded model only supports `use_cache=False`. HOT 7
[Bug]: Gemma 27B crashes on GCP A100 HOT 2
[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2 HOT 12
[New Model]: Codestral Mamba HOT 1

[Usage]: how to initiate the gemma2-27b with a 4-bit quantization? about vllm HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent