System Info transformers ve

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error when QWEN1.5 32B/9B de-quantizing GGUF about transformers HOT 6 CLOSED

kunger97 commented on September 12, 2024

Error when QWEN1.5 32B/9B de-quantizing GGUF

from transformers.

Comments (6)

Isotr0py commented on September 12, 2024 1

Seems that the model config was created incorrectly. I ran the reproduction code, and the vocab size in config is incorrect:

from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM

model_id = "Qwen/Qwen1.5-32B-Chat-GGUF"
filename = "qwen1_5-32b-chat-q4_k_m.gguf"

config = AutoConfig.from_pretrained(model_id, gguf_file=filename)
print(config)

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
print(tokenizer)

Outputs

Qwen2Config {
  "_model_name_or_path": "Qwen2-beta-32B-Chat-AWQ-fp16",
  "_name_or_path": "Qwen/Qwen1.5-32B-Chat-GGUF",
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 27392,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 64,
  "num_key_value_heads": 8,
  "pad_token_id": 151643,
  "rms_norm_eps": 9.999999974752427e-07,
  "rope_theta": 1000000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "transformers_version": "4.42.3",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

Qwen2TokenizerFast(name_or_path='Qwen/Qwen1.5-32B-Chat-GGUF', vocab_size=152064, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

And the correct vocab_size=152064 in tokenizer is same with token_embd.weight shape, while "vocab_size": 151936 in config is wrong.

In fact, the current gguf config extraction is dependent on the vocab_size in gguf metadata. However, the Qwen1.5-32B-Chat-GGUF model file doesn't have this optional key in metadata. So it used the default value in Qwen2Config.

from transformers.

amyeroberts commented on September 12, 2024

cc @SunMarc

from transformers.

SunMarc commented on September 12, 2024

Hi @kunger97, can you share with me the version of gguf that you are using ? Also, did it work in the past or you tried it for the first time ?

from transformers.

kunger97 commented on September 12, 2024

This is my first attempt, i'm useing gguf 0.9.1

from transformers.

kunger97 commented on September 12, 2024

Is there currently a solution to this problem?

from transformers.

Isotr0py commented on September 12, 2024

@kunger97 I have created a PR #32551 to fix this.

from transformers.

Recommend Projects

Error when QWEN1.5 32B/9B de-quantizing GGUF about transformers HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent