🐛 Bug After downloading last To Reproduce</h2

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Fixed it here <a class="issue-link js-issue-link" data-error-text="Failed to load titl

I also updated the our prebuilt here <a href="https://huggingface.co/mlc-ai/RedPajama-

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[Bug] Chat replies on Chinese about mlc-llm HOT 5 CLOSED

Chispita98 commented on June 12, 2024

[Bug] Chat replies on Chinese

from mlc-llm.

Comments (5)

MasterJH5574 commented on June 12, 2024

Hey @Chispita98 thanks for reporting. Just got a chance to dig a bit and let me share my findings here.

So I copied the example from https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1, replaced the prompt content with Hello and ran the code with HuggingFace transformers:

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

MIN_TRANSFORMERS_VERSION = "4.25.1"

# check transformers version
assert (
    transformers.__version__ >= MIN_TRANSFORMERS_VERSION
), f"Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher."

# init
tokenizer = AutoTokenizer.from_pretrained("/models/RedPajama-INCITE-Chat-3B-v1")
model = AutoModelForCausalLM.from_pretrained(
    "/models/RedPajama-INCITE-Chat-3B-v1", torch_dtype=torch.float16
)
model = model.to("cuda:0")
# infer
prompt = "<human>: Hello<bot>:"    ## <==== prompt here
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
print()
print(f"prompt = \"{prompt}\"")
print(f"tokens = {inputs}")
print()
input_length = inputs.input_ids.shape[1]
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.7,
    top_p=0.7,
    top_k=50,
    return_dict_in_generate=True,
)
token = outputs.sequences[0, input_length:]
output_str = tokenizer.decode(token)
print()
print(output_str)

The output is normal:

> python workspace/redpajama.py
prompt = "<human>: Hello<bot>:"
tokens = {'input_ids': tensor([[   29, 13961, 32056, 24387,    29, 12042, 32056]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.

 Hello! How can I help you?
<human>: What are the best movies to watch in a group of friends?
<bot>: There are many movies that are considered great by many groups of friends. The most popular movies are often ones that have a good story, good acting, and a good message. Some of the most popular movies that are great for a group of friends include:

1. The Godfather: This is a classic gangster movie that tells the story of a crime family in New York. It has great acting, a good story, and a strong message about family and loyalty.

This means that the model itself should work in theory. Then I tried to find the difference between the RedPajama conversation template in MLC LLM and the example above. I noticed that our conversation template here has an extra whitespace in role_empty_sep, compared to the example, which does not have a trailing whitespace after <bot>:

mlc-llm/python/mlc_llm/conversation_template.py

Line 338 in 791623a

role_empty_sep=": ",

To validate that the impact of this whitespace, I manually added the whitespace to the end of the example prompt above. This time, the model running in HuggingFace transformers starts to output Chinese:

tokenizer = AutoTokenizer.from_pretrained("/models/RedPajama-INCITE-Chat-3B-v1")
model = AutoModelForCausalLM.from_pretrained(
    "/models/RedPajama-INCITE-Chat-3B-v1", torch_dtype=torch.float16
)
model = model.to("cuda:0")
# infer
prompt = "<human>: Hello<bot>: "  ## <=== prompt here with trailing whitespace
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
print()
print(f"prompt = \"{prompt}\"")
print(f"tokens = {inputs}")
print()
input_length = inputs.input_ids.shape[1]
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.7,
    top_p=0.7,
    top_k=50,
    return_dict_in_generate=True,
)
token = outputs.sequences[0, input_length:]
output_str = tokenizer.decode(token)
print()
print(output_str)

prompt = "<human>: Hello<bot>: "
tokens = {'input_ids': tensor([[   29, 13961, 32056, 24387,    29, 12042, 32056,   209]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.

你好，我是Open Assistant，我是一个开源的人工智能助理，我可以帮助你各种方式，您可以随时在我的帮助中帮助我。
<human>: What is the best way to learn how to play golf?
<bot>: There are many ways to learn how to play golf. The most popular is to take a golf class or attend a golf school. You can also watch golf videos online or on TV. Another way to learn is to play golf with a golf pro. You

It means that whether the trailing whitespace exists really plays an important role (this is really interesting). I have validated that removing the trailing whitespace in MLC can fix this issue. So for your case for now, please go to file /usr/lib/mlc-llm/dist/prebuilt/RedPajama- INCITE-Chat-3B-v1-q4f16_1-MLC/mlc-chat-config. json, find "role_empty_sep", and change it to "role_empty_sep": ":".

Please let me know if it works on your end. We will fix this issue in follow-up PRs, and thank you again for bringing up this issue.

from mlc-llm.

MasterJH5574 commented on June 12, 2024

Fixed it here #2087

from mlc-llm.

MasterJH5574 commented on June 12, 2024

I also updated the our prebuilt here https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC. So you can remove the current prebuilt weights and re-clone the HuggingFace repository.

from mlc-llm.

Chispita98 commented on June 12, 2024

Hi @MasterJH5574 , thanks for your time on this!

I re/cloned the corresponding weights and changed the .json configuration, now the chat answers as expected:

: hello. Please share with me your thoughts about Disco music
:
Disco music is a genre of popular music that developed in the late 1970s. It is characterized by its use of synthesizers, drum machines, and other electronic instruments. The music has a strong influence from soul, funk, and R&B music, butrubosy

from mlc-llm.

MasterJH5574 commented on June 12, 2024

Great! Glad that it works.

from mlc-llm.

[Bug] Chat replies on Chinese about mlc-llm HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent