I used the fast_contrastive_search， cpoied from <a href="https://github.com/yxuansu/S

I used the fast_contrastive_search， cpoied from <a href="https://github.c

repeated token is generated comparing to beam-search, when using fast_contrastive_search on T5 about simctg HOT 4 CLOSED

yxuansu commented on May 20, 2024

repeated token is generated comparing to beam-search, when using fast_contrastive_search on T5

from simctg.

Comments (4)

yxuansu commented on May 20, 2024

I used the fast_contrastive_search， cpoied from https://github.com/yxuansu/SimCTG/blob/main/SimCTGEncDec/SimCTGT5/simctgt5.py code ,as follows： but generated reapied tokens:

Hi @wuzhiye7,

Can you send the name of Chinese BART huggingface model and your inputs to me? I would like to test the instance and provide you some feedbacks.

from simctg.

wuzhiye7 commented on May 20, 2024

model : t5-pegasus-base ， huggingface model hub: imxly/t5-pegasus

input_tokens :['[CLS]', '我', '不会', '贴', '假', '睫', '毛', '呀', '，', '好', '难', '！', '[SEP]']
input_ids :[[101, 1909, 6932, 4745, 463, 3466, 2644, 840, 5661, 1266, 5314, 5658, 102], [101, 32018, 1909, 7117, 7914, 4913, 3399, 179, 505, 1963, 3443, 26300, 2808, 6312, 135, 40959, 731, 31348, 15699, 5661, 24630, 1963, 463, 3466, 2644, 637, 199, 2374, 2106, 4866, 5661, 541, 1963, 26300, 4745, 198, 28756, 4745, 615, 26257, 198, 179, 102]]

@yxuansu

from simctg.

yxuansu commented on May 20, 2024

model : t5-pegasus-base ， huggingface model hub: imxly/t5-pegasus input_tokens :['[CLS]', '我', '不会', '贴', '假', '睫', '毛', '呀', '，', '好', '难', '！', '[SEP]'] input_ids :[[101, 1909, 6932, 4745, 463, 3466, 2644, 840, 5661, 1266, 5314, 5658, 102], [101, 32018, 1909, 7117, 7914, 4913, 3399, 179, 505, 1963, 3443, 26300, 2808, 6312, 135, 40959, 731, 31348, 15699, 5661, 24630, 1963, 463, 3466, 2644, 637, 199, 2374, 2106, 4866, 5661, 541, 1963, 26300, 4745, 198, 28756, 4745, 615, 26257, 198, 179, 102]]

@yxuansu

Hi @wuzhiye7,

I have tested the case on my end. Please follow the instructions below:

(1) First, install simctg from pip:

pip install simctg --upgrade

(2) Second, run the example below:

from simctg.simctgt5 import SimCTGT5
model_name = r'imxly/t5-pegasus'
# initialize tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained(model_name)
# initialize model
from transformers.models.mt5.modeling_mt5 import MT5ForConditionalGeneration
t5model = MT5ForConditionalGeneration.from_pretrained(model_name)
model = SimCTGT5(model_name, user_defined_model=t5model, user_defined_tokenizer=tokenizer, special_token_list=[])

print ('------------------------------------------')
# prepare input
text = '我不会贴假睫毛呀，好难！'
ids = tokenizer.encode(text, return_tensors='pt')
print ('The input text is: {}'.format(text))
print ('------------------------------------------')
# generate result
output = model.fast_contrastive_search(input_ids=ids, beam_width=5, alpha=0.5, decoding_len=30,
        start_of_sequence_token_id=tokenizer.cls_token_id, 
        end_of_sequence_token_id=tokenizer.sep_token_id, early_stop = True)
output_text = ''.join(tokenizer.convert_ids_to_tokens(output))
print ('The output text is: {}'.format(output_text))
'''
  ------------------------------------------
  The input text is: 我不会贴假睫毛呀，好难！
  ------------------------------------------
  The output text is: 如何贴假睫毛？我是女生
'''

P.S. If you are interested, the source code of simctg package is located here (https://github.com/yxuansu/SimCTG/tree/main/simctg).

Please let me know if you have any questions.

from simctg.

wuzhiye7 commented on May 20, 2024

model : t5-pegasus-base ， huggingface model hub: imxly/t5-pegasus input_tokens :['[CLS]', '我', '不会', '贴', '假', '睫', '毛', '呀', '，', '好', '难', '！', '[SEP]'] input_ids :[[101, 1909, 6932, 4745, 463, 3466, 2644, 840, 5661, 1266, 5314, 5658, 102], [101, 32018, 1909, 7117, 7914, 4913, 3399, 179, 505, 1963, 3443, 26300, 2808, 6312, 135, 40959, 731, 31348, 15699, 5661, 24630, 1963, 463, 3466, 2644, 637, 199, 2374, 2106, 4866, 5661, 541, 1963, 26300, 4745, 198, 28756, 4745, 615, 26257, 198, 179, 102]]
@yxuansu

Hi @wuzhiye7,

I have tested the case on my end. Please follow the instructions below:

(1) First, install simctg from pip:
pip install simctg --upgrade
(2) Second, run the example below:
from simctg.simctgt5 import SimCTGT5
model_name = r'imxly/t5-pegasus'
# initialize tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained(model_name)
# initialize model
from transformers.models.mt5.modeling_mt5 import MT5ForConditionalGeneration
t5model = MT5ForConditionalGeneration.from_pretrained(model_name)
model = SimCTGT5(model_name, user_defined_model=t5model, user_defined_tokenizer=tokenizer, special_token_list=[])

print ('------------------------------------------')
# prepare input
text = '我不会贴假睫毛呀，好难！'
ids = tokenizer.encode(text, return_tensors='pt')
print ('The input text is: {}'.format(text))
print ('------------------------------------------')
# generate result
output = model.fast_contrastive_search(input_ids=ids, beam_width=5, alpha=0.5, decoding_len=30,
        start_of_sequence_token_id=tokenizer.cls_token_id, 
        end_of_sequence_token_id=tokenizer.sep_token_id, early_stop = True)
output_text = ''.join(tokenizer.convert_ids_to_tokens(output))
print ('The output text is: {}'.format(output_text))
'''
  ------------------------------------------
  The input text is: 我不会贴假睫毛呀，好难！
  ------------------------------------------
  The output text is: 如何贴假睫毛？我是女生
'''
P.S. If you are interested, the source code of simctg package is located here (https://github.com/yxuansu/SimCTG/tree/main/simctg).

Please let me know if you have any questions.

thanks ，its ok now

from simctg.

repeated token is generated comparing to beam-search, when using fast_contrastive_search on T5 about simctg HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent