Giter Club home page Giter Club logo

Comments (4)

yxuansu avatar yxuansu commented on May 20, 2024

I used the fast_contrastive_search, cpoied from https://github.com/yxuansu/SimCTG/blob/main/SimCTGEncDec/SimCTGT5/simctgt5.py code ,as follows: image but generated reapied tokens: image

Hi @wuzhiye7,

Can you send the name of Chinese BART huggingface model and your inputs to me? I would like to test the instance and provide you some feedbacks.

from simctg.

wuzhiye7 avatar wuzhiye7 commented on May 20, 2024

model : t5-pegasus-base , huggingface model hub: imxly/t5-pegasus
image
input_tokens :['[CLS]', '我', '不会', '贴', '假', '睫', '毛', '呀', ',', '好', '难', '!', '[SEP]']
input_ids :[[101, 1909, 6932, 4745, 463, 3466, 2644, 840, 5661, 1266, 5314, 5658, 102], [101, 32018, 1909, 7117, 7914, 4913, 3399, 179, 505, 1963, 3443, 26300, 2808, 6312, 135, 40959, 731, 31348, 15699, 5661, 24630, 1963, 463, 3466, 2644, 637, 199, 2374, 2106, 4866, 5661, 541, 1963, 26300, 4745, 198, 28756, 4745, 615, 26257, 198, 179, 102]]

@yxuansu

from simctg.

yxuansu avatar yxuansu commented on May 20, 2024

model : t5-pegasus-base , huggingface model hub: imxly/t5-pegasus image input_tokens :['[CLS]', '我', '不会', '贴', '假', '睫', '毛', '呀', ',', '好', '难', '!', '[SEP]'] input_ids :[[101, 1909, 6932, 4745, 463, 3466, 2644, 840, 5661, 1266, 5314, 5658, 102], [101, 32018, 1909, 7117, 7914, 4913, 3399, 179, 505, 1963, 3443, 26300, 2808, 6312, 135, 40959, 731, 31348, 15699, 5661, 24630, 1963, 463, 3466, 2644, 637, 199, 2374, 2106, 4866, 5661, 541, 1963, 26300, 4745, 198, 28756, 4745, 615, 26257, 198, 179, 102]]

@yxuansu

Hi @wuzhiye7,

I have tested the case on my end. Please follow the instructions below:

(1) First, install simctg from pip:

pip install simctg --upgrade

(2) Second, run the example below:

from simctg.simctgt5 import SimCTGT5
model_name = r'imxly/t5-pegasus'
# initialize tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained(model_name)
# initialize model
from transformers.models.mt5.modeling_mt5 import MT5ForConditionalGeneration
t5model = MT5ForConditionalGeneration.from_pretrained(model_name)
model = SimCTGT5(model_name, user_defined_model=t5model, user_defined_tokenizer=tokenizer, special_token_list=[])

print ('------------------------------------------')
# prepare input
text = '我不会贴假睫毛呀,好难!'
ids = tokenizer.encode(text, return_tensors='pt')
print ('The input text is: {}'.format(text))
print ('------------------------------------------')
# generate result
output = model.fast_contrastive_search(input_ids=ids, beam_width=5, alpha=0.5, decoding_len=30,
        start_of_sequence_token_id=tokenizer.cls_token_id, 
        end_of_sequence_token_id=tokenizer.sep_token_id, early_stop = True)
output_text = ''.join(tokenizer.convert_ids_to_tokens(output))
print ('The output text is: {}'.format(output_text))
'''
  ------------------------------------------
  The input text is: 我不会贴假睫毛呀,好难!
  ------------------------------------------
  The output text is: 如何贴假睫毛?我是女生
'''

P.S. If you are interested, the source code of simctg package is located here (https://github.com/yxuansu/SimCTG/tree/main/simctg).

Please let me know if you have any questions.

from simctg.

wuzhiye7 avatar wuzhiye7 commented on May 20, 2024

model : t5-pegasus-base , huggingface model hub: imxly/t5-pegasus image input_tokens :['[CLS]', '我', '不会', '贴', '假', '睫', '毛', '呀', ',', '好', '难', '!', '[SEP]'] input_ids :[[101, 1909, 6932, 4745, 463, 3466, 2644, 840, 5661, 1266, 5314, 5658, 102], [101, 32018, 1909, 7117, 7914, 4913, 3399, 179, 505, 1963, 3443, 26300, 2808, 6312, 135, 40959, 731, 31348, 15699, 5661, 24630, 1963, 463, 3466, 2644, 637, 199, 2374, 2106, 4866, 5661, 541, 1963, 26300, 4745, 198, 28756, 4745, 615, 26257, 198, 179, 102]]
@yxuansu

Hi @wuzhiye7,

I have tested the case on my end. Please follow the instructions below:

(1) First, install simctg from pip:

pip install simctg --upgrade

(2) Second, run the example below:

from simctg.simctgt5 import SimCTGT5
model_name = r'imxly/t5-pegasus'
# initialize tokenizer
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained(model_name)
# initialize model
from transformers.models.mt5.modeling_mt5 import MT5ForConditionalGeneration
t5model = MT5ForConditionalGeneration.from_pretrained(model_name)
model = SimCTGT5(model_name, user_defined_model=t5model, user_defined_tokenizer=tokenizer, special_token_list=[])

print ('------------------------------------------')
# prepare input
text = '我不会贴假睫毛呀,好难!'
ids = tokenizer.encode(text, return_tensors='pt')
print ('The input text is: {}'.format(text))
print ('------------------------------------------')
# generate result
output = model.fast_contrastive_search(input_ids=ids, beam_width=5, alpha=0.5, decoding_len=30,
        start_of_sequence_token_id=tokenizer.cls_token_id, 
        end_of_sequence_token_id=tokenizer.sep_token_id, early_stop = True)
output_text = ''.join(tokenizer.convert_ids_to_tokens(output))
print ('The output text is: {}'.format(output_text))
'''
  ------------------------------------------
  The input text is: 我不会贴假睫毛呀,好难!
  ------------------------------------------
  The output text is: 如何贴假睫毛?我是女生
'''

P.S. If you are interested, the source code of simctg package is located here (https://github.com/yxuansu/SimCTG/tree/main/simctg).

Please let me know if you have any questions.

thanks ,its ok now

from simctg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.