Excuse me, can i use your code for Chinese...

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How can i apply your code for Chinese? about bert-extractive-summarizer HOT 20 OPEN

dmmiller612 commented on June 10, 2024

How can i apply your code for Chinese?

from bert-extractive-summarizer.

Comments (20)

ttxs69 commented on June 10, 2024 8

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well.
Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

Sorry to reply so late.
just change two lines code in sentence_handler.py

bert-extractive-summarizer/summarizer/sentence_handler.py

Line 3 in f94c024

from spacy.lang.en import English

change to

from spacy.lang.zh import Chinese

and

bert-extractive-summarizer/summarizer/sentence_handler.py

Line 8 in f94c024

def __init__(self, language=English):

change to

def __init__(self, language=Chinese):

and this code #45 (comment) works well.

from bert-extractive-summarizer.

dmmiller612 commented on June 10, 2024 1

The only limitation right now for Chinese is that you would need a Bert Model and tokenizer that uses Chinese. If you have both the tokenizer and model, you can easily pass it in for summarization.

from bert-extractive-summarizer.

shaofengzeng commented on June 10, 2024

OK, thanks

from bert-extractive-summarizer.

1615070057 commented on June 10, 2024

打扰一下，我可以用您的中文代码...

Hello, is the project about the application of ‘bert-extractive-summarizer’ applied to the Chinese abstract successful? I do n’t know how to modify it. I would like to ask.

from bert-extractive-summarizer.

BIRlz commented on June 10, 2024

OK, thanks

Have you ever tested this model on a Chinese dataset? It didn't work on my dataset and outputs nothing

from bert-extractive-summarizer.

dmmiller612 commented on June 10, 2024

It would need a Chinese based bert model. I am not sure if the bert-multilingual model supports Chinese or not. This would need to be in the form of a huggingface transformer.

from bert-extractive-summarizer.

ttxs69 commented on June 10, 2024

I have tried using bert-base-chinese model,but it outputs nothing.
this is my code:

from transformers import *

# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('bert-base-chinese')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('bert-base-chinese')
custom_model = AutoModel.from_pretrained('bert-base-chinese', config=custom_config)

from summarizer import Summarizer

body = '这是一个测试句子'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)

from bert-extractive-summarizer.

ttxs69 commented on June 10, 2024

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well.
Thanks @dmmiller612

from bert-extractive-summarizer.

Bibabo-BUPT commented on June 10, 2024

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well.
Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

from bert-extractive-summarizer.

lmq990417 commented on June 10, 2024

@ttxs69
Why is the final output of the Chinese original text after I modify the Chinese model according to your steps?
Urgently want to know, hope can reply！

from bert-extractive-summarizer.

jnkr36 commented on June 10, 2024

@ttxs69
Why is the final output of the Chinese original text after I modify the Chinese model according to your steps?
Urgently want to know, hope can reply！

i just try and it can work after i follow the steps to change the two lines code, you can run step into model(body) for debug

from bert-extractive-summarizer.

lmq990417 commented on June 10, 2024

@ttxs69
Ok, thanks,I will try. If it is convenient, could you please send me a copy of the code you run? My email address is [email protected].

from bert-extractive-summarizer.

lmq990417 commented on June 10, 2024

@jnkr36
I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]！
Thank you very much again

from bert-extractive-summarizer.

lmq990417 commented on June 10, 2024

@jnkr36
I came again !
I just have a question that if you've downloaded zh_core_web_sm before.

from bert-extractive-summarizer.

jnkr36 commented on June 10, 2024

@jnkr36
I came again !
I just have a question that if you've downloaded zh_core_web_sm before.

@jnkr36
I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]！
Thank you very much again

sorry for late response. i have sent you my project. please check you email. any other questions, we can talk again.

from bert-extractive-summarizer.

FrontMage commented on June 10, 2024

Just for convenience, I forked the repo and modified it as the suggestion above, it works nicely.

pip install git+https://github.com/FrontMage/bert-extractive-summarizer.git

from bert-extractive-summarizer.

tuzcsap commented on June 10, 2024

@FrontMage
Hello!
I've installed your modified fork, transformers, spacy 3.0.0 and downloaded zh_core_web_sm, then tried to run model as in ttxs69 snippet, but model generates empty output on Chinese sentences.
Could you, please, provide more details on your setup?

from bert-extractive-summarizer.

zhangsirf commented on June 10, 2024

[email protected]。

If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks

from bert-extractive-summarizer.

zhangsirf commented on June 10, 2024

@ttxs69
为什么我按照你的步骤修改了中文模型后最终输出的是中文原文？
急想知道，望能回复！

我只是尝试，在我按照步骤更改两行代码后它可以工作，您可以运行 step into model(body) 进行调试

If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks

from bert-extractive-summarizer.

ilingen commented on June 10, 2024

For the outputs is original text, I just found out that you need to change every sentence in your long text to a Chinese period.

from bert-extractive-summarizer.

How can i apply your code for Chinese? about bert-extractive-summarizer HOT 20 OPEN

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent