Giter Club home page Giter Club logo

Comments (20)

ttxs69 avatar ttxs69 commented on June 10, 2024 8

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well.
Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

Sorry to reply so late.
just change two lines code in sentence_handler.py

from spacy.lang.en import English

change to

from spacy.lang.zh import Chinese

and

def __init__(self, language=English):

change to

def __init__(self, language=Chinese):

and this code #45 (comment) works well.

from bert-extractive-summarizer.

dmmiller612 avatar dmmiller612 commented on June 10, 2024 1

The only limitation right now for Chinese is that you would need a Bert Model and tokenizer that uses Chinese. If you have both the tokenizer and model, you can easily pass it in for summarization.

from bert-extractive-summarizer.

shaofengzeng avatar shaofengzeng commented on June 10, 2024

OK, thanks

from bert-extractive-summarizer.

1615070057 avatar 1615070057 commented on June 10, 2024

打扰一下,我可以用您的中文代码...

Hello, is the project about the application of ‘bert-extractive-summarizer’ applied to the Chinese abstract successful? I do n’t know how to modify it. I would like to ask.

from bert-extractive-summarizer.

BIRlz avatar BIRlz commented on June 10, 2024

OK, thanks

Have you ever tested this model on a Chinese dataset? It didn't work on my dataset and outputs nothing

from bert-extractive-summarizer.

dmmiller612 avatar dmmiller612 commented on June 10, 2024

It would need a Chinese based bert model. I am not sure if the bert-multilingual model supports Chinese or not. This would need to be in the form of a huggingface transformer.

from bert-extractive-summarizer.

ttxs69 avatar ttxs69 commented on June 10, 2024

I have tried using bert-base-chinese model,but it outputs nothing.
this is my code:

from transformers import *

# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('bert-base-chinese')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('bert-base-chinese')
custom_model = AutoModel.from_pretrained('bert-base-chinese', config=custom_config)

from summarizer import Summarizer

body = '这是一个测试句子'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)

from bert-extractive-summarizer.

ttxs69 avatar ttxs69 commented on June 10, 2024

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well.
Thanks @dmmiller612

from bert-extractive-summarizer.

Bibabo-BUPT avatar Bibabo-BUPT commented on June 10, 2024

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well.
Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

from bert-extractive-summarizer.

lmq990417 avatar lmq990417 commented on June 10, 2024

@ttxs69
Why is the final output of the Chinese original text after I modify the Chinese model according to your steps?
Urgently want to know, hope can reply!

from bert-extractive-summarizer.

jnkr36 avatar jnkr36 commented on June 10, 2024

@ttxs69
Why is the final output of the Chinese original text after I modify the Chinese model according to your steps?
Urgently want to know, hope can reply!

i just try and it can work after i follow the steps to change the two lines code, you can run step into model(body) for debug

from bert-extractive-summarizer.

lmq990417 avatar lmq990417 commented on June 10, 2024

@ttxs69
Ok, thanks,I will try. If it is convenient, could you please send me a copy of the code you run? My email address is [email protected].

from bert-extractive-summarizer.

lmq990417 avatar lmq990417 commented on June 10, 2024

@jnkr36
I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]
Thank you very much again

from bert-extractive-summarizer.

lmq990417 avatar lmq990417 commented on June 10, 2024

@jnkr36
I came again !
I just have a question that if you've downloaded zh_core_web_sm before.

from bert-extractive-summarizer.

jnkr36 avatar jnkr36 commented on June 10, 2024

@jnkr36
I came again !
I just have a question that if you've downloaded zh_core_web_sm before.

@jnkr36
I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]
Thank you very much again

sorry for late response. i have sent you my project. please check you email. any other questions, we can talk again.

from bert-extractive-summarizer.

FrontMage avatar FrontMage commented on June 10, 2024

Just for convenience, I forked the repo and modified it as the suggestion above, it works nicely.

pip install git+https://github.com/FrontMage/bert-extractive-summarizer.git

from bert-extractive-summarizer.

tuzcsap avatar tuzcsap commented on June 10, 2024

@FrontMage
Hello!
I've installed your modified fork, transformers, spacy 3.0.0 and downloaded zh_core_web_sm, then tried to run model as in ttxs69 snippet, but model generates empty output on Chinese sentences.
Could you, please, provide more details on your setup?

from bert-extractive-summarizer.

zhangsirf avatar zhangsirf commented on June 10, 2024

[email protected]

If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks

from bert-extractive-summarizer.

zhangsirf avatar zhangsirf commented on June 10, 2024

@ttxs69
为什么我按照你的步骤修改了中文模型后最终输出的是中文原文?
急想知道,望能回复!

我只是尝试,在我按照步骤更改两行代码后它可以工作,您可以运行 step into model(body) 进行调试

If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks

from bert-extractive-summarizer.

ilingen avatar ilingen commented on June 10, 2024

For the outputs is original text, I just found out that you need to change every sentence in your long text to a Chinese period.

from bert-extractive-summarizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.