Comments (20)
I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well.
Thanks @dmmiller612Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U
Sorry to reply so late.
just change two lines code in sentence_handler.py
change to
from spacy.lang.zh import Chinese
and
change to
def __init__(self, language=Chinese):
and this code #45 (comment) works well.
from bert-extractive-summarizer.
The only limitation right now for Chinese is that you would need a Bert Model and tokenizer that uses Chinese. If you have both the tokenizer and model, you can easily pass it in for summarization.
from bert-extractive-summarizer.
OK, thanks
from bert-extractive-summarizer.
打扰一下,我可以用您的中文代码...
Hello, is the project about the application of ‘bert-extractive-summarizer’ applied to the Chinese abstract successful? I do n’t know how to modify it. I would like to ask.
from bert-extractive-summarizer.
OK, thanks
Have you ever tested this model on a Chinese dataset? It didn't work on my dataset and outputs nothing
from bert-extractive-summarizer.
It would need a Chinese based bert model. I am not sure if the bert-multilingual model supports Chinese or not. This would need to be in the form of a huggingface transformer.
from bert-extractive-summarizer.
I have tried using bert-base-chinese
model,but it outputs nothing.
this is my code:
from transformers import *
# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('bert-base-chinese')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('bert-base-chinese')
custom_model = AutoModel.from_pretrained('bert-base-chinese', config=custom_config)
from summarizer import Summarizer
body = '这是一个测试句子'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)
from bert-extractive-summarizer.
I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well.
Thanks @dmmiller612
from bert-extractive-summarizer.
I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well.
Thanks @dmmiller612
Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U
from bert-extractive-summarizer.
@ttxs69
Why is the final output of the Chinese original text after I modify the Chinese model according to your steps?
Urgently want to know, hope can reply!
from bert-extractive-summarizer.
@ttxs69
Why is the final output of the Chinese original text after I modify the Chinese model according to your steps?
Urgently want to know, hope can reply!
i just try and it can work after i follow the steps to change the two lines code, you can run step into model(body) for debug
from bert-extractive-summarizer.
@ttxs69
Ok, thanks,I will try. If it is convenient, could you please send me a copy of the code you run? My email address is [email protected].
from bert-extractive-summarizer.
@jnkr36
I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]!
Thank you very much again
from bert-extractive-summarizer.
@jnkr36
I came again !
I just have a question that if you've downloaded zh_core_web_sm before.
from bert-extractive-summarizer.
@jnkr36
I came again !
I just have a question that if you've downloaded zh_core_web_sm before.
@jnkr36
I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is [email protected]!
Thank you very much again
sorry for late response. i have sent you my project. please check you email. any other questions, we can talk again.
from bert-extractive-summarizer.
Just for convenience, I forked the repo and modified it as the suggestion above, it works nicely.
pip install git+https://github.com/FrontMage/bert-extractive-summarizer.git
from bert-extractive-summarizer.
@FrontMage
Hello!
I've installed your modified fork, transformers, spacy 3.0.0 and downloaded zh_core_web_sm, then tried to run model as in ttxs69 snippet, but model generates empty output on Chinese sentences.
Could you, please, provide more details on your setup?
from bert-extractive-summarizer.
If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks
from bert-extractive-summarizer.
@ttxs69
为什么我按照你的步骤修改了中文模型后最终输出的是中文原文?
急想知道,望能回复!我只是尝试,在我按照步骤更改两行代码后它可以工作,您可以运行 step into model(body) 进行调试
If it is convenient, could you please send me a copy of the code you run? My email address is [email protected] thanks
from bert-extractive-summarizer.
For the outputs is original text, I just found out that you need to change every sentence in your long text to a Chinese period.
from bert-extractive-summarizer.
Related Issues (20)
- unable to build on Mac m1 - Big Sur HOT 1
- ValueError: n_samples=4 should be >= n_clusters=40 HOT 1
- how to save model as pkl.file for deploying
- "from summarizer import Summaizer" HOT 1
- training custom model HOT 1
- can you please provide english.json file ,i was having issue that trainer folder is not there
- AWS Lambda + Container issue with model loading as /home is read only HOT 1
- Error when running xlnet for individual paragraphs on linux using gpu
- Reproducibility bug on run_embeddings method
- Don't load the SBERT model twice
- Which kind of model should I choose?
- How to use cached sentence embedding vector as the input instead of text?
- How to support Japaneses
- tensor size mismatch for specific input text
- TypeError: 'Summarizer' object is not callable
- Run Summarizer model on array of strings HOT 2
- Trying to mimic the API's result
- [News API] Summarization returns empty string HOT 2
- cannot import name summarizer HOT 1
- Need a way to force load on CPU when an unsupported GPU throws a pytorch error.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bert-extractive-summarizer.