Comments (17)
Whoops, looks like I fixed one part, but need to fix the summarizer contract. I will get to that this weekend.
from bert-extractive-summarizer.
You should be able to load a custom (Transformers based) model using the library. Here is an example from the readme, let me know if it you are still having issues.
from transformers import *
# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('allenai/scibert_scivocab_uncased')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
custom_model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased', config=custom_config)
from summarizer import Summarizer
body = 'Text body that you want to summarize with BERT'
body2 = 'Something else you want to summarize with BERT'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)
model(body2)
from bert-extractive-summarizer.
Sorry, I actually fixed this last night, and forgot to commit. I will update when I get home this evening.
from bert-extractive-summarizer.
I am having the same issue:
I am trying to load trained model using:
ext_model = Summarizer(model="../models/CNN_DailyMail_Extractive/bertext_cnndm_transformer.pt")
I also tried to use
ext_model = Summarizer(custom_model="../models/CNN_DailyMail_Extractive/bertext_cnndm_transformer.pt")
However, I have the following error:
File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__ self.model = base_model.from_pretrained(model, output_hidden_states=True) AttributeError: 'NoneType' object has no attribute 'from_pretrained'
from bert-extractive-summarizer.
I'll take a look.
from bert-extractive-summarizer.
Experiencing the same thing
from bert-extractive-summarizer.
How do i use this with docker? Trying the german-bert from here
docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
but get
root@docker2:~/bert-extractive-summarizer/summarizer# docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
100%|#######################################################################################################################################################################################| 40155833/40155833 [00:02<00:00, 19742925.37B/s]
Using Model: bert-base-german-cased
Traceback (most recent call last):
File "./server.py", line 86, in <module>
summarizer = Summarizer(args.model, int(args.hidden), args.reduce, float(args.greediness))
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 73, in __init__
super(Summarizer, self).__init__(model, hidden, reduce_option, greedyness)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 53, in __init__
super(SingleModel, self).__init__(model, hidden, reduce_option, greedyness)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 15, in __init__
self.model = BertParent(model)
File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 41, in __init__
self.model = base_model.from_pretrained(model, output_hidden_states=True)
AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker2:~/bert-extractive-summarizer/summarizer#
from bert-extractive-summarizer.
@davidlenz , Sorry, I wasn't using Docker when checking in the last changes for this issue, so didn't look at that setup;
Making this work would require some more code updates I think. server.py
and summarize.py
would need to be updated to accept arguments to pass in e.g. the path to where your custom model is stored, plus some code updates to create a BertModel (and BertTokenizer if required) from those paths, which can then be passed into the Summarizer(...) constructor.
from bert-extractive-summarizer.
@hdatteln this is a good starting point, thanks! Got the following afterwards:
Traceback (most recent call last):
File "./server.py", line 87, in <module>
summarizer = Summarizer(args.model, args.custom_model, args.custom_tokenizer, int(args.hidden), args.reduce, float(args.greediness))
TypeError: __init__() takes from 1 to 5 positional arguments but 7 were given
root@docker2:~/bert-extractive-summarizer#
So from the requirements-service.txt
here it looks the bert-extractive-summarizer
is installed via pip as version 0.2.0
which needs to be changed to reflect the latest changes in version 0.2.2
.
I applied the changes locally and rebuild the docker container (docker build uses local server.py
and requirements-service.txt
) but had no luck. I am actually uncertain how to correctly provide inputs to custom_model
and custom_tokenizer
.
Staring at the code for a while, i came to the conclusion that my model is not really a custom model in the sense it is meant to be here, but rather another pretrained model already in the transformers repo. Thus i concluded it would suffice to include the bert-base-german-cased
into the MODELS dict from BertParent.py
. However as i currently understand these changes need to be added to pypi as well to be usable with docker.
from bert-extractive-summarizer.
Thanks for the Feedback! Unfortunately it is still not working for me and i am not sure how to go on or correctly use the german-bert
.
docker run --rm -it -p 5000:5000 summary-service:latest -model bert-large-uncased
works well, but
docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
still throws AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker:~/bert-extractive-summarizer# docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
100%|#######################################################################################################################################################################################| 40155833/40155833 [00:02<00:00, 17176330.37B/s]
Using Model: bert-base-german-cased
Traceback (most recent call last):
File "./server.py", line 90, in <module>
greedyness=float(args.greediness)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 106, in __init__
super(Summarizer, self).__init__(model, custom_model, custom_tokenizer, hidden, reduce_option, greedyness, language, random_state)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 80, in __init__
greedyness, language=language, random_state=random_state)
File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 25, in __init__
self.model = BertParent(model, custom_model, custom_tokenizer)
File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__
self.model = base_model.from_pretrained(model, output_hidden_states=True)
AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker:~/bert-extractive-summarizer#
from bert-extractive-summarizer.
Is it also possible to add the multilingual options from BERT as an option? (or make it possible to custom indicate which BERT tokenizer, model and which pre-trained model it needs to use?)
Update: I found out it is already possible, but the documentation leaves some room for interpretation (as in that the custom model needs to be already pre-trained). Maybe it is possible to include the following passage for others to see how they can use it? @dmmiller612
bert_model = "bert-base-multilingual-cased"
custom_model = transformers.BertModel.from_pretrained(bert_model, output_hidden_states=True)
custom_tokenizer = transformers.BertTokenizer.from_pretrained(bert_model)
model = Summarizer(model=bert_model, custom_model=custom_model, custom_tokenizer=custom_tokenizer)```
from bert-extractive-summarizer.
Yep, I can update the documentation.
from bert-extractive-summarizer.
I am having the same issue
$ docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-multilingual-cased [nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip. 100%|################################################################################################################################################################| 40155833/40155833 [00:25<00:00, 1561989.72B/s] Using Model: bert-base-multilingual-cased Traceback (most recent call last): File "./server.py", line 90, in <module> greedyness=float(args.greediness) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 106, in __init__ super(Summarizer, self).__init__(model, custom_model, custom_tokenizer, hidden, reduce_option, greedyness, language, random_state) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 80, in __init__ greedyness, language=language, random_state=random_state) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 25, in __init__ self.model = BertParent(model, custom_model, custom_tokenizer) File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__ self.model = base_model.from_pretrained(model, output_hidden_states=True) AttributeError: 'NoneType' object has no attribute 'from_pretrained'
from bert-extractive-summarizer.
Yeah, right now the service doesn't have a good way to load a custom model (It can easily been done with the library). I'll add something to hopefully address the issue sometime this week.
from bert-extractive-summarizer.
There's an ad-hoc solution if it's urgent.
Replace
`
base_model, base_tokenizer = self.MODELS.get(model, (None, None))
if custom_model:
self.model = custom_model
else:
self.model = base_model.from_pretrained(model, output_hidden_states=True)
if custom_tokenizer:
self.tokenizer = custom_tokenizer
else:
self.tokenizer = base_tokenizer.from_pretrained(model)`
with
`
base_model, base_tokenizer = self.MODELS.get('bert-large-uncased', (None, None))
if custom_model:
self.model = base_model.from_pretrained(custom_model, output_hidden_states=True)
else:
self.model = base_model.from_pretrained(model, output_hidden_states=True)
if custom_tokenizer:
self.tokenizer = base_tokenizer.from_pretrained(custom_tokenizer)
else:
self.tokenizer = base_tokenizer.from_pretrained(model)`
this part in bert_parent.py to make it work. Use with caution, since it's not a permanent solution. You can use new Summarizer(custom model = 'path_or_model', custom_tokenizer = 'path_or_model') now.
from bert-extractive-summarizer.
Yeah, right now the service doesn't have a good way to load a custom model (It can easily been done with the library). I'll add something to hopefully address the issue sometime this week.
Hi, any update on this loading the custom model
from bert-extractive-summarizer.
Closing as stale. Let me know if any issues arise here.
from bert-extractive-summarizer.
Related Issues (20)
- unable to build on Mac m1 - Big Sur HOT 1
- ValueError: n_samples=4 should be >= n_clusters=40 HOT 1
- how to save model as pkl.file for deploying
- "from summarizer import Summaizer" HOT 1
- training custom model HOT 1
- can you please provide english.json file ,i was having issue that trainer folder is not there
- AWS Lambda + Container issue with model loading as /home is read only HOT 1
- Error when running xlnet for individual paragraphs on linux using gpu
- Reproducibility bug on run_embeddings method
- Don't load the SBERT model twice
- Which kind of model should I choose?
- How to use cached sentence embedding vector as the input instead of text?
- How to support Japaneses
- tensor size mismatch for specific input text
- TypeError: 'Summarizer' object is not callable
- Run Summarizer model on array of strings HOT 2
- Trying to mimic the API's result
- [News API] Summarization returns empty string HOT 2
- cannot import name summarizer HOT 1
- Need a way to force load on CPU when an unsupported GPU throws a pytorch error.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bert-extractive-summarizer.