Giter Club home page Giter Club logo

Comments (17)

dmmiller612 avatar dmmiller612 commented on June 10, 2024 2

Whoops, looks like I fixed one part, but need to fix the summarizer contract. I will get to that this weekend.

from bert-extractive-summarizer.

dmmiller612 avatar dmmiller612 commented on June 10, 2024 2

You should be able to load a custom (Transformers based) model using the library. Here is an example from the readme, let me know if it you are still having issues.

from transformers import *

# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('allenai/scibert_scivocab_uncased')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
custom_model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased', config=custom_config)

from summarizer import Summarizer

body = 'Text body that you want to summarize with BERT'
body2 = 'Something else you want to summarize with BERT'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)
model(body2)

from bert-extractive-summarizer.

dmmiller612 avatar dmmiller612 commented on June 10, 2024 1

Sorry, I actually fixed this last night, and forgot to commit. I will update when I get home this evening.

from bert-extractive-summarizer.

igormis avatar igormis commented on June 10, 2024 1

I am having the same issue:
I am trying to load trained model using:
ext_model = Summarizer(model="../models/CNN_DailyMail_Extractive/bertext_cnndm_transformer.pt")
I also tried to use
ext_model = Summarizer(custom_model="../models/CNN_DailyMail_Extractive/bertext_cnndm_transformer.pt")
However, I have the following error:
File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__ self.model = base_model.from_pretrained(model, output_hidden_states=True) AttributeError: 'NoneType' object has no attribute 'from_pretrained'

from bert-extractive-summarizer.

dmmiller612 avatar dmmiller612 commented on June 10, 2024

I'll take a look.

from bert-extractive-summarizer.

hdatteln avatar hdatteln commented on June 10, 2024

Experiencing the same thing

from bert-extractive-summarizer.

davidlenz avatar davidlenz commented on June 10, 2024

How do i use this with docker? Trying the german-bert from here

docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased

but get

root@docker2:~/bert-extractive-summarizer/summarizer# docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
100%|#######################################################################################################################################################################################| 40155833/40155833 [00:02<00:00, 19742925.37B/s]
Using Model: bert-base-german-cased
Traceback (most recent call last):
  File "./server.py", line 86, in <module>
    summarizer = Summarizer(args.model, int(args.hidden), args.reduce, float(args.greediness))
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 73, in __init__
    super(Summarizer, self).__init__(model, hidden, reduce_option, greedyness)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 53, in __init__
    super(SingleModel, self).__init__(model, hidden, reduce_option, greedyness)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 15, in __init__
    self.model = BertParent(model)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 41, in __init__
    self.model = base_model.from_pretrained(model, output_hidden_states=True)
AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker2:~/bert-extractive-summarizer/summarizer#

from bert-extractive-summarizer.

hdatteln avatar hdatteln commented on June 10, 2024

@davidlenz , Sorry, I wasn't using Docker when checking in the last changes for this issue, so didn't look at that setup;
Making this work would require some more code updates I think. server.py and summarize.py would need to be updated to accept arguments to pass in e.g. the path to where your custom model is stored, plus some code updates to create a BertModel (and BertTokenizer if required) from those paths, which can then be passed into the Summarizer(...) constructor.

from bert-extractive-summarizer.

davidlenz avatar davidlenz commented on June 10, 2024

@hdatteln this is a good starting point, thanks! Got the following afterwards:

Traceback (most recent call last):
  File "./server.py", line 87, in <module>
    summarizer = Summarizer(args.model, args.custom_model, args.custom_tokenizer, int(args.hidden), args.reduce, float(args.greediness))
TypeError: __init__() takes from 1 to 5 positional arguments but 7 were given
root@docker2:~/bert-extractive-summarizer#

So from the requirements-service.txt here it looks the bert-extractive-summarizer is installed via pip as version 0.2.0 which needs to be changed to reflect the latest changes in version 0.2.2.

I applied the changes locally and rebuild the docker container (docker build uses local server.py and requirements-service.txt) but had no luck. I am actually uncertain how to correctly provide inputs to custom_model and custom_tokenizer.

Staring at the code for a while, i came to the conclusion that my model is not really a custom model in the sense it is meant to be here, but rather another pretrained model already in the transformers repo. Thus i concluded it would suffice to include the bert-base-german-cased into the MODELS dict from BertParent.py. However as i currently understand these changes need to be added to pypi as well to be usable with docker.

from bert-extractive-summarizer.

davidlenz avatar davidlenz commented on June 10, 2024

Thanks for the Feedback! Unfortunately it is still not working for me and i am not sure how to go on or correctly use the german-bert.

docker run --rm -it -p 5000:5000 summary-service:latest -model bert-large-uncased

works well, but

docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased

still throws AttributeError: 'NoneType' object has no attribute 'from_pretrained'


root@docker:~/bert-extractive-summarizer# docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-german-cased
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
100%|#######################################################################################################################################################################################| 40155833/40155833 [00:02<00:00, 17176330.37B/s]
Using Model: bert-base-german-cased
Traceback (most recent call last):
  File "./server.py", line 90, in <module>
    greedyness=float(args.greediness)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 106, in __init__
    super(Summarizer, self).__init__(model, custom_model, custom_tokenizer, hidden, reduce_option, greedyness, language, random_state)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 80, in __init__
    greedyness, language=language, random_state=random_state)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 25, in __init__
    self.model = BertParent(model, custom_model, custom_tokenizer)
  File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__
    self.model = base_model.from_pretrained(model, output_hidden_states=True)
AttributeError: 'NoneType' object has no attribute 'from_pretrained'
root@docker:~/bert-extractive-summarizer#

from bert-extractive-summarizer.

houda96 avatar houda96 commented on June 10, 2024

Is it also possible to add the multilingual options from BERT as an option? (or make it possible to custom indicate which BERT tokenizer, model and which pre-trained model it needs to use?)

Update: I found out it is already possible, but the documentation leaves some room for interpretation (as in that the custom model needs to be already pre-trained). Maybe it is possible to include the following passage for others to see how they can use it? @dmmiller612

bert_model = "bert-base-multilingual-cased"
custom_model = transformers.BertModel.from_pretrained(bert_model,  output_hidden_states=True)
custom_tokenizer = transformers.BertTokenizer.from_pretrained(bert_model)
model = Summarizer(model=bert_model, custom_model=custom_model, custom_tokenizer=custom_tokenizer)```

from bert-extractive-summarizer.

dmmiller612 avatar dmmiller612 commented on June 10, 2024

Yep, I can update the documentation.

from bert-extractive-summarizer.

elmeligy avatar elmeligy commented on June 10, 2024

I am having the same issue
$ docker run --rm -it -p 5000:5000 summary-service:latest -model bert-base-multilingual-cased [nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip. 100%|################################################################################################################################################################| 40155833/40155833 [00:25<00:00, 1561989.72B/s] Using Model: bert-base-multilingual-cased Traceback (most recent call last): File "./server.py", line 90, in <module> greedyness=float(args.greediness) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 106, in __init__ super(Summarizer, self).__init__(model, custom_model, custom_tokenizer, hidden, reduce_option, greedyness, language, random_state) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 80, in __init__ greedyness, language=language, random_state=random_state) File "/usr/local/lib/python3.6/dist-packages/summarizer/model_processors.py", line 25, in __init__ self.model = BertParent(model, custom_model, custom_tokenizer) File "/usr/local/lib/python3.6/dist-packages/summarizer/BertParent.py", line 38, in __init__ self.model = base_model.from_pretrained(model, output_hidden_states=True) AttributeError: 'NoneType' object has no attribute 'from_pretrained'

from bert-extractive-summarizer.

dmmiller612 avatar dmmiller612 commented on June 10, 2024

Yeah, right now the service doesn't have a good way to load a custom model (It can easily been done with the library). I'll add something to hopefully address the issue sometime this week.

from bert-extractive-summarizer.

 avatar commented on June 10, 2024

There's an ad-hoc solution if it's urgent.

Replace

`
base_model, base_tokenizer = self.MODELS.get(model, (None, None))

    if custom_model:
        self.model = custom_model
    else:
        self.model = base_model.from_pretrained(model, output_hidden_states=True)

    if custom_tokenizer:
        self.tokenizer = custom_tokenizer
    else:
        self.tokenizer = base_tokenizer.from_pretrained(model)`

with

`
base_model, base_tokenizer = self.MODELS.get('bert-large-uncased', (None, None))

    if custom_model:
        self.model = base_model.from_pretrained(custom_model, output_hidden_states=True)
    else:
        self.model = base_model.from_pretrained(model, output_hidden_states=True)

    if custom_tokenizer:
        self.tokenizer = base_tokenizer.from_pretrained(custom_tokenizer)
    else:
        self.tokenizer = base_tokenizer.from_pretrained(model)`

this part in bert_parent.py to make it work. Use with caution, since it's not a permanent solution. You can use new Summarizer(custom model = 'path_or_model', custom_tokenizer = 'path_or_model') now.

from bert-extractive-summarizer.

nvenkatesh2409 avatar nvenkatesh2409 commented on June 10, 2024

Yeah, right now the service doesn't have a good way to load a custom model (It can easily been done with the library). I'll add something to hopefully address the issue sometime this week.

Hi, any update on this loading the custom model

from bert-extractive-summarizer.

dmmiller612 avatar dmmiller612 commented on June 10, 2024

Closing as stale. Let me know if any issues arise here.

from bert-extractive-summarizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.