tingofurro / summac Goto Github PK

View Code? Open in Web Editor NEW

75.0 2.0 19.0 254 KB

Codebase, data and models for the SummaC paper in TACL

Home Page: https://arxiv.org/abs/2111.09525

License: Apache License 2.0

Jupyter Notebook 56.95% Python 39.23% Shell 3.81%

summarization inconsistency-detection factual-consistency text-generation

summac's People

Contributors

Stargazers

Watchers

Forkers

fazarafi etsurin trenusch aktsvigun m0baxter jucendrero purang2 tomhosking simonellershaw j-chim bilibraker kaliaanup jeevananandanne forrestbao techthiyanes brillio-ds cywsg

summac's Issues

Range of numbers summac have

Hi, would two exact sentences have a summac_conv score near 1.?
I am having trouble interpreting the output results

Problem when loading the benchmark

hi, I encounter a key error problem using benchmark.py to load the cogensumm dataset.
The self.get_cnndm_document func always returns a key error indicating the "aid" is not in the loaded CNN/DM dataset. I can't figure out why this problem happens. Could you please help?

Pip install fails on multiple python versions (Ubuntu)

I tried multiple python versions but keep running into compatibility issues across the different required libraries:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gradient 2.0.6 requires click<9.0,>=8.0.1, but you have click 7.1.2 which is incompatible.
sentence-transformers 2.2.2 requires huggingface-hub>=0.4.0, but you have huggingface-hub 0.0.12 which is incompatible.
Successfully installed click-7.1.2 datasets-1.7.0 huggingface-hub-0.0.12 nltk-3.6.6 pyarrow-3.0.0 scikit-learn-1.0.2 summac-0.0.3 tokenizers-0.10.3 tqdm-4.49.0 transformers-4.8.1 xlrd-1.2.0

Can you dump a pip freeze somewhere that has all of the python library versions you are using, along with the python version? Then i can hopefully install it.

IndexError: list index out of range

Hi,

I encountered an error:

File "/add_score.py", line 53, in add_score
    res = function(["? I haven't had a birthday since 2007. I have a b-day in October and it's almost completely ignored."], ["",])
  File "/add_score_summac.py", line 28, in <lambda>
    "my_summacZS_batched": lambda summs, docs: modelZS.score(docs, summs)['scores'],
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 351, in score
    score = self.score_one(source, gen)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 322, in score_one
    image = self.imager.build_image(original, generated)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 113, in build_image
    generated_chunks = self.split_text(generated, granularity=gran_sum)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 94, in split_text
    return self.split_sentences(text)
  File "/usr/local/lib/python3.9/site-packages/summac/model_summac.py", line 71, in split_sentences
    sentences = nltk.tokenize.sent_tokenize(text)
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 107, in sent_tokenize
    return tokenizer.tokenize(text)
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1276, in tokenize
    return list(self.sentences_from_text(text, realign_boundaries))
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in sentences_from_text
    return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1332, in <listcomp>
    return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1322, in span_tokenize
    for sentence in slices:
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1421, in _realign_boundaries
    for sentence1, sentence2 in _pair_iter(slices):
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 318, in _pair_iter
    prev = next(iterator)
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1395, in _slices_from_text
    for match, context in self._match_potential_end_contexts(text):
  File "/usr/local/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1382, in _match_potential_end_contexts
    before_words[match] = split[-1]
IndexError: list index out of range

I think it is caused by the leading "? ", which might lead in an empty sentence within the metric.
Is this to be expected and explained somewhere or is this a bug?

kind regards

Edit:
I circumvented (not fixed) this is issue for now using this code:

match = re.match(r"(\s*[.?!]+\s)", summaries[i])
if match:
    summaries[i] = summaries[i][len(match.group(1)):]

because empty leading sentences with other symbols than "?" also caused this issue.

Performance

Hi,

I am wondering, whether this metric runs as time efficient as possible at my machine.
I assume running on the gpu is faster than on cpu.
What would be the best indicator to see if my machine is fully utilized?
nvidia-smi only reports about 1 GB of additional gpu memory usage during evaluation, but close to 100% GPU-Util.
Is this indicating max gpu usage? I am curious because of the low memory usage, even though I processed a batch of documents and summaries. That took a few minutes to finish.
Are there any parameters to tweak performance, like batch sizes etc.?

kind regards

Hi, how could I access the model.py file?

Hi,
I am currently writing a thesis based on the summac model. But I find that the author deleted it. So I want to ask if someone downloaded it, could you send it to me? My school email address is [email protected]

How to interpret negative scores produced by SummaCzs

I got negative scores from SummaCzs model following the example usage with my data. For example, scores look like the following.

        "scores": {
            "for_reference_summary": [
                -0.193,
                -0.746
            ],
            "for_generated_summary": [
                -0.259,
                -0.402
            ]
        },

Is it normal to have negative scores? How can I interpret the negative scores if so?

Thanks

Missing packages in `SummaC - Main Results.ipynb` (`utils_scoring`, `model_guardrails`, etc.)

Hi,
I'm trying to reproduce your results with SummaC - Main Results.ipynb, but I guess there are some missing packages such as utils_scoring and model_guardrails.
It would be very helpful if you could provide them.

Thank you for your great work!

Issue pip package install with python version 3.10

Hi,

thanks for providing the project as a pip package!
The installation doesn't work on python 3.10 for me. Anybody gotten the same experience?
I can circumvent the issue by using python 3.9.
If the issue is confirmed, could you please fix the package or add the python requirement to the Readme.

kind regards.

Threshold tuning

I was looking at your code and attempting to recreate your results.

If this this is how the results quoted in the paper were obtained it seems a bit strange that you are fine-tuning your threshold on the test set. Not withstanding the fact that the threshold is tuned per dataset in the benchmark (this being mentioned in the paper).

Can't run the notebooks

I am unable to run the notebook, and I feel like I am debugging all of the provided files. It seems like the code files have bugs in quite a few places.

How can I install `utils_misc` package?

Thanks for your work, it's great! But I have some trouble running the example code provided in the README.md.

When I ran the code, the package utils_misc is missing and I cannot find the right package used in model_summac.py.

I tried pip install utils_misc but obviously that's not the package required.

How can I install utils_misc ? Or could you provide a requirements.txt file ?

where is your utils_summac_benchmark

where is your utils_summac_benchmark? Not from your code, and not from pip install.

Which version of QuestEval did you use?

I guess QuestEval implementation by the authors (repo) is used in this code.
Can I ask which version (commit id) you used for your result?

summac/model_baseline.py

Line 17 in 53fae37

from questeval.questeval_metric import QuestEval